1.16 Wide vs. Long data
Read more on tidy data here: https://r4ds.hadley.nz/data-tidy
The data on Lung function originally was recorded in wide format, with separate variables for mother’s and father’s FEV1 score (MFEV1
and FFEV
). In this format, the data is one record per family.
fev <- read.delim("https://norcalbiostat.netlify.com/data/Lung_081217.txt",
sep="\t", header=TRUE)
head(fev)
## ID AREA FSEX FAGE FHEIGHT FWEIGHT FFVC FFEV1 MSEX MAGE MHEIGHT MWEIGHT MFVC
## 1 1 1 1 53 61 161 391 3.23 2 43 62 136 370
## 2 2 1 1 40 72 198 441 3.95 2 38 66 160 411
## 3 3 1 1 26 69 210 445 3.47 2 27 59 114 309
## 4 4 1 1 34 68 187 433 3.74 2 36 58 123 265
## 5 5 1 1 46 61 121 354 2.90 2 39 62 128 245
## 6 6 1 1 44 72 153 610 4.91 2 36 66 125 349
## MFEV1 OCSEX OCAGE OCHEIGHT OCWEIGHT OCFVC OCFEV1 MCSEX MCAGE MCHEIGHT
## 1 3.31 2 12 59 115 296 2.79 NA NA NA
## 2 3.47 1 10 56 66 323 2.39 NA NA NA
## 3 2.65 1 8 50 59 114 1.11 NA NA NA
## 4 2.06 2 11 57 106 256 1.85 1 9 49
## 5 2.33 1 16 61 88 260 2.47 2 12 60
## 6 3.06 1 15 67 100 389 3.55 1 13 57
## MCWEIGHT MCFVC MCFEV1 YCSEX YCAGE YCHEIGHT YCWEIGHT YCFVC YCFEV1
## 1 NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA
## 3 NA NA NA NA NA NA NA NA NA
## 4 56 159 1.30 NA NA NA NA NA NA
## 5 85 268 2.34 2 10 50 53 154 1.43
## 6 87 276 2.37 2 10 55 72 195 1.69
To analyze the effect of gender on FEV, the data need to be in long format, with a single variable for fev
and a separate variable for gender. The following code chunk demonstrates one method of combining data on height, gender, age and FEV1 for both males and females.
fev2 <- data.frame(gender = c(fev$FSEX, fev$MSEX),
rev = c(fev$FFEV1, fev$MFEV1),
ht = c(fev$FHEIGHT, fev$MHEIGHT),
age = c(fev$FAGE, fev$MAGE))
fev2$gender <- factor(fev2$gender, labels=c("M", "F"))
head(fev2)
## gender rev ht age
## 1 M 3.23 61 53
## 2 M 3.95 72 40
## 3 M 3.47 69 26
## 4 M 3.74 68 34
## 5 M 2.90 61 46
## 6 M 4.91 72 44
Nearly all analysis procedures and most graphing procedures require the data to be in long format. There are several R
packages that can help with this including reshape2
and tidyr
.