Repeated-measures ANOVA
Repeated-measures designs are often used in psychology in which the same participants are measured multiple times. One popular example is the longitudinal design in which the same participants are followed and measured over time. Another example is the cross-over study in which participants receive a sequence of different treatments. To analyze such data, repeated-measures ANOVA can be used.
An example
The ACITVE study has measures on verbal ability through the Hopkins Verbal Learning Test for 4 times. The data are apparently repeated measures. Using the data as an example, we try to answer the following questions:
- Whether there is any difference in verbal ability among the 4 times of measures. In this case, time is a within-subject factor.
- Whether there is any difference between male and female participants. In this case, sex is a between-subject factor.
- Whether there is an interaction effect between time and sex.
Long format of data
For repeated-measures ANOVA in R, it requires the long format of data. The current data are in wide format in which the hvltt
data at each time are included as a separated variable on one column in the data frame. For the long format, we would need to stack the data from each individual into a vector. To reshape the data, the function melt()
from the R package reshape2
can be used. Specially, for the ACITVE data, the following code can be used.
> library(reshape2) > > usedata('active') > head(active) site age edu group booster sex reason ufov hvltt hvltt2 hvltt3 hvltt4 mmse id 1 1 76 12 1 1 1 28 16 28 28 17 22 27 1 2 1 67 10 1 1 2 13 20 24 22 20 27 25 2 3 6 67 13 3 1 2 24 16 24 24 28 27 27 3 4 5 72 16 1 1 2 33 16 35 34 32 34 30 4 5 4 69 12 4 0 2 30 16 35 29 34 34 28 5 6 1 70 13 1 1 1 35 23 29 27 26 29 23 6 > > active_long <- melt(active, + id.vars=c("id", "sex"), + measure.vars=c("hvltt", "hvltt2", "hvltt3", "hvltt4"), + variable.name="time", + value.name="hvltt" + ) > > head(active_long) id sex time hvltt 1 1 1 hvltt 28 2 2 2 hvltt 24 3 3 2 hvltt 24 4 4 2 hvltt 35 5 5 2 hvltt 35 6 6 1 hvltt 29 >
Note that in the melt()
function,
- The first input is the data set name.
id.vars
provides a vector of variables that will be repeated (not stacked) in the new long format data set.measure.vars
tells the variables in the wide-format to be stacked in the long format.variable.name
gives the name for the variable created by the columns in the wide format.value.name
gives the name for the variable created by the data in the long format. In this example, that is hvltt data.
One within-subject factor
We first consider just one within-subject factor, time, to evaluate whether there is any difference in verbal ability across the 4 times of data. To answer the question, the following analysis can be conducted.
> library(reshape2) > usedata('active') > > active_long <- melt(active, + id.vars=c("id", "sex"), + measure.vars=c("hvltt", "hvltt2", "hvltt3", "hvltt4"), + variable.name="time", + value.name="hvltt" + ) > > ex1<-aov(hvltt~time+Error(id/time), data=active_long) > summary(ex1) Error: id Df Sum Sq Mean Sq F value Pr(>F) Residuals 1 0.3716 0.3716 Error: id:time Df Sum Sq Mean Sq time 3 3800 1267 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 3 1112 370.7 13.12 1.54e-08 *** Residuals 6292 177742 28.2 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >
To look at the effect of "time", we check the output in the "Error: Within
" section. For the F value 13.12 with the degrees of freedom 3 and 6292, we have p-value 1.54e-08. Therefore, there is a significant difference among the 4 times of data.
Note that to conduct the repeated-measures ANOVA, the R function aov()
is used. The function uses the regular expression to represent the model. In this case, the outcome is hvltt
and the predictor is time
. Error(id/time)
is used to divide the error variance into 4 different clusters, which therefore takes into account of the repeated measures.
One within-subject factor and one between-subject factor
As an example, we consider an additional between-subject factor, sex. With the two factors, we can also test the interaction effect between time and sex. The code below can be used to conduct the analysis.
> library(reshape2) > usedata('active') > > active_long <- melt(active, + id.vars=c("id", "sex"), + measure.vars=c("hvltt", "hvltt2", "hvltt3", "hvltt4"), + variable.name="time", + value.name="hvltt" + ) > > ex2<-aov(hvltt~time*sex+Error(id/(time*sex)), data=active_long) > summary(ex2) Error: id Df Sum Sq Mean Sq sex 1 0.3716 0.3716 Error: id:time Df Sum Sq Mean Sq time 3 3800 1267 Error: id:sex Df Sum Sq Mean Sq sex 1 5020 5020 Error: id:time:sex Df Sum Sq Mean Sq time 3 204.9 68.29 Error: Within Df Sum Sq Mean Sq F value Pr(>F) time 3 1091 363.8 13.383 1.05e-08 *** sex 1 1592 1592.1 58.564 2.26e-14 *** time:sex 3 116 38.6 1.419 0.235 Residuals 6284 170830 27.2 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >
Based on the output, we have significant time and sex effects but no interaction effect.
To cite the book, use:
Zhang, Z. & Wang, L. (2017-2025). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.