Repeated-measures ANOVA

Repeated-measures designs are often used in psychology in which the same participants are measured multiple times. One popular example is the longitudinal design in which the same participants are followed and measured over time. Another example is the cross-over study in which participants receive a sequence of different treatments. To analyze such data, repeated-measures ANOVA can be used.

An example

The ACITVE study has measures on verbal ability through the Hopkins Verbal Learning Test for 4 times. The data are apparently repeated measures. Using the data as an example, we try to answer the following questions:

  • Whether there is any difference in verbal ability among the 4 times of measures. In this case, time is a within-subject factor.
  • Whether there is any difference between male and female participants. In this case, sex is a between-subject factor. 
  • Whether there is an interaction effect between time and sex.

Long format of data

For repeated-measures ANOVA in R, it requires the long format of data. The current data are in wide format in which the hvltt data at each time are included as a separated variable on one column in the data frame. For the long format, we would need to stack the data from each individual into a vector. To reshape the data, the function melt() from the R package reshape2 can be used. Specially, for the ACITVE data, the following code can be used.

> library(reshape2)
> 
> usedata('active')
> head(active)
  site age edu group booster sex reason ufov hvltt hvltt2 hvltt3 hvltt4 mmse id
1    1  76  12     1       1   1     28   16    28     28     17     22   27  1
2    1  67  10     1       1   2     13   20    24     22     20     27   25  2
3    6  67  13     3       1   2     24   16    24     24     28     27   27  3
4    5  72  16     1       1   2     33   16    35     34     32     34   30  4
5    4  69  12     4       0   2     30   16    35     29     34     34   28  5
6    1  70  13     1       1   1     35   23    29     27     26     29   23  6
> 
> active_long <- melt(active,
+     id.vars=c("id", "sex"),
+     measure.vars=c("hvltt", "hvltt2", "hvltt3", "hvltt4"),
+     variable.name="time",
+     value.name="hvltt"
+ )
> 
> head(active_long)
  id sex  time hvltt
1  1   1 hvltt    28
2  2   2 hvltt    24
3  3   2 hvltt    24
4  4   2 hvltt    35
5  5   2 hvltt    35
6  6   1 hvltt    29
> 

Note that in the melt() function,

  • The first input is the data set name.
  • id.vars provides a vector of variables that will be repeated (not stacked) in the new long format data set.
  • measure.vars tells the variables in the wide-format to be stacked in the long format.
  • variable.name gives the name for the variable created by the columns in the wide format.
  • value.name gives the name for the variable created by the data in the long format. In this example, that is hvltt data.

One within-subject factor

We first consider just one within-subject factor, time, to evaluate whether there is any difference in verbal ability across the 4 times of data. To answer the question, the following analysis can be conducted.

> library(reshape2)
> usedata('active')
> 
> active_long <- melt(active,
+     id.vars=c("id", "sex"),
+     measure.vars=c("hvltt", "hvltt2", "hvltt3", "hvltt4"),
+     variable.name="time",
+     value.name="hvltt"
+ )
> 
> ex1<-aov(hvltt~time+Error(id/time), data=active_long)
> summary(ex1)

Error: id
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  1 0.3716  0.3716               

Error: id:time
     Df Sum Sq Mean Sq
time  3   3800    1267

Error: Within
            Df Sum Sq Mean Sq F value   Pr(>F)    
time         3   1112   370.7   13.12 1.54e-08 ***
Residuals 6292 177742    28.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 

To look at the effect of "time", we check the output in the "Error: Within" section. For the F value 13.12 with the degrees of freedom 3 and 6292, we have p-value 1.54e-08. Therefore, there is a significant difference among the 4 times of data.

Note that to conduct the repeated-measures ANOVA, the R function aov() is used. The function uses the regular expression to represent the model. In this case, the outcome is hvltt and the predictor is timeError(id/time) is used to divide the error variance into 4 different clusters, which therefore takes into account of the repeated measures. 

One within-subject factor and one between-subject factor

As an example, we consider an additional between-subject factor, sex. With the two factors, we can also test the interaction effect between time and sex. The code below can be used to conduct the analysis.

> library(reshape2)
> usedata('active')
> 
> active_long <- melt(active,
+     id.vars=c("id", "sex"),
+     measure.vars=c("hvltt", "hvltt2", "hvltt3", "hvltt4"),
+     variable.name="time",
+     value.name="hvltt"
+ )
> 
> ex2<-aov(hvltt~time*sex+Error(id/(time*sex)), data=active_long)
> summary(ex2)

Error: id
    Df Sum Sq Mean Sq
sex  1 0.3716  0.3716

Error: id:time
     Df Sum Sq Mean Sq
time  3   3800    1267

Error: id:sex
    Df Sum Sq Mean Sq
sex  1   5020    5020

Error: id:time:sex
     Df Sum Sq Mean Sq
time  3  204.9   68.29

Error: Within
            Df Sum Sq Mean Sq F value   Pr(>F)    
time         3   1091   363.8  13.383 1.05e-08 ***
sex          1   1592  1592.1  58.564 2.26e-14 ***
time:sex     3    116    38.6   1.419    0.235    
Residuals 6284 170830    27.2                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 

Based on the output, we have significant time and sex effects but no interaction effect.

To cite the book, use: Zhang, Z. & Wang, L. (2017-2025). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.