One-way ANOVA
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences between group means, developed by R.A. Fisher (Fisher, 1925). In ANOVA, the observed variance in a particular variable, usually an outcome variable, is partitioned into components attributable to different sources of variation: typically the between-group variation and the within-group variation. Simply speaking, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups.
One-way ANOVA
One-way ANOVA typically evaluates whether there is difference in means of three or more groups, although it also works for two-group analysis. In other words, it investigates the effect a grouping variable on the outcome variable.
In the ACTIVE data, we have 4 groups: one control group and three training groups - memory, reasoning, and speed training groups. As an example, we test whether there is any difference in the four groups in terms of memory performance measured by the Hopkins Verbal Learning Test, denoted by the hvltt2
variable in the data set. Then, one-way ANOVA can be used.
Specifically for this example, the null and alternative hypotheses
\[ H_{0}:\:\mu_{1}=\mu_{2}=\mu_{3}=\mu_{4}\]
\[H_{a}:\;H_{0}\,\mbox{is not true or at least two groups have different means.}\]
For the hypothesis testing, an F-test is used. The basic idea is to decompose the total variation of the outcome variance into the variation because of the difference between groups and the variation within groups. The variation is measured by the sum of squares. For one-way ANOVA, we have \[ SST = SSB + SSW \]
- SST is the total sum of squares.
- SSB is the between-group sum of squares.
- SSW is the within-group sum of squares.
Then the F-statistic is \[ F=\frac{SSB/df_B}{SSW/df_W} = \frac{MSB}{MSW}\]
- $df_B$ is the between-group degrees of freedom, which is equal to the number of groups - 1. For the ACTIVE example, it is $4-1=3$.
- $df_W$ is the within-group degrees of freedom, which is equal to the total sample size - the number of groups. Since the sample size is 1575, $df_W = 1575-4 = 1571$.
If the null hypothesis is true, the F-statistic should follow an F distribution with degrees of freedom $df_B$ and $df_W$ under certain condition. The R code below conducts the one-way ANOVA for the ACTIVE data.
- ANOVA in R is based on the linear regression. Therefore, the model is fitted using the function
lm()
. Then the functionanova()
is used to construct the ANOVA source of variation table. In the table, the sum of squares (Sum Sq), mean sum of squares (Mean Sq), degrees of freedom (Df), F value and p-value (Pr(>F)) are included. - In the data set, the group variable is coded using numeric values. To conduct ANOVA, one needs to first change it to a grouping variable, which is done using the function
factor()
. - Specifically for this example, we have F=10.051. Comparing it to an F distribution with degrees of freedom (3, 1571), we obtain the p-value = 1.466e-06. Therefore, one would reject the null hypothesis. The memory performance significantly differed across the 4 groups.
> usedata('active') > attach(active) > > model<-lm(hvltt2~factor(group)) > anova(model) Analysis of Variance Table Response: hvltt2 Df Sum Sq Mean Sq F value Pr(>F) factor(group) 3 859 286.203 10.051 1.466e-06 *** Residuals 1571 44736 28.476 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >
Post-hoc multiple comparison
In the one-way ANOVA, we have found at least two groups are different in memory performance. We can further investigate which two groups are different. In R, the function pairwise.t.test()
can be used to conduct a t-test for any two groups to see if they are significantly different. For example, the code below shows how to conduct such analysis.
> usedata('active') > attach(active) > > pairwise.t.test(hvltt2, group, p.adj = "none") Pairwise comparisons using t tests with pooled SD data: hvltt2 and group 1 2 3 2 2.9e-05 - - 3 0.0014 0.2995 - 4 2.8e-07 0.3375 0.0445 P value adjustment method: none >
From the output, we might "naively" conclude that group 1 is statistically different from groups 2, 3, and 4 as well as group 3 different from group 4. However, in doing so, we run into the multiple comparison problem -- when conducting a group of more than one hypothesis testing simultaneously, hypothesis tests are more likely to incorrectly reject the null hypothesis. To demonstrate this, consider two hypotheses:
\[ H_{01}: \text{group 1 = group 2} \] and \[ H_{02}: \text{group 3 = group 4}. \]
If for each individual hypothesis, the type I error rate is controlled at 0.05, meaning there is 5% probability to reject the hypothesis even it is true. Then, the type I error rate to reject either or both of the two hypotheses is $1 - (1-.05)*(1-0.05) = 0.0975$, which is about two times of the nominal level 0.05. This indicates to control the type I error rate, one should use a smaller significance level. Such a significance level can be calculated. Suppose we have $k$ simultaneous hypotheses to test and we want to keep the overall type I error rate to be 0.05. Let $\alpha^*$ is the significance level to use. Then we have \[ 1-(1-\alpha^*)^k = 0.05. \] Solving this equation leads to \[ \alpha^*= 1 - 0.95^{1/k} \approx 0.05/k. \] For the one-way ANOVA post-hoc comparison, we conducted a total of 6 hypotheses tests. Therefore, the corrected significance level should be $0.05/6 = 0.008$. Therefore, only a p-value smaller than this can be considered significant. In this case, the difference between group 3 and group 4 is not statistically significant anymore. Another way to solve the problem is to multiply the obtained p-value by the number of tests and then compare it with 0.05 all the time. The way to adjust the p-value is called Bonferroni correction (Dunn, 1961). This can be done easily in R as shown below.
> usedata('active') > attach(active) > > pairwise.t.test(hvltt2, group, p.adj = "bonf") Pairwise comparisons using t tests with pooled SD data: hvltt2 and group 1 2 3 2 0.00017 - - 3 0.00828 1.00000 - 4 1.7e-06 1.00000 0.26720 P value adjustment method: bonferroni >
Other p-value correction methods
Bonferroni correction does not take into consideration of the actual size of p-values. Other methods can have a larger power than the Bonferroni correction method. One such method is the Holm's step-down method. The method obtain the corrected level for rejecting a null hypothesis in the following way.- Specify $\alpha$, the level at which to control the overall type I error.
- Compute $p$-values, $p_1, \ldots , p_m$, for the m null hypotheses $H_{01}, \ldots, H_{0m}$.
- Order the m p-values so that $p^{(1)} \leq p^{(2)} \leq \ldots \leq p^{(m)}$.
- Define $L = min\left\{ j: p^{(j)} > \frac{\alpha}{m+1−j} \right \}$
- Reject all null hypotheses $H_{0j}$ for which $p_j < p^{(L)}$
> attach(active) > > all.pvalue <- pairwise.t.test(hvltt2, group, p.adj = "bonf") > all.pvalue.rm.na <- na.omit(c(all.pvalue$p.value)) > p.adjust(all.pvalue.rm.na, method="holm") [1] 8.724122e-04 3.312008e-02 9.976196e-06 1.000000e+00 1.000000e+00 [6] 8.015928e-01 >
Barplot with standard error or confidence interval
With a fairly large sample size, it can be useful to make a barplot to show the group information. The height of each bar is proportional to the mean of each group. In addition, the standard error or the confidence interval for the mean of each group can be added to the bars. Such a plot can be generated using the R package ggplot2. To generate a such a plot, we need a data frame with the following information: the grouping variable, the mean of each group and the standard error. For the ACTIVE data, the information is given in the following table.
group | mean | se |
1 | 26.1 | 0.27 |
2 | 24.5 | 0.27 |
3 | 24.9 | 0.25 |
4 | 24.1 | 0.29 |
The R code below calculates the information in the table and generates a bar plot as shown in the output. We now explain each line of the R code.
> library(Rmisc) Loading required package: lattice Loading required package: plyr > library(ggplot2) > > usedata('active') > stats <- summarySE(active, measurevar="hvltt2", groupvars=c("group")) > stats group N hvltt2 sd se ci 1 1 392 26.10204 5.293416 0.2673579 0.5256389 2 2 387 24.49871 5.307817 0.2698115 0.5304842 3 3 407 24.89189 5.116197 0.2536005 0.4985340 4 4 389 24.13111 5.625401 0.2852192 0.5607685 > > ggplot(stats, aes(x=factor(group), y=hvltt2)) + + geom_bar(stat="identity", fill="blue", width=0.8) + + geom_errorbar(aes(ymin=hvltt2-se, ymax=hvltt2+se), + width=.2, colour='red') + + xlab("Groups") + + ylab("Memory test score") + + scale_x_discrete(breaks=c("1", "2", "3", "4"), + labels=c("Memory", "Reasoning", "Speed", "Control")) >
- To generate the plot, two R packages
Rmisc
and ggplot2 are used. - The function
summarySE()
fromRmisc
is first used to obtain the needed information that includes the mean and se of hvltt2 for each groups the four groups.- For the
summarySE()
function, (1) the first input the name of thedataframe
to be used, (2)measurevar
tells the target variable to be analyzed, and (3)groupvars
tells the grouping variable to be used.
- For the
To generate the plot, the ggplot()
function from the ggplot2 package is used.
ggplot()
can generate all kinds of plot and allow the use of difference options. Note that the different options (layers) are connected using the symbol "+".- Line 8 provides the basic information for a plot. It says the data set stats are used here.
aes()
function provides some other information. For example, here it tells the variable on the x-axis should be "group" variable in the data usingx=group
. Similarly, on the y-axis, the group mean should be plotted. geom_bar
tells that a bar plot is used.stat="identify"
tells the bar should be proportional to the data value in the data set.fill="blue"
specifies the color to fill the bar, blue in this example.width=0.8
tells the width of the bar.geom_errorbar()
tells to add error bar to the plot.aes(ymin=mean-se, ymax=mean+se)
specifies the lower and upper values for the bar. In the barplot with error bars, the lower value is the mean minus one standard error and the upper value is the mean plus one standard error.width=.2
tells the width andcolour='red'
tells the color of the error bars.xlab
andylab
add the labels for the x-axis and y-axis.scale_x_discrete
can be used to customize the labels for each group. In the original data, the 4 groups are represented by 1, 2, 3, and 4.breaks
tells the position of the original data andlabels
provides the new labels for each group.
To cite the book, use:
Zhang, Z. & Wang, L. (2017-2022). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.