t test

A t-test is a family of statistical hypothesis tests in which the test statistic follows a Student's t-distribution under the null hypothesis. The most widely used t-tests include the one-sample t-test, t-test for paired samples and independent two-sample t-test.

One-sample t-test

The one-sample t-test is used to test the null hypothesis the population mean \(\mu\) is equal to a specified value \(\mu_0\), which is often 0. Therefore, the null hypothesis is

\[H_0: \mu = \mu_0.\]

Depending on the alternative hypothesis, we can carry out either a one-tailed or two-tailed test. If the sign for the difference between \(\mu\) and \(\mu_0\) is not known, a two-tailed test should be used and the alternative hypothesis is

\[H_1: \mu \neq \mu_0. \]

Otherwise, a one-tailed test is used and the alternative hypothesis is

\[H_a: \mu > \mu_0 \]

if it is expected the population mean is greater than \(\mu_0\), or

\[H_a: \mu < \mu_0 \]

if it is expected the population mean is less than \(\mu_0\).

Test statistic

For one-sample t-test, the statistic

\[t={\frac {{\overline {x}}-\mu _{0}}{s/{\sqrt {n}}}}\]

where \(\overline{x}\) is the sample mean, \(s\) is the sample standard deviation of the sample and \(n\) is the sample size. This is also called the t-statistics, which follows a \(t-\)distribution with the degrees of freedom \(n − 1\), under the assumption that the data follow a normal distribution. With the \(t-\)distribution, the calculation of a p-value is illustrated in the figure below.

Using the ACTIVE data, we want to test whether the education level of people older than 65 years is above high school (years of education is greater than 13). From the t-test output, we have t-value 2.465. Comparing that with a t-distribution with degrees of freedom 2801, we get the \(p\)-value = 0.007. We therefore reject the null hypothesis.

> usedata('active') > attach(active) > > t.test(edu, mu=12, alternative = "greater" ) One Sample t-test data: edu t = 3.7234, df = 1574, p-value = 0.0001017 alternative hypothesis: true mean is greater than 12 95 percent confidence interval: 13.30691 Inf sample estimates: mean of x 14.34222 >

Effect size

For the one-sample t-test, the effect size is defined as

\[\delta={\frac {\mu-\mu _0}{\delta}},\]

where \(\delta\) is the population standard deviation. The sample effect size

\[d={\frac {{\overline {x}}-\mu _{0}}{s}}.\]

Whether an effect size should be interpreted small, medium, or large depends on its substantive context and its operational definition. Some guidelines were provided in the literature as shown in the table below. However, they should be used with extraordinary caution.

Effect size d
Very small 0.01
Small 0.20
Medium 0.50
Large 0.80
Very large 1.20
Huge 2.0

For the ACTIVE example, the estimated effect size \(d=0.047\), indicating a small effect. Note that even though the result based on the t-test was significant, the difference was actually quite small in practice.

> usedata('active') > attach(active) > > (14.23233 - 13)/sd(edu) [1] 0.04656958 >

t-test for paired samples

This test is used when we have paired samples where two samples can be matched or "paired". A common example is pre- and post-test design or repeated measures. For example, suppose we want to assess of an intervention method to reduce depression. We can enroll 100 participants and measure each participant's depression level. Then all the participants are given the intervention, after which their depression levels are measured again. Our interest is in whether the intervention has any effect on mean depression levels.

To answer the question, we can first calculate the difference in depression level before and after intervention: \(d_i = y_{i2} - y_{i1}\). Then \(d_i\) can be analyzed using the one-sample t-test.

\[t = \frac{\overline{d}-\mu_0}{s_d/\sqrt(n)},\]

where \(\overline{d}\) is the average and \(s_d\) is the standard deviation of the differences. Under the null hypothesis \(H_0:\mu=\mu_0\), the t-statistic follows a t-distribution with the degree of freedom used is \(n-1\), where \(n\) represents the number of pairs.

Note that although the mean difference is the same for the paired and unpaired samples, their statistical significance levels can be very different. This is because the variance of \(d\) is

\[\text{var}(d) = \text{var}(y_2 - y_1) = \text{var}(y_1) + \text{var}(y_2) - 2\rho \text{var}(y_1) \text{var}(y_2)\]

where \(\rho\) is the correlation before and after the treatment. Since the correlation is often positive, the variance for the paired samples is often smaller than unpaired samples.

In the ACTIVE data, we have measures on verbal test for 4 times. As an example, we want to test if there is any difference in the score between the first time and the last time. The input and output can be seen below. Note that the same t.test() function is used but the option paired=TRUE is used.

> usedata('active') > attach(active) > > t.test(hvltt, hvltt4, paired=TRUE) Paired t-test data: hvltt and hvltt4 t = -0.45061, df = 1811, p-value = 0.6523 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.2658514 0.1665137 sample estimates: mean of the differences -0.04966887 >

Independent two-sample t-test

The independent samples t-test is used when two separate sets of independent and identically distributed samples are obtained, one from each of the two populations being compared. For example, in evaluating the effect of an intervention, we enroll 100 participants and randomly assign 50 to the treatment group and the other 50 to the control group. In this case, we have two independent samples and should use the independent two-sample t-test.

Welch's t test (unpooled two independent sample t test)

When the two population variances of the two groups are not equal (the two sample sizes may or may not be equal). The \(t\) statistic to test whether the population means are different is calculated as:

\[t=\frac{\bar{x}_{1}-\bar{x}_{2}}{s_{\overline{\Delta}}}\]

where

\[s_{\overline{\Delta}}=\sqrt{\frac{s_{1}^{2}}{n_{1}}+\frac{s_{2}^{2}}{n_{2}}}.\]

Here, \(s_{1}^{2}\) and \(s_{2}^{2}\) are the unbiased estimators of the variances of the two samples with \(n_{k}\) = number of participants in group \(k\) = 1 or 2. For use in significance testing, the distribution of the test statistic is approximated as an ordinary Student's \(t\) distribution with the degrees of freedom calculated as

\[\mathrm{d.f.}=\frac{(s_{1}^{2}/n_{1}+s_{2}^{2}/n_{2})^{2}}{(s_{1}^{2}/n_{1})^{2}/(n_{1}-1)+(s_{2}^{2}/n_{2})^{2}/(n_{2}-1)}\]

This is known as the Welch-Satterthwaite equation. The true distribution of the test statistic actually depends (slightly) on the two unknown population variances.

In R, the function t.test() can be used to conduct a t test. The following code conducts the Welch's t test. Note that alternative = "greater" sets the alternative hypothesis. The other options include two.sided and less.

> usedata('active') > attach(active) > > training<-hvltt2[group==1] > control<-hvltt2[group==4] > > mean(training, na.rm=T)-mean(control, na.rm=T) [1] 1.538577 > > t.test(training, control, alternative = 'greater') Welch Two Sample t-test data: training and control t = 4.6022, df = 1272.7, p-value = 2.299e-06 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 0.9882856 Inf sample estimates: mean of x mean of y 25.15493 23.61635 >

Pooled two independent sample t test

When the two samples have the same population variance.The \(t\) statistic can be calculated as follows:

\[t=\frac{\bar{x}_{1}-\bar{x}_{2}}{s_{p}\cdot\sqrt{\frac{1}{n_{1}}+\frac{1}{n_{2}}}}\]

where

\[s_{p}=\sqrt{\frac{(n_{1}-1)s_{1}^{2}+(n_{2}-1)s_{2}^{2}}{n_{1}+n_{2}-2}}\]

is an estimator of the pooled standard deviation of the two samples. \(n_{k}-1\) is the degrees of freedom for each group, and the total sample size minus two (\(n_{1}+n_{2}-2\)) is the total number of degrees of freedom, which is used in significance testing.

The pooled two independent sample t test can also be conducted using the t.test() function by setting the option var.equal=T or TRUE.

> usedata('active') > attach(active) > > training<-hvltt2[group==1] > control<-hvltt2[group==4] > > t.test(training, control, var.equal=T, alternative = 'greater') Two Sample t-test data: training and control t = 4.602, df = 1273, p-value = 2.301e-06 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 0.9882598 Inf sample estimates: mean of x mean of y 25.15493 23.61635 >

Effect size

For the two-sample t-test, Cohen's d is defined as the difference between two means divided by a standard deviation.

\[d=\frac{\bar{x}_{1}-\bar{x}_{2}}{s_{p}}\]

Cohen defined \(s_{p}\), the pooled standard deviation, as

\[s_{p}=\sqrt{\frac{(n_{1}-1)s_{x_{1}}^{2}+(n_{2}-1)s_{x_{2}}^{2}}{n_{1}+n_{2}-2}}\] For the ACTIVE data analysis example, the effect size is calculated as below.
> usedata('active') > attach(active) > > training<-hvltt2[group==1] > control<-hvltt2[group==4] > > mean1=mean(training,na.rm=T) > mean2=mean(control,na.rm=T) > meandiff=mean1-mean2 > > n1=length(training)-sum(is.na(training)) > n2=length(control)-sum(is.na(control)) > > v1=var(training,na.rm=T) > v2=var(control,na.rm=T) > s=sqrt(((n1-1)*v1+(n2-1)*v2)/(n1+n2-2)) > s [1] 5.968894 > > cohend=meandiff > cohend [1] 1.538577 >

To cite the book, use: Zhang, Z. & Wang, L. (2017-2022). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.