Moderation Analysis
What is a moderator?
To explain what is a moderator, we start with a bivariate relationship between an input variable X and an outcome variable $Y$. For example, $X$ could be the number of training sessions (training intensity) and $Y$ could be math test score. We can hypothesize that there is a relationship between them such that the number of training sessions predicts math test performance. Using a diagram, we can portray the relationship below.
The above path diagram can be expressed using a regression model as
\[ Y=\beta_{0}+\beta_{1}*X+\epsilon \] where $\beta_{0}$ is the intercept and $\beta_{1}$ is the slope.A moderator variable Z is a variable that alters the strength of the relationship between $X$ and $Y$. In other words, the effect of $X$ on $Y$ depends on the levels of the moderator $Z$. For instance, if male students ($Z=0$) benefit more (or less) from training than female students ($Z=1$), then gender can be considered as a moderator. Using the diagram, if the coefficient $a$ is different $b$, there is a moderation effect.
To summarize, a moderator $Z$ is a variable that alters the direction and/or strength of the relation between a predictor $X$ and an outcome $Y$.
Questions involving moderators address “when” or “for whom” a variable most strongly predicts or causes an outcome variable. Using a path diagram, we can express the moderation effect as:
How to conduct moderation analysis?
Moderation analysis can be conducted by adding one or multiple interaction terms in a regression analysis. For example, if $Z$ is a moderator for the relation between $X$ and $Y$, we can fit a regression model
\begin{eqnarray*} Y & = & \beta_{0}+\beta_{1}*X+\beta_{2}*Z+\beta_{3}*X*Z+\epsilon\\ & = & \beta_{0}+\beta_{2}*Z+(\beta_{1}+\beta_{3}*Z)*X+\epsilon. \end{eqnarray*}
Thus, if \(\beta_{3}\) is not equal to 0, the relationship between $X$ and $Y$ depends on the value of $Z$, which indicates a moderation effect. In fact, from the regression model, we can get:
- If $z=0$, the effect of $X$ on Y is $\beta_{1}+\beta_{3}*0=\beta_{1}$.
- If $z=2$, the effect of $X$ on Y is $\beta_{1}+\beta_{3}*2$.
- If $z=4$, the effect of $X$ on Y is $\beta_{1}+\beta_{3}*4$.
If $Z$ is a dichotomous/binary variable, for example, gender, the above equation can be written as
\begin{eqnarray*} Y & = & \beta_{0}+\beta_{1}*X+\beta_{2}*Z+\beta_{3}*X*Z+\epsilon\\ & = & \begin{cases} \beta_{0}+\beta_{1}*X+\epsilon & \mbox{For male students}(Z=0)\\ \beta_{0}+\beta_{2}+(\beta_{1}+\beta_{3})*X+\epsilon & \mbox{For female students}(Z=1) \end{cases} \end{eqnarray*}
Thus, if $\beta_{3}$ is not equal to 0, the relationship between X and Y depends on the value of $Z$, which indicates a moderation effect. When $z=0,$ the effect of $X$ on Y is $\beta_{1}+\beta_{3}*0=\beta_{1}$ and when $z=1$, the effect of $X$ on Y is $\beta_{1}+\beta_{3}*1$ for female students.
Steps for moderation analysis
A moderation analysis typically consists of the following steps.
- Compute the interaction term XZ=X*Z.
- Fit a multiple regression model with X, Z, and XZ as predictors.
- Test whether the regression coefficient for XZ is significant or not.
- Interpret the moderation effect.
- Display the moderation effect graphically.
An example
The data set mathmod.csv
includes three variables: training intensity, gender, and math test score. Using the example, we investigate whether the effect of training intensity on math test performance depends on gender. Therefore, we evaluate whether gender is a moderator.
The python code for the analysis is given below.
>>> import pandas as pd >>> mathmod = pd.read_csv('https://advstats.psychstat.org/data/mathmod.csv') >>> mathmod training gender math 0 0.0 0 4.5 1 0.1 1 2.4 2 0.2 1 3.1 3 0.3 0 5.8 4 0.4 1 3.2 .. ... ... ... 96 9.6 1 3.2 97 9.7 1 3.8 98 9.8 1 3.2 99 9.9 1 6.1 100 10.0 0 0.6 [101 rows x 3 columns] >>> >>> mathmod['xz'] = mathmod['gender'] * mathmod['training'] >>> >>> ## fit the model >>> import statsmodels.formula.api as smf >>> reg = smf.ols("math ~ training + gender + xz", data=mathmod).fit() >>> print(reg.summary()) OLS Regression Results ============================================================================== Dep. Variable: math R-squared: 0.380 Model: OLS Adj. R-squared: 0.361 Method: Least Squares F-statistic: 19.81 Date: Wed, 27 Nov 2024 Prob (F-statistic): 4.26e-10 Time: 13:27:14 Log-Likelihood: -136.43 No. Observations: 101 AIC: 280.9 Df Residuals: 97 BIC: 291.3 Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 4.9900 0.275 18.146 0.000 4.444 5.536 training -0.3394 0.054 -6.301 0.000 -0.446 -0.233 gender -2.7569 0.379 -7.272 0.000 -3.509 -2.004 xz 0.5043 0.068 7.367 0.000 0.368 0.640 ============================================================================== Omnibus: 0.021 Durbin-Watson: 2.305 Prob(Omnibus): 0.990 Jarque-Bera (JB): 0.050 Skew: -0.025 Prob(JB): 0.975 Kurtosis: 2.903 Cond. No. 34.3 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Since the regression coefficient (0.504
) for the interaction term XZ is significant at the alpha level 0.05 with a p-value=5.8e-11
, there exists a significant moderation effect. In other words, the effect of training intensity on math performance significantly depends on gender.
When Z=0 (male students), the estimated effect of training intensity on math performance is \(\hat{\beta}_{1}=-.34\). When Z=1 (female students), the estimated effect of training intensity on math performance is \(\hat{\beta}_{1}+\hat{\beta}_{3}=-.34+.50=.16\). The moderation analysis tells us that the effects of training intensity on math performance for males (-.34
) and females (.16
) are significantly different for this example.
Interaction plot
A moderation effect indicates the regression slopes are different for different groups. Therefore, if we plot the regression line for each group, they should interact at certain point. Such a plot is called an interaction plot. To get the plot, we first calculate the intercept and slope for each level of the moderator. For this example, we have
\begin{eqnarray*} Y & = & \beta_{0}+\beta_{1}*X+\beta_{2}*Z+\beta_{3}*X*Z \\ & = & \begin{cases} \beta_{0}+\beta_{1}*X& \mbox{For male students}(Z=0) \\ \beta_{0}+\beta_{2}+(\beta_{1}+\beta_{3})*X& \mbox{For female students}(Z=1) \end{cases}. \\ & = & \begin{cases} 5 - 0.34*X& \mbox{For male students}(Z=0)\\ 2.23 + 0.16*X& \mbox{For female students}(Z=1)\end{cases}\end{eqnarray*}
With the information, we can generate a plot using the python code below. Note that we generate the data using the estimated regression model.
>>> import pandas as pd >>> import numpy as np >>> import matplotlib.pyplot as plt >>> >>> ## read data >>> mathmod = pd.read_csv('https://advstats.psychstat.org/data/mathmod.csv') >>> >>> # Define x values (minimum to maximum of training) >>> x = np.linspace(np.min(mathmod['training']), np.max(mathmod['training']), 100) >>> >>> # Define the equations of the two lines >>> y1 = 5 - 0.34 * x >>> y2 = 2.23 + 0.16 * x >>> >>> # Create the plot >>> _=plt.figure(figsize=(8, 6)) >>> >>> # Plot the two lines >>> plt.plot(x, y1, label="Male", color='blue', linewidth=2) [] >>> plt.plot(x, y2, label="Female", color='red', linewidth=2) [ ] >>> >>> # Add labels and title >>> plt.xlabel('Training intensity') Text(0.5, 0, 'Training intensity') >>> plt.ylabel('Math score') Text(0, 0.5, 'Math score') >>> plt.title('Moderation plot') Text(0.5, 1.0, 'Moderation plot') >>> >>> # Add a legend >>> plt.legend() >>> >>> # Show the plot >>> plt.grid(True) >>> plt.savefig('mod.svg', format='svg') >>> >>> plt.show()
Another example - continuous moderator
The data set depress.csv
includes three variables: Stress, Social support and Depression. Suppose we want to investigate whether social support is a moderator for the relation between stress and depression. That is, to study whether the effect of stress on depression depends on different levels of social support. Note that the potential moderator social support is a continuous variable.
The analysis is given below. The regression coefficient estimate of the interaction term is -.39
with t = -20.754, p <.001
. Therefore, social support is a significant moderator for the relation between stress and depression. The relation between stress and depression significantly depends on different levels of social support.
>>> import pandas as pd >>> depress = pd.read_csv('https://advstats.psychstat.org/data/depress.csv') >>> depress.head() stress support depress 0 7 5 32 1 8 7 20 2 2 2 30 3 7 6 25 4 6 9 19 >>> >>> depress['xz'] = depress['stress'] * depress['support'] >>> >>> ## fit the model >>> import statsmodels.formula.api as smf >>> reg = smf.ols("depress ~ stress + support + xz", data=depress).fit() >>> print(reg.summary()) OLS Regression Results ============================================================================== Dep. Variable: depress R-squared: 0.964 Model: OLS Adj. R-squared: 0.963 Method: Least Squares F-statistic: 853.0 Date: Sun, 24 Nov 2024 Prob (F-statistic): 4.83e-69 Time: 20:58:40 Log-Likelihood: -172.76 No. Observations: 100 AIC: 353.5 Df Residuals: 96 BIC: 363.9 Df Model: 3 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ Intercept 29.2583 0.691 42.351 0.000 27.887 30.630 stress 1.9956 0.116 17.185 0.000 1.765 2.226 support -0.2356 0.111 -2.125 0.036 -0.456 -0.015 xz -0.3902 0.019 -20.754 0.000 -0.428 -0.353 ============================================================================== Omnibus: 1.159 Durbin-Watson: 2.093 Prob(Omnibus): 0.560 Jarque-Bera (JB): 0.673 Skew: 0.142 Prob(JB): 0.714 Kurtosis: 3.284 Cond. No. 187. ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Since social support is a continuous variable, there is no immediate levels to look at the relationship between stress and depression. However, we can choose several difference levels. One way is to use these three levels of a moderator: mean, one standard deviation below the mean and one standard deviation above the mean. For this example, the three values for social support are 5.37, 2.56 and 8.18. The fitted regression lines for the three values are
\begin{eqnarray*} \hat{depress} & = & 29.26+2.00*stress-.24*support-.39*stress*support\\ & = & \begin{cases} 28.65+1*stress & \;support=2.56\\ 27.97-.09*stress & \;support=5.37.\\ 27.30-1.19*stress & \;support=8.18 \end{cases} \end{eqnarray*}
From it, we can clearly see that with more social support, the relationship between depression and stress becomes negative from positive. This can also be seen from the interaction plot below.
>>> import pandas as pd >>> import numpy as np >>> import matplotlib.pyplot as plt >>> >>> ## read data >>> depress = pd.read_csv('https://advstats.psychstat.org/data/depress.csv') >>> >>> # Define x values (minimum to maximum of training) >>> x = np.linspace(np.min(depress['stress']), np.max(depress['stress']), 100) >>> >>> # Define the equations of the two lines >>> y1 = 28.65 + 1.0 * x >>> y2 = 27.97 - 0.09 * x >>> y3 = 27.30 - 1.19 * x >>> >>> # Create the plot >>> _=plt.figure(figsize=(8, 6)) >>> >>> # Plot the two lines >>> plt.plot(x, y1, label="Low support", color='blue', linewidth=2) [] >>> plt.plot(x, y2, label="Medium support", color='red', linewidth=2) [ ] >>> plt.plot(x, y3, label="High support", color='green', linewidth=2) [ ] >>> >>> # Add labels and title >>> plt.xlabel('Stress') Text(0.5, 0, 'Stress') >>> plt.ylabel('Depression') Text(0, 0.5, 'Depression') >>> plt.title('Moderation plot') Text(0.5, 1.0, 'Moderation plot') >>> >>> # Add a legend >>> plt.legend() >>> >>> # Show the plot >>> plt.savefig('mod.svg', format='svg') >>> >>> plt.show()
To cite the book, use:
Zhang, Z. & Wang, L. (2017-2022). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.