Structural Equation Models

Simply speaking, a structural equation model (SEM) is a combination of confirmatory factor analysis and path analysis. Structural equation modeling includes two sets of models – the measurement model and the structural model. The measurement model can be expressed as a factor model. Figure 1 is a model to measure cognitive ability using three variables – verbal ability, math ability, and speed ability (note each of them can be viewed as factors measured by lower level observed variables).

1. A factor model to measure cognitive ability

Figure 2 gives another example of measurement model – a model to measure health.

2. A factor model to measure health

If one believes that health influences cognitive ability, then one can fit a path model using the factors – cognitive ability and health. Therefore, a structural model is actually a path model. Putting them together, we have a model in Figure 3. This model is called SEM model.

3. A factor model to measure health

Example 1. Autoregressive model

In ACTIVE study, we have three variables – word series (ws), letter series (ls), and letter sets (ls) to measure reasoning ability. Also, we have data on all these three variables before and after training. Assume we want to test whether reasoning ability before training can predict reasoning ability after training. Then the SEM model in Figure 4 can be used. Not that we allow the factor in time 1 to predict the factor at time 2. In addition, we allow the uniqueness factors for each observed variable to be correlated. The R code for the analysis is given below.

4. An autoregressive factor model

First look at model fit. The chi-square value is 27 with 5 degrees of freedom. The p-value for chi-square test is almost 0. Thus, based on chi-square test, this is not a good model. However, CFI and TFI are both close to 1. The RMSEA is about 0.063 and SRMR is about 0.011. Considering the sample size here is large – N=1114, overall, we may accept this model is a fairly good model. Then we can answer our question. Because the regression coefficient from reasoning1 to reasoning2 is significant, reasoning ability before training seems to predict reasoning ability after training. In other words, those with higher reasoning ability before training tend to have higher reasoning ability after training.

> library(lavaan)
This is lavaan 0.6-19
lavaan is FREE software! Please report any bugs.
> usedata('active.full.csv')
> automodel <- '
+ reasoning1 =~ ws1 + ls1 + lt1
+ reasoning2 =~ ws2 + ls2 + lt2
+ reasoning2 ~ reasoning1
+ ws1 ~~ ws2
+ ls1 ~~ ls2
+ lt1 ~~ lt2
+ '
> 
> auto.res <- sem(automodel,  data=active.full)
> summary(auto.res, fit=TRUE)
lavaan 0.6-19 ended normally after 66 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        16

  Number of observations                          1114

Model Test User Model:
                                                      
  Test statistic                                27.213
  Degrees of freedom                                 5
  P-value (Chi-square)                           0.000

Model Test Baseline Model:

  Test statistic                              5827.630
  Degrees of freedom                                15
  P-value                                        0.000

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.996
  Tucker-Lewis Index (TLI)                       0.989

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -16447.377
  Loglikelihood unrestricted model (H1)     -16433.771
                                                      
  Akaike (AIC)                               32926.755
  Bayesian (BIC)                             33007.006
  Sample-size adjusted Bayesian (SABIC)      32956.186

Root Mean Square Error of Approximation:

  RMSEA                                          0.063
  90 Percent confidence interval - lower         0.041
  90 Percent confidence interval - upper         0.087
  P-value H_0: RMSEA <= 0.050                    0.151
  P-value H_0: RMSEA >= 0.080                    0.133

Standardized Root Mean Square Residual:

  SRMR                                           0.013

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning1 =~                                       
    ws1               1.000                           
    ls1               1.192    0.030   40.091    0.000
    lt1               0.422    0.016   26.301    0.000
  reasoning2 =~                                       
    ws2               1.000                           
    ls2               1.110    0.026   43.371    0.000
    lt2               0.411    0.014   29.443    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning2 ~                                        
    reasoning1        1.073    0.024   43.919    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .ws1 ~~                                              
   .ws2               1.216    0.327    3.718    0.000
 .ls1 ~~                                              
   .ls2               0.356    0.401    0.888    0.375
 .lt1 ~~                                              
   .lt2               1.596    0.138   11.544    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ws1               5.511    0.385   14.324    0.000
   .ls1               4.547    0.475    9.575    0.000
   .lt1               3.996    0.181   22.065    0.000
   .ws2               5.021    0.419   11.995    0.000
   .ls2               5.161    0.507   10.182    0.000
   .lt2               3.963    0.181   21.875    0.000
    reasoning1       19.103    1.060   18.014    0.000
   .reasoning2        2.447    0.289    8.479    0.000

> 

Example 2. Mediation analysis with latent variables

In path analysis, we have fitted a complex mediation model. Since we know that ws1, ls1, and lt1 are measurements of reasoning ability, we can form a latent reasoning ability variable. Thus, our mediation model can be expressed as in Figure 5.

5. A mediation model with latent variables

Given CFI = 0.997, RMSEA =  0.034 and SRMR = 0.015, we accept the model as a good model even though the chi-square test is significant. Based on the Sobel test, the total indirect effect from age to ept1 through hvltt1 and reasoning is significant.

> library(lavaan)
This is lavaan 0.5-23.1097
lavaan is BETA software! Please report any bugs.
> usedata('active.full.csv')
> #head(active.full)
> med.model <- '
+ reasoning =~ ws1 + ls1 + lt1
+ reasoning ~ p4*age + p8*edu
+ hvltt1 ~ p2*age + p7*edu
+ hvltt1 ~~ reasoning
+ ept1 ~ p1*age + p6*edu + p3*hvltt1 + p5*reasoning
+ indirect := p2*p3 + p4*p5
+ total := p1 + p2*p3 + p7*p3
+ '
> 
> med.res <- sem(med.model, data=active.full)
> summary(med.res, fit=TRUE)
lavaan (0.5-23.1097) converged normally after  51 iterations

  Number of observations                          1114

  Estimator                                         ML
  Minimum Function Test Statistic               18.363
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.019

Model test baseline model:

  Minimum Function Test Statistic             3344.605
  Degrees of freedom                                20
  P-value                                        0.000

User model versus baseline model:

  Comparative Fit Index (CFI)                    0.997
  Tucker-Lewis Index (TLI)                       0.992

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)             -20675.968
  Loglikelihood unrestricted model (H1)     -20666.787

  Number of free parameters                         17
  Akaike (AIC)                               41385.937
  Bayesian (BIC)                             41471.204
  Sample-size adjusted Bayesian (BIC)        41417.207

Root Mean Square Error of Approximation:

  RMSEA                                          0.034
  90 Percent Confidence Interval          0.013  0.055
  P-value RMSEA <= 0.05                          0.889

Standardized Root Mean Square Residual:

  SRMR                                           0.015

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning =~                                        
    ws1               1.000                           
    ls1               1.181    0.029   41.262    0.000
    lt1               0.430    0.016   26.758    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  reasoning ~                                         
    age       (p4)   -0.228    0.023   -9.737    0.000
    edu       (p8)    0.745    0.047   15.792    0.000
  hvltt1 ~                                            
    age       (p2)   -0.161    0.027   -6.074    0.000
    edu       (p7)    0.429    0.052    8.177    0.000
  ept1 ~                                              
    age       (p1)    0.030    0.022    1.389    0.165
    edu       (p6)    0.396    0.046    8.527    0.000
    hvltt1    (p3)    0.169    0.026    6.488    0.000
    reasoning (p5)    0.643    0.035   18.186    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .reasoning ~~                                        
   .hvltt1            7.255    0.592   12.263    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ws1               5.334    0.366   14.594    0.000
   .ls1               4.850    0.444   10.910    0.000
   .lt1               3.917    0.181   21.657    0.000
   .hvltt1           20.618    0.874   23.601    0.000
   .ept1             11.505    0.522   22.027    0.000
   .reasoning        13.814    0.777   17.777    0.000

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    indirect         -0.174    0.019   -9.368    0.000
    total             0.075    0.025    3.008    0.003

> 

To cite the book, use: Zhang, Z. & Wang, L. (2017-2022). Advanced statistics using R. Granger, IN: ISDSA Press. https://doi.org/10.35566/advstats. ISBN: 978-1-946728-01-2.
To take the full advantage of the book such as running analysis within your web browser, please subscribe.