The following considerations apply to the construction and assessment of many structural equation models.
Model specification Building or specifying a model requires attending to: • the set of variables to be employed, • what is known about the variables, • what is theorized or hypothesized about the variables' causal connections and disconnections, • what the researcher seeks to learn from the modeling, and • the instances of missing values and/or the need for imputation. Structural equation models attempt to mirror the worldly forces operative for causally homogeneous cases – namely cases enmeshed in the same worldly causal structures but whose values on the causes differ and who therefore possess different values on the outcome variables. Causal homogeneity can be facilitated by case selection, or by segregating cases in a multi-group model. A model's specification is not complete until the researcher specifies: • which effects and/or correlations/covariances are to be included and estimated, • which effects and other coefficients are forbidden or presumed unnecessary, • and which coefficients will be given fixed/unchanging values (e.g. to provide measurement scales for latent variables as in Figure 2). The latent level of a model is composed of
endogenous and exogenous variables. The endogenous latent variables are the true-score variables postulated as receiving effects from at least one other modeled variable. Each endogenous variable is modeled as the dependent variable in a regression-style equation. The exogenous latent variables are background variables postulated as causing one or more of the endogenous variables and are modeled like the predictor variables in regression-style equations. Causal connections among the exogenous variables are not explicitly modeled but are usually acknowledged by modeling the exogenous variables as freely correlating with one another. The model may include intervening variables – variables receiving effects from some variables but also sending effects to other variables. As in regression, each endogenous variable is assigned a residual or error variable encapsulating the effects of unavailable and usually unknown causes. Each latent variable, whether
exogenous or endogenous, is thought of as containing the cases' true-scores on that variable, and these true-scores causally contribute valid/genuine variations into one or more of the observed/reported indicator variables. The LISREL program assigned Greek names to the elements in a set of matrices to keep track of the various model components. These names became relatively standard notation, though the notation has been extended and altered to accommodate a variety of statistical considerations. Texts and programs "simplifying" model specification via diagrams or by using equations permitting user-selected variable names, re-convert the user's model into some standard matrix-algebra form in the background. The "simplifications" are achieved by implicitly introducing default program "assumptions" about model features with which users supposedly need not concern themselves. Unfortunately, these default assumptions easily obscure model components that leave unrecognized issues lurking within the model's structure, and underlying matrices. Two main components of models are distinguished in SEM: the
structural model showing potential causal dependencies between
endogenous and exogenous latent variables, and the
measurement model showing the causal connections between the latent variables and the indicators. Exploratory and confirmatory
factor analysis models, for example, focus on the causal measurement connections, while
path models more closely correspond to SEMs latent structural connections. Modelers specify each coefficient in a model as being
free to be estimated, or
fixed at some value. The free coefficients may be postulated effects the researcher wishes to test, background correlations among the exogenous variables, or the variances of the residual or error variables providing additional variations in the endogenous latent variables. The fixed coefficients may be values like the 1.0 values in Figure 2 that provide a scales for the latent variables, or values of 0.0 which assert causal disconnections such as the assertion of no-direct-effects (no arrows) pointing from Academic Achievement to any of the four scales in Figure 1. SEM programs provide estimates and tests of the free coefficients, while the fixed coefficients contribute importantly to testing the overall model structure. Various kinds of constraints between coefficients can also be used. "Accepting" failing models as "close enough" is also not a reasonable alternative. A cautionary instance was provided by Browne, MacCallum, Kim, Anderson, and Glaser who addressed the mathematics behind why the test can have (though it does not always have) considerable power to detect model misspecification. The probability accompanying a test is the probability that the data could arise by random sampling variations if the current model, with its optimal estimates, constituted the real underlying population forces. A small probability reports it would be unlikely for the current data to have arisen if the current model structure constituted the real population causal forces – with the remaining differences attributed to random sampling variations. Browne, McCallum, Kim, Andersen, and Glaser presented a factor model they viewed as acceptable despite the model being significantly inconsistent with their data according to . The fallaciousness of their claim that close-fit should be treated as good enough was demonstrated by Hayduk, Pazkerka-Robinson, Cummings, Levers and Beres who demonstrated a fitting model for Browne, et al.'s own data by incorporating an experimental feature Browne, et al. overlooked. The fault was not in the math of the indices or in the over-sensitivity of testing. The fault was in Browne, MacCallum, and the other authors forgetting, neglecting, or overlooking, that the amount of ill fit cannot be trusted to correspond to the nature, location, or seriousness of problems in a model's specification. Many researchers tried to justify switching to fit-indices, rather than testing their models, by claiming that increases (and hence probability decreases) with increasing sample size (N). There are two mistakes in discounting on this basis. First, for proper models, does not increase with increasing N, is the strongest available structural equation model test. Numerous fit indices quantify how closely a model fits the data but all fit indices suffer from the logical difficulty that the size or amount of ill fit is not trustably coordinated with the severity or nature of the issues producing the data inconsistency. The evidence of model-data inconsistency was too statistically solid to be dislodged or discarded, but people could at least be provided a way to distract from the "disturbing" evidence. Career-profits can still be accrued by developing additional indices, reporting investigations of index behavior, and publishing models intentionally burying evidence of model-data inconsistency under an MDI (a mound of distracting indices). There seems no general justification for why a researcher should "accept" a causally wrong model, rather than attempting to correct detected misspecifications. And some portions of the literature seems not to have noticed that "accepting a model" (on the basis of "satisfying" an index value) suffers from an intensified version of the criticism applied to "acceptance" of a null-hypothesis. Introductory statistics texts usually recommend replacing the term "accept" with "failed to reject the null hypothesis" to acknowledge the possibility of Type II error. A Type III error arises from "accepting" a model hypothesis when the current data are sufficient to reject the model. Whether or not researchers are committed to seeking the world's structure is a fundamental concern. Displacing test evidence of model-data inconsistency by hiding it behind index claims of acceptable-fit, introduces the discipline-wide cost of diverting attention away from whatever the discipline might have done to attain a structurally-improved understanding of the discipline's substance. The discipline ends up paying a real costs for index-based displacement of evidence of model misspecification. The frictions created by disagreements over the necessity of correcting model misspecifications will likely increase with increasing use of non-factor-structured models, and with use of fewer, more-precise, indicators of similar yet importantly-different latent variables. •
Akaike information criterion (AIC) • An index of relative model fit: The preferred model is the one with the lowest AIC value. • \mathit{AIC} = 2k - 2\ln(L)\, • where
k is the number of
parameters in the
statistical model, and
L is the maximized value of the
likelihood of the model. •
Root Mean Square Error of Approximation (RMSEA) • Fit index where a value of zero indicates the best fit. Guidelines for determining a "close fit" using RMSEA are highly contested. •
Standardized Root Mean Squared Residual (SRMR) • The SRMR is a popular absolute fit indicator. Hu and Bentler (1999) suggested .08 or smaller as a guideline for good fit. •
Comparative Fit Index (CFI) • In examining baseline comparisons, the CFI depends in large part on the average size of the correlations in the data. If the average correlation between variables is not high, then the CFI will not be very high. A CFI value of .95 or higher is desirable. The following table provides references documenting these, and other, features for some common indices: the RMSEA (Root Mean Square Error of Approximation), SRMR (Standardized Root Mean Squared Residual), CFI (Confirmatory Fit Index), and the TLI (the Tucker-Lewis Index). Additional indices such as the AIC (Akaike Information Criterion) can be found in most SEM introductions. Direct-effect estimates are interpreted in parallel to the interpretation of coefficients in regression equations but with causal commitment. Each unit increase in a causal variable's value is viewed as producing a change of the estimated magnitude in the dependent variable's value given control or adjustment for all the other operative/modeled causal mechanisms. Indirect effects are interpreted similarly, with the magnitude of a specific indirect effect equaling the product of the series of direct effects comprising that indirect effect. The units involved are the real scales of observed variables' values, and the assigned scale values for latent variables. A specified/fixed 1.0 effect of a latent on a specific indicator coordinates that indicator's scale with the latent variable's scale. The presumption that the remainder of the model remains constant or unchanging may require discounting indirect effects that might, in the real world, be simultaneously prompted by a real unit increase. And the unit increase itself might be inconsistent with what is possible in the real world because there may be no known way to change the causal variable's value. If a model adjusts for measurement errors, the adjustment permits interpreting latent-level effects as referring to variations in true scores. Understanding causal implications implicitly connects to understanding "controlling", and potentially explaining why some variables, but not others, should be controlled. As models become more complex these fundamental components can combine in non-intuitive ways, such as explaining how there can be no correlation (zero covariance) between two variables despite the variables being connected by a direct non-zero causal effect. The caution appearing in the Model Assessment section warrants repeat. Interpretation should be possible whether a model is or is not consistent with the data. The estimates report how the world would appear to someone believing the model – even if that belief is unfounded because the model happens to be wrong. Interpretation should acknowledge that the model coefficients may or may not correspond to "parameters" – because the model's coefficients may not have corresponding worldly structural features. Adding new latent variables entering or exiting the original model at a few clear causal locations/variables contributes to detecting model misspecifications which could otherwise ruin coefficient interpretations. The correlations between the new latent's indicators and all the original indicators contribute to testing the original model's structure because the few new and focused effect coefficients must work in coordination with the model's original direct and indirect effects to coordinate the new indicators with the original indicators. If the original model's structure was problematic, the sparse new causal connections will be insufficient to coordinate the new indicators with the original indicators, thereby signaling the inappropriateness of the original model's coefficients through model-data inconsistency. Dependable fitting models are rarer than failing models or models inappropriately bludgeoned into fitting, but appropriately-fitting models are possible. The multiple ways of conceptualizing PLS models complicate interpretation of PLS models. Many of the above comments are applicable if a PLS modeler adopts a realist perspective by striving to ensure their modeled indicators combine in a way that matches some existing but unavailable latent variable. Non-causal PLS models, such as those focusing primarily on
R2 or out-of-sample predictive power, change the interpretation criteria by diminishing concern for whether or not the model's coefficients have worldly counterparts. The fundamental features differentiating the five PLS modeling perspectives discussed by Rigdon, Sarstedt and Ringle The controversy over model testing declined as clear reporting of significant model-data inconsistency becomes mandatory. Scientists do not get to ignore, or fail to report, evidence just because they do not like what the evidence reports. The comments by Bollen and Pearl regarding myths about causality in the context of SEM for example, makes it easy to overlook that a researcher might begin with one terrible model and one atrocious model, and end by retaining the structurally terrible model because some index reports it as better fitting than the atrocious model. It is unfortunate that even otherwise strong SEM texts like Kline (2016) Overall, the contributions that can be made by structural equation modeling depend on careful and detailed model assessment, even if a failing model happens to be the best available. An additional controversy that touched the fringes of the previous controversies awaits ignition. Factor models and theory-embedded factor structures having multiple indicators tend to fail, and dropping weak indicators tends to reduce the model-data inconsistency. Reducing the number of indicators leads to concern for, and controversy over, the minimum number of indicators required to support a latent variable in a structural equation model. Researchers tied to factor tradition can be persuaded to reduce the number of indicators to three per latent variable, but three or even two indicators may still be inconsistent with a proposed underlying factor common cause. Hayduk and Littvay (2012) discussed how to think about, defend, and adjust for measurement error, when using only a single indicator for each modeled latent variable. Single indicators have been used effectively in SE models for a long time, but controversy remains only as far away as a reviewer who has considered measurement from only the factor analytic perspective. Though declining, traces of these controversies are scattered throughout the SEM literature, and you can easily incite disagreement by asking: What should be done with models that are significantly inconsistent with the data? Or by asking: Does model simplicity override respect for evidence of data inconsistency? Or, what weight should be given to indexes which show close or not-so-close data fit for some models? Or, should we be especially lenient toward, and "reward", parsimonious models that are inconsistent with the data? Or, given that the RMSEA condones disregarding some real ill fit for each model degree of freedom, doesn't that mean that people testing models with null-hypotheses of non-zero RMSEA are doing deficient model testing? Considerable variation in statistical sophistication is required to cogently address such questions, though responses will likely center on the non-technical matter of whether or not researchers are required to report and respect evidence. == Extensions, modeling alternatives, and statistical kin ==