Psychometricians have developed a number of different measurement theories. These include
classical test theory (CTT) and
item response theory (IRT). An approach that seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the
Rasch model for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences. Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include
factor analysis, a method of determining the underlying dimensions of data. One of the main challenges faced by users of factor analysis is a lack of consensus on appropriate procedures for
determining the number of latent factors. A usual procedure is to stop factoring when
eigenvalues drop below one because the original sphere shrinks. The lack of clear cutting points concerns other multivariate methods as well.
Multidimensional scaling is a method for finding a simple representation for data with a large number of latent dimensions.
Cluster analysis is an approach to finding objects that are like each other. Factor analysis, multidimensional scaling, and cluster analysis are all multivariate descriptive methods used to distill from large amounts of data simpler structures. More recently,
structural equation modeling and
path analysis represent more sophisticated approaches to working with large
covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits. Because at a granular level psychometric research is concerned with the extent and nature of multidimensionality in each of the items of interest, a relatively new procedure known as bi-factor analysis can be helpful. Bi-factor analysis can decompose "an item's systematic variance in terms of, ideally, two sources, a general factor and one source of additional systematic variance."
Key concepts Key concepts in classical test theory are
reliability and
validity. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity. Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called
test-retest reliability. Similarly, the equivalence of different versions of the same measure can be indexed by a
Pearson correlation, and is called
equivalent forms reliability or a similar term. That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish
concurrent validity; when the criterion is collected later the goal is to establish
predictive validity. A measure has
construct validity if it is related to measures of other constructs as required by theory.
Content validity is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a
job analysis.
Item response theory models the relationship between
latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not. == Standards of quality ==