Psychometrics

Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, personality factors, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

Etymology

The word psychometry derives from Greek: ψυχή, psukhē, "spirit, soul" and μέτρον, metron, "measure". The American academic Joseph Rodes Buchanan is credited as having first coined the word "psychometry" in 1842 in connection with his investigation of paranormal phenomena rather than the rational quantification of psychological criteria. == Historical foundation ==

Historical foundation

Rational psychological testing has come from two streams of thought: the first, from Darwin, Galton, and Cattell, on the measurement of individual differences and the second, from Herbart, Weber, Fechner, and Wundt and their psychophysical measurements of a similar construct. The second set of individuals and their research is what has led to the development of experimental psychology and standardized testing. German stream The origin of psychometrics also has connections to the related field of psychophysics. Around the same time that Darwin, Galton, and Cattell were making their discoveries, Herbart was also interested in "unlocking the mysteries of human consciousness" through the scientific method. In the late 1950s, Leopold Szondi made a historical and epistemological assessment of the impact of statistical thinking on psychology during the previous few decades: "in the last decades, the specifically psychological thinking has been almost completely suppressed and removed, and replaced by a statistical thinking. Precisely here we see the cancer of testology and testomania of today." More recently, psychometric theory has been applied in the measurement of personality, attitudes and beliefs, and academic achievement. These latent constructs cannot truly be measured, and much of the research and science in this discipline has been developed in an attempt to measure these constructs as close to the true score as possible. Figures who made significant contributions to psychometrics include Paul Horst, Karl Pearson, Henry F. Kaiser, Carl Brigham, L. L. Thurstone, E. L. Thorndike, Georg Rasch, Eugene Galanter, Johnson O'Connor, Frederic M. Lord, Ledyard R Tucker, Louis Guttman, and Jane Loevinger. == Definition of measurement in the social sciences ==

Definition of measurement in the social sciences

The definition of measurement in the social sciences has a long history. A current widespread definition, proposed by Stanley Smith Stevens, is that measurement is "the assignment of numerals to objects or events according to some rule". This definition was introduced in a 1946 Science article in which Stevens proposed four levels of measurement. Although widely adopted, this definition differs in important respects from the more classical definition of measurement adopted in the physical sciences, namely that scientific measurement entails "the estimation or discovery of the ratio of some magnitude of a quantitative attribute to a unit of the same attribute" (p. 358) Indeed, Stevens's definition of measurement was put forward in response to the British Ferguson Committee, whose chair, A. Ferguson, was a physicist. The committee was appointed in 1932 by the British Association for the Advancement of Science to investigate the possibility of quantitatively estimating sensory events. Although its chair and other members were physicists, the committee also included several psychologists. The committee's report highlighted the importance of the definition of measurement. While Stevens's response was to propose a new definition, which has had considerable influence in the field, this was by no means the only response to the report. Another, notably different, response was to accept the classical definition, as reflected in the following statement: :Measurement in psychology and physics are in no sense different. Physicists can measure when they can find the operations by which they may meet the necessary criteria; psychologists have to do the same. They need not worry about the mysterious differences between the meaning of measurement in the two sciences (Reese, 1943, p. 49). These divergent responses are reflected in alternative approaches to measurement. For example, methods based on covariance matrices are typically employed on the premise that numbers, such as raw scores derived from assessments, are measurements. Such approaches implicitly entail Stevens's definition of measurement, which requires only that numbers are assigned according to some rule. The main research task, then, is generally considered to be the discovery of associations between scores, and of factors posited to underlie such associations. On the other hand, when measurement models such as the Rasch model are employed, numbers are not assigned based on a rule. Instead, in keeping with Reese's statement above, specific criteria for measurement are stated, and the goal is to construct procedures or operations that provide data that meet the relevant criteria. Measurements are estimated based on the models, and tests are conducted to ascertain whether the relevant criteria have been met. == Instruments and procedures ==

Instruments and procedures

The first psychometric instruments were designed to measure intelligence. One early approach to measuring intelligence was the test developed in France by Alfred Binet and Theodore Simon. That test was known as the . The French test was adapted for use in the U.S. by Lewis Terman of Stanford University, and named the Stanford-Binet IQ test. Another major focus in psychometrics has been on personality testing. There has been a range of theoretical approaches to conceptualizing and measuring personality, though there is no widely agreed upon theory. Some of the better-known instruments include the Minnesota Multiphasic Personality Inventory, the Five-Factor Model (or "Big 5") and tools such as Personality and Preference Inventory and the Myers–Briggs Type Indicator. Attitudes have also been studied extensively using psychometric approaches. An alternative method involves the application of unfolding measurement models, the most general being the Hyperbolic Cosine Model (Andrich & Luo, 1993). == Theoretical approaches ==

Theoretical approaches

Psychometricians have developed a number of different measurement theories. These include classical test theory (CTT) and item response theory (IRT). An approach that seems mathematically to be similar to IRT but also quite distinctive, in terms of its origins and features, is represented by the Rasch model for measurement. The development of the Rasch model, and the broader class of models to which it belongs, was explicitly founded on requirements of measurement in the physical sciences. Psychometricians have also developed methods for working with large matrices of correlations and covariances. Techniques in this general tradition include factor analysis, a method of determining the underlying dimensions of data. One of the main challenges faced by users of factor analysis is a lack of consensus on appropriate procedures for determining the number of latent factors. A usual procedure is to stop factoring when eigenvalues drop below one because the original sphere shrinks. The lack of clear cutting points concerns other multivariate methods as well. Multidimensional scaling is a method for finding a simple representation for data with a large number of latent dimensions. Cluster analysis is an approach to finding objects that are like each other. Factor analysis, multidimensional scaling, and cluster analysis are all multivariate descriptive methods used to distill from large amounts of data simpler structures. More recently, structural equation modeling and path analysis represent more sophisticated approaches to working with large covariance matrices. These methods allow statistically sophisticated models to be fitted to data and tested to determine if they are adequate fits. Because at a granular level psychometric research is concerned with the extent and nature of multidimensionality in each of the items of interest, a relatively new procedure known as bi-factor analysis can be helpful. Bi-factor analysis can decompose "an item's systematic variance in terms of, ideally, two sources, a general factor and one source of additional systematic variance." Key concepts Key concepts in classical test theory are reliability and validity. A reliable measure is one that measures a construct consistently across time, individuals, and situations. A valid measure is one that measures what it is intended to measure. Reliability is necessary, but not sufficient, for validity. Both reliability and validity can be assessed statistically. Consistency over repeated measures of the same test can be assessed with the Pearson correlation coefficient, and is often called test-retest reliability. Similarly, the equivalence of different versions of the same measure can be indexed by a Pearson correlation, and is called equivalent forms reliability or a similar term. That external sample of behavior can be many things including another test; college grade point average as when the high school SAT is used to predict performance in college; and even behavior that occurred in the past, for example, when a test of current psychological symptoms is used to predict the occurrence of past victimization (which would accurately represent postdiction). When the criterion measure is collected at the same time as the measure being validated the goal is to establish concurrent validity; when the criterion is collected later the goal is to establish predictive validity. A measure has construct validity if it is related to measures of other constructs as required by theory. Content validity is a demonstration that the items of a test do an adequate job of covering the domain being measured. In a personnel selection example, test content is based on a defined statement or set of statements of knowledge, skill, ability, or other characteristics obtained from a job analysis. Item response theory models the relationship between latent traits and responses to test items. Among other advantages, IRT provides a basis for obtaining an estimate of the location of a test-taker on a given latent trait as well as the standard error of measurement of that location. For example, a university student's knowledge of history can be deduced from his or her score on a university test and then be compared reliably with a high school student's knowledge deduced from a less difficult test. Scores derived by classical test theory do not have this characteristic, and assessment of actual ability (rather than ability relative to other test-takers) must be assessed by comparing scores to those of a "norm group" randomly selected from the population. In fact, all measures derived from classical test theory are dependent on the sample tested, while, in principle, those derived from item response theory are not. == Standards of quality ==

Standards of quality

The considerations of validity and reliability typically are viewed as essential elements for determining the quality of any test. However, professional and practitioner associations have frequently placed these concerns within broader contexts when developing standards and evaluating overall test quality within a given context. A concern in many applied research settings is whether the metric of a given psychological inventory is meaningful or arbitrary. Testing standards In 2014, the American Educational Research Association (AERA), American Psychological Association (APA), and National Council on Measurement in Education (NCME) published a revision of the Standards for Educational and Psychological Testing, which describes standards for test development, evaluation, and use. The Standards cover essential topics in testing including validity, reliability/errors of measurement, and fairness in testing. The book also establishes standards related to testing operations—including test design and development, scores, scales, norms, score linking, cut scores, test administration, scoring, reporting, score interpretation, test documentation, and rights and responsibilities of test takers and test users. Finally, the Standards cover topics related to testing applications, including psychological testing and assessment, workplace testing and credentialing, educational testing and assessment, and testing in program evaluation and public policy. Evaluation standards In the field of evaluation, and in particular educational evaluation, the Joint Committee on Standards for Educational Evaluation has published three sets of standards for evaluations. The Personnel Evaluation Standards was published in 1988, The Program Evaluation Standards (2nd edition) was published in 1994, and The Student Evaluation Standards was published in 2003. Each publication presents and elaborates a set of standards for use in a variety of educational settings. The standards provide guidelines for designing, implementing, assessing, and improving the identified form of evaluation. Each of the standards has been placed in one of four fundamental categories to promote educational evaluations that are proper, useful, feasible, and accurate. In these sets of standards, validity and reliability considerations are covered under the accuracy topic. For example, the student accuracy standards help ensure that student evaluations will provide sound, accurate, and credible information about student learning and performance. == Controversy and criticism ==

Controversy and criticism

Because psychometrics is based on latent psychological processes measured through correlations, there has been controversy about some psychometric measures. Critics, including practitioners in the physical sciences, have argued that such definition and quantification is difficult, and that such measurements are often misused by laymen, such as with personality tests used in employment procedures. The Standards for Educational and Psychological Measurement gives the following statement on test validity: "validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests". Simply put, a test is not valid unless it is used and interpreted in the way it is intended. Two types of tools used to measure personality traits are objective tests and projective measures. Examples of such tests are the: Big Five Inventory (BFI), Minnesota Multiphasic Personality Inventory (MMPI-2), Rorschach Inkblot test, Neurotic Personality Questionnaire KON-2006, or Eysenck Personality Questionnaire. Some of these tests are helpful because they have adequate reliability and validity, two factors that make tests consistent and accurate reflections of the underlying construct. The Myers–Briggs Type Indicator (MBTI), however, has questionable validity and has been the subject of much criticism. Psychometric specialist Robert Hogan wrote of the measure: "Most personality psychologists regard the MBTI as little more than an elaborate Chinese fortune cookie." Lee Cronbach noted in American Psychologist (1957) that "correlational psychology, though fully as old as experimentation, was slower to mature. It qualifies equally as a discipline, however, because it asks a distinctive type of question and has technical methods of examining whether the question has been properly put and the data properly interpreted." He would go on to say, "The correlation method, for its part, can study what man has not learned to control or can never hope to control ... A true federation of the disciplines is required. Kept independent, they can give only wrong answers or no answers at all regarding certain important problems." == Non-human: animals and machines ==

Non-human: animals and machines

Psychometrics addresses human abilities, attitudes, traits, and educational evolution. Notably, the study of behavior, mental processes, and abilities of non-human animals is usually addressed by comparative psychology, or with a continuum between non-human animals and the rest of animals by evolutionary psychology. Nonetheless, there are some advocators for a more gradual transition between the approach taken for humans and the approach taken for (non-human) animals. The evaluation of abilities, traits and learning evolution of machines has been mostly unrelated to the case of humans and non-human animals, with specific approaches in the area of artificial intelligence. A more integrated approach, under the name of universal psychometrics, has also been proposed.{{Cite journal |author1=J. Hernández-Orallo |author2=D.L. Dowe |author3=M.V. Hernández-Lloreda | year = 2013 == See also ==

Source: Wikipedia ↗

tickerdossier.com tickerdossier.substack.com