Intelligence The earliest application of factor analysis was in locating and measuring components of human intelligence. It was believed that intelligence had various uncorrelated components such as spatial intelligence, verbal intelligence, induction, deduction etc and that scores on these could be adduced by factor analysis from results on various tests, to give a single index known as the
Intelligence Quotient (IQ). The pioneering statistical psychologist
Spearman actually developed factor analysis in 1904 for his
two-factor theory of intelligence, adding a formal technique to the science of
psychometrics. In 1924
Thurstone looked for 56 factors of intelligence, developing the notion of Mental Age. Standard IQ tests today are based on this early work.
Residential differentiation In 1949, Shevky and Williams introduced the theory of
factorial ecology, which dominated studies of residential differentiation from the 1950s to the 1970s. Neighbourhoods in a city were recognizable or could be distinguished from one another by various characteristics which could be reduced to three by factor analysis. These were known as 'social rank' (an index of occupational status), 'familism' or family size, and 'ethnicity'; Cluster analysis could then be applied to divide the city into clusters or precincts according to values of the three key factor variables. An extensive literature developed around factorial ecology in urban geography, but the approach went out of fashion after 1980 as being methodologically primitive and having little place in postmodern geographical paradigms. One of the problems with factor analysis has always been finding convincing names for the various artificial factors. In 2000, Flood revived the factorial ecology approach to show that principal components analysis actually gave meaningful answers directly, without resorting to factor rotation. The principal components were actually dual variables or shadow prices of 'forces' pushing people together or apart in cities. The first component was 'accessibility', the classic trade-off between demand for travel and demand for space, around which classical urban economics is based. The next two components were 'disadvantage', which keeps people of similar status in separate neighbourhoods (mediated by planning), and ethnicity, where people of similar ethnic backgrounds try to co-locate. About the same time, the Australian Bureau of Statistics defined distinct indexes of advantage and disadvantage taking the first principal component of sets of key variables that were thought to be important. These SEIFA indexes are regularly published for various jurisdictions, and are used frequently in spatial analysis.
Development indexes PCA can be used as a formal method for the development of indexes. As an alternative
confirmatory composite analysis has been proposed to develop and assess indexes. The City Development Index was developed by PCA from about 200 indicators of city outcomes in a 1996 survey of 254 global cities. The first principal component was subject to iterative regression, adding the original variables singly until about 90% of its variation was accounted for. The index ultimately used about 15 indicators but was a good predictor of many more variables. Its comparative value agreed very well with a subjective assessment of the condition of each city. The coefficients on items of infrastructure were roughly proportional to the average costs of providing the underlying services, suggesting the Index was actually a measure of effective physical and social investment in the city. The country-level
Human Development Index (HDI) from
UNDP, which has been published since 1990 and is very extensively used in development studies, has very similar coefficients on similar indicators, strongly suggesting it was originally constructed using PCA.
Population genetics In 1978
Cavalli-Sforza and others pioneered the use of principal components analysis (PCA) to summarise data on variation in human gene frequencies across regions. The components showed distinctive patterns, including gradients and sinusoidal waves. They interpreted these patterns as resulting from specific ancient migration events. Since then, PCA has been ubiquitous in population genetics, with thousands of papers using PCA as a display mechanism. Genetics varies largely according to proximity, so the first two principal components actually show spatial distribution and may be used to map the relative geographical location of different population groups, thereby showing individuals who have wandered from their original locations. PCA in genetics has been technically controversial, in that the technique has been performed on discrete non-normal variables and often on binary allele markers. The lack of any measures of standard error in PCA are also an impediment to more consistent usage. In August 2022, the molecular biologist
Eran Elhaik published a theoretical paper in
Scientific Reports analyzing 12 PCA applications. He concluded that it was easy to manipulate the method, which, in his view, generated results that were 'erroneous, contradictory, and absurd.' Specifically, he argued, the results achieved in population genetics were characterized by cherry-picking and
circular reasoning.
Market research and indexes of attitude Market research has been an extensive user of PCA. It is used to develop customer satisfaction or customer loyalty scores for products, and with clustering, to develop market segments that may be targeted with advertising campaigns, in much the same way as factorial ecology will locate geographical areas with similar characteristics. PCA rapidly transforms large amounts of data into smaller, easier-to-digest variables that can be more rapidly and readily analyzed. In any consumer questionnaire, there are series of questions designed to elicit consumer attitudes, and principal components seek out latent variables underlying these attitudes. For example, the Oxford Internet Survey in 2013 asked 2000 people about their attitudes and beliefs, and from these analysts extracted four principal component dimensions, which they identified as 'escape', 'social networking', 'efficiency', and 'problem creating'. Another example from Joe Flood in 2008 extracted an attitudinal index toward housing from 28 attitude questions in a national survey of 2697 households in Australia. The first principal component represented a general attitude toward property and home ownership. The index, or the attitude questions it embodied, could be fed into a General Linear Model of tenure choice. The strongest determinant of private renting by far was the attitude index, rather than income, marital status or household type.
Quantitative finance In
quantitative finance, PCA is used in
financial risk management, and has been applied to
other problems such as
portfolio optimization. PCA is commonly used in problems involving
fixed income securities and
portfolios, and
interest rate derivatives. Valuations here depend on the entire
yield curve, comprising numerous highly correlated instruments, and PCA is used to define a set of components or factors that explain rate movements, Here, for each simulation-sample, the components are stressed, and rates, and
in turn option values, are then reconstructed; with VaR calculated, finally, over the entire run. PCA is also used in
hedging exposure to
interest rate risk, given
partial durations and other sensitivities. Under both, the first three, typically, principal components of the system are of interest (
representing "shift", "twist", and "curvature"). These principal components are derived from an eigen-decomposition of the
covariance matrix of
yield at predefined maturities; and where the
variance of each component is its
eigenvalue (and as the components are
orthogonal, no correlation need be incorporated in subsequent modelling). For
equity, an optimal portfolio is one where the
expected return is maximized for a given level of risk, or alternatively, where risk is minimized for a given return; see
Markowitz model for discussion. Thus, one approach is to reduce portfolio risk, where
allocation strategies are applied to the "principal portfolios" instead of the underlying
stocks. A second approach is to enhance portfolio return, using the principal components to select companies' stocks with upside potential. PCA has also been used to understand relationships essentially an analysis of a bank's ability to endure
a hypothetical adverse economic scenario. Its utility is in "distilling the information contained in [several]
macroeconomic variables into a more manageable data set, which can then [be used] for analysis." This technique is known as
spike-triggered covariance analysis. In a typical application an experimenter presents a
white noise process as a stimulus (usually either as a sensory input to a test subject, or as a
current injected directly into the neuron) and records a train of action potentials, or spikes, produced by the neuron as a result. Presumably, certain features of the stimulus make the neuron more likely to spike. In order to extract these features, the experimenter calculates the
covariance matrix of the
spike-triggered ensemble, the set of all stimuli (defined and discretized over a finite time window, typically on the order of 100 ms) that immediately preceded a spike. The
eigenvectors of the difference between the spike-triggered covariance matrix and the covariance matrix of the
prior stimulus ensemble (the set of all stimuli, defined over the same length time window) then indicate the directions in the
space of stimuli along which the variance of the spike-triggered ensemble differed the most from that of the prior stimulus ensemble. Specifically, the eigenvectors with the largest positive eigenvalues correspond to the directions along which the variance of the spike-triggered ensemble showed the largest positive change compared to the variance of the prior. Since these were the directions in which varying the stimulus led to a spike, they are often good approximations of the sought after relevant stimulus features. In neuroscience, PCA is also used to discern the identity of a neuron from the shape of its action potential.
Spike sorting is an important procedure because
extracellular recording techniques often pick up signals from more than one neuron. In spike sorting, one first uses PCA to reduce the dimensionality of the space of action potential waveforms, and then performs
clustering analysis to associate specific action potentials with individual neurons. PCA as a dimension reduction technique is particularly suited to detect coordinated activities of large neuronal ensembles. It has been used in determining collective variables, that is,
order parameters, during
phase transitions in the brain. == Relation with other methods ==