Spatial data comes in many varieties and it is not easy to arrive at a system of classification that is simultaneously exclusive, exhaustive, imaginative, and satisfying. -- G. Upton & B. Fingelton
Spatial data analysis Urban and Regional Studies deal with large tables of spatial data obtained from censuses and surveys. It is necessary to simplify the huge amount of detailed information in order to extract the main trends. Multivariable analysis (or
Factor analysis, FA) allows a change of variables, transforming the many variables of the census, usually correlated between themselves, into fewer independent "Factors" or "Principal Components" which are, actually, the
eigenvectors of the data correlation matrix weighted by the inverse of their eigenvalues. This change of variables has two main advantages: • Since information is concentrated on the first new factors, it is possible to keep only a few of them while losing only a small amount of information; mapping them produces fewer and more significant maps • The factors, actually the eigenvectors, are orthogonal by construction, i.e. not correlated. In most cases, the dominant factor (with the largest eigenvalue) is the Social Component, separating rich and poor in the city. Since factors are not-correlated, other smaller processes than social status, which would have remained hidden otherwise, appear on the second, third, ... factors. Factor analysis depends on measuring distances between observations : the choice of a significant metric is crucial. The Euclidean metric (Principal Component Analysis), the Chi-Square distance (Correspondence Analysis) or the Generalized Mahalanobis distance (Discriminant Analysis) are among the more widely used. More complicated models, using communalities or rotations have been proposed. Using multivariate methods in spatial analysis began really in the 1950s (although some examples go back to the beginning of the century) and culminated in the 1970s, with the increasing power and accessibility of computers. Already in 1948, in a seminal publication, two sociologists,
Wendell Bell and Eshref Shevky, had shown that most city populations in the US and in the world could be represented with three independent factors : 1- the « socio-economic status » opposing rich and poor districts and distributed in sectors running along highways from the city center, 2- the « life cycle », i.e. the age structure of households, distributed in concentric circles, and 3- « race and ethnicity », identifying patches of migrants located within the city. In 1961, in a groundbreaking study, British geographers used FA to classify British towns. Brian J Berry, at the University of Chicago, and his students made a wide use of the method, applying it to most important cities in the world and exhibiting common social structures. The use of Factor Analysis in Geography, made so easy by modern computers, has been very wide but not always very wise. Since the vectors extracted are determined by the data matrix, it is not possible to compare factors obtained from different censuses. A solution consists in fusing together several census matrices in a unique table which, then, may be analyzed. This, however, assumes that the definition of the variables has not changed over time and produces very large tables, difficult to manage. A better solution, proposed by psychometricians, groups the data in a « cubic matrix », with three entries (for instance, locations, variables, time periods). A Three-Way Factor Analysis produces then three groups of factors related by a small cubic « core matrix ». This method, which exhibits data evolution over time, has not been widely used in geography. In Los Angeles, however, it has exhibited the role, traditionally ignored, of Downtown as an organizing center for the whole city during several decades.
Spatial autocorrelation Local
Moran's I. Spatial
autocorrelation statistics measure and analyze the degree of dependency among observations in a geographic space. Classic spatial autocorrelation statistics include
Moran's I,
Geary's C,
Getis's G and the
standard deviational ellipse. These statistics require measuring a
spatial weights matrix that reflects the intensity of the geographic relationship between observations in a neighborhood, e.g., the distances between neighbors, the lengths of shared border, or whether they fall into a specified directional class such as "west". Classic spatial autocorrelation statistics compare the spatial weights to the covariance relationship at pairs of locations. Spatial autocorrelation that is more positive than expected from random indicate the clustering of similar values across geographic space, while significant negative spatial autocorrelation indicates that neighboring values are more dissimilar than expected by chance, suggesting a spatial pattern similar to a chess board. Spatial autocorrelation statistics such as Moran's I and Geary's C are global in the sense that they estimate the overall degree of spatial autocorrelation for a dataset. The possibility of spatial heterogeneity suggests that the estimated degree of autocorrelation may vary significantly across geographic space.
Local spatial autocorrelation statistics provide estimates disaggregated to the level of the spatial analysis units, allowing assessment of the dependency relationships across space. G statistics compare neighborhoods to a global average and identify local regions of strong autocorrelation. Local versions of the I and C statistics are also available.
Spatial heterogeneity Spatial interaction Spatial interaction or "
gravity models" estimate the flow of people, material or information between locations in geographic space. Factors can include origin propulsive variables such as the number of commuters in residential areas, destination attractiveness variables such as the amount of office space in employment areas, and proximity relationships between the locations measured in terms such as driving distance or travel time. In addition, the topological, or
connective, relationships between areas must be identified, particularly considering the often conflicting relationship between distance and topology; for example, two spatially close neighborhoods may not display any significant interaction if they are separated by a highway. After specifying the functional forms of these relationships, the analyst can estimate model parameters using observed flow data and standard estimation techniques such as ordinary least squares or maximum likelihood. Competing destinations versions of spatial interaction models include the proximity among the destinations (or origins) in addition to the origin-destination proximity; this captures the effects of destination (origin) clustering on flows.
Spatial interpolation Spatial interpolation methods estimate the variables at unobserved locations in geographic space based on the values at observed locations. Basic methods include
inverse distance weighting: this attenuates the variable with decreasing proximity from the observed location.
Kriging is a more sophisticated method that interpolates across space according to a spatial lag relationship that has both systematic and random components. This can accommodate a wide range of spatial relationships for the hidden values between observed locations. Kriging provides optimal estimates given the hypothesized lag relationship, and error estimates can be mapped to determine if spatial patterns exist.
Spatial regression Spatial regression methods capture spatial dependency in
regression analysis, avoiding statistical problems such as unstable parameters and unreliable significance tests, as well as providing information on spatial relationships among the variables involved. Depending on the specific technique, spatial dependency can enter the regression model as relationships between the independent variables and the dependent, between the dependent variables and a spatial lag of itself, or in the error terms.
Geographically weighted regression (GWR) is a local version of spatial regression that generates parameters disaggregated by the spatial units of analysis. This allows assessment of the spatial heterogeneity in the estimated relationships between the independent and dependent variables. The use of
Bayesian hierarchical modeling in conjunction with
Markov chain Monte Carlo (MCMC) methods have recently shown to be effective in modeling complex relationships using Poisson-Gamma-CAR, Poisson-lognormal-SAR, or Overdispersed logit models. Statistical packages for implementing such Bayesian models using MCMC include
WinBugs,
CrimeStat and many packages available via
R programming language. Spatial stochastic processes, such as
Gaussian processes are also increasingly being deployed in spatial regression analysis. Model-based versions of GWR, known as spatially varying coefficient models have been applied to conduct Bayesian inference. and Nearest Neighbor Gaussian Processes (NNGP).
Spatial neural networks Spatial volatility Spatial volatility models describe spatial or spatiotemporal dependence in the conditional variance of a process, extending the concept of
Autoregressive conditional heteroskedasticity (ARCH) from time series to spatial settings. Such models account for the fact that variability at one location may be related to variability at neighbouring locations, as defined by a spatial weights matrix. This is in keeping with one formulation of
Arbia's law of geography which states that "everything is related to everything else, but things observed at a coarse spatial resolution are more related than things observed at a finer resolution." A generalised spatial and spatiotemporal ARCH/GARCH framework was introduced by Otto, Schmid, and Garthoff (2018), allowing the conditional variance at a location to depend on weighted past squared residuals from neighbouring locations and, in the spatiotemporal case, on its own past conditional variances. Sato and Matsuda (2017) proposed a spatial log-ARCH model as an alternative formulation. Spatial volatility models find applications in disciplines where risk or uncertainty propagate over space, including regional economics, environmental risk assessment, and financial networks. A recent review summarises methodological developments, estimation strategies, and applications of spatial and spatiotemporal volatility models across disciplines.
Simulation and modeling Spatial interaction models are aggregate and top-down: they specify an overall governing relationship for flow between locations. This characteristic is also shared by urban models such as those based on mathematical programming, flows among economic sectors, or bid-rent theory. An alternative modeling perspective is to represent the system at the highest possible level of disaggregation and study the bottom-up emergence of complex patterns and relationships from behavior and interactions at the individual level.
Complex adaptive systems theory as applied to spatial analysis suggests that simple interactions among proximal entities can lead to intricate, persistent and functional spatial entities at aggregate levels. Two fundamentally spatial simulation methods are cellular automata and agent-based modeling.
Cellular automata modeling imposes a fixed spatial framework such as grid cells and specifies rules that dictate the state of a cell based on the states of its neighboring cells. As time progresses, spatial patterns emerge as cells change states based on their neighbors; this alters the conditions for future time periods. For example, cells can represent locations in an urban area and their states can be different types of land use. Patterns that can emerge from the simple interactions of local land uses include office districts and
urban sprawl.
Agent-based modeling uses software entities (agents) that have purposeful behavior (goals) and can react, interact and modify their environment while seeking their objectives. Unlike the cells in cellular automata, simulysts can allow agents to be mobile with respect to space. For example, one could model traffic flow and dynamics using agents representing individual vehicles that try to minimize travel time between specified origins and destinations. While pursuing minimal travel times, the agents must avoid collisions with other vehicles also seeking to minimize their travel times. Cellular automata and agent-based modeling are complementary modeling strategies. They can be integrated into a common geographic automata system where some agents are fixed while others are mobile. Calibration plays a pivotal role in both CA and ABM simulation and modelling approaches. Initial approaches to CA proposed robust calibration approaches based on stochastic, Monte Carlo methods. ABM approaches rely on agents' decision rules (in many cases extracted from qualitative research base methods such as questionnaires). Recent Machine Learning Algorithms calibrate using training sets, for instance in order to understand the qualities of the built environment.
Multiple-point geostatistics (MPS) Spatial analysis of a conceptual geological model is the main purpose of any MPS algorithm. The method analyzes the spatial statistics of the geological model, called the training image, and generates realizations of the phenomena that honor those input multiple-point statistics. A recent MPS algorithm used to accomplish this task is the pattern-based method by Honarkhah. In this method, a distance-based approach is employed to analyze the patterns in the training image. This allows the reproduction of the multiple-point statistics, and the complex geometrical features of the training image. Each output of the MPS algorithm is a realization that represents a random field. Together, several realizations may be used to quantify spatial uncertainty. One of the recent methods is presented by Tahmasebi et al. uses a cross-correlation function to improve the spatial pattern reproduction. They call their MPS simulation method as the CCSIM algorithm. This method is able to quantify the spatial connectivity, variability and uncertainty. Furthermore, the method is not sensitive to any type of data and is able to simulate both categorical and continuous scenarios. CCSIM algorithm is able to be used for any stationary, non-stationary and multivariate systems and it can provide high quality visual appeal model., == Geospatial and hydrospatial analysis ==