Geostatistics

Geostatistics is intimately related to interpolation methods but extends far beyond simple interpolation problems. Geostatistical techniques rely on statistical models based on random function (or random variable) theory to model the uncertainty associated with spatial estimation and simulation. A number of simpler interpolation methods/algorithms, such as inverse distance weighting, bilinear interpolation and nearest-neighbor interpolation, were already well known before geostatistics. Geostatistics goes beyond the interpolation problem by considering the studied phenomenon at unknown locations as a set of correlated random variables. Let be the value of the variable of interest at a certain location . This value is unknown (e.g., temperature, rainfall, piezometric level, geological facies, etc.). Although there exists a value at location that could be measured, geostatistics considers this value as random since it was not measured or has not been measured yet. However, the randomness of is not complete. Still, it is defined by a cumulative distribution function (CDF) that depends on certain information that is known about the value : :F(\mathit{z}, \mathbf{x}) = \operatorname{Prob} \lbrace Z(\mathbf{x}) \leqslant \mathit{z} \mid \text{information} \rbrace . Typically, if the value of is known at locations close to (or in the neighborhood of ) one can constrain the CDF of by this neighborhood: if a high spatial continuity is assumed, can only have values similar to the ones found in the neighborhood. Conversely, in the absence of spatial continuity can take any value. The spatial continuity of the random variables is described by a model of spatial continuity that can be either a parametric function in the case of variogram-based geostatistics, or have a non-parametric form when using other methods such as multiple-point simulation or pseudo-genetic techniques. By applying a single spatial model on an entire domain, one makes the assumption that is a stationary process. It means that the same statistical properties are applicable on the entire domain. Several geostatistical methods provide ways of relaxing this stationarity assumption. In this framework, one can distinguish two modeling goals: • Estimating the value for , typically by the expectation, the median or the mode of the CDF . This is usually denoted as an estimation problem. • Sampling from the entire probability density function by actually considering each possible outcome of it at each location. This is generally done by creating several alternative maps of , called realizations. Consider a domain discretized in grid nodes (or pixels). Each realization is a sample of the complete -dimensional joint distribution function :: F(\mathbf{z}, \mathbf{x}) = \operatorname{Prob} \lbrace Z(\mathbf{x}_1) \leqslant z_1, Z(\mathbf{x}_2) \leqslant z_2, ..., Z(\mathbf{x}_N) \leqslant z_N \rbrace . : In this approach, the presence of multiple solutions to the interpolation problem is acknowledged. Each realization is considered as a possible scenario of what the real variable could be. All associated workflows are then considering ensemble of realizations, and consequently ensemble of predictions that allow for probabilistic forecasting. Therefore, geostatistics is often used to generate or update spatial models when solving inverse problems. A number of methods exist for both geostatistical estimation and multiple realizations approaches. Several reference books provide a comprehensive overview of the discipline. ==Methods==