In
statistics, prediction is a part of
statistical inference. One particular approach to such inference is known as
predictive inference, but the prediction can be undertaken within any of the several approaches to statistical inference. Indeed, one possible description of statistics is that it provides a means of transferring knowledge about a sample of a population to the whole population, and to other related populations, which is not necessarily the same as prediction over time. When information is transferred across time, often to specific points in time, the process is known as
forecasting. Forecasting usually requires
time series methods, while prediction is often performed on
cross-sectional data. Statistical techniques used for prediction include
regression and its various sub-categories such as
linear regression,
generalized linear models (
logistic regression,
Poisson regression,
Probit regression), etc. In case of forecasting,
autoregressive moving average models and
vector autoregression models can be utilized. When these and/or related, generalized set of regression or
machine learning methods are deployed in commercial usage, the field is known as
predictive analytics. In many applications, such as time series analysis, it is possible to estimate the models that generate the observations. If models can be expressed as
transfer functions or in terms of state-space parameters then smoothed, filtered and predicted data estimates can be calculated. If the underlying generating models are linear then a minimum-variance
Kalman filter and a minimum-variance smoother may be used to recover data of interest from noisy measurements. These techniques rely on one-step-ahead predictors (which minimise the variance of the
prediction error). When the generating models are nonlinear then stepwise linearizations may be applied within
Extended Kalman Filter and smoother recursions. However, in nonlinear cases, optimum minimum-variance performance guarantees no longer apply. To use regression analysis for prediction, data are collected on the variable that is to be predicted, called the
dependent variable or response variable, and on one or more variables whose values are
hypothesized to influence it, called
independent variables or explanatory variables. A
functional form, often linear, is hypothesized for the postulated causal relationship, and the
parameters of the function are
estimated from the data—that is, are chosen so as to optimize is some way the
fit of the function, thus parameterized, to the data. That is the estimation step. For the prediction step, explanatory variable values that are deemed relevant to future (or current but not yet observed) values of the dependent variable are input to the parameterized function to generate predictions for the dependent variable.
Machine learning and artificial intelligence In recent decades, prediction has become a central task in
machine learning and
artificial intelligence research. Supervised learning algorithms, such as
support vector machines,
decision trees, and
neural networks, are trained on historical datasets to predict outcomes on new, unseen data. These models are widely applied in domains such as
natural language processing,
computer vision,
health informatics, and
financial technology. Recent studies have emphasized the importance of model interpretability and fairness, since predictions can influence critical decisions in healthcare, criminal justice, and public policy.{{cite journal |last=Rudin |first=Cynthia |year=2019 An unbiased performance estimate of a model can be obtained on
hold-out test sets. The predictions can visually be compared to the ground truth in a
parity plot. ==Science==