Random variable

A random variable X is a measurable function X \colon \Omega \to E from a sample space \Omega as a set of possible outcomes to a measurable space E. For the measurability of X to be meaningful, the sample space \Omega needs to belong to a probability triple (\Omega, \mathcal{F}, \operatorname{P}) (see the measure-theoretic definition). A random variable is often denoted by capital Roman letters such as X, Y, Z, T. The probability that X takes on a value in a measurable set S\subseteq E is written as : \operatorname{P}(X \in S) = \operatorname{P}(\{ \omega \in \Omega \mid X(\omega) \in S \}). Standard case In many cases, X is real-valued, i.e. E = \mathbb{R}. In some contexts, the term random element (see extensions) is used to denote a random variable not of this form. When the image (or range) of X is finite or countably infinite, the random variable is called a discrete random variable and its distribution is a discrete probability distribution, i.e. can be described by a probability mass function that assigns a probability to each value in the image of X. If the image is uncountably infinite (usually an interval) then X is called a continuous random variable. In the special case that it is absolutely continuous, its distribution can be described by a probability density function, which assigns probabilities to intervals; in particular, each individual point must necessarily have probability zero for an absolutely continuous random variable. Not all continuous random variables are absolutely continuous. Any random variable can be described by its cumulative distribution function, which describes the probability that the random variable will be less than or equal to a certain value. Extensions The term "random variable" in statistics is traditionally limited to the real-valued case ({{tmath|1= E=\mathbb{R} }}). In this case, the structure of the real numbers makes it possible to define quantities such as the expected value and variance of a random variable, its cumulative distribution function, and the moments of its distribution. However, the definition above is valid for any measurable space E of values. Thus one can consider random elements of other sets E, such as random Boolean values, categorical values, complex numbers, vectors, matrices, sequences, trees, sets, shapes, manifolds, and functions. One may then specifically refer to a random variable of type E, or an E-valued random variable. This more general concept of a random element is particularly useful in disciplines such as graph theory, machine learning, natural language processing, and other fields in discrete mathematics and computer science, where one is often interested in modeling the random variation of non-numerical data structures. In some cases, it is nonetheless convenient to represent each element of E, using one or more real numbers. In this case, a random element may optionally be represented as a vector of real-valued random variables (all defined on the same underlying probability space \Omega, which allows the different random variables to covary). For example: • A random word may be represented as a random integer that serves as an index into the vocabulary of possible words. Alternatively, it can be represented as a random indicator vector, whose length equals the size of the vocabulary, where the only values of positive probability are (1 \ 0 \ 0 \ 0 \ \cdots), (0 \ 1 \ 0 \ 0 \ \cdots), (0 \ 0 \ 1 \ 0 \ \cdots) and the position of the 1 indicates the word. • A random sentence of given length N may be represented as a vector of N random words. • A random graph on N given vertices may be represented as a N \times N matrix of random variables, whose values specify the adjacency matrix of the random graph. • A random function F may be represented as a collection of random variables F(x), giving the function's values at the various points x in the function's domain. The F(x) are ordinary real-valued random variables provided that the function is real-valued. For example, a stochastic process is a random function of time, a random vector is a random function of some index set such as 1,2,\ldots, n, and a random field is a random function on any set (typically time, space, or a discrete set). == Distribution functions ==