MarketWallenius' noncentral hypergeometric distribution
Company Profile

Wallenius' noncentral hypergeometric distribution

In probability theory and statistics, Wallenius' noncentral hypergeometric distribution is a generalization of the hypergeometric distribution where items are sampled with bias.

Univariate distribution
{{Probability distribution | name =Univariate Wallenius' Noncentral Hypergeometric Distribution| type =mass| pdf_image =| cdf_image =| parameters =m_1, m_2 \in \mathbb{N}N = m_1 + m_2n \in [0,N)\omega \in \mathbb{R}_+| support =x \in [x_{min},x_{max}]x_{min}=\max(0,n-m_2)x_{max}=\min(n,m_1)| pdf =\binom{m_1}{x} \binom{m_2}{n-x} \int_0^1 (1-t^{\omega/D})^{x} (1-t^{1/D})^{n-x} \operatorname{d}twhere D=\omega(m_1-x)+(m_2-(n-x))| cdf =| mean =Approximated by solution \mu to \frac{\mu}{m_1} + \left(1-\frac{n-\mu}{m_2}\right)^{\omega} = 1| median =| mode =| variance =\approx \frac{Nab}{(N-1)(m_1 b + m_2 a)}\,,   wherea=\mu(m_1-\mu),\; b=(n-\mu)(\mu+m_2-n)| skewness =| kurtosis =| entropy =| mgf =| char = }} Wallenius' distribution is particularly complicated because each ball has a probability of being taken that depends not only on its weight, but also on the total weight of its competitors. And the weight of the competing balls depends on the outcomes of all preceding draws. This recursive dependency gives rise to a difference equation with a solution that is given in open form by the integral in the expression of the probability mass function in the table above. Closed form expressions for the probability mass function exist (Lyons, 1980), but they are not very useful for practical calculations because of extreme numerical instability, except in degenerate cases. Several other calculation methods are used, including recursion, Taylor expansion and numerical integration (Fog, 2007, 2008). The most reliable calculation method is recursive calculation of f(x,n) from f(x,n-1) and f(x-1,n-1) using the recursion formula given below under properties. The probabilities of all (x,n) combinations on all possible trajectories leading to the desired point are calculated, starting with f(0,0) = 1 as shown on the figure to the right. The total number of probabilities to calculate is n(x+1)-x2. Other calculation methods must be used when n and x are so big that this method is too inefficient. The probability that all balls have the same color is easier to calculate. See the formula below under multivariate distribution. No exact formula for the mean is known (short of complete enumeration of all probabilities). The equation given above is reasonably accurate. This equation can be solved for μ by Newton-Raphson iteration. The same equation can be used for estimating the odds from an experimentally obtained value of the mean. Properties of the univariate distribution Wallenius' distribution has fewer symmetry relations than Fisher's noncentral hypergeometric distribution has. The only symmetry relates to the swapping of colors: :\operatorname{wnchypg}(x;n,m_1,m_2,\omega) = \operatorname{wnchypg}(n-x;n,m_2,m_1,1/\omega)\,. Unlike Fisher's distribution, Wallenius' distribution has no symmetry relating to the number of balls not taken. The following recursion formula is useful for calculating probabilities: :\operatorname{wnchypg}(x;n,m_1,m_2,\omega) = ::\operatorname{wnchypg}(x-1;n-1,m_1,m_2,\omega) \frac{(m_1-x+1)\omega}{(m_1-x+1)\omega+m_2+x-n} + ::\operatorname{wnchypg}(x;n-1,m_1,m_2,\omega) \frac{m_2+x-n+1}{(m_1-x)\omega+m_2+x-n+1} Another recursion formula is also known: :\operatorname{wnchypg}(x;n,m_1,m_2,\omega) = ::\operatorname{wnchypg}(x-1;n-1,m_1-1,m_2,\omega) \frac{m_1\omega}{m_1\omega+m_2} + ::\operatorname{wnchypg}(x;n-1,m_1,m_2-1,\omega) \frac{m_2}{m_1\omega+m_2}\,. The probability is limited by :\operatorname{f}_1(x) \le \operatorname{wnchypg}(x;n,m_1,m_2,\omega) \le \operatorname{f}_2(x)\,,\,\,\text{for}\,\, \omega :\operatorname{f}_1(x) \ge \operatorname{wnchypg}(x;n,m_1,m_2,\omega) \ge \operatorname{f}_2(x)\,,\,\,\text{for}\,\, \omega > 1\,,\text{where} :\operatorname{f}_1(x)=\binom{m_1}{x}\binom{m_2}{n-x} \frac{n!}{(m_1+m_2/\omega)^{\underline{x}}\, (m_2+\omega(m_1-x))^{\underline{n-x}}} :\operatorname{f}_2(x)=\binom{m_1}{x}\binom{m_2}{n-x} \frac{n!}{(m_1+(m_2-x_2)/\omega)^{\underline{x}}\, (m_2+\omega m_1)^{\underline{n-x}}}\, , where the underlined superscript indicates the falling factorial a^{\underline{b}} = a(a-1)\ldots(a-b+1). ==Multivariate distribution==
Multivariate distribution
The distribution can be expanded to any number of colors c of balls in the urn. The multivariate distribution is used when there are more than two colors. {{Probability distribution | name =Multivariate Wallenius' Noncentral Hypergeometric Distribution| type =mass| pdf_image =| cdf_image =| parameters =c \in \mathbb{N}\mathbf{m}=(m_1,\ldots,m_c) \in \mathbb{N}^cN = \sum_{i=1}^c m_in \in [0,N)\boldsymbol{\omega} = (\omega_1,\ldots,\omega_c) \in \mathbb{R}_+^c| support =\mathrm{S} = \left\{ \mathbf{x} \in \mathbb{Z}_{0+}^c \, : \, \sum_{i=1}^{c} x_i = n \right\}| pdf =\left(\prod_{i=1}^c \binom{m_i}{x_i} \right) \int_0^1 \prod_{i=1}^c (1-t^{\omega_i/D})^{x_i} \operatorname{d}t\,,where D=\boldsymbol{\omega}\cdot (\mathbf{m}-\mathbf{x}) = \sum_{i=1}^c \omega_i(m_i-x_i)| cdf =| mean =Approximated by solution \mu_1,\ldots,\mu_c to \left(1-\frac{\mu_1}{m_1}\right)^{1/\omega_1} = \left(1-\frac{\mu_2}{m_2}\right)^{1/\omega_2} = \ldots = \left(1-\frac{\mu_c}{m_c}\right)^{1/\omega_c}\wedge \, \sum_{i=1}^c \mu_i = n \, \wedge \, \forall\, i \in [0,c]\, :\, 0 \le \mu_i \le m_i\,.| median =| mode =| variance =Approximated by variance of Fisher's noncentral hypergeometric distribution with same mean.| skewness =| kurtosis =| entropy =| mgf =| char = }} The probability mass function can be calculated by various Taylor expansion methods or by numerical integration (Fog, 2008). The probability that all balls have the same color, j, can be calculated as: :\operatorname{mwnchypg}((0,\ldots,0,x_j,0,\ldots);n,\mathbf{m}, \boldsymbol{\omega}) = \frac{m_j^{\,\,\underline{n}}} {\left( \frac{1}{\omega_j}\sum_{i=1}^{c}m_i\omega_i \right) ^{\underline{n}}} for xj = nmj, where the underlined superscript denotes the falling factorial. A reasonably good approximation to the mean can be calculated using the equation given above. The equation can be solved by defining θ so that :\mu_i = m_i(1-e^{\omega_i\theta}) and solving :\sum_{i=1}^c \mu_i = n for θ by Newton-Raphson iteration. The equation for the mean is also useful for estimating the odds from experimentally obtained values for the mean. No good way of calculating the variance is known. The best known method is to approximate the multivariate Wallenius distribution by a multivariate Fisher's noncentral hypergeometric distribution with the same mean, and insert the mean as calculated above in the approximate formula for the variance of the latter distribution. Properties of the multivariate distribution The order of the colors is arbitrary so that any colors can be swapped. The weights can be arbitrarily scaled: :\operatorname{mwnchypg}(\mathbf{x};n,\mathbf{m}, \boldsymbol{\omega}) = \operatorname{mwnchypg}(\mathbf{x};n,\mathbf{m}, r\boldsymbol{\omega})\,\, for all r \in \mathbb{R}_+. Colors with zero number (mi = 0) or zero weight (ωi = 0) can be omitted from the equations. Colors with the same weight can be joined: :\operatorname{mwnchypg}\left(\mathbf{x};n,\mathbf{m}, (\omega_1,\ldots,\omega_{c-1},\omega_{c-1})\right)\, = ::\operatorname{mwnchypg}\left((x_1,\ldots,x_{c-1}+x_c); n,(m_1,\ldots,m_{c-1}+m_c), (\omega_1,\ldots,\omega_{c-1})\right)\, \cdot ::\operatorname{hypg}(x_c; x_{c-1}+x_c, m_c, m_{c-1}+m_c)\,, where \operatorname{hypg}(x;n,m,N) is the (univariate, central) hypergeometric distribution probability. ==Complementary Wallenius' noncentral hypergeometric distribution==
Complementary Wallenius' noncentral hypergeometric distribution
The balls that are not taken in the urn experiment have a distribution that is different from Wallenius' noncentral hypergeometric distribution, due to a lack of symmetry. The distribution of the balls not taken can be called the '''complementary Wallenius' noncentral hypergeometric distribution'''. Probabilities in the complementary distribution are calculated from Wallenius' distribution by replacing n with N-n, xi with mi - xi, and ωi with 1/ωi. ==Software available==
Software available
• WalleniusHypergeometricDistribution in Mathematica. • An implementation for the R programming language is available as the package named BiasedUrn. Includes univariate and multivariate probability mass functions, distribution functions, quantiles, random variable generating functions, mean and variance. • Implementation in C++ is available from www.agner.org. ==See also==
tickerdossier.comtickerdossier.substack.com