Terriberry extends Chan's formulae to calculating the third and fourth
central moments, needed for example when estimating
skewness and
kurtosis: :\begin{align} M_{3,X} = M_{3,A} + M_{3,B} & {} + \delta^3\frac{n_A n_B (n_A - n_B)}{n_X^2} + 3\delta\frac{n_AM_{2,B} - n_BM_{2,A}}{n_X} \\[6pt] M_{4,X} = M_{4,A} + M_{4,B} & {} + \delta^4\frac{n_A n_B \left(n_A^2 - n_A n_B + n_B^2\right)}{n_X^3} \\[6pt] & {} + 6\delta^2\frac{n_A^2 M_{2,B} + n_B^2 M_{2,A}}{n_X^2} + 4\delta\frac{n_AM_{3,B} - n_BM_{3,A}}{n_X} \end{align} Here the M_k are again the sums of powers of differences from the mean \sum(x - \overline{x})^k, giving : \begin{align} & \text{skewness} = g_1 = \frac{\sqrt{n} M_3}{M_2^{3/2}}, \\[4pt] & \text{kurtosis} = g_2 = \frac{n M_4}{M_2^2}-3. \end{align} For the incremental case (i.e., B = \{x\}), this simplifies to: : \begin{align} \delta & = x - m \\[5pt] m' & = m + \frac{\delta}{n} \\[5pt] M_2' & = M_2 + \delta^2 \frac{n-1}{n} \\[5pt] M_3' & = M_3 + \delta^3 \frac{ (n - 1) (n - 2)}{n^2} - \frac{3\delta M_2}{n} \\[5pt] M_4' & = M_4 + \frac{\delta^4 (n - 1) (n^2 - 3n + 3)}{n^3} + \frac{6\delta^2 M_2}{n^2} - \frac{4\delta M_3}{n} \end{align} By preserving the value \delta / n, only one division operation is needed and the
higher-order statistics can thus be calculated for little incremental cost. An example of the online algorithm for kurtosis implemented as described is: def online_kurtosis(data): n = mean = M2 = M3 = M4 = 0 for x in data: n1 = n n = n + 1 delta = x - mean delta_n = delta / n delta_n2 = delta_n**2 term1 = delta * delta_n * n1 mean = mean + delta_n M4 = M4 + term1 * delta_n2 * (n**2 - 3*n + 3) + 6 * delta_n2 * M2 - 4 * delta_n * M3 M3 = M3 + term1 * delta_n * (n - 2) - 3 * delta_n * M2 M2 = M2 + term1 # Note, you may also calculate variance using M2, and skewness using M3 # Caution: If all the inputs are the same, M2 will be 0, resulting in a division by 0. kurtosis = (n * M4) / (M2**2) - 3 return kurtosis Pébaÿ further extends these results to arbitrary-order
central moments, for the incremental and the pairwise cases, and subsequently Pébaÿ et al. for weighted and compound moments. One can also find there similar formulas for
covariance. Choi and Sweetman offer two alternative methods to compute the skewness and kurtosis, each of which can save substantial
computer memory requirements and
CPU time in certain applications. The first approach is to compute the statistical moments by separating the data into bins and then computing the moments from the geometry of the resulting
histogram, which effectively becomes a
one-pass algorithm for higher moments. One benefit is that the statistical moment calculations can be carried out to arbitrary accuracy such that the computations can be tuned to the precision of, e.g., the data storage format or the original measurement hardware. A relative histogram of a
random variable can be constructed in the conventional way: the range of potential values is divided into bins and the number of occurrences within each bin are counted and plotted such that the area of each rectangle equals the portion of the sample values within that bin: : H(x_k)=\frac{h(x_k)}{A} where h(x_k) and H(x_k) represent the frequency and the relative frequency at bin x_k and A= \sum_{k=1}^K h(x_k) \,\Delta x_k is the total area of the histogram. After this normalization, the n raw moments and central moments of x(t) can be calculated from the relative histogram: : m_n^{(h)} = \sum_{k=1}^{K} x_k^n H(x_k) \, \Delta x_k = \frac{1}{A} \sum_{k=1}^K x_k^n h(x_k) \, \Delta x_k : \theta_n^{(h)}= \sum_{k=1}^{K} \Big(x_k-m_1^{(h)}\Big)^n \, H(x_k) \, \Delta x_k = \frac{1}{A} \sum_{k=1}^{K} \Big(x_k-m_1^{(h)}\Big)^n h(x_k) \, \Delta x_k where the superscript ^{(h)} indicates the moments are calculated from the histogram. For constant bin width \Delta x_k=\Delta x these two expressions can be simplified using I= A/\Delta x: : m_n^{(h)}= \frac{1}{I} \sum_{k=1}^K x_k^n \, h(x_k) : \theta_n^{(h)}= \frac{1}{I} \sum_{k=1}^K \Big(x_k-m_1^{(h)}\Big)^n h(x_k) The second approach from Choi and Sweetman is an analytical methodology to combine statistical moments from individual segments of a time-history such that the resulting overall moments are those of the complete time-history. This methodology could be used for parallel computation of statistical moments with subsequent combination of those moments, or for combination of statistical moments computed at sequential times. If Q sets of statistical moments are known: (\gamma_{0,q},\mu_{q},\sigma^2_{q},\alpha_{3,q},\alpha_{4,q}) \quad for q=1,2,\ldots,Q , then each \gamma_n can be expressed in terms of the equivalent n raw moments: : \gamma_{n,q}= m_{n,q} \gamma_{0,q} \qquad \quad \textrm{for} \quad n=1,2,3,4 \quad \text{ and } \quad q = 1,2, \dots ,Q where \gamma_{0,q} is generally taken to be the duration of the q^{th} time-history, or the number of points if \Delta t is constant. The benefit of expressing the statistical moments in terms of \gamma is that the Q sets can be combined by addition, and there is no upper limit on the value of Q. : \gamma_{n,c}= \sum_{q=1}^Q \gamma_{n,q} \quad \quad \text{for } n=0,1,2,3,4 where the subscript _c represents the concatenated time-history or combined \gamma. These combined values of \gamma can then be inversely transformed into raw moments representing the complete concatenated time-history : m_{n,c}=\frac{\gamma_{n,c}}{\gamma_{0,c}} \quad \text{for } n=1,2,3,4 Known relationships between the raw moments (m_n) and the central moments ( \theta_n = \operatorname E[(x-\mu)^n])) are then used to compute the central moments of the concatenated time-history. Finally, the statistical moments of the concatenated history are computed from the central moments: : \mu_c=m_{1,c} \qquad \sigma^2_c=\theta_{2,c} \qquad \alpha_{3,c}=\frac{\theta_{3,c}}{\sigma_c^3} \qquad \alpha_{4,c}={\frac{\theta_{4,c}}{\sigma_c^4}}-3 ==Covariance==