MarketQuantile normalization
Company Profile

Quantile normalization

In statistics, quantile normalization is a technique for making two distributions identical in statistical properties. To quantile-normalize a test distribution to a reference distribution of the same length, sort the test distribution and sort the reference distribution. The highest entry in the test distribution then takes the value of the highest entry in the reference distribution, the next highest entry in the reference distribution, and so on, until the test distribution is a perturbation of the reference distribution.

Example
A quick illustration of such normalizing on a very small dataset, organized into columns (1-3) and rows (A-D): \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5 & 4 & 3 \\ 2 & 1 & 4 \\ 3 & 4 & 6 \\ 4 & 2 & 8 \end{matrix} For each column, rank the entries from lowest to highest (i to iv): \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5 & 4 & 3 \\ 2 & 1 & 4 \\ 3 & 4 & 6 \\ 4 & 2 & 8 \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] \rm iv & \rm iii & \rm i \\ \rm i & \rm i & \rm ii \\ \rm ii & \rm iii & \rm iii \\ \rm iii & \rm ii & \rm iv \end{matrix} Set aside these rank values to use later. Go back to the first set of data. Rearrange each columns' values such that each column is in order from lowest to highest. The result is: \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5 & 4 & 3 \\ 2 & 1 & 4 \\ 3 & 4 & 6 \\ 4 & 2 & 8 \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 2 & 1 & 3 \\ 3 & 2 & 4 \\ 4 & 4 & 6 \\ 5 & 4 & 8 \end{matrix} Now find the mean for each row, and rank them lowest to highest (i to iv): \begin{align} (2 + 1 + 3)/3 &= 2.00 \text{ (rank i)} \\ (3 + 2 + 4)/3 &= 3.00 \text{ (rank ii)} \\ (4 + 4 + 6)/3 &= 4.67 \text{ (rank iii)} \\ (5 + 4 + 8)/3 &= 5.67 \text{ (rank iv)} \end{align} Now take the ranking order from earlier and substitute in the means according to their corresponding ranks: \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] \rm iv & \rm iii & \rm i \\ \rm i & \rm i & \rm ii \\ \rm ii & \rm iii & \rm iii \\ \rm iii & \rm ii & \rm iv \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5.67 & 4.67 & 2.00 \\ 2.00 & 2.00 & 3.00 \\ 3.00 & 4.67 & 4.67 \\ 4.67 & 3.00 & 5.67 \end{matrix} These are the new normalized values. However, note that when, as in column two, values are tied in rank, they should instead be assigned the mean of the values corresponding to the ranks they would normally represent if they were different. In the case of column 2, they represent ranks iii and iv. So we assign the two tied rank iii entries the average of rank iii and rank iv ((4.67 + 5.67)/2 = 5.17). And so we arrive at the following set of normalized values: \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5.67 & \bold{4.67} & 2.00 \\ 2.00 & 2.00 & 3.00 \\ 3.00 & \bold{4.67} & 4.67 \\ 4.67 & 3.00 & 5.67 \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5.67 & \bold{5.17} & 2.00 \\ 2.00 & 2.00 & 3.00 \\ 3.00 & \bold{5.17} & 4.67 \\ 4.67 & 3.00 & 5.67 \end{matrix} The new values have the same distribution and can now be easily compared. Here are the summary statistics for each of the three columns: \begin{array}{r} & {} \\[6pt] & \text{Min}: \\ & \text{1st Qrt}: \\ & \text{Median}: \\ & \text{Mean}: \\ & \text{3rd Qrt}: \\ & \text{Max}: \end{array} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 2.00 & 2.00 & 2.00 \\ 2.75 & 2.75 & 2.75 \\ 3.83 & 4.08 & 3.83 \\ 3.83 & 3.83 & 3.83 \\ 4.92 & 5.17 & 4.92 \\ 5.67 & 5.17 & 5.67 \end{matrix} ==References==
tickerdossier.comtickerdossier.substack.com