Quantile normalization

A quick illustration of such normalizing on a very small dataset, organized into columns (1-3) and rows (A-D): \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5 & 4 & 3 \\ 2 & 1 & 4 \\ 3 & 4 & 6 \\ 4 & 2 & 8 \end{matrix} For each column, rank the entries from lowest to highest (i to iv): \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5 & 4 & 3 \\ 2 & 1 & 4 \\ 3 & 4 & 6 \\ 4 & 2 & 8 \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] \rm iv & \rm iii & \rm i \\ \rm i & \rm i & \rm ii \\ \rm ii & \rm iii & \rm iii \\ \rm iii & \rm ii & \rm iv \end{matrix} Set aside these rank values to use later. Go back to the first set of data. Rearrange each columns' values such that each column is in order from lowest to highest. The result is: \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5 & 4 & 3 \\ 2 & 1 & 4 \\ 3 & 4 & 6 \\ 4 & 2 & 8 \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 2 & 1 & 3 \\ 3 & 2 & 4 \\ 4 & 4 & 6 \\ 5 & 4 & 8 \end{matrix} Now find the mean for each row, and rank them lowest to highest (i to iv): \begin{align} (2 + 1 + 3)/3 &= 2.00 \text{ (rank i)} \\ (3 + 2 + 4)/3 &= 3.00 \text{ (rank ii)} \\ (4 + 4 + 6)/3 &= 4.67 \text{ (rank iii)} \\ (5 + 4 + 8)/3 &= 5.67 \text{ (rank iv)} \end{align} Now take the ranking order from earlier and substitute in the means according to their corresponding ranks: \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] \rm iv & \rm iii & \rm i \\ \rm i & \rm i & \rm ii \\ \rm ii & \rm iii & \rm iii \\ \rm iii & \rm ii & \rm iv \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5.67 & 4.67 & 2.00 \\ 2.00 & 2.00 & 3.00 \\ 3.00 & 4.67 & 4.67 \\ 4.67 & 3.00 & 5.67 \end{matrix} These are the new normalized values. However, note that when, as in column two, values are tied in rank, they should instead be assigned the mean of the values corresponding to the ranks they would normally represent if they were different. In the case of column 2, they represent ranks iii and iv. So we assign the two tied rank iii entries the average of rank iii and rank iv ((4.67 + 5.67)/2 = 5.17). And so we arrive at the following set of normalized values: \begin{matrix} & {} \\[6pt] & A: \\ & B: \\ & C: \\ & D: \end{matrix} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5.67 & \bold{4.67} & 2.00 \\ 2.00 & 2.00 & 3.00 \\ 3.00 & \bold{4.67} & 4.67 \\ 4.67 & 3.00 & 5.67 \end{matrix} \quad \longrightarrow \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 5.67 & \bold{5.17} & 2.00 \\ 2.00 & 2.00 & 3.00 \\ 3.00 & \bold{5.17} & 4.67 \\ 4.67 & 3.00 & 5.67 \end{matrix} The new values have the same distribution and can now be easily compared. Here are the summary statistics for each of the three columns: \begin{array}{r} & {} \\[6pt] & \text{Min}: \\ & \text{1st Qrt}: \\ & \text{Median}: \\ & \text{Mean}: \\ & \text{3rd Qrt}: \\ & \text{Max}: \end{array} \quad \begin{matrix} \underline 1 & \underline 2 & \underline 3 \\[6pt] 2.00 & 2.00 & 2.00 \\ 2.75 & 2.75 & 2.75 \\ 3.83 & 4.08 & 3.83 \\ 3.83 & 3.83 & 3.83 \\ 4.92 & 5.17 & 4.92 \\ 5.67 & 5.17 & 5.67 \end{matrix} ==References==