Interpretation of a Bland-Altman plot is contingent on the construction of the plot and data at hand. Variations to the default plot have introduced throughout the years and each should be interpreted accordingly.
Original construction The original plot displays a
scatter plot of differences between individual data points. The differences should be of the new reference system minus a
gold standard. The 95% limits of agreement can be unreliable estimates of the population parameters especially for small sample sizes so, when comparing methods or assessing
repeatability, it is important to calculate confidence intervals for 95% limits of agreement. This can be done by Bland and Altman's approximate method
Sample size and power estimation Determining an adequate sample size is a key consideration in Bland–Altman analysis, as it influences the precision of the estimated limits of agreement and the statistical power to detect clinically meaningful differences between measurement methods. Historically, there has been limited formal guidance on how to perform power or sample size calculations for Bland–Altman studies. Early recommendations by Martin Bland suggested estimating sample size from the expected width of the confidence interval for the limits of agreement, an approach that does not explicitly account for Type II error and may yield insufficient sample sizes for typical study designs. A more rigorous approach was later introduced by Lu
et al. (2016), who proposed a statistical framework for assessing power and determining sample size based on the distribution of measurement differences and predefined limits of clinical agreement. Their method explicitly incorporates Type II error control and provides more accurate estimates of required sample sizes for studies targeting a given statistical power, typically 80%. Simulation studies in that work demonstrated good performance of the method under practical conditions; however, the authors did not provide publicly available software to implement the approach. Several software packages now include implementations of the Lu
et al. methodology. The commercial MedCalc statistical software provides sample size and power estimation tools for Bland–Altman analyses. In addition, an open-source implementation is available in the R package
blandPower, which provides functions to estimate power curves, determine required sample sizes, and visualize confidence interval widths as a function of sample size. The
blandPower package was developed to promote reproducibility and accessibility of power and sample size calculations for method comparison studies using the Bland–Altman framework.
Visualization variations In the case that the differences grow proportionally to the magnitude of the data, then the data is said to have a 'proportional bias'. There are many methods for visualizing the plot and subsequent analysis to accommodate for it. Firstly, a
linear regression could illustrate any relevant trends. If the distribution of differences are equal at all points around the regression the data is said to be
homoscedastic and the trend is a simple proportional bias. Inversely, if the data has wider spread at different magnitudes of the data, then the differences are said to be
heteroscedastic, which has further implications. Statistical tests such as the
Breusch–Pagan test or the
White test can provide statistical indicators of heteroscedasticity.One typical example of a plot with heteroscedastic data is one whose
variation of differences grows proportional to the magnitude of the data, visualized as an expanding 'v' shape. Similarly, the plot of differences could be visualized
logarithmically. In either case, the relationship between the two systems illustrates a multiplicative relationship as opposed to linear one. This also indicates that the magnitude of the data correlates with variations of accuracy for the systems. == Application ==