MarketOne in ten rule
Company Profile

One in ten rule

In statistics, the one in ten rule is a rule of thumb for how many predictor parameters can be estimated from data when doing regression analysis while keeping the risk of overfitting and finding spurious correlations low. The rule states that one predictive variable can be studied for every ten events. For logistic regression the number of events is given by the size of the smallest of the outcome categories, and for survival analysis it is given by the number of uncensored events. In other words: for each feature we need 10 observations/labels.

Improvements
A "one in 20 rule" has been suggested, indicating the need for shrinkage of regression coefficients, and a "one in 50 rule" for stepwise selection with the default p-value of 5%. Other studies, however, show that the one in ten rule may be too conservative as a general recommendation and that five to nine events per predictor can be enough, depending on the research question. More recently, a study has shown that the ratio of events per predictive variable is not a reliable statistic for estimating the minimum number of events for estimating a logistic prediction model. Instead, the number of predictor variables, the total sample size (events + non-events) and the events fraction (events / total sample size) can be used to calculate the expected prediction error of the model that is to be developed. One can then estimate the required sample size to achieve an expected prediction error that is smaller than a predetermined allowable prediction error value. The necessary sample size and number of events for model development are then given by the values that meet these requirements. == Other modalities ==
Other modalities
For highly correlated input data the one-in-10 rule (10 observations or labels needed per feature) may not be directly applicable due to the high correlation of the features: For images there is a rule of thumb that per class 1000 examples are needed. This would mean that for a binary classification of images (with fictive 1000 pixel x 1000 pixel per image, i.e. 1 000 000 features per image), we would only require 2000 labels /1 000 0000 pixel = 0.002 labels per pixel or 0.002 labels per feature. This is however only due to the high (spatial) correlation of pixels. == Literature ==
Literature
• David A. Freedman (1983) "A Note on Screening Regression Equations," The American Statistician, 37:2, 152–155, ==References==
tickerdossier.comtickerdossier.substack.com