Missing data reduces the representativeness of the sample and can therefore distort inferences about the population. Generally speaking, there are three main approaches to handle missing data: (1)
Imputation—where values are filled in the place of missing data, (2)
omission—where samples with invalid data are discarded from further analysis and (3)
analysis—by directly applying methods unaffected by the missing values. One systematic review addressing the prevention and handling of missing data for patient-centered outcomes research identified 10 standards as necessary for the prevention and handling of missing data. These include standards for study design, study conduct, analysis, and reporting. In some practical application, the experimenters can control the level of missingness, and prevent missing values before gathering the data. For example, in computer questionnaires, it is often not possible to skip a question. A question has to be answered, otherwise one cannot continue to the next. So missing values due to the participant are eliminated by this type of questionnaire, though this method may not be permitted by an ethics board overseeing the research. In survey research, it is common to make multiple efforts to contact each individual in the sample, often sending letters to attempt to persuade those who have decided not to participate to change their minds. However, such techniques can either help or hurt in terms of reducing the negative inferential effects of missing data, because the kind of people who are willing to be persuaded to participate after initially refusing or not being home are likely to be significantly different from the kinds of people who will still refuse or remain unreachable after additional effort. Any multiply-imputed data analysis must be repeated for each of the imputed data sets and, in some cases, the relevant statistics must be combined in a relatively complicated way. Methods such as listwise deletion have been used to impute data but it has been found to introduce additional bias. There is a beginner guide that provides a step-by-step instruction how to impute data. The
expectation-maximization algorithm is an approach in which values of the statistics which would be computed if a complete dataset were available are estimated (imputed), taking into account the pattern of missing data. In this approach, values for individual missing data-items are not usually imputed.
Interpolation In the mathematical field of numerical analysis,
interpolation is a method of constructing new data points within the range of a discrete set of known data points. In the comparison of two paired samples with missing data, a test statistic that uses all available data without the need for imputation is the partially overlapping samples t-test. This is valid under normality and assuming MCAR
Partial deletion Methods which involve reducing the data available to a dataset having no missing values include: •
Listwise deletion/casewise deletion • Pairwise deletion
Full analysis Methods which take full account of all information available, without the distortion resulting from using imputed values as if they were actually observed: • Generative approaches: • The
expectation-maximization algorithm • full information
maximum likelihood estimation • Discriminative approaches: • Max-margin classification of data with absent features
Partial identification methods may also be used. ==Model-based techniques==