Missing data pose challenges to real-life data analysis. Simple ad-hoc fixes, like deletion or mean imputation, only work under highly restrictive conditions, which are often not met in practice. Multiple imputation replaces each missing value by multiple plausible values. The variability between these replacements reflects our ignorance of the true (but missing) value. Each of the completed data set is then analyzed by standard methods, and the results are pooled to obtain unbiased estimates with correct confidence intervals. Multiple imputation is a general approach that also inspires novel solutions to old problems by reformulating the task at hand as a missing-data problem. This is the second edition of a popular book on multiple imputation, focused on explaining the application of methods through detailed worked examples using the MICE package as developed by the author. This new edition incorporates the recent developments in this fast-moving field. This class-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by verbal statements that explain the formula in accessible terms. The book sharpens the reader’s intuition on how to think about missing data, and provides all the tools needed to execute a well-grounded quantitative analysis in the presence of missing data.
This book presents a systematic and unified approach for modern nonparametric treatment of missing and modified data via examples of density and hazard rate estimation, nonparametric regression, filtering signals, and time series analysis. All basic types of missing at random and not at random, biasing, truncation, censoring, and measurement errors are discussed, and their treatment is explained. Ten chapters of the book cover basic cases of direct data, biased data, nondestructive and destructive missing, survival data modified by truncation and censoring, missing survival data, stationary and nonstationary time series and processes, and ill-posed modifications. The coverage is suitable for self-study or a one-semester course for graduate students with a prerequisite of a standard course in introductory probability. Exercises of various levels of difficulty will be helpful for the instructor and self-study. The book is primarily about practically important small samples. It explains when consistent estimation is possible, and why in some cases missing data should be ignored and why others must be considered. If missing or data modification makes consistent estimation impossible, then the author explains what type of action is needed to restore the lost information. The book contains more than a hundred figures with simulated data that explain virtually every setting, claim, and development. The companion R software package allows the reader to verify, reproduce and modify every simulation and used estimators. This makes the material fully transparent and allows one to study it interactively. Sam Efromovich is the Endowed Professor of Mathematical Sciences and the Head of the Actuarial Program at the University of Texas at Dallas. He is well known for his work on the theory and application of nonparametric curve estimation and is the author of Nonparametric Curve Estimation: Methods, Theory, and Applications. Professor Sam Efromovich is a Fellow of the Institute of Mathematical Statistics and the American Statistical Association.
Find guidance on using SAS for multiple imputation and solving common missing data issues. Multiple Imputation of Missing Data Using SAS provides both theoretical background and constructive solutions for those working with incomplete data sets in an engaging example-driven format. It offers practical instruction on the use of SAS for multiple imputation and provides numerous examples that use a variety of public release data sets with applications to survey data. Written for users with an intermediate background in SAS programming and statistics, this book is an excellent resource for anyone seeking guidance on multiple imputation. The authors cover the MI and MIANALYZE procedures in detail, along with other procedures used for analysis of complete data sets. They guide analysts through the multiple imputation process, including evaluation of missing data patterns, choice of an imputation method, execution of the process, and interpretation of results. Topics discussed include how to deal with missing data problems in a statistically appropriate manner, how to intelligently select an imputation method, how to incorporate the uncertainty introduced by the imputation process, and how to incorporate the complex sample design (if appropriate) through use of the SAS SURVEY procedures. Discover the theoretical background and see extensive applications of the multiple imputation process in action. This book is part of the SAS Press program.
A practical guide to analysing partially observeddata. Collecting, analysing and drawing inferences from data iscentral to research in the medical and social sciences.Unfortunately, it is rarely possible to collect all the intendeddata. The literature on inference from the resultingincomplete data is now huge, and continues to grow both asmethods are developed for large and complex data structures, and asincreasing computer power and suitable software enable researchersto apply these methods. This book focuses on a particular statistical method foranalysing and drawing inferences from incomplete data, calledMultiple Imputation (MI). MI is attractive because it is bothpractical and widely applicable. The authors aim is to clarify theissues raised by missing data, describing the rationale for MI, therelationship between the various imputation models and associatedalgorithms and its application to increasingly complex datastructures. Multiple Imputation and its Application: Discusses the issues raised by the analysis of partiallyobserved data, and the assumptions on which analyses rest. Presents a practical guide to the issues to consider whenanalysing incomplete data from both observational studies andrandomized trials. Provides a detailed discussion of the practical use of MI withreal-world examples drawn from medical and social statistics. Explores handling non-linear relationships and interactionswith multiple imputation, survival analysis, multilevel multipleimputation, sensitivity analysis via multiple imputation, usingnon-response weights with multiple imputation and doubly robustmultiple imputation. Multiple Imputation and its Application is aimed atquantitative researchers and students in the medical and socialsciences with the aim of clarifying the issues raised by theanalysis of incomplete data data, outlining the rationale for MIand describing how to consider and address the issues that arise inits application.
Missing data affect nearly every discipline by complicating the statistical analysis of collected data. But since the 1990s, there have been important developments in the statistical methodology for handling missing data. Written by renowned statisticians in this area, Handbook of Missing Data Methodology presents many methodological advances and the latest applications of missing data methods in empirical research. Divided into six parts, the handbook begins by establishing notation and terminology. It reviews the general taxonomy of missing data mechanisms and their implications for analysis and offers a historical perspective on early methods for handling missing data. The following three parts cover various inference paradigms when data are missing, including likelihood and Bayesian methods; semi-parametric methods, with particular emphasis on inverse probability weighting; and multiple imputation methods. The next part of the book focuses on a range of approaches that assess the sensitivity of inferences to alternative, routinely non-verifiable assumptions about the missing data process. The final part discusses special topics, such as missing data in clinical trials and sample surveys as well as approaches to model diagnostics in the missing data setting. In each part, an introduction provides useful background material and an overview to set the stage for subsequent chapters. Covering both established and emerging methodologies for missing data, this book sets the scene for future research. It provides the framework for readers to delve into research and practical applications of missing data methods.
An up-to-date, comprehensive treatment of a classic text on missing data in statistics The topic of missing data has gained considerable attention in recent decades. This new edition by two acknowledged experts on the subject offers an up-to-date account of practical methodology for handling missing data problems. Blending theory and application, authors Roderick Little and Donald Rubin review historical approaches to the subject and describe simple methods for multivariate analysis with missing values. They then provide a coherent theory for analysis of problems based on likelihoods derived from statistical models for the data and the missing data mechanism, and then they apply the theory to a wide range of important missing data problems. Statistical Analysis with Missing Data, Third Edition starts by introducing readers to the subject and approaches toward solving it. It looks at the patterns and mechanisms that create the missing data, as well as a taxonomy of missing data. It then goes on to examine missing data in experiments, before discussing complete-case and available-case analysis, including weighting methods. The new edition expands its coverage to include recent work on topics such as nonresponse in sample surveys, causal inference, diagnostic methods, and sensitivity analysis, among a host of other topics. An updated “classic” written by renowned authorities on the subject Features over 150 exercises (including many new ones) Covers recent work on important methods like multiple imputation, robust alternatives to weighting, and Bayesian methods Revises previous topics based on past student feedback and class experience Contains an updated and expanded bibliography The authors were awarded The Karl Pearson Prize in 2017 by the International Statistical Institute, for a research contribution that has had profound influence on statistical theory, methodology or applications. Their work "has been no less than defining and transforming." (ISI) Statistical Analysis with Missing Data, Third Edition is an ideal textbook for upper undergraduate and/or beginning graduate level students of the subject. It is also an excellent source of information for applied statisticians and practitioners in government and industry.
The last two decades have seen enormous developments in statistical methods for incomplete data. The EM algorithm and its extensions, multiple imputation, and Markov Chain Monte Carlo provide a set of flexible and reliable tools from inference in large classes of missing-data problems. Yet, in practical terms, those developments have had surprisingly little impact on the way most data analysts handle missing values on a routine basis. Analysis of Incomplete Multivariate Data helps bridge the gap between theory and practice, making these missing-data tools accessible to a broad audience. It presents a unified, Bayesian approach to the analysis of incomplete multivariate data, covering datasets in which the variables are continuous, categorical, or both. The focus is applied, where necessary, to help readers thoroughly understand the statistical properties of those methods, and the behavior of the accompanying algorithms. All techniques are illustrated with real data examples, with extended discussion and practical advice. All of the algorithms described in this book have been implemented by the author for general use in the statistical languages S and S Plus. The software is available free of charge on the Internet.
Missing data form a problem in every scientific discipline, yet the techniques required to handle them are complicated and often lacking. One of the great ideas in statistical science—multiple imputation—fills gaps in the data with plausible values, the uncertainty of which is coded in the data itself. It also solves other problems, many of which are missing data problems in disguise. Flexible Imputation of Missing Data is supported by many examples using real data taken from the author's vast experience of collaborative research, and presents a practical guide for handling missing data under the framework of multiple imputation. Furthermore, detailed guidance of implementation in R using the author’s package MICE is included throughout the book. Assuming familiarity with basic statistical concepts and multivariate methods, Flexible Imputation of Missing Data is intended for two audiences: (Bio)statisticians, epidemiologists, and methodologists in the social and health sciences Substantive researchers who do not call themselves statisticians, but who possess the necessary skills to understand the principles and to follow the recipes This graduate-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by a verbal statement that explains the formula in layperson terms. Readers less concerned with the theoretical underpinnings will be able to pick up the general idea, and technical material is available for those who desire deeper understanding. The analyses can be replicated in R using a dedicated package developed by the author.
This book provides practical guidance for statisticians, clinicians, and researchers involved in clinical trials in the biopharmaceutical industry, medical and public health organisations. Academics and students needing an introduction to handling missing data will also find this book invaluable. The authors describe how missing data can affect the outcome and credibility of a clinical trial, show by examples how a clinical team can work to prevent missing data, and present the reader with approaches to address missing data effectively. The book is illustrated throughout with realistic case studies and worked examples, and presents clear and concise guidelines to enable good planning for missing data. The authors show how to handle missing data in a way that is transparent and easy to understand for clinicians, regulators and patients. New developments are presented to improve the choice and implementation of primary and sensitivity analyses for missing data. Many SAS code examples are included – the reader is given a toolbox for implementing analyses under a variety of assumptions.