The process of developing predictive models includes many stages. Most resources focus on the modeling algorithms but neglect other critical aspects of the modeling process. This book describes techniques for finding the best representations of predictors for modeling and for nding the best subset of predictors for improving model performance. A variety of example data sets are used to illustrate the techniques along with R programs for reproducing the results.
Forecasting is required in many situations. Stocking an inventory may require forecasts of demand months in advance. Telecommunication routing requires traffic forecasts a few minutes ahead. Whatever the circumstances or time horizons involved, forecasting is an important aid in effective and efficient planning. This textbook provides a comprehensive introduction to forecasting methods and presents enough information about each method for readers to use them sensibly.
Missing data pose challenges to real-life data analysis. Simple ad-hoc fixes, like deletion or mean imputation, only work under highly restrictive conditions, which are often not met in practice. Multiple imputation replaces each missing value by multiple plausible values. The variability between these replacements reflects our ignorance of the true (but missing) value. Each of the completed data set is then analyzed by standard methods, and the results are pooled to obtain unbiased estimates with correct confidence intervals. Multiple imputation is a general approach that also inspires novel solutions to old problems by reformulating the task at hand as a missing-data problem. This is the second edition of a popular book on multiple imputation, focused on explaining the application of methods through detailed worked examples using the MICE package as developed by the author. This new edition incorporates the recent developments in this fast-moving field. This class-tested book avoids mathematical and technical details as much as possible: formulas are accompanied by verbal statements that explain the formula in accessible terms. The book sharpens the reader’s intuition on how to think about missing data, and provides all the tools needed to execute a well-grounded quantitative analysis in the presence of missing data.
This important book describes procedures for selecting a model from a large set of competing statistical models. It includes model selection techniques for univariate and multivariate regression models, univariate and multivariate autoregressive models, nonparametric (including wavelets) and semiparametric regression models, and quasi-likelihood and robust regression models. Information-based model selection criteria are discussed, and small sample and asymptotic properties are presented. The book also provides examples and large scale simulation studies comparing the performances of information-based model selection criteria, bootstrapping, and cross-validation selection methods over a wide range of models.
Complex multivariate testing problems are frequently encountered in many scientific disciplines, such as engineering, medicine and the social sciences. As a result, modern statistics needs permutation testing for complex data with low sample size and many variables, especially in observational studies. The Authors give a general overview on permutation tests with a focus on recent theoretical advances within univariate and multivariate complex permutation testing problems, this book brings the reader completely up to date with today’s current thinking. Key Features: Examines the most up-to-date methodologies of univariate and multivariate permutation testing. Includes extensive software codes in MATLAB, R and SAS, featuring worked examples, and uses real case studies from both experimental and observational studies. Includes a standalone free software NPC Test Release 10 with a graphical interface which allows practitioners from every scientific field to easily implement almost all complex testing procedures included in the book. Presents and discusses solutions to the most important and frequently encountered real problems in multivariate analyses. A supplementary website containing all of the data sets examined in the book along with ready to use software codes. Together with a wide set of application cases, the Authors present a thorough theory of permutation testing both with formal description and proofs, and analysing real case studies. Practitioners and researchers, working in different scientific fields such as engineering, biostatistics, psychology or medicine will benefit from this book.
Linear regression with one predictor variable; Inferences in regression and correlation analysis; Diagnosticis and remedial measures; Simultaneous inferences and other topics in regression analysis; Matrix approach to simple linear regression analysis; Multiple linear regression; Nonlinear regression; Design and analysis of single-factor studies; Multi-factor studies; Specialized study designs.
Many texts are excellent sources of knowledge about individual statistical tools, but the art of data analysis is about choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for dealing with nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. This text realistically deals with model uncertainty and its effects on inference to achieve "safe data mining".
Discover New Methods for Dealing with High-Dimensional DataA sparse statistical model has only a small number of nonzero parameters or weights; therefore, it is much easier to estimate and interpret than a dense model. Statistical Learning with Sparsity: The Lasso and Generalizations presents methods that exploit sparsity to help recover the underl
This guide to small area estimation aims to help users compile more reliable granular or disaggregated data in cost-effective ways. It explains small area estimation techniques with examples of how the easily accessible R analytical platform can be used to implement them, particularly to estimate indicators on poverty, employment, and health outcomes. The guide is intended for staff of national statistics offices and for other development practitioners. It aims to help them to develop and implement targeted socioeconomic policies to ensure that the vulnerable segments of societies are not left behind, and to monitor progress toward the Sustainable Development Goals.