Many texts are excellent sources of knowledge about individual statistical tools, but the art of data analysis is about choosing and using multiple tools. Instead of presenting isolated techniques, this text emphasizes problem solving strategies that address the many issues arising when developing multivariable models using real data and not standard textbook examples. It includes imputation methods for dealing with missing data effectively, methods for dealing with nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with "too many variables to analyze and not enough observations," and powerful model validation techniques based on the bootstrap. This text realistically deals with model uncertainty and its effects on inference to achieve "safe data mining".
This book presents information on regressions modeling strategies that address many issues arising when developing multivariable models using (real data) examples. It includes imputation methods for dealing with missing data effectively, methods for dealing with nonlinear relationships and for making the estimation of transformations a formal part of the modeling process, methods for dealing with “too many variables to analyze and not enough observations”, and powerful model validation techniques based on the bootstrap. The text deals with model uncertainty and its effect on inference. It also presents many graphical methods for communicating complex regression models to nonstatisticians.
Praise for the First Edition "The attention to detail is impressive. The book is very well written and the author is extremely careful with his descriptions . . . the examples are wonderful." —The American Statistician Fully revised to reflect the latest methodologies and emerging applications, Applied Regression Modeling, Second Edition continues to highlight the benefits of statistical methods, specifically regression analysis and modeling, for understanding, analyzing, and interpreting multivariate data in business, science, and social science applications. The author utilizes a bounty of real-life examples, case studies, illustrations, and graphics to introduce readers to the world of regression analysis using various software packages, including R, SPSS, Minitab, SAS, JMP, and S-PLUS. In a clear and careful writing style, the book introduces modeling extensions that illustrate more advanced regression techniques, including logistic regression, Poisson regression, discrete choice models, multilevel models, and Bayesian modeling. In addition, the Second Edition features clarification and expansion of challenging topics, such as: Transformations, indicator variables, and interaction Testing model assumptions Nonconstant variance Autocorrelation Variable selection methods Model building and graphical interpretation Throughout the book, datasets and examples have been updated and additional problems are included at the end of each chapter, allowing readers to test their comprehension of the presented material. In addition, a related website features the book's datasets, presentation slides, detailed statistical software instructions, and learning resources including additional problems and instructional videos. With an intuitive approach that is not heavy on mathematical detail, Applied Regression Modeling, Second Edition is an excellent book for courses on statistical regression analysis at the upper-undergraduate and graduate level. The book also serves as a valuable resource for professionals and researchers who utilize statistical methods for decision-making in their everyday work.
From the reviews of the First Edition. "An interesting, useful, and well-written book on logistic regression models . . . Hosmer and Lemeshow have used very little mathematics, have presented difficult concepts heuristically and through illustrative examples, and have included references." —Choice "Well written, clearly organized, and comprehensive . . . the authors carefully walk the reader through the estimation of interpretation of coefficients from a wide variety of logistic regression models . . . their careful explication of the quantitative re-expression of coefficients from these various models is excellent." —Contemporary Sociology "An extremely well-written book that will certainly prove an invaluable acquisition to the practicing statistician who finds other literature on analysis of discrete data hard to follow or heavily theoretical." —The Statistician In this revised and updated edition of their popular book, David Hosmer and Stanley Lemeshow continue to provide an amazingly accessible introduction to the logistic regression model while incorporating advances of the last decade, including a variety of software packages for the analysis of data sets. Hosmer and Lemeshow extend the discussion from biostatistics and epidemiology to cutting-edge applications in data mining and machine learning, guiding readers step-by-step through the use of modeling techniques for dichotomous data in diverse fields. Ample new topics and expanded discussions of existing material are accompanied by a wealth of real-world examples-with extensive data sets available over the Internet.
The primary focus here is on log-linear models for contingency tables, but in this second edition, greater emphasis has been placed on logistic regression. The book explores topics such as logistic discrimination and generalised linear models, and builds upon the relationships between these basic models for continuous data and the analogous log-linear and logistic regression models for discrete data. It also carefully examines the differences in model interpretations and evaluations that occur due to the discrete nature of the data. Sample commands are given for analyses in SAS, BMFP, and GLIM, while numerous data sets from fields as diverse as engineering, education, sociology, and medicine are used to illustrate procedures and provide exercises. Throughoutthe book, the treatment is designed for students with prior knowledge of analysis of variance and regression.
The second edition of this volume provides insight and practical illustrations on how modern statistical concepts and regression methods can be applied in medical prediction problems, including diagnostic and prognostic outcomes. Many advances have been made in statistical approaches towards outcome prediction, but a sensible strategy is needed for model development, validation, and updating, such that prediction models can better support medical practice. There is an increasing need for personalized evidence-based medicine that uses an individualized approach to medical decision-making. In this Big Data era, there is expanded access to large volumes of routinely collected data and an increased number of applications for prediction models, such as targeted early detection of disease and individualized approaches to diagnostic testing and treatment. Clinical Prediction Models presents a practical checklist that needs to be considered for development of a valid prediction model. Steps include preliminary considerations such as dealing with missing values; coding of predictors; selection of main effects and interactions for a multivariable model; estimation of model parameters with shrinkage methods and incorporation of external data; evaluation of performance and usefulness; internal validation; and presentation formatting. The text also addresses common issues that make prediction models suboptimal, such as small sample sizes, exaggerated claims, and poor generalizability. The text is primarily intended for clinical epidemiologists and biostatisticians. Including many case studies and publicly available R code and data sets, the book is also appropriate as a textbook for a graduate course on predictive modeling in diagnosis and prognosis. While practical in nature, the book also provides a philosophical perspective on data analysis in medicine that goes beyond predictive modeling. Updates to this new and expanded edition include: • A discussion of Big Data and its implications for the design of prediction models • Machine learning issues • More simulations with missing ‘y’ values • Extended discussion on between-cohort heterogeneity • Description of ShinyApp • Updated LASSO illustration • New case studies
This book focuses on tools and techniques for building regression models using real-world data and assessing their validity. A key theme throughout the book is that it makes sense to base inferences or conclusions only on valid models. Plots are shown to be an important tool for both building regression models and assessing their validity. We shall see that deciding what to plot and how each plot should be interpreted will be a major challenge. In order to overcome this challenge we shall need to understand the mathematical properties of the fitted regression models and associated diagnostic procedures. As such this will be an area of focus throughout the book. In particular, we shall carefully study the properties of resi- als in order to understand when patterns in residual plots provide direct information about model misspecification and when they do not. The regression output and plots that appear throughout the book have been gen- ated using R. The output from R that appears in this book has been edited in minor ways. On the book web site you will find the R code used in each example in the text.
Combining a modern, data-analytic perspective with a focus on applications in the social sciences, the Third Edition of Applied Regression Analysis and Generalized Linear Models provides in-depth coverage of regression analysis, generalized linear models, and closely related methods, such as bootstrapping and missing data. Updated throughout, this Third Edition includes new chapters on mixed-effects models for hierarchical and longitudinal data. Although the text is largely accessible to readers with a modest background in statistics and mathematics, author John Fox also presents more advanced material in optional sections and chapters throughout the book. Accompanying website resources containing all answers to the end-of-chapter exercises. Answers to odd-numbered questions, as well as datasets and other student resources are available on the author′s website. NEW! Bonus chapter on Bayesian Estimation of Regression Models also available at the author′s website.
A one-stop guide for public health students and practitioners learning the applications of classical regression models in epidemiology This book is written for public health professionals and students interested in applying regression models in the field of epidemiology. The academic material is usually covered in public health courses including (i) Applied Regression Analysis, (ii) Advanced Epidemiology, and (iii) Statistical Computing. The book is composed of 13 chapters, including an introduction chapter that covers basic concepts of statistics and probability. Among the topics covered are linear regression model, polynomial regression model, weighted least squares, methods for selecting the best regression equation, and generalized linear models and their applications to different epidemiological study designs. An example is provided in each chapter that applies the theoretical aspects presented in that chapter. In addition, exercises are included and the final chapter is devoted to the solutions of these academic exercises with answers in all of the major statistical software packages, including STATA, SAS, SPSS, and R. It is assumed that readers of this book have a basic course in biostatistics, epidemiology, and introductory calculus. The book will be of interest to anyone looking to understand the statistical fundamentals to support quantitative research in public health. In addition, this book: • Is based on the authors’ course notes from 20 years teaching regression modeling in public health courses • Provides exercises at the end of each chapter • Contains a solutions chapter with answers in STATA, SAS, SPSS, and R • Provides real-world public health applications of the theoretical aspects contained in the chapters Applications of Regression Models in Epidemiology is a reference for graduate students in public health and public health practitioners. ERICK SUÁREZ is a Professor of the Department of Biostatistics and Epidemiology at the University of Puerto Rico School of Public Health. He received a Ph.D. degree in Medical Statistics from the London School of Hygiene and Tropical Medicine. He has 29 years of experience teaching biostatistics. CYNTHIA M. PÉREZ is a Professor of the Department of Biostatistics and Epidemiology at the University of Puerto Rico School of Public Health. She received an M.S. degree in Statistics and a Ph.D. degree in Epidemiology from Purdue University. She has 22 years of experience teaching epidemiology and biostatistics. ROBERTO RIVERA is an Associate Professor at the College of Business at the University of Puerto Rico at Mayaguez. He received a Ph.D. degree in Statistics from the University of California in Santa Barbara. He has more than five years of experience teaching statistics courses at the undergraduate and graduate levels. MELISSA N. MARTÍNEZ is an Account Supervisor at Havas Media International. She holds an MPH in Biostatistics from the University of Puerto Rico and an MSBA from the National University in San Diego, California. For the past seven years, she has been performing analyses for the biomedical research and media advertising fields.
′The editors of the new SAGE Handbook of Regression Analysis and Causal Inference have assembled a wide-ranging, high-quality, and timely collection of articles on topics of central importance to quantitative social research, many written by leaders in the field. Everyone engaged in statistical analysis of social-science data will find something of interest in this book.′ - John Fox, Professor, Department of Sociology, McMaster University ′The authors do a great job in explaining the various statistical methods in a clear and simple way - focussing on fundamental understanding, interpretation of results, and practical application - yet being precise in their exposition.′ - Ben Jann, Executive Director, Institute of Sociology, University of Bern ′Best and Wolf have put together a powerful collection, especially valuable in its separate discussions of uses for both cross-sectional and panel data analysis.′ -Tom Smith, Senior Fellow, NORC, University of Chicago Edited and written by a team of leading international social scientists, this Handbook provides a comprehensive introduction to multivariate methods. The Handbook focuses on regression analysis of cross-sectional and longitudinal data with an emphasis on causal analysis, thereby covering a large number of different techniques including selection models, complex samples, and regression discontinuities. Each Part starts with a non-mathematical introduction to the method covered in that section, giving readers a basic knowledge of the method’s logic, scope and unique features. Next, the mathematical and statistical basis of each method is presented along with advanced aspects. Using real-world data from the European Social Survey (ESS) and the Socio-Economic Panel (GSOEP), the book provides a comprehensive discussion of each method’s application, making this an ideal text for PhD students and researchers embarking on their own data analysis.