In May of 1973 we organized an international research colloquium on foundations of probability, statistics, and statistical theories of science at the University of Western Ontario. During the past four decades there have been striking formal advances in our understanding of logic, semantics and algebraic structure in probabilistic and statistical theories. These advances, which include the development of the relations between semantics and metamathematics, between logics and algebras and the algebraic-geometrical foundations of statistical theories (especially in the sciences), have led to striking new insights into the formal and conceptual structure of probability and statistical theory and their scientific applications in the form of scientific theory. The foundations of statistics are in a state of profound conflict. Fisher's objections to some aspects of Neyman-Pearson statistics have long been well known. More recently the emergence of Bayesian statistics as a radical alternative to standard views has made the conflict especially acute. In recent years the response of many practising statisticians to the conflict has been an eclectic approach to statistical inference. Many good statisticians have developed a kind of wisdom which enables them to know which problems are most appropriately handled by each of the methods available. The search for principles which would explain why each of the methods works where it does and fails where it does offers a fruitful approach to the controversy over foundations.
In May of 1973 we organized an international research colloquium on foundations of probability, statistics, and statistical theories of science at the University of Western Ontario. During the past four decades there have been striking formal advances in our understanding of logic, semantics and algebraic structure in probabilistic and statistical theories. These advances, which include the development of the relations between semantics and metamathematics, between logics and algebras and the algebraic-geometrical foundations of statistical theories (especially in the sciences), have led to striking new insights into the formal and conceptual structure of probability and statistical theory and their scientific applications in the form of scientific theory. The foundations of statistics are in a state of profound conflict. Fisher's objections to some aspects of Neyman-Pearson statistics have long been well known. More recently the emergence of Bayesian statistics as a radical alternative to standard views has made the conflict especially acute. In recent years the response of many practising statisticians to the conflict has been an eclectic approach to statistical inference. Many good statisticians have developed a kind of wisdom which enables them to know which problems are most appropriately handled by each of the methods available. The search for principles which would explain why each of the methods works where it does and fails where it does offers a fruitful approach to the controversy over foundations.
Statistical Foundations of Data Science gives a thorough introduction to commonly used statistical models, contemporary statistical machine learning techniques and algorithms, along with their mathematical insights and statistical theories. It aims to serve as a graduate-level textbook and a research monograph on high-dimensional statistics, sparsity and covariance learning, machine learning, and statistical inference. It includes ample exercises that involve both theoretical studies as well as empirical applications. The book begins with an introduction to the stylized features of big data and their impacts on statistical analysis. It then introduces multiple linear regression and expands the techniques of model building via nonparametric regression and kernel tricks. It provides a comprehensive account on sparsity explorations and model selections for multiple regression, generalized linear models, quantile regression, robust regression, hazards regression, among others. High-dimensional inference is also thoroughly addressed and so is feature screening. The book also provides a comprehensive account on high-dimensional covariance estimation, learning latent factors and hidden structures, as well as their applications to statistical estimation, inference, prediction and machine learning problems. It also introduces thoroughly statistical machine learning theory and methods for classification, clustering, and prediction. These include CART, random forests, boosting, support vector machines, clustering algorithms, sparse PCA, and deep learning.
The revised second edition of this textbook provides the reader with a solid foundation in probability theory and statistics as applied to the physical sciences, engineering and related fields. It covers a broad range of numerical and analytical methods that are essential for the correct analysis of scientific data, including probability theory, distribution functions of statistics, fits to two-dimensional data and parameter estimation, Monte Carlo methods and Markov chains. Features new to this edition include: • a discussion of statistical techniques employed in business science, such as multiple regression analysis of multivariate datasets. • a new chapter on the various measures of the mean including logarithmic averages. • new chapters on systematic errors and intrinsic scatter, and on the fitting of data with bivariate errors. • a new case study and additional worked examples. • mathematical derivations and theoretical background material have been appropriately marked, to improve the readability of the text. • end-of-chapter summary boxes, for easy reference. As in the first edition, the main pedagogical method is a theory-then-application approach, where emphasis is placed first on a sound understanding of the underlying theory of a topic, which becomes the basis for an efficient and practical application of the material. The level is appropriate for undergraduates and beginning graduate students, and as a reference for the experienced researcher. Basic calculus is used in some of the derivations, and no previous background in probability and statistics is required. The book includes many numerical tables of data, as well as exercises and examples to aid the readers' understanding of the topic.