Kernel smoothing refers to a general methodology for recovery of underlying structure in data sets. The basic principle is that local averaging or smoothing is performed with respect to a kernel function. This book provides uninitiated readers with a feeling for the principles, applications, and analysis of kernel smoothers. This is facilita
Kernel smoothing refers to a general methodology for recovery of underlying structure in data sets. The basic principle is that local averaging or smoothing is performed with respect to a kernel function. This book provides uninitiated readers with a feeling for the principles, applications, and analysis of kernel smoothers. This is facilitated by the authors' focus on the simplest settings, namely density estimation and nonparametric regression. They pay particular attention to the problem of choosing the smoothing parameter of a kernel smoother, and also treat the multivariate case in detail. Kernal Smoothing is self-contained and assumes only a basic knowledge of statistics, calculus, and matrix algebra. It is an invaluable introduction to the main ideas of kernel estimation for students and researchers from other discipline and provides a comprehensive reference for those familiar with the topic.
Kernel smoothing has greatly evolved since its inception to become an essential methodology in the data science tool kit for the 21st century. Its widespread adoption is due to its fundamental role for multivariate exploratory data analysis, as well as the crucial role it plays in composite solutions to complex data challenges. Multivariate Kernel Smoothing and Its Applications offers a comprehensive overview of both aspects. It begins with a thorough exposition of the approaches to achieve the two basic goals of estimating probability density functions and their derivatives. The focus then turns to the applications of these approaches to more complex data analysis goals, many with a geometric/topological flavour, such as level set estimation, clustering (unsupervised learning), principal curves, and feature significance. Other topics, while not direct applications of density (derivative) estimation but sharing many commonalities with the previous settings, include classification (supervised learning), nearest neighbour estimation, and deconvolution for data observed with error. For a data scientist, each chapter contains illustrative Open data examples that are analysed by the most appropriate kernel smoothing method. The emphasis is always placed on an intuitive understanding of the data provided by the accompanying statistical visualisations. For a reader wishing to investigate further the details of their underlying statistical reasoning, a graduated exposition to a unified theoretical framework is provided. The algorithms for efficient software implementation are also discussed. José E. Chacón is an associate professor at the Department of Mathematics of the Universidad de Extremadura in Spain. Tarn Duong is a Senior Data Scientist for a start-up which provides short distance carpooling services in France. Both authors have made important contributions to kernel smoothing research over the last couple of decades.
Comprehensive theoretical overview of kernel smoothing methods with motivating examples Kernel smoothing is a flexible nonparametric curve estimation method that is applicable when parametric descriptions of the data are not sufficiently adequate. This book explores theory and methods of kernel smoothing in a variety of contexts, considering independent and correlated data e.g. with short-memory and long-memory correlations, as well as non-Gaussian data that are transformations of latent Gaussian processes. These types of data occur in many fields of research, e.g. the natural and the environmental sciences, and others. Nonparametric density estimation, nonparametric and semiparametric regression, trend and surface estimation in particular for time series and spatial data and other topics such as rapid change points, robustness etc. are introduced alongside a study of their theoretical properties and optimality issues, such as consistency and bandwidth selection. Addressing a variety of topics, Kernel Smoothing: Principles, Methods and Applications offers a user-friendly presentation of the mathematical content so that the reader can directly implement the formulas using any appropriate software. The overall aim of the book is to describe the methods and their theoretical backgrounds, while maintaining an analytically simple approach and including motivating examples—making it extremely useful in many sciences such as geophysics, climate research, forestry, ecology, and other natural and life sciences, as well as in finance, sociology, and engineering. A simple and analytical description of kernel smoothing methods in various contexts Presents the basics as well as new developments Includes simulated and real data examples Kernel Smoothing: Principles, Methods and Applications is a textbook for senior undergraduate and graduate students in statistics, as well as a reference book for applied statisticians and advanced researchers.
Summary: Offers a comprehensive overview of statistical theory and emphases the implementation of presented methods in Matlab. This title contains various Matlab scripts useful for kernel smoothing of density, cumulative distribution function, regression function, hazard function, indices of quality and bivariate density.
This book describes computational problems related to kernel density estimation (KDE) – one of the most important and widely used data smoothing techniques. A very detailed description of novel FFT-based algorithms for both KDE computations and bandwidth selection are presented. The theory of KDE appears to have matured and is now well developed and understood. However, there is not much progress observed in terms of performance improvements. This book is an attempt to remedy this. The book primarily addresses researchers and advanced graduate or postgraduate students who are interested in KDE and its computational aspects. The book contains both some background and much more sophisticated material, hence also more experienced researchers in the KDE area may find it interesting. The presented material is richly illustrated with many numerical examples using both artificial and real datasets. Also, a number of practical applications related to KDE are presented.
The book describes the use of smoothing techniques in statistics, including both density estimation and nonparametric regression. Considerable advances in research in this area have been made in recent years. The aim of this text is to describe a variety of ways in which these methods can be applied to practical problems in statistics. The role of smoothing techniques in exploring data graphically is emphasised, but the use of nonparametric curves in drawing conclusions from data, as an extension of more standard parametric models, is also a major focus of the book. Examples are drawn from a wide range of applications. The book is intended for those who seek an introduction to the area, with an emphasis on applications rather than on detailed theory. It is therefore expected that the book will benefit those attending courses at an advanced undergraduate, or postgraduate, level, as well as researchers, both from statistics and from other disciplines, who wish to learn about and apply these techniques in practical data analysis. The text makes extensive reference to S-Plus, as a computing environment in which examples can be explored. S-Plus functions and example scripts are provided to implement many of the techniques described. These parts are, however, clearly separate from the main body of text, and can therefore easily be skipped by readers not interested in S-Plus.
A comprehensive, up-to-date textbook on nonparametric methods for students and researchers Until now, students and researchers in nonparametric and semiparametric statistics and econometrics have had to turn to the latest journal articles to keep pace with these emerging methods of economic analysis. Nonparametric Econometrics fills a major gap by gathering together the most up-to-date theory and techniques and presenting them in a remarkably straightforward and accessible format. The empirical tests, data, and exercises included in this textbook help make it the ideal introduction for graduate students and an indispensable resource for researchers. Nonparametric and semiparametric methods have attracted a great deal of attention from statisticians in recent decades. While the majority of existing books on the subject operate from the presumption that the underlying data is strictly continuous in nature, more often than not social scientists deal with categorical data—nominal and ordinal—in applied settings. The conventional nonparametric approach to dealing with the presence of discrete variables is acknowledged to be unsatisfactory. This book is tailored to the needs of applied econometricians and social scientists. Qi Li and Jeffrey Racine emphasize nonparametric techniques suited to the rich array of data types—continuous, nominal, and ordinal—within one coherent framework. They also emphasize the properties of nonparametric estimators in the presence of potentially irrelevant variables. Nonparametric Econometrics covers all the material necessary to understand and apply nonparametric methods for real-world problems.
The author has attempted to present a book that provides a non-technical introduction into the area of non-parametric density and regression function estimation. The application of these methods is discussed in terms of the S computing environment. Smoothing in high dimensions faces the problem of data sparseness. A principal feature of smoothing, the averaging of data points in a prescribed neighborhood, is not really practicable in dimensions greater than three if we have just one hundred data points. Additive models provide a way out of this dilemma; but, for their interactiveness and recursiveness, they require highly effective algorithms. For this purpose, the method of WARPing (Weighted Averaging using Rounded Points) is described in great detail.
Introduction to Data Science: Data Analysis and Prediction Algorithms with R introduces concepts and skills that can help you tackle real-world data analysis challenges. It covers concepts from probability, statistical inference, linear regression, and machine learning. It also helps you develop skills such as R programming, data wrangling, data visualization, predictive algorithm building, file organization with UNIX/Linux shell, version control with Git and GitHub, and reproducible document preparation. This book is a textbook for a first course in data science. No previous knowledge of R is necessary, although some experience with programming may be helpful. The book is divided into six parts: R, data visualization, statistics with R, data wrangling, machine learning, and productivity tools. Each part has several chapters meant to be presented as one lecture. The author uses motivating case studies that realistically mimic a data scientist’s experience. He starts by asking specific questions and answers these through data analysis so concepts are learned as a means to answering the questions. Examples of the case studies included are: US murder rates by state, self-reported student heights, trends in world health and economics, the impact of vaccines on infectious disease rates, the financial crisis of 2007-2008, election forecasting, building a baseball team, image processing of hand-written digits, and movie recommendation systems. The statistical concepts used to answer the case study questions are only briefly introduced, so complementing with a probability and statistics textbook is highly recommended for in-depth understanding of these concepts. If you read and understand the chapters and complete the exercises, you will be prepared to learn the more advanced concepts and skills needed to become an expert.