Modern Dimension Reduction

Modern Dimension Reduction

Author: Philip D. Waggoner

Publisher: Cambridge University Press

Published: 2021-08-05

Total Pages: 98

ISBN-13: 1108991645

DOWNLOAD EBOOK

Data are not only ubiquitous in society, but are increasingly complex both in size and dimensionality. Dimension reduction offers researchers and scholars the ability to make such complex, high dimensional data spaces simpler and more manageable. This Element offers readers a suite of modern unsupervised dimension reduction techniques along with hundreds of lines of R code, to efficiently represent the original high dimensional data space in a simplified, lower dimensional subspace. Launching from the earliest dimension reduction technique principal components analysis and using real social science data, I introduce and walk readers through application of the following techniques: locally linear embedding, t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection, self-organizing maps, and deep autoencoders. The result is a well-stocked toolbox of unsupervised algorithms for tackling the complexities of high dimensional data so common in modern society. All code is publicly accessible on Github.


Dimension Reduction

Dimension Reduction

Author: Christopher J. C. Burges

Publisher: Now Publishers Inc

Published: 2010

Total Pages: 104

ISBN-13: 1601983786

DOWNLOAD EBOOK

We give a tutorial overview of several foundational methods for dimension reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, canonical correlation analysis (CCA), kernel CCA, Fisher discriminant analysis, oriented PCA, and several techniques for sufficient dimension reduction. For the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps, and spectral clustering. Although the review focuses on foundations, we also provide pointers to some more modern techniques. We also describe the correlation dimension as one method for estimating the intrinsic dimension, and we point out that the notion of dimension can be a scale-dependent quantity. The Nystr m method, which links several of the manifold algorithms, is also reviewed. We use a publicly available dataset to illustrate some of the methods. The goal is to provide a self-contained overview of key concepts underlying many of these algorithms, and to give pointers for further reading.


Machine Learning Refined

Machine Learning Refined

Author: Jeremy Watt

Publisher: Cambridge University Press

Published: 2020-01-09

Total Pages: 597

ISBN-13: 1108480721

DOWNLOAD EBOOK

An intuitive approach to machine learning covering key concepts, real-world applications, and practical Python coding exercises.


Active Subspaces

Active Subspaces

Author: Paul G. Constantine

Publisher: SIAM

Published: 2015-03-17

Total Pages: 105

ISBN-13: 1611973864

DOWNLOAD EBOOK

Scientists and engineers use computer simulations to study relationships between a model's input parameters and its outputs. However, thorough parameter studies are challenging, if not impossible, when the simulation is expensive and the model has several inputs. To enable studies in these instances, the engineer may attempt to reduce the dimension of the model's input parameter space. Active subspaces are an emerging set of dimension reduction tools that identify important directions in the parameter space. This book describes techniques for discovering a model's active subspace and proposes methods for exploiting the reduced dimension to enable otherwise infeasible parameter studies. Readers will find new ideas for dimension reduction, easy-to-implement algorithms, and several examples of active subspaces in action.


Generalized Principal Component Analysis

Generalized Principal Component Analysis

Author: René Vidal

Publisher: Springer

Published: 2016-04-11

Total Pages: 590

ISBN-13: 0387878114

DOWNLOAD EBOOK

This book provides a comprehensive introduction to the latest advances in the mathematical theory and computational tools for modeling high-dimensional data drawn from one or multiple low-dimensional subspaces (or manifolds) and potentially corrupted by noise, gross errors, or outliers. This challenging task requires the development of new algebraic, geometric, statistical, and computational methods for efficient and robust estimation and segmentation of one or multiple subspaces. The book also presents interesting real-world applications of these new methods in image processing, image and video segmentation, face recognition and clustering, and hybrid system identification etc. This book is intended to serve as a textbook for graduate students and beginning researchers in data science, machine learning, computer vision, image and signal processing, and systems theory. It contains ample illustrations, examples, and exercises and is made largely self-contained with three Appendices which survey basic concepts and principles from statistics, optimization, and algebraic-geometry used in this book. René Vidal is a Professor of Biomedical Engineering and Director of the Vision Dynamics and Learning Lab at The Johns Hopkins University. Yi Ma is Executive Dean and Professor at the School of Information Science and Technology at ShanghaiTech University. S. Shankar Sastry is Dean of the College of Engineering, Professor of Electrical Engineering and Computer Science and Professor of Bioengineering at the University of California, Berkeley.


High-Dimensional Probability

High-Dimensional Probability

Author: Roman Vershynin

Publisher: Cambridge University Press

Published: 2018-09-27

Total Pages: 299

ISBN-13: 1108415199

DOWNLOAD EBOOK

An integrated package of powerful probabilistic tools and key applications in modern mathematical data science.


Statistical Methods in Molecular Biology

Statistical Methods in Molecular Biology

Author: Heejung Bang

Publisher: Humana

Published: 2016-08-23

Total Pages: 636

ISBN-13: 9781493961245

DOWNLOAD EBOOK

This progressive book presents the basic principles of proper statistical analyses. It progresses to more advanced statistical methods in response to rapidly developing technologies and methodologies in the field of molecular biology.


Unsupervised Machine Learning for Clustering in Political and Social Research

Unsupervised Machine Learning for Clustering in Political and Social Research

Author: Philip D. Waggoner

Publisher: Cambridge University Press

Published: 2021-01-28

Total Pages: 70

ISBN-13: 1108879837

DOWNLOAD EBOOK

In the age of data-driven problem-solving, applying sophisticated computational tools for explaining substantive phenomena is a valuable skill. Yet, application of methods assumes an understanding of the data, structure, and patterns that influence the broader research program. This Element offers researchers and teachers an introduction to clustering, which is a prominent class of unsupervised machine learning for exploring and understanding latent, non-random structure in data. A suite of widely used clustering techniques is covered in this Element, in addition to R code and real data to facilitate interaction with the concepts. Upon setting the stage for clustering, the following algorithms are detailed: agglomerative hierarchical clustering, k-means clustering, Gaussian mixture models, and at a higher-level, fuzzy C-means clustering, DBSCAN, and partitioning around medoids (k-medoids) clustering.


Modern Data Science with R

Modern Data Science with R

Author: Benjamin S. Baumer

Publisher: CRC Press

Published: 2021-03-31

Total Pages: 830

ISBN-13: 0429575394

DOWNLOAD EBOOK

From a review of the first edition: "Modern Data Science with R... is rich with examples and is guided by a strong narrative voice. What’s more, it presents an organizing framework that makes a convincing argument that data science is a course distinct from applied statistics" (The American Statistician). Modern Data Science with R is a comprehensive data science textbook for undergraduates that incorporates statistical and computational thinking to solve real-world data problems. Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in the state-of-the-art R/RStudio computing environment can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling questions. The second edition is updated to reflect the growing influence of the tidyverse set of packages. All code in the book has been revised and styled to be more readable and easier to understand. New functionality from packages like sf, purrr, tidymodels, and tidytext is now integrated into the text. All chapters have been revised, and several have been split, re-organized, or re-imagined to meet the shifting landscape of best practice.


Sufficient Dimension Reduction

Sufficient Dimension Reduction

Author: Bing Li

Publisher: CRC Press

Published: 2018-04-27

Total Pages: 362

ISBN-13: 1351645730

DOWNLOAD EBOOK

Sufficient dimension reduction is a rapidly developing research field that has wide applications in regression diagnostics, data visualization, machine learning, genomics, image processing, pattern recognition, and medicine, because they are fields that produce large datasets with a large number of variables. Sufficient Dimension Reduction: Methods and Applications with R introduces the basic theories and the main methodologies, provides practical and easy-to-use algorithms and computer codes to implement these methodologies, and surveys the recent advances at the frontiers of this field. Features Provides comprehensive coverage of this emerging research field. Synthesizes a wide variety of dimension reduction methods under a few unifying principles such as projection in Hilbert spaces, kernel mapping, and von Mises expansion. Reflects most recent advances such as nonlinear sufficient dimension reduction, dimension folding for tensorial data, as well as sufficient dimension reduction for functional data. Includes a set of computer codes written in R that are easily implemented by the readers. Uses real data sets available online to illustrate the usage and power of the described methods. Sufficient dimension reduction has undergone momentous development in recent years, partly due to the increased demands for techniques to process high-dimensional data, a hallmark of our age of Big Data. This book will serve as the perfect entry into the field for the beginning researchers or a handy reference for the advanced ones. The author Bing Li obtained his Ph.D. from the University of Chicago. He is currently a Professor of Statistics at the Pennsylvania State University. His research interests cover sufficient dimension reduction, statistical graphical models, functional data analysis, machine learning, estimating equations and quasilikelihood, and robust statistics. He is a fellow of the Institute of Mathematical Statistics and the American Statistical Association. He is an Associate Editor for The Annals of Statistics and the Journal of the American Statistical Association.