Estimation and Inference in High-dimensional Models

Estimation and Inference in High-dimensional Models

Author: Mojtaba Sahraee Ardakan

Publisher:

Published: 2022

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

A wide variety of problems that are encountered in different fields can be formulated as an inference problem. Common examples of such inference problems include estimating parameters of a model from some observations, inverse problems where an unobserved signal is to be estimated based on a given model and some measurements, or a combination of the two where hidden signals along with some parameters of the model are to be estimated jointly. For example, various tasks in machine learning such as image inpainting and super-resolution can be cast as an inverse problem over deep neural networks. Similarly, in computational neuroscience, a common task is to estimate the parameters of a nonlinear dynamical system from neuronal activities. Despite wide application of different models and algorithms to solve these problems, our theoretical understanding of how these algorithms work is often incomplete. In this work, we try to bridge the gap between theory and practice by providing theoretical analysis of three different estimation problems. First, we consider the problem of estimating the input and hidden layer signals in a given multi-layer stochastic neural network with all the signals being matrix valued. Various problems such as multitask regression and classification, and inverse problems that use deep generative priors can be modeled as inference problem over multi-layer neural networks. We consider different types of estimators for such problems and exactly analyze the performance of these estimators in a certain high-dimensional regime known as the large system limit. Our analysis allows us to obtain the estimation error of all the hidden signals in the deep neural network as expectations over low-dimensional random variables that are characterized via a set of equations called the state evolution. Next, we analyze the problem of estimating a signal from convolutional observations via ridge estimation. Such convolutional inverse problems arise naturally in several fields such as imaging and seismology. The shared weights of the convolution operator introduces dependencies in the observations that makes analysis of such estimators difficult. By looking at the problem in the Fourier domain and using results about Fourier transform of a class of random processes, we show that this problem can be reduced to analysis of multiple ordinary ridge estimators, one for each frequency. This allows us to write the estimation error of the ridge estimator as an integral that depends on the spectrum of the underlying random process that generates the input features. Finally, we conclude this work by considering the problem of estimating the parameters of a multi-dimensional autoregressive generalized linear model with discrete values. Such processes take a linear combination of the past outputs of the process as the mean parameter of a generalized linear model that generates the future values. The coefficients of the linear combination are the parameters of the model and we seek to estimate these parameters under the assumption that they are sparse. This model can be used for example to model the spiking activity of neurons. In this problem, we obtain a high-probability upper bound for the estimation error of the parameters. Our experiments further support these theoretical results.


Financial Mathematics, Volatility and Covariance Modelling

Financial Mathematics, Volatility and Covariance Modelling

Author: Julien Chevallier

Publisher: Routledge

Published: 2019-06-28

Total Pages: 381

ISBN-13: 1351669095

DOWNLOAD EBOOK

This book provides an up-to-date series of advanced chapters on applied financial econometric techniques pertaining the various fields of commodities finance, mathematics & stochastics, international macroeconomics and financial econometrics. Financial Mathematics, Volatility and Covariance Modelling: Volume 2 provides a key repository on the current state of knowledge, the latest debates and recent literature on financial mathematics, volatility and covariance modelling. The first section is devoted to mathematical finance, stochastic modelling and control optimization. Chapters explore the recent financial crisis, the increase of uncertainty and volatility, and propose an alternative approach to deal with these issues. The second section covers financial volatility and covariance modelling and explores proposals for dealing with recent developments in financial econometrics This book will be useful to students and researchers in applied econometrics; academics and students seeking convenient access to an unfamiliar area. It will also be of great interest established researchers seeking a single repository on the current state of knowledge, current debates and relevant literature.


On the Inference about the Spectral Distribution of High-Dimensional Covariance Matrix Based on High-Frequency Noisy Observations

On the Inference about the Spectral Distribution of High-Dimensional Covariance Matrix Based on High-Frequency Noisy Observations

Author: Ningning Xia

Publisher:

Published: 2017

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

In practice, observations are often contaminated by noise, making the resulting sample covariance matrix a signal-plus-noise sample covariance matrix. Aiming to make inferences about the spectral distribution of the population covariance matrix under such a situation, we establish an asymptotic relationship that describes how the limiting spectral distribution of (signal) sample covariance matrices depends on that of signal-plus-noise-type sample covariance matrices. As an application, we consider inferences about the spectral distribution of integrated covolatility (ICV) matrices of high-dimensional diffusion processes based on high-frequency data with microstructure noise. The (slightly modified) pre-averaging estimator is a signal-plus-noise sample covariance matrix, and the aforementioned result, together with a (generalized) connection between the spectral distribution of signal sample covariance matrices and that of the population covariance matrix, enables us to propose a two-step procedure to consistently estimate the spectral distribution of ICV for a class of diffusion processes. An alternative approach is further proposed, which possesses several desirable properties: it is more robust, it eliminates the effects of microstructure noise, and the asymptotic relationship that enables consistent estimation of the spectral distribution of ICV is the standard Mar v{c}enko-Pastur equation. The performance of the two approaches is examined via simulation studies under both synchronous and asynchronous observation settings.


Scaling Multidimensional Inference for Big Structured Data

Scaling Multidimensional Inference for Big Structured Data

Author: Elad Gilboa

Publisher:

Published: 2014

Total Pages: 139

ISBN-13:

DOWNLOAD EBOOK

"In information technology, big data is a collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications" [151]. In a world of increasing sensor modalities, cheaper storage, and more data oriented questions, we are quickly passing the limits of tractable computations using traditional statistical analysis methods. Methods which often show great results on simple data have difficulties processing complicated multidimensional data. Accuracy alone can no longer justify unwarranted memory use and computational complexity. Improving the scaling properties of these methods for multidimensional data is the only way to make these methods relevant. In this work we explore methods for improving the scaling properties of parametric and nonparametric models. Namely, we focus on the structure of the data to lower the complexity of a specific family of problems. The two types of structures considered in this work are distributive optimization with separable constraints (Chapters 2-3), and scaling Gaussian processes for multidimensional lattice input (Chapters 4-5). By improving the scaling of these methods, we can expand their use to a wide range of applications which were previously intractable open the door to new research questions.


Statistical Inference for High Dimensional Models

Statistical Inference for High Dimensional Models

Author: Shijie Cui

Publisher:

Published: 2022

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Statistical inference under high dimensional modelings has attracted much attention due to its wide applications in many fields. In this dissertation, I propose new methods for statistical inference in high dimensional models from three aspects: inference in high dimensional semiparametric models, inference in high dimensional matrix-valued data, and inference in high dimensional measurement error misspecified models. The first project studied statistical inference in high dimensional partially linear single index models. Firstly a profile partial penalized least squares estimator for parameter estimates for the model is proposed, and its asymptotic properties are given. Then an F-type test statistic for testing the parametric components is proposed, and its theoretical properties are established. I then propose a new test for the specification testing problem of the nonparametric components. Finally, simulation studies and empirical analysis of a real-world data set are conducted to illustrate the performance of the proposed testing procedure. The second project proposes new testing procedures in high dimensional matrix-valued data. Rank is an essential attribute for a matrix. A new type of statistic is proposed, which can make inferences on the rank of the matrix-valued data. I firstly give the theoretical property of its oracle version. To overcome the problem of empirical error accumulation, a new type of sparse SVD method is proposed, and its theoretical properties are given. Based on the newly proposed sparse SVD method, I provide a sample version statistic. Theoretical properties of this sample version statistic are given. Simulation studies and two applications to surveillance video data are provided to illustrate the performance of our newly proposed method. The third project proposes a new testing method in misspecified measurement error models. The testing method can work when there is potential model misspecification and measurement error in the model. Firstly its property is studied under the low dimensional setting. Then I develop it to the high dimensional setting. Further, I propose a method that can be adaptive to the sparsity level of the true parameters under the high dimensional setting. Simulation studies and one application to a clinical trial data set are given.


High-Frequency Financial Econometrics

High-Frequency Financial Econometrics

Author: Yacine Aït-Sahalia

Publisher: Princeton University Press

Published: 2014-07-21

Total Pages: 683

ISBN-13: 0691161437

DOWNLOAD EBOOK

A comprehensive introduction to the statistical and econometric methods for analyzing high-frequency financial data High-frequency trading is an algorithm-based computerized trading practice that allows firms to trade stocks in milliseconds. Over the last fifteen years, the use of statistical and econometric methods for analyzing high-frequency financial data has grown exponentially. This growth has been driven by the increasing availability of such data, the technological advancements that make high-frequency trading strategies possible, and the need of practitioners to analyze these data. This comprehensive book introduces readers to these emerging methods and tools of analysis. Yacine Aït-Sahalia and Jean Jacod cover the mathematical foundations of stochastic processes, describe the primary characteristics of high-frequency financial data, and present the asymptotic concepts that their analysis relies on. Aït-Sahalia and Jacod also deal with estimation of the volatility portion of the model, including methods that are robust to market microstructure noise, and address estimation and testing questions involving the jump part of the model. As they demonstrate, the practical importance and relevance of jumps in financial data are universally recognized, but only recently have econometric methods become available to rigorously analyze jump processes. Aït-Sahalia and Jacod approach high-frequency econometrics with a distinct focus on the financial side of matters while maintaining technical rigor, which makes this book invaluable to researchers and practitioners alike.


Simultaneous Inference for High Dimensional and Correlated Data

Simultaneous Inference for High Dimensional and Correlated Data

Author: Afroza Polin

Publisher:

Published: 2019

Total Pages: 100

ISBN-13:

DOWNLOAD EBOOK

In high dimensional data, the number of covariates is larger than the sample size, which makes the estimation process challenging. We consider a high-dimensional and longitudinal data where at each time point, the number of covariates is much higher than the number of subjects. We consider two different settings of longitudinal data. First, we consider that the samples at different time points are generated from different populations. Second, we consider that the samples at different time points are generated from a multivariate distribution. In both cases, the number of covariates is much larger than the sample size and the standard least square methods are not applicable.In longitudinal study, our main focus is in the changes of the mean responses over the time and how these changes are related to the explanatory variables. Thus we are interested in testing the effect of the covariates over the time points simultaneously. In the first scenario, we use lasso at each time point to regress the response on the explanatory variables. Along with estimating the regression coefficients lasso also does dimension reduction. We use de-biased lasso for inference. To adjust the multiplicity effect in simultaneous testing we apply Bonferroni, Holm's, Hochberg's and the coherent stepwise procedures. In the second scenario, the samples at different time points are generated from a multivariate distribution and the dimension of the multivariate distribution is equal to the number of time points. We use lasso and de-biased lasso for inferences. To adjust the multiplicity effect in simultaneous testing, we use Bonferroni, Holm's, Hochberg's and stepwise procedures. We provide theoretical details that Bonferroni, Holm's step-down and the coherent step-wise procedures controls the family-wise error rate in strong sense for de-biased lasso estimators. While Hochberg's procedure provides a strong control of family-wise error rate only for independent or positively correlated test statistics.


Inference Methods for High-Dimensional Data

Inference Methods for High-Dimensional Data

Author: Zhe Zhang

Publisher:

Published: 2023

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

This dissertation aims to develop new statistical inference procedure for high-dimensional regression models, and focuses on three fundamental problems: (a) individual hypothesis testing without specification of high-dimensional regression models, (b) high dimensional linear hypothesis testing in linear regression model and (c) individual hypothesis testing in partial linear model . In Chapter 3, we propose an effective model-free inference procedure for high-dimensional regression models. We first reformulate the hypothesis testing problem via sufficient dimension reduction framework. With the aid of new reformulation, we propose a new test statistic and show that its asymptotic distribution is $\chi^2$ distribution whose degree of freedom does not depend on the unknown population distribution. We further conduct power analysis under local alternative hypotheses. In addition, we study how to control the false discovery rate of the proposed chi-squared tests, which are correlated, to identify important predictors under a model-free framework. To this end, we propose a multiple testing procedure and establish its theoretical guarantees. Monte Carlo simulation studies are conducted to assess the performance of the proposed tests and an empirical analysis of a real-world data set is used to illustrate the proposed methodology. In Chapter 4, we present a novel transformation-based inference method for conducting linear hypothesis tests in high-dimensional linear regression models. Our method uses score functions to construct a new random vector and links high-dimensional coefficient tests to high-dimensional one sample mean tests. We provide a formulation for a U-statistic with a kernel of order two and demonstrate its asymptotic normality. The presence of high-dimensional nuisance parameters presents a significant challenge in our model setting, however, we have shown that their impact can be disregarded asymptotically under mild conditions. Additionally, we have studied the influence of the power enhancement term on power performance through both theoretical analysis and simulations. The results indicate that the enhancement term does not impact the type-I error rate and can improve power performance in scenarios where the U-statistic may not perform well. In Chapter 5, we consider testing the treatment effect in high-dimensional partial linear models. Due to the slow convergence rate of the unknown nuisance function estimator from some machine learning algorithms, we can not directly estimate and plug in the nuisance function on the same data. To overcome this limitation, we update the estimation of the nuisance function recursively. This leads to an explicit expression of the estimators of the parameters of interest. Our approach has been shown to have asymptotic normality, and we assess its finite sample performance through simulations. The results indicate that our statistic offers higher power than in cases of model misspecification.