Efficient Second-order Methods for Machine Learning

Efficient Second-order Methods for Machine Learning

Author: Peng Xu

Publisher:

Published: 2018

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Due to the large-scale nature of many modern machine learning applications, including but not limited to deep learning problems, people have been focusing on studying and developing efficient optimization algorithms. Most of these are first-order methods which use only gradient information. The conventional wisdom in the machine learning community is that second-order methods that use Hessian information are inappropriate to use since they can not be efficient. In this thesis, we consider second-order optimization methods: we develop new sub-sampled Newton-type algorithms for both convex and non-convex optimization problems; we prove that they are efficient and scalable; and we provide a detailed empirical evaluation of their scalability as well as usefulness. In the convex setting, we present a subsampled Newton-type algorithm (SSN) that exploits non-uniform subsampling Hessians as well as inexact updates to reduce the computational complexity. Theoretically we show that our algorithms achieve a linear-quadratic convergence rate and empirically we demonstrate the efficiency of our methods on several real datasets. In addition, we extend our methods into a distributed setting and propose a distributed Newton-type method, Globally Improved Approximate NewTon method (GIANT). Theoretically we show that GIANT is highly communication efficient comparing with existing distributed optimization algorithms. Empirically we demonstrate the scalability and efficiency of GIANT in Spark. In the non-convex setting, we consider two classic non-convex Newton-type methods -- Trust Region method (TR) and Cubic Regularization method (CR). We relax the Hessian approximation condition that has been assumed in the existing works of using inexact Hessian for those algorithms. Under the relaxed Hessian approximation condition, we show that worst-case iteration complexities to converge an approximate second-order stationary point are retained for both methods. Using the similar idea of SSN, we present the sub-sampled TR and CR methods along with the sampling complexities to achieve the Hessian approximation condition. To understand the empirical performances of those methods, we conduct an extensive empirical study on some non-convex machine learning problems and showcase the efficiency and robustness of these Newton-type methods under various settings.


First-order and Stochastic Optimization Methods for Machine Learning

First-order and Stochastic Optimization Methods for Machine Learning

Author: Guanghui Lan

Publisher: Springer Nature

Published: 2020-05-15

Total Pages: 591

ISBN-13: 3030395685

DOWNLOAD EBOOK

This book covers not only foundational materials but also the most recent progresses made during the past few years on the area of machine learning algorithms. In spite of the intensive research and development in this area, there does not exist a systematic treatment to introduce the fundamental concepts and recent progresses on machine learning algorithms, especially on those based on stochastic optimization methods, randomized algorithms, nonconvex optimization, distributed and online learning, and projection free methods. This book will benefit the broad audience in the area of machine learning, artificial intelligence and mathematical programming community by presenting these recent developments in a tutorial style, starting from the basic building blocks to the most carefully designed and complicated algorithms for machine learning.


Optimization for Machine Learning

Optimization for Machine Learning

Author: Suvrit Sra

Publisher: MIT Press

Published: 2011-09-30

Total Pages: 509

ISBN-13: 0262297892

DOWNLOAD EBOOK

An up-to-date account of the interplay between optimization and machine learning, accessible to students and researchers in both communities. The interplay between optimization and machine learning is one of the most important developments in modern computational science. Optimization formulations and methods are proving to be vital in designing algorithms to extract essential knowledge from huge volumes of data. Machine learning, however, is not simply a consumer of optimization technology but a rapidly evolving field that is itself generating new optimization ideas. This book captures the state of the art of the interaction between optimization and machine learning in a way that is accessible to researchers in both fields. Optimization approaches have enjoyed prominence in machine learning because of their wide applicability and attractive theoretical properties. The increasing complexity, size, and variety of today's machine learning models call for the reassessment of existing assumptions. This book starts the process of reassessment. It describes the resurgence in novel contexts of established frameworks such as first-order methods, stochastic approximations, convex relaxations, interior-point methods, and proximal methods. It also devotes attention to newer themes such as regularized optimization, robust optimization, gradient and subgradient methods, splitting techniques, and second-order methods. Many of these techniques draw inspiration from other fields, including operations research, theoretical computer science, and subfields of optimization. The book will enrich the ongoing cross-fertilization between the machine learning community and these other fields, and within the broader optimization community.


Accelerated Optimization for Machine Learning

Accelerated Optimization for Machine Learning

Author: Zhouchen Lin

Publisher: Springer Nature

Published: 2020-05-29

Total Pages: 286

ISBN-13: 9811529108

DOWNLOAD EBOOK

This book on optimization includes forewords by Michael I. Jordan, Zongben Xu and Zhi-Quan Luo. Machine learning relies heavily on optimization to solve problems with its learning models, and first-order optimization algorithms are the mainstream approaches. The acceleration of first-order optimization algorithms is crucial for the efficiency of machine learning. Written by leading experts in the field, this book provides a comprehensive introduction to, and state-of-the-art review of accelerated first-order optimization algorithms for machine learning. It discusses a variety of methods, including deterministic and stochastic algorithms, where the algorithms can be synchronous or asynchronous, for unconstrained and constrained problems, which can be convex or non-convex. Offering a rich blend of ideas, theories and proofs, the book is up-to-date and self-contained. It is an excellent reference resource for users who are seeking faster optimization algorithms, as well as for graduate students and researchers wanting to grasp the frontiers of optimization in machine learning in a short time.


Stochastic Optimization for Large-Scale Machine Learning

Stochastic Optimization for Large-Scale Machine Learning

Author: Vinod Kumar Chauhan

Publisher:

Published: 2021-11

Total Pages:

ISBN-13: 9781032146140

DOWNLOAD EBOOK

"Stochastic Optimization for Large-scale Machine Learning identifies different areas of improvement and recent research directions to tackle the challenge. Developed optimisation techniques are also explored to improve machine learning algorithms based on data access and on first and second order optimisation methods. The book will be a valuable reference to practitioners and researchers as well as students in the field of machine learning"--


Optimization Methods for Structured Machine Learning Problems

Optimization Methods for Structured Machine Learning Problems

Author: Nikolaos Tsipinakis

Publisher:

Published: 2019

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Solving large-scale optimization problems lies at the core of modern machine learning applications. Unfortunately, obtaining a sufficiently accurate solution quickly is a difficult task. However, the problems we consider in many machine learning applications exhibit a particular structure. In this thesis we study optimization methods and improve their convergence behavior by taking advantage of such structures. In particular, this thesis constitutes of two parts: In the first part of the thesis, we consider the Temporal Difference learning (TD) problem in off-line Reinforcement Learning (RL). In off-line RL, it is typically the case that the number of samples is small compared to the number of features. Therefore, recent advances have focused on efficient algorithms to incorporate feature selection via `1-regularization which effectively avoids over-fitting. Unfortunately, the TD optimization problem reduces to a fixed-point problem where convexity of the objective function cannot be assumed. Further, it remains unclear whether existing algorithms have the ability to offer good approximations for the task of policy evaluation and improvement (either they are non-convergent or do not solve the fixed-point problem). In this part of the thesis, we attempt to solve the `1- regularized fixed-point problem with the help of Alternating Direction Method of Multipliers (ADMM) and we argue that the proposed method is well suited to the structure of the aforementioned fixed-point problem. In the second part of the thesis, we study multilevel methods for large-scale optimization and extend their theoretical analysis to self-concordant functions. In particular, we address the following issues that arise in the analysis of second-order optimization methods based either on sampling, randomization or sketching: (a) the analysis of the iterates is not scale-invariant and (b) lack of global fast convergence rates without restrictive assumptions. We argue that, with the analysis undertaken in this part of the thesis, the analysis of randomized second-order methods can be considered on-par with the analysis of the classical Newton method. Further, we demonstrate how our proposed method can exploit typical spectral structures of the Hessian that arise in machine learning applications to further improve the convergence rates.


Optimization Methods in Machine Learning: Theory and Applications

Optimization Methods in Machine Learning: Theory and Applications

Author: Ankan Saha

Publisher:

Published: 2013

Total Pages: 220

ISBN-13: 9781303423444

DOWNLOAD EBOOK

We look at the integral role played by convex optimization in various machine learning problems. Over the last few years there has been a lot of machine learning problems which have a (non)smooth convex optimization at its core. These problems generally call for fast first order iterative methods as obtaining the exact minimum is often impossible and second order methods or higher become prohibitively expensive even on moderately sized datasets. We look at a few such optimization problems that arise in different contexts and show that a class of smoothing strategies due to Nesterov can be applied to these seemingly very different problems to obtain theoretically faster rates of convergence than existing methods. Our experimental results validate the speed and efficacy of our methods and scale significantly well over a broad range of datasets. This thesis also explores an often used but understudied optimization algorithm, namely the cyclic coordinate descent method, and provides a novel theoretical analysis of the first non-asymptotic convergence rates of cyclic coordinate descent under certain assumptions. This work also sheds light on some of the recent advances in online convex optimization to minimize regret in the presence of smooth unknown functions. We also look at online learning from the point of view of stability and provide a new integral framework which encompasses the regret analysis of all existing algorithms as specific cases of this framework. We investigate related methods of analysis and the central role played by optimization in all these seemingly different but connected domains of machine learning research.


Multistrategy Learning

Multistrategy Learning

Author: Ryszard S. Michalski

Publisher: Springer Science & Business Media

Published: 1993-06-30

Total Pages: 174

ISBN-13: 9780792393740

DOWNLOAD EBOOK

Most machine learning research has been concerned with the development of systems that implememnt one type of inference within a single representational paradigm. Such systems, which can be called monostrategy learning systems, include those for empirical induction of decision trees or rules, explanation-based generalization, neural net learning from examples, genetic algorithm-based learning, and others. Monostrategy learning systems can be very effective and useful if learning problems to which they are applied are sufficiently narrowly defined. Many real-world applications, however, pose learning problems that go beyond the capability of monostrategy learning methods. In view of this, recent years have witnessed a growing interest in developing multistrategy systems, which integrate two or more inference types and/or paradigms within one learning system. Such multistrategy systems take advantage of the complementarity of different inference types or representational mechanisms. Therefore, they have a potential to be more versatile and more powerful than monostrategy systems. On the other hand, due to their greater complexity, their development is significantly more difficult and represents a new great challenge to the machine learning community. Multistrategy Learning contains contributions characteristic of the current research in this area.


Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning

Sample-Efficient Nonconvex Optimization Algorithms in Machine Learning and Reinforcement Learning

Author: Pan Xu

Publisher:

Published: 2021

Total Pages: 246

ISBN-13:

DOWNLOAD EBOOK

Machine learning and reinforcement learning have achieved tremendous success in solving problems in various real-world applications. Many modern learning problems boil down to a nonconvex optimization problem, where the objective function is the average or the expectation of some loss function over a finite or infinite dataset. Solving such nonconvex optimization problems, in general, can be NP-hard. Thus one often tackles such a problem through incremental steps based on the nature and the goal of the problem: finding a first-order stationary point, finding a second-order stationary point (or a local optimum), and finding a global optimum. With the size and complexity of the machine learning datasets rapidly increasing, it has become a fundamental challenge to design efficient and scalable machine learning algorithms that can improve the performance in terms of accuracy and save computational cost in terms of sample efficiency at the same time. Though many algorithms based on stochastic gradient descent have been developed and widely studied theoretically and empirically for nonconvex optimization, it has remained an open problem whether we can achieve the optimal sample complexity for finding a first-order stationary point and for finding local optima in nonconvex optimization. In this thesis, we start with the stochastic nested variance reduced gradient (SNVRG) algorithm, which is developed based on stochastic gradient descent methods and variance reduction techniques. We prove that SNVRG achieves the near-optimal convergence rate among its type for finding a first-order stationary point of a nonconvex function. We further build algorithms to efficiently find the local optimum of a nonconvex objective function by examining the curvature information at the stationary point found by SNVRG. With the ultimate goal of finding the global optimum in nonconvex optimization, we then provide a unified framework to analyze the global convergence of stochastic gradient Langevin dynamics-based algorithms for a nonconvex objective function. In the second part of this thesis, we generalize the aforementioned sample-efficient stochastic nonconvex optimization methods to reinforcement learning problems, including policy gradient, actor-critic, and Q-learning. For these problems, we propose novel algorithms and prove that they enjoy state-of-the-art theoretical guarantees on the sample complexity. The works presented in this thesis form an incomplete collection of the recent advances and developments of sample-efficient nonconvex optimization algorithms for both machine learning and reinforcement learning.