Distributed and Error Resilient Convex Optimization Formulations in Machine Learning

Distributed and Error Resilient Convex Optimization Formulations in Machine Learning

Author: Burak Bartan

Publisher:

Published: 2022

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Neural networks have been very successful across many domains in machine learning. Training neural networks typically requires minimizing a high-dimensional non-convex function. Stochastic gradient descent and variants are often used in practice for training neural networks. In this thesis, we describe convex optimization formulations for optimally training neural networks with polynomial activation functions. More specifically, we present semidefinite programming formulations for training neural networks with second degree polynomial activations and show that its solution provides a globally optimal solution to the original non-convex training problem. We then extend this strategy to train quantized neural networks with integer weights. We show that we can globally optimize the training loss with respect to integer weights in polynomial time via semidefinite relaxations and randomized rounding. In the second part of the thesis, we describe a distributed computing and optimization framework to train models, including our convex neural networks. The proposed second order optimization methods in this part rely on approximating the Hessian matrix via random projections. In particular, we describe how to employ randomized sketches in reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems. We present novel approximation guarantees as well as closed-form expressions for debiasing the update directions of the optimization algorithm. Finally, we establish a novel connection between randomized sketching and coded computation. The proposed approach builds on polar codes for straggler-resilient distributed computing.


Introduction to Online Convex Optimization, second edition

Introduction to Online Convex Optimization, second edition

Author: Elad Hazan

Publisher: MIT Press

Published: 2022-09-06

Total Pages: 249

ISBN-13: 0262046989

DOWNLOAD EBOOK

New edition of a graduate-level textbook on that focuses on online convex optimization, a machine learning framework that views optimization as a process. In many practical applications, the environment is so complex that it is not feasible to lay out a comprehensive theoretical model and use classical algorithmic theory and/or mathematical optimization. Introduction to Online Convex Optimization presents a robust machine learning approach that contains elements of mathematical optimization, game theory, and learning theory: an optimization method that learns from experience as more aspects of the problem are observed. This view of optimization as a process has led to some spectacular successes in modeling and systems that have become part of our daily lives. Based on the “Theoretical Machine Learning” course taught by the author at Princeton University, the second edition of this widely used graduate level text features: Thoroughly updated material throughout New chapters on boosting, adaptive regret, and approachability and expanded exposition on optimization Examples of applications, including prediction from expert advice, portfolio selection, matrix completion and recommendation systems, SVM training, offered throughout Exercises that guide students in completing parts of proofs


Optimization Algorithms for Distributed Machine Learning

Optimization Algorithms for Distributed Machine Learning

Author: Gauri Joshi

Publisher: Springer Nature

Published: 2022-11-25

Total Pages: 137

ISBN-13: 303119067X

DOWNLOAD EBOOK

This book discusses state-of-the-art stochastic optimization algorithms for distributed machine learning and analyzes their convergence speed. The book first introduces stochastic gradient descent (SGD) and its distributed version, synchronous SGD, where the task of computing gradients is divided across several worker nodes. The author discusses several algorithms that improve the scalability and communication efficiency of synchronous SGD, such as asynchronous SGD, local-update SGD, quantized and sparsified SGD, and decentralized SGD. For each of these algorithms, the book analyzes its error versus iterations convergence, and the runtime spent per iteration. The author shows that each of these strategies to reduce communication or synchronization delays encounters a fundamental trade-off between error and runtime.


Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Author: Stephen Boyd

Publisher: Now Publishers Inc

Published: 2011

Total Pages: 138

ISBN-13: 160198460X

DOWNLOAD EBOOK

Surveys the theory and history of the alternating direction method of multipliers, and discusses its applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others.


Distributed Optimization and Learning

Distributed Optimization and Learning

Author: Zhongguo Li

Publisher: Elsevier

Published: 2024-08-06

Total Pages: 288

ISBN-13: 0443216371

DOWNLOAD EBOOK

Distributed Optimization and Learning: A Control-Theoretic Perspective illustrates the underlying principles of distributed optimization and learning. The book presents a systematic and self-contained description of distributed optimization and learning algorithms from a control-theoretic perspective. It focuses on exploring control-theoretic approaches and how those approaches can be utilized to solve distributed optimization and learning problems over network-connected, multi-agent systems. As there are strong links between optimization and learning, this book provides a unified platform for understanding distributed optimization and learning algorithms for different purposes. Provides a series of the latest results, including but not limited to, distributed cooperative and competitive optimization, machine learning, and optimal resource allocation Presents the most recent advances in theory and applications of distributed optimization and machine learning, including insightful connections to traditional control techniques Offers numerical and simulation results in each chapter in order to reflect engineering practice and demonstrate the main focus of developed analysis and synthesis approaches


Communication-efficient and Fault-tolerant Algorithms for Distributed Machine Learning

Communication-efficient and Fault-tolerant Algorithms for Distributed Machine Learning

Author: Farzin Haddadpour

Publisher:

Published: 2021

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Distributed computing over multiple nodes has been emerging in practical systems. Comparing to the classical single node computation, distributed computing offers higher computing speeds over large data. However, the computation delay of the overall distributed system is controlled by its slower nodes, i.e., straggler nodes. Furthermore, if we want to run iterative algorithms such as gradient descent based algorithms communication cost becomes a bottleneck. Therefore, it is important to design coded strategies while they are prone to these straggler nodes, at the same time they are communication-efficient. Recent work has developed coding theoretic approaches to add redundancy to distributed matrix-vector multiplications with the goal of speeding up the computation by mitigating the straggler effect in distributed computing. First, we consider the case where the matrix comes from a small (e.g., binary) alphabet, where a variant of a popular method called the ``Four-Russians method'' is known to have significantly lower computational complexity as compared with the usual matrix-vector multiplication algorithm. We develop novel code constructions that are applicable to binary matrix-vector multiplication {via a variant of the Four-Russians method called the Mailman algorithm}. Specifically, in our constructions, the encoded matrices have a low alphabet that ensures lower computational complexity, as well as good straggler tolerance. We also present a trade-off between the communication and computation cost of distributed coded matrix-vector multiplication {for general, possibly non-binary, matrices.} Second, we provide novel coded computation strategies, called MatDot, for distributed matrix-matrix products that outperform the recent ``Polynomial code'' constructions in recovery threshold, i.e., the required number of successful workers at the cost of higher computation cost per worker and higher communication cost from each worker to the fusion node. We also demonstrate a novel coding technique for multiplying $n$ matrices ($n \geq 3$) using ideas from MatDot codes. Third, we introduce the idea of \emph{cross-iteration coded computing}, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions. The state-of-the-art approach for these iterative algorithms involves performing one iteration of the algorithm per round of communication among the nodes. In contrast, our approach performs multiple iterations of the underlying algorithm in a single round of communication by incorporating some redundancy storage and computation. Our algorithm works in the master-worker setting with the workers storing carefully constructed linear transformations of input matrices and using these matrices in an iterative algorithm, with the master node inverting the effect of these linear transformations. In addition to reduced communication costs, a trivial generalization of our algorithm also includes resilience to stragglers and failures as well as Byzantine worker nodes. We also show a special case of our algorithm that trades-off between communication and computation. The degree of redundancy of our algorithm can be tuned based on the amount of communication and straggler resilience required. Moreover, we also describe a variant of our algorithm that can flexibly recover the results based on the degree of straggling in the worker nodes. The variant allows for the performance to degrade gracefully as the number of successful (non-straggling) workers is lowered. Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has been a great deal of research to alleviate communication cost by compressing the gradient vector or using local updates and periodic model averaging. Next direction in this thesis, is to advocate the use of redundancy towards communication-efficient distributed stochastic algorithms for non-convex optimization. In particular, we, both theoretically and practically, show that by properly infusing redundancy to the training data with model averaging, it is possible to significantly reduce the number of communication rounds. To be more precise, we show that redundancy reduces residual error in local averaging, thereby reaching the same level of accuracy with fewer rounds of communication as compared with previous algorithms. Empirical studies on CIFAR10, CIFAR100 and ImageNet datasets in a distributed environment complement our theoretical results; they show that our algorithms have additional beneficial aspects including tolerance to failures, as well as greater gradient diversity. Next, we study local distributed SGD, where data is partitioned among computation nodes, and the computation nodes perform local updates with periodically exchanging the model among the workers to perform averaging. While local SGD is empirically shown to provide promising results, a theoretical understanding of its performance remains open. We strengthen convergence analysis for local SGD, and show that local SGD can be far less expensive and applied far more generally than current theory suggests. Specifically, we show that for loss functions that satisfy the \pl~condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker. This is in contrast with previous work which required higher number of communication rounds, as well as was limited to strongly convex loss functions, for a similar asymptotic performance. We also develop an adaptive synchronization scheme that provides a general condition for linear speed up. We also validate the theory with experimental results, running over AWS EC2 clouds and an internal GPU cluster. In final section, we focus on Federated learning where communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions. Two notable trends to deal with the communication overhead of federated algorithms are \emph{gradient compression} and \emph{local computation with periodic communication}. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a set of algorithms with periodical compressed (quantized or sparsified) communication and analyze their convergence properties in both homogeneous and heterogeneous local data distributions settings. For the homogeneous setting, our analysis improves existing bounds by providing tighter convergence rates for both \emph{strongly convex} and \emph{non-convex} objective functions. To mitigate data heterogeneity, we introduce a \emph{local gradient tracking} scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings. We complement our theoretical results by demonstrating the effectiveness of our proposed methods on real-world datasets.


Convex Optimization for Machine Learning

Convex Optimization for Machine Learning

Author: Changho Suh

Publisher:

Published: 2022-09-27

Total Pages: 0

ISBN-13: 9781638280521

DOWNLOAD EBOOK

This book covers an introduction to convex optimization, one of the powerful and tractable optimization problems that can be efficiently solved on a computer. The goal of the book is to help develop a sense of what convex optimization is, and how it can be used in a widening array of practical contexts with a particular emphasis on machine learning. The first part of the book covers core concepts of convex sets, convex functions, and related basic definitions that serve understanding convex optimization and its corresponding models. The second part deals with one very useful theory, called duality, which enables us to: (1) gain algorithmic insights; and (2) obtain an approximate solution to non-convex optimization problems which are often difficult to solve. The last part focuses on modern applications in machine learning and deep learning. A defining feature of this book is that it succinctly relates the "story" of how convex optimization plays a role, via historical examples and trending machine learning applications. Another key feature is that it includes programming implementation of a variety of machine learning algorithms inspired by optimization fundamentals, together with a brief tutorial of the used programming tools. The implementation is based on Python, CVXPY, and TensorFlow. This book does not follow a traditional textbook-style organization, but is streamlined via a series of lecture notes that are intimately related, centered around coherent themes and concepts. It serves as a textbook mainly for a senior-level undergraduate course, yet is also suitable for a first-year graduate course. Readers benefit from having a good background in linear algebra, some exposure to probability, and basic familiarity with Python.


On the Analysis of Data-driven and Distributed Algorithms for Convex Optimization Problems

On the Analysis of Data-driven and Distributed Algorithms for Convex Optimization Problems

Author: Hesamoddin Ahmadi

Publisher:

Published: 2016

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

This dissertation considers the resolution of three optimization problems. Of these, the first two problems are closely related and focus on solving optimization problems in which the problem parameters are misspecified but can be estimated through a parallel learning problem. The last problem focuses on the development of a distributed algorithm for the direct current formulation of the optimal power flow problem arising in power systems operation. Next, we provide a short description of each part of this dissertation. The first part of this work considers a misspecified optimization problem that requires minimizing a function f(x;q*) over a closed and convex set X where q* is an unknown vector of parameters that may be learnt by a parallel learning process. In this context, we examine the development of coupled schemes that generate iterates (x_k,q_k) as k goes to infinity, then x_k converges to x*, a minimizer of f(x;q*) over X and q_k converges to q*. In the first part of the work, we consider the solution of problems where f is either smooth or nonsmooth. In smooth strongly convex regimes, we demonstrate that such schemes still display a linear rate of convergence, albeit with inferior constants. When strong convexity assumptions are weakened, it can be shown that the convergence in function values sees a modification in the convergence rate of O(1/K) by an additive factor


Distributed Algorithms to Convex Optimization Problems

Distributed Algorithms to Convex Optimization Problems

Author: Peng Wang

Publisher:

Published: 2017

Total Pages: 112

ISBN-13: 9780355754285

DOWNLOAD EBOOK

This dissertation studies first a distributed algorithm to solve general convex optimization problems and then designs distributed algorithms to solve special optimization problems related to a system of linear equations.