Efficient Machine Learning Software Stack from Algorithms to Compilation

Efficient Machine Learning Software Stack from Algorithms to Compilation

Author: Zixuan Jiang

Publisher:

Published: 2023

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Machine learning enables the extraction of knowledge from data and decision-making without explicit programming, achieving great success and revolutionizing many fields. These successes can be attributed to the continuous advancements in machine learning software and hardware, which have expanded the boundaries and facilitated breakthroughs in diverse applications. The machine learning software stack is a comprehensive collection of components used to solve problems with machine learning algorithms. It encompasses problem definitions, data processing, model and method designs, software frameworks, libraries, code optimization, and system management. This stack supports the entire life cycle of a machine learning project. The software stack allows the community to stand on the shoulders of previous great work and push the limit of machine learning, fostering innovation and enabling broader adoption of machine learning techniques in academia and industry. The software stack is usually divided into algorithm and compilation with distinct design principles. Algorithm design prioritizes task-related performance, while compilation focuses on execution time and resource consumption on hardware devices. Maintaining arithmetic equivalence is optional in algorithm design, but compulsory in compilation to ensure consistent results. The compilation is closer to hardware than algorithm design. Compilation engineers optimize for hardware specifications, while algorithm developers usually do not prioritize hardware-friendliness. Opportunities to enhance hardware efficiency exist in algorithm and compilation designs, as well as their interplay. Despite extensive innovations and improvements, efficiency in the machine learning software stack is a continuing challenge. Algorithm design proposes efficient model architectures and learning algorithms, while compilation design optimizes computation graphs and simplifies operations. However, there is still a gap between the demand for efficiency and the current solutions, driven by rapidly growing workloads, limited resources in specific machine learning applications, and the need for cross-layer design. Addressing these challenges requires interdisciplinary research and collaboration. Improving efficiency in the machine learning software stack will optimize performance and enhance the accessibility and applicability of machine learning technologies. In this dissertation, we focus on addressing these efficiency challenges from the perspectives of machine learning algorithms and compilation. We introduce three novel improvements that enhance the efficiency of mainstream machine learning algorithms. Firstly, effective gradient matching for dataset condensation generates a small insightful dataset, accelerating training and other related tasks. Additionally, NormSoftmax proposes to append a normalization layer to achieve fast and stable training in Transformers and classification models. Lastly, mixed precision hardware-aware neural architecture search combines mixed-precision quantization, neural architecture search, and hardware energy efficiency, resulting in significantly more efficient neural networks than using a single method. However, algorithmic efficiency alone is insufficient to fully exploit the potential in the machine learning software stack. We delve into and optimize the compilation processes with three techniques. Firstly, we simplify the layer normalization in the influential Transformers, obtaining two equivalent and efficient Transformer variants with alternative normalization types. Our proposed variants enable efficient training and inference of popular models like GPT and ViT. Secondly, we formulate and solve the scheduling problem for reversible neural architectures, finding the optimal training schedule that fully leverages the computation and memory resources on hardware accelerators. Lastly, optimizer fusion allows users to accelerate the training process in the eager execution mode of machine learning frameworks. It leverages the better locality on hardware and parallelism in the computation graphs. Throughout the dissertation, we emphasize the integration of efficient algorithms and compilation into a cohesive machine learning software stack. We also consider hardware properties to provide hardware-friendly software designs. We demonstrate the effectiveness of the proposed methods in algorithm and compilation through extensive experiments. Our approaches effectively reduce the time and energy required for both training and inference. Ultimately, our methods have the potential to empower machine learning practitioners and researchers to build more efficient, powerful, robust, scalable, and accessible machine learning solutions


Deep Learning Systems

Deep Learning Systems

Author: Andres Rodriguez

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 245

ISBN-13: 3031017692

DOWNLOAD EBOOK

This book describes deep learning systems: the algorithms, compilers, and processor components to efficiently train and deploy deep learning models for commercial applications. The exponential growth in computational power is slowing at a time when the amount of compute consumed by state-of-the-art deep learning (DL) workloads is rapidly growing. Model size, serving latency, and power constraints are a significant challenge in the deployment of DL models for many applications. Therefore, it is imperative to codesign algorithms, compilers, and hardware to accelerate advances in this field with holistic system-level and algorithm solutions that improve performance, power, and efficiency. Advancing DL systems generally involves three types of engineers: (1) data scientists that utilize and develop DL algorithms in partnership with domain experts, such as medical, economic, or climate scientists; (2) hardware designers that develop specialized hardware to accelerate the components in the DL models; and (3) performance and compiler engineers that optimize software to run more efficiently on a given hardware. Hardware engineers should be aware of the characteristics and components of production and academic models likely to be adopted by industry to guide design decisions impacting future hardware. Data scientists should be aware of deployment platform constraints when designing models. Performance engineers should support optimizations across diverse models, libraries, and hardware targets. The purpose of this book is to provide a solid understanding of (1) the design, training, and applications of DL algorithms in industry; (2) the compiler techniques to map deep learning code to hardware targets; and (3) the critical hardware features that accelerate DL systems. This book aims to facilitate co-innovation for the advancement of DL systems. It is written for engineers working in one or more of these areas who seek to understand the entire system stack in order to better collaborate with engineers working in other parts of the system stack. The book details advancements and adoption of DL models in industry, explains the training and deployment process, describes the essential hardware architectural features needed for today's and future models, and details advances in DL compilers to efficiently execute algorithms across various hardware targets. Unique in this book is the holistic exposition of the entire DL system stack, the emphasis on commercial applications, and the practical techniques to design models and accelerate their performance. The author is fortunate to work with hardware, software, data scientist, and research teams across many high-technology companies with hyperscale data centers. These companies employ many of the examples and methods provided throughout the book.


Compiling Algorithms for Heterogeneous Systems

Compiling Algorithms for Heterogeneous Systems

Author: Steven Bell

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 89

ISBN-13: 3031017587

DOWNLOAD EBOOK

Most emerging applications in imaging and machine learning must perform immense amounts of computation while holding to strict limits on energy and power. To meet these goals, architects are building increasingly specialized compute engines tailored for these specific tasks. The resulting computer systems are heterogeneous, containing multiple processing cores with wildly different execution models. Unfortunately, the cost of producing this specialized hardware—and the software to control it—is astronomical. Moreover, the task of porting algorithms to these heterogeneous machines typically requires that the algorithm be partitioned across the machine and rewritten for each specific architecture, which is time consuming and prone to error. Over the last several years, the authors have approached this problem using domain-specific languages (DSLs): high-level programming languages customized for specific domains, such as database manipulation, machine learning, or image processing. By giving up generality, these languages are able to provide high-level abstractions to the developer while producing high-performance output. The purpose of this book is to spur the adoption and the creation of domain-specific languages, especially for the task of creating hardware designs. In the first chapter, a short historical journey explains the forces driving computer architecture today. Chapter 2 describes the various methods for producing designs for accelerators, outlining the push for more abstraction and the tools that enable designers to work at a higher conceptual level. From there, Chapter 3 provides a brief introduction to image processing algorithms and hardware design patterns for implementing them. Chapters 4 and 5 describe and compare Darkroom and Halide, two domain-specific languages created for image processing that produce high-performance designs for both FPGAs and CPUs from the same source code, enabling rapid design cycles and quick porting of algorithms. The final section describes how the DSL approach also simplifies the problem of interfacing between application code and the accelerator by generating the driver stack in addition to the accelerator configuration. This book should serve as a useful introduction to domain-specialized computing for computer architecture students and as a primer on domain-specific languages and image processing hardware for those with more experience in the field.


Explainable Machine Learning Models and Architectures

Explainable Machine Learning Models and Architectures

Author: Suman Lata Tripathi

Publisher: John Wiley & Sons

Published: 2023-10-03

Total Pages: 277

ISBN-13: 1394185847

DOWNLOAD EBOOK

EXPLAINABLE MACHINE LEARNING MODELS AND ARCHITECTURES This cutting-edge new volume covers the hardware architecture implementation, the software implementation approach, and the efficient hardware of machine learning applications. Machine learning and deep learning modules are now an integral part of many smart and automated systems where signal processing is performed at different levels. Signal processing in the form of text, images, or video needs large data computational operations at the desired data rate and accuracy. Large data requires more use of integrated circuit (IC) area with embedded bulk memories that further lead to more IC area. Trade-offs between power consumption, delay and IC area are always a concern of designers and researchers. New hardware architectures and accelerators are needed to explore and experiment with efficient machine-learning models. Many real-time applications like the processing of biomedical data in healthcare, smart transportation, satellite image analysis, and IoT-enabled systems have a lot of scope for improvements in terms of accuracy, speed, computational powers, and overall power consumption. This book deals with the efficient machine and deep learning models that support high-speed processors with reconfigurable architectures like graphic processing units (GPUs) and field programmable gate arrays (FPGAs), or any hybrid system. Whether for the veteran engineer or scientist working in the field or laboratory, or the student or academic, this is a must-have for any library.


How Machine Learning is Innovating Today's World

How Machine Learning is Innovating Today's World

Author: Arindam Dey

Publisher: John Wiley & Sons

Published: 2024-06-18

Total Pages: 489

ISBN-13: 1394214138

DOWNLOAD EBOOK

Provides a comprehensive understanding of the latest advancements and practical applications of machine learning techniques. Machine learning (ML), a branch of artificial intelligence, has gained tremendous momentum in recent years, revolutionizing the way we analyze data, make predictions, and solve complex problems. As researchers and practitioners in the field, the editors of this book recognize the importance of disseminating knowledge and fostering collaboration to further advance this dynamic discipline. How Machine Learning is Innovating Today's World is a timely book and presents a diverse collection of 25 chapters that delve into the remarkable ways that ML is transforming various fields and industries. It provides a comprehensive understanding of the practical applications of ML techniques. The wide range of topics include: An analysis of various tokenization techniques and the sequence-to-sequence model in natural language processing explores the evaluation of English language readability using ML models a detailed study of text analysis for information retrieval through natural language processing the application of reinforcement learning approaches to supply chain management the performance analysis of converting algorithms to source code using natural language processing in Java presents an alternate approach to solving differential equations utilizing artificial neural networks with optimization techniques a comparative study of different techniques of text-to-SQL query conversion the classification of livestock diseases using ML algorithms ML in image enhancement techniques the efficient leader selection for inter-cluster flying ad-hoc networks a comprehensive survey of applications powered by GPT-3 and DALL-E recommender systems' domain of application reviews mood detection, emoji generation, and classification using tokenization and CNN variations of the exam scheduling problem using graph coloring the intersection of software engineering and machine learning applications explores ML strategies for indeterminate information systems in complex bipolar neutrosophic environments ML applications in healthcare, in battery management systems, and the rise of AI-generated news videos how to enhance resource management in precision farming through AI-based irrigation optimization. Audience The book will be extremely useful to professionals, post-graduate research scholars, policymakers, corporate managers, and anyone with technical interests looking to understand how machine learning and artificial intelligence can benefit their work.


Machine Learning for Decision Makers

Machine Learning for Decision Makers

Author: Patanjali Kashyap

Publisher: Apress

Published: 2018-01-04

Total Pages: 381

ISBN-13: 1484229886

DOWNLOAD EBOOK

Take a deep dive into the concepts of machine learning as they apply to contemporary business and management. You will learn how machine learning techniques are used to solve fundamental and complex problems in society and industry. Machine Learning for Decision Makers serves as an excellent resource for establishing the relationship of machine learning with IoT, big data, and cognitive and cloud computing to give you an overview of how these modern areas of computing relate to each other. This book introduces a collection of the most important concepts of machine learning and sets them in context with other vital technologies that decision makers need to know about. These concepts span the process from envisioning the problem to applying machine-learning techniques to your particular situation. This discussion also provides an insight to help deploy the results to improve decision-making. The book uses case studies and jargon busting to help you grasp the theory of machine learning quickly. You'll soon gain the big picture of machine learning and how it fits with other cutting-edge IT services. This knowledge will give you confidence in your decisions for the future of your business. What You Will Learn Discover the machine learning, big data, and cloud and cognitive computing technology stack Gain insights into machine learning concepts and practices Understand business and enterprise decision-making using machine learning Absorb machine-learning best practices Who This Book Is For Managers tasked with making key decisions who want to learn how and when machine learning and related technologies can help them.


Machine Learning and Optimization for Engineering Design

Machine Learning and Optimization for Engineering Design

Author: Apoorva S. Shastri

Publisher: Springer Nature

Published: 2024-01-27

Total Pages: 175

ISBN-13: 9819974569

DOWNLOAD EBOOK

This book aims to provide a collection of state-of-the-art scientific and technical research papers related to machine learning-based algorithms in the field of optimization and engineering design. The theoretical and practical development for numerous engineering applications such as smart homes, ICT-based irrigation systems, academic success prediction, future agro-industry for crop production, disease classification in plants, dental problems and solutions, loan eligibility processing, etc., and their implementation with several case studies and literature reviews are included as self-contained chapters. Additionally, the book intends to highlight the importance of study and effectiveness in addressing the time and space complexity of problems and enhancing accuracy, analysis, and validations for different practical applications by acknowledging the state-of-the-art literature survey. The book targets a larger audience by exploring multidisciplinary research directions such as computer vision, machine learning, artificial intelligence, modified/newly developed machine learning algorithms, etc., to enhance engineering design applications for society. State-of-the-art research work with illustrations and exercises along with pseudo-code has been provided here.


Languages and Compilers for Parallel Computing

Languages and Compilers for Parallel Computing

Author: James Brodman

Publisher: Springer

Published: 2015-04-30

Total Pages: 401

ISBN-13: 3319174738

DOWNLOAD EBOOK

This book constitutes the thoroughly refereed post-conference proceedings of the 27th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2014, held in Hillsboro, OR, USA, in September 2014. The 25 revised full papers were carefully reviewed and selected from 39 submissions. The papers are organized in topical sections on accelerator programming; algorithms for parallelism; compilers; debugging; vectorization.


Machine Learning Applications In Software Engineering

Machine Learning Applications In Software Engineering

Author: Du Zhang

Publisher: World Scientific

Published: 2005-02-21

Total Pages: 367

ISBN-13: 9814481424

DOWNLOAD EBOOK

Machine learning deals with the issue of how to build computer programs that improve their performance at some tasks through experience. Machine learning algorithms have proven to be of great practical value in a variety of application domains. Not surprisingly, the field of software engineering turns out to be a fertile ground where many software development and maintenance tasks could be formulated as learning problems and approached in terms of learning algorithms. This book deals with the subject of machine learning applications in software engineering. It provides an overview of machine learning, summarizes the state-of-the-practice in this niche area, gives a classification of the existing work, and offers some application guidelines. Also included in the book is a collection of previously published papers in this research area.


Hardware-aware Algorithms for Efficient Machine Learning

Hardware-aware Algorithms for Efficient Machine Learning

Author: Tri Dao Phuc Quang

Publisher:

Published: 2023

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Machine learning (ML) training will continue to grow to consume more cycles, their inference will proliferate on more kinds of devices, and their capabilities will be used in more domains. Some goals central to this future are to make ML models efficient so they remain practical to train and deploy, and to unlock new application domains with new capabilities. We describe some recent developments in hardware-aware algorithms to improve the efficiency-quality tradeoff of ML models and equip them with long context. In Chapter 2, we focus on structured sparsity, a natural approach to mitigate the extensive compute and memory cost of large ML models. We describe a line of work on learnable fast transforms that, thanks to their expressiveness and efficiency, yields some of the first sparse training methods to speed up large models in wall-clock time (2x) without compromising their quality. In Chapter 3, we focus on efficient Transformer training and inference for long sequences. We describe FlashAttention, a fast and memory-efficient algorithm to compute attention with no approximation. By careful accounting of reads/writes between different levels of memory hierarchy, FlashAttention is 2-4x faster and uses 10-20x less memory compared to the best existing attention implementations, allowing us to train higher-quality Transformers with 8x longer context. FlashAttention is now widely used in some of the largest research labs and companies. In Chapter 4, we examine state-space models, a promising architecture designed for long-range memory. As we seek to understand why early state-space models did not perform well on language modeling tasks, we propose simple multiplicative interaction that expands their expressiveness. We also design hardware-friendly algorithms to train them. As a result, we are able to train state-space models to multi-billion parameter scale, demonstrating a new kind of model competitive with the dominant Transformers in language modeling. We conclude with some exciting directions in ML and systems, such as software-hardware co-design, structured sparsity for scientific AI, and long context for new AI workflows and modalities.