[PDF] Full Algorithm Accelerator Co Design For High Performance And Secure Deep Learning Download eBook

Algorithm-accelerator Co-design for High-performance and Secure Deep Learning

Author: Weizhe Hua

Publisher:

Published: 2022

Total Pages: 0

ISBN-13:

Deep learning has emerged as a new engine for many of today's artificial intelligence/machine learning systems, leading to several recent breakthroughs in vision and natural language processing tasks.However, as we move into the era of deep learning with billions and even trillions of parameters, meeting the computational and memory requirements to train and serve state-of-the-art models has become extremely challenging. Optimizing the computational cost and memory footprint of deep learning models for better system performance is critical to the widespread deployment of deep learning. Moreover, a massive amount of sensitive and private user data is exposed to the deep learning system during the training or serving process. Therefore, it is essential to investigate potential vulnerabilities in existing deep learning hardware, and then design secure deep learning systems that provide strong privacy guarantees for user data and the models that learn from the data. In this dissertation, we propose to co-design the deep learning algorithms and hardware architectural techniques to improve both the performance and security/privacy of deep learning systems. On high-performance deep learning, we first introduce channel gating neural network (CGNet), which exploits the dynamic sparsity of specific inputs to reduce computation of convolutional neural networks. We also co-develop an ASIC accelerator for CGNet that can turn theoretical FLOP reduction into wall-clock speedup. Secondly, we present Fast Linear Attention with a Single Head (FLASH), a state-of-the-art language model specifically designed for Google's TPU that can achieve transformer-level quality with linear complexity with respect to the sequence length. Through our empirical studies on masked language modeling, auto-regressive language modeling, and fine-tuning for question answering, FLASH achieves at least similar if not better quality compared to the augmented transformer, while being significantly faster (e.g., up to 12 times faster). On the security of deep learning, we study the side-channel vulnerabilities of existing deep learning accelerators. We then introduce a secure accelerator architecture for privacy-preserving deep learning, named GuardNN. GuardNN provides a trusted execution environment (TEE) with specialized protection for deep learning, and achieves a small trusted computing base and low protection overhead at the same time. The FPGA prototype of GuardNN achieves a maximum performance overhead of 2.4\% across four different modern DNNs models for ImageNet.

Accelerator Architecture for Secure and Energy Efficient Machine Learning

Author: Mohammad Hossein Samavatian

Publisher:

Published: 2022

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

ML applications are driving the next computing revolution. In this context both performance and security are crucial. We propose hardware/software co-design solutions for addressing both. First, we propose RNNFast, an accelerator for Recurrent Neural Networks (RNNs). RNNs are particularly well suited for machine learning problems in which context is important, such as language translation. RNNFast leverages an emerging class of non-volatile memory called domain-wall memory (DWM). We show that DWM is very well suited for RNN acceleration due to its very high density and low read/write energy. RNNFast is very efficient and highly scalable, with a flexible mapping of logical neurons to RNN hardware blocks. The accelerator is designed to minimize data movement by closely interleaving DWM storage and computation. We compare our design with a state-of-the-art GPGPU and find 21.8X higher performance with 70X lower energy. Second, we brought ML security into ML accelerator design for more efficiency and robustness. Deep Neural Networks (DNNs) are employed in an increasing number of applications, some of which are safety-critical. Unfortunately, DNNs are known to be vulnerable to so-called adversarial attacks. In general, the proposed defenses have high overhead, some require attack-specific re-training of the model or careful tuning to adapt to different attacks. We show that these approaches, while successful for a range of inputs, are insufficient to address stronger, high-confidence adversarial attacks. To address this, we propose HASI and DNNShield, two hardware-accelerated defenses that adapt the strength of the response to the confidence of the adversarial input. Both techniques rely on approximation or random noise deliberately introduced into the model. HASI uses direct noise injection into the model at inference. DNNShield uses approximation that relies on dynamic and random sparsification of the DNN model to achieve inference approximation efficiently and with fine-grain control over the approximation error. Both techniques use the output distribution characteristics of noisy/sparsified inference compared to a baseline output to detect adversarial inputs. We show an adversarial detection rate of 86% when applied to VGG16 and 88% when applied to ResNet50, which exceeds the detection rate of the state-of-the-art approaches, with a much lower overhead. We demonstrate a software/hardware-accelerated FPGA prototype, which reduces the performance impact of HASI and DNNShield relative to software-only CPU and GPU implementations.

Deep Learning for Computer Architects

Author: Brandon Reagen

Publisher: Morgan & Claypool Publishers

Published: 2017-08-22

Total Pages: 125

ISBN-13: 1627059857

DOWNLOAD EBOOK

This is a primer written for computer architects in the new and rapidly evolving field of deep learning. It reviews how machine learning has evolved since its inception in the 1960s and tracks the key developments leading up to the emergence of the powerful deep learning techniques that emerged in the last decade. Machine learning, and specifically deep learning, has been hugely disruptive in many fields of computer science. The success of deep learning techniques in solving notoriously difficult classification and regression problems has resulted in their rapid adoption in solving real-world problems. The emergence of deep learning is widely attributed to a virtuous cycle whereby fundamental advancements in training deeper models were enabled by the availability of massive datasets and high-performance computer hardware. It also reviews representative workloads, including the most commonly used datasets and seminal networks across a variety of domains. In addition to discussing the workloads themselves, it also details the most popular deep learning tools and show how aspiring practitioners can use the tools with the workloads to characterize and optimize DNNs. The remainder of the book is dedicated to the design and optimization of hardware and architectures for machine learning. As high-performance hardware was so instrumental in the success of machine learning becoming a practical solution, this chapter recounts a variety of optimizations proposed recently to further improve future designs. Finally, it presents a review of recent research published in the area as well as a taxonomy to help readers understand how various contributions fall in context.

Algorithm-Centric Design of Reliable and Efficient Deep Learning Processing Systems

Author: Elbruz Ozen

Publisher:

Published: 2023

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Artificial intelligence techniques driven by deep learning have experienced significant advancements in the past decade. The usage of deep learning methods has increased dramatically in practical application domains such as autonomous driving, healthcare, and robotics, where the utmost hardware resource efficiency, as well as strict hardware safety and reliability requirements, are often imposed. The increasing computational cost of deep learning models has been traditionally tackled through model compression and domain-specific accelerator design. As the cost of conventional fault tolerance methods is often prohibitive in consumer electronics, the question of functional safety and reliability for deep learning hardware is still in its infancy. This dissertation outlines a novel approach to deliver dramatic boosts in hardware safety, reliability, and resource efficiency through a synergistic co-design paradigm. We first observe and make use of the unique algorithmic characteristics of deep neural networks, including plasticity in the design process, resiliency to small numerical perturbations, and their inherent redundancy, as well as the unique micro-architectural properties of deep learning accelerators such as regularity. The advocated approach is accomplished by reshaping deep neural networks, enhancing deep neural network accelerators strategically, prioritizing the overall functional correctness, and minimizing the associated costs through the statistical nature of deep neural networks. To illustrate, our analysis demonstrates that deep neural networks equipped with the proposed techniques can maintain accuracy gracefully, even at extreme rates of hardware errors. As a result, the described methodology can embed strong safety and reliability characteristics in mission-critical deep learning applications at a negligible cost. The proposed approach further offers a promising avenue for handling the micro-architectural challenges of deep neural network accelerators and boosting resource efficiency through the synergistic co-design of deep neural networks and hardware micro-architectures.

Data Orchestration in Deep Learning Accelerators

Author: Tushar Krishna

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 158

ISBN-13: 3031017676

DOWNLOAD EBOOK

This Synthesis Lecture focuses on techniques for efficient data orchestration within DNN accelerators. The End of Moore's Law, coupled with the increasing growth in deep learning and other AI applications has led to the emergence of custom Deep Neural Network (DNN) accelerators for energy-efficient inference on edge devices. Modern DNNs have millions of hyper parameters and involve billions of computations; this necessitates extensive data movement from memory to on-chip processing engines. It is well known that the cost of data movement today surpasses the cost of the actual computation; therefore, DNN accelerators require careful orchestration of data across on-chip compute, network, and memory elements to minimize the number of accesses to external DRAM. The book covers DNN dataflows, data reuse, buffer hierarchies, networks-on-chip, and automated design-space exploration. It concludes with data orchestration challenges with compressed and sparse DNNs and future trends. The target audience is students, engineers, and researchers interested in designing high-performance and low-energy accelerators for DNN inference.

Deep Learning Systems

Author: Andres Rodriguez

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 245

ISBN-13: 3031017692

DOWNLOAD EBOOK

This book describes deep learning systems: the algorithms, compilers, and processor components to efficiently train and deploy deep learning models for commercial applications. The exponential growth in computational power is slowing at a time when the amount of compute consumed by state-of-the-art deep learning (DL) workloads is rapidly growing. Model size, serving latency, and power constraints are a significant challenge in the deployment of DL models for many applications. Therefore, it is imperative to codesign algorithms, compilers, and hardware to accelerate advances in this field with holistic system-level and algorithm solutions that improve performance, power, and efficiency. Advancing DL systems generally involves three types of engineers: (1) data scientists that utilize and develop DL algorithms in partnership with domain experts, such as medical, economic, or climate scientists; (2) hardware designers that develop specialized hardware to accelerate the components in the DL models; and (3) performance and compiler engineers that optimize software to run more efficiently on a given hardware. Hardware engineers should be aware of the characteristics and components of production and academic models likely to be adopted by industry to guide design decisions impacting future hardware. Data scientists should be aware of deployment platform constraints when designing models. Performance engineers should support optimizations across diverse models, libraries, and hardware targets. The purpose of this book is to provide a solid understanding of (1) the design, training, and applications of DL algorithms in industry; (2) the compiler techniques to map deep learning code to hardware targets; and (3) the critical hardware features that accelerate DL systems. This book aims to facilitate co-innovation for the advancement of DL systems. It is written for engineers working in one or more of these areas who seek to understand the entire system stack in order to better collaborate with engineers working in other parts of the system stack. The book details advancements and adoption of DL models in industry, explains the training and deployment process, describes the essential hardware architectural features needed for today's and future models, and details advances in DL compilers to efficiently execute algorithms across various hardware targets. Unique in this book is the holistic exposition of the entire DL system stack, the emphasis on commercial applications, and the practical techniques to design models and accelerate their performance. The author is fortunate to work with hardware, software, data scientist, and research teams across many high-technology companies with hyperscale data centers. These companies employ many of the examples and methods provided throughout the book.

Deep Learning on Edge Computing Devices

Author: Xichuan Zhou

Publisher: Elsevier

Published: 2022-02-02

Total Pages: 200

ISBN-13: 0323909272

DOWNLOAD EBOOK

Deep Learning on Edge Computing Devices: Design Challenges of Algorithm and Architecture focuses on hardware architecture and embedded deep learning, including neural networks. The title helps researchers maximize the performance of Edge-deep learning models for mobile computing and other applications by presenting neural network algorithms and hardware design optimization approaches for Edge-deep learning. Applications are introduced in each section, and a comprehensive example, smart surveillance cameras, is presented at the end of the book, integrating innovation in both algorithm and hardware architecture. Structured into three parts, the book covers core concepts, theories and algorithms and architecture optimization.This book provides a solution for researchers looking to maximize the performance of deep learning models on Edge-computing devices through algorithm-hardware co-design. - Focuses on hardware architecture and embedded deep learning, including neural networks - Brings together neural network algorithm and hardware design optimization approaches to deep learning, alongside real-world applications - Considers how Edge computing solves privacy, latency and power consumption concerns related to the use of the Cloud - Describes how to maximize the performance of deep learning on Edge-computing devices - Presents the latest research on neural network compression coding, deep learning algorithms, chip co-design and intelligent monitoring

Towards A Private New World

Author: Mohammad Sadegh Riazi

Publisher:

Published: 2020

Total Pages: 269

ISBN-13:

DOWNLOAD EBOOK

Data privacy and security are among the grand challenges in the emerging era of massive data and collective intelligence. On the one hand, the rapid advances of several technologies, including artificial intelligence, are directly dependent on harnessing the full potential of data. On the other hand, such colossal collections of data inherently have sensitive information about individuals; explicit access to the data violates the privacy of content owners. While a number of elegant cryptographic solutions have been suggested for secure storage as well as secure transmission of data, the ability to compute on encrypted data at scale has remained a standing challenge. Secure computation is a set of developing technologies that enable processing on the unintelligible version of the data. Secure computation can create a zero-trust platform where two or more individuals or organizations collaboratively compute on their shares of data without compromising data confidentiality. Computing on encrypted data removes several critical obstacles that prohibit scientific advances in which collaboration between distrusting parties is needed. Nevertheless, secure computation comes at the cost of significant computational overhead and higher communication between the pertinent parties. Currently, the high computational complexity prevents secure computation to be adopted in compute-intensive systems. This dissertation introduces several holistic algorithm-level, protocol-level, as well as hardware-level methodologies to enable the large-scale realization of the emerging secure computing and privacy technologies. The key contributions of this dissertation are as follows: (I) Introducing a novel secure computation framework in which several secure function evaluation protocols are integrated. The integration allows to choose a specific protocol to execute each unique operation based on the underlying mathematical characteristics of the protocol. The proposed methodology enables the secure execution of machine learning models 4-133x faster than the prior art. (II) Designing a neural network transformation and a customized secure computation protocol for secure inference on deep neural networks. The transformation translates the contemporary neural network operations into several Boolean operations that can more efficiently be executed in secure computation protocols. The proposed transformation in conjunction with the customized protocol enable privacy-preserving medical diagnosis on four medical datasets for the first time. (III) Design and end-to-end implementation of a new high-performance hardware architecture for computing on encrypted data. The proposed architecture outperforms high-end GPUs by more than 30x and modern CPUs by more than two orders of magnitude. (IV) Creating an efficient methodology based on hardware synthesis tools to produce compact Boolean circuit representation of a given function. The Boolean representation is optimized according to the cost function of secure computation protocols. The methodology reduces the computation and communication costs by up to 4x. (V) Designing a new substring search algorithm customized for secure computation that does not require random access to the text. The proposed algorithm outperforms all state-of-the-art substring search algorithms when run within the secure computation protocol. (VI) Introducing the first secure content-addressable memory for approximate search. The design enables high-accuracy similarity-based approximate search while keeping the underlying data private without relying on a trusted server. The construction is the first to provide post-breach data confidentiality. (VII) Proposing a new methodology to create large-volume synthetic human fingerprints that are computationally indistinguishable from real fingerprints. The methodology enhances the security of any fingerprint-based authentication system.

Towards Holistic Secure and Trustworthy Deep Learning

Author: Huili Chen

Publisher:

Published: 2022

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Machine Learning (ML) models, in particular Deep Neural Networks (DNNs), have been evolving exceedingly fast in the past few decades although the idea of DNNs was proposed in the nineteenth century. The success of contemporary ML models can be attributed to two key factors: (i) Data of various modalities is becoming more abundant for designers, which makes data-driven approaches such as DNNs more applicable in real-world settings; (ii) The computing power of emerging hardware platforms (e.g., GPUs, TPUs) is becoming stronger due to the architecture advance. The increasing computation capability makes the training of large-scale DNNs practical for complex data applications. While ML has enabled a paradigm shift in various fields such as autonomous driving, natural language processing, and biomedical diagnosis, training high-performance ML models can be both time and resource-consuming. As such, commercial ML models (which typically contain a tremendous amount of parameters to learn complex tasks) are trained by large tech companies and then distributed to the end users or deployed on the cloud for Machine Learning as a Service (MLaaS). This supply chain of ML models raises concerns for both model designers and end users. From the model developer's perspective, he/she wants to ensure ownership proof of the trained model in order to prevent copyright infringement and preserve the commercial advantage. For the end user, he/she needs to verify the obtained ML model is not maliciously altered before deploying the model. This dissertation introduces holistic algorithm-level and hardware-level solutions to resolving the Intellectual Property (IP) protection and security assessment challenges of ML models, thus facilitating safe and reliable ML deployment. The key contributions of this dissertation are as follows: Devising an end-to-end collusion-secure DNN fingerprinting framework named DeepMarks that enables the model owner to prove model authorship and identify unique users in the context of Deep Learning (DL). I design a fingerprint embedding technique that combines anti-collusion codes and weight regularization to ensure the fingerprint is encoded in the marked DL model in a robust manner while preserving the main task accuracy. Designing a hardware-level IP protection and usage control technique for DL applications using on-device DNN attestation. The proposed framework DeepAttest leverages device-specific fingerprints to 'mark' authentic DNNs and verifies the legitimacy of the deployed DNN with the support of the Trusted Execution Environment (TEE). The algorithm and hardware architecture of DeepAttest are co-optimized to ensure the process of on-device DNN attestation is lightweight and secure. Developing a spectral-domain DNN watermarking framework named SpecMark that removes the requirement of model re-training for watermark embedding and is robust against transfer learning. I adapt the idea of spread spectrum watermarking in the conventional multi-media domain to protect the IP of model designers using spectral watermarking. The effectiveness and robustness of SpecMark are corroborated on various automatic speech recognition datasets. Demonstrating a targeted Trojan attack against DNNs named ProFlip that exploits bit flipping techniques (particularly Row Hammer attacks) for Trojan insertion. Compared to previous Neural Trojan attacks that require poisoned training to backdoor the model, ProFlip can embed the Trojan after model deployment. To this end, I develop a new layer-wise sensitivity analysis technique to pinpoint the vulnerable layer for attack and a novel critical bit search algorithm that identifies the most susceptible weights bits. Designing a black-box Trojan detection and mitigation framework called DeepInspect that can assess a pre-trained DL model and determines if it has been backdoored. DeepInspect defense scheme identifies the footmark of Trojan insertion by learning the probability distribution of potential triggers with a conditional generative model. DeepInspect further leverages the trained generator to patch the model for higher Trojan robustness. Proposing a genetic algorithm-based logic unlocking scheme named GenUnlock that outperforms prior satisfiability (SAT)-based counterpart with better runtime efficiency. GenUnlock performs fast and effective key searching by algorithm/hardware co-design and an ensemble-based method. Empirical results show that GenUnlock reduces the attack runtime by an average of 4.68× compared to SAT-based attacks. Introducing a new logic testing-based Hardware Trojan detection framework named AdaTest that combines Reinforcement Learning (RL) and adaptive sampling. AdaTest achieves dynamic and progressive test pattern generation by defining a domain-specific reward function for circuits that characterizes both the static and dynamic properties of the circuit status. Experimental results show that AdaTest obtains a higher Trojan coverage with a shorter test pattern generation time compared to prior arts.

Deep Learning: Algorithms and Applications

Author: Witold Pedrycz

Publisher: Springer Nature

Published: 2019-10-23

Total Pages: 360

ISBN-13: 3030317609

DOWNLOAD EBOOK

This book presents a wealth of deep-learning algorithms and demonstrates their design process. It also highlights the need for a prudent alignment with the essential characteristics of the nature of learning encountered in the practical problems being tackled. Intended for readers interested in acquiring practical knowledge of analysis, design, and deployment of deep learning solutions to real-world problems, it covers a wide range of the paradigm’s algorithms and their applications in diverse areas including imaging, seismic tomography, smart grids, surveillance and security, and health care, among others. Featuring systematic and comprehensive discussions on the development processes, their evaluation, and relevance, the book offers insights into fundamental design strategies for algorithms of deep learning.

Posts

Algorithm-accelerator Co-design for High-performance and Secure Deep Learning

Accelerator Architecture for Secure and Energy Efficient Machine Learning

Deep Learning for Computer Architects

Algorithm-Centric Design of Reliable and Efficient Deep Learning Processing Systems

Data Orchestration in Deep Learning Accelerators

Deep Learning Systems

Deep Learning on Edge Computing Devices

Towards A Private New World

Towards Holistic Secure and Trustworthy Deep Learning

Deep Learning: Algorithms and Applications

Popular eBook

Recent Posts