Shared Virtual Memory for Heterogeneous Embedded Systems on Chips
Author: Pirmin Robert Vogel
Publisher:
Published: 2018
Total Pages: 197
ISBN-13: 9783866286238
DOWNLOAD EBOOKRead and Download eBook Full
Author: Pirmin Robert Vogel
Publisher:
Published: 2018
Total Pages: 197
ISBN-13: 9783866286238
DOWNLOAD EBOOKAuthor: Andreas Dominic Kurth
Publisher: BoD – Books on Demand
Published: 2022-10-05
Total Pages: 282
ISBN-13: 3866287747
DOWNLOAD EBOOKHeterogeneous systems on chip (HeSoCs) combine general-purpose, feature-rich multi-core host processors with domain-specific programmable many-core accelerators (PMCAs) to unite versatility with energy efficiency and peak performance. By virtue of their heterogeneity, HeSoCs hold the promise of increasing performance and energy efficiency compared to homogeneous multiprocessors, because applications can be executed on hardware that is designed for them. However, this heterogeneity also increases system complexity substantially. This thesis presents the first research platform for HeSoCs where all components, from accelerator cores to application programming interface, are available under permissive open-source licenses. We begin by identifying the hardware and software components that are required in HeSoCs and by designing a representative hardware and software architecture. We then design, implement, and evaluate four critical HeSoC components that have not been discussed in research at the level required for an open-source implementation: First, we present a modular, topology-agnostic, high-performance on-chip communication platform, which adheres to a state-of-the-art industry-standard protocol. We show that the platform can be used to build high-bandwidth (e.g., 2.5 GHz and 1024 bit data width) end-to-end communication fabrics with high degrees of concurrency (e.g., up to 256 independent concurrent transactions). Second, we present a modular and efficient solution for implementing atomic memory operations in highly-scalable many-core processors, which demonstrates near-optimal linear throughput scaling for various synthetic and real-world workloads and requires only 0.5 kGE per core. Third, we present a hardware-software solution for shared virtual memory that avoids the majority of translation lookaside buffer misses with prefetching, supports parallel burst transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our work improves accelerator performance for memory-intensive kernels by up to 4×. Fourth, we present a software toolchain for mixed-data-model heterogeneous compilation and OpenMP offloading. Our work enables transparent memory sharing between a 64-bit host processor and a 32-bit accelerator at overheads below 0.7 % compared to 32-bit-only execution. Finally, we combine our contributions to a research platform for state-of-the-art HeSoCs and demonstrate its performance and flexibility.
Author: Matheus Cavalcante
Publisher: BoD – Books on Demand
Published: 2023-08-24
Total Pages: 224
ISBN-13: 3866288018
DOWNLOAD EBOOKIn his seminal Turing Award Lecture, Backus discussed the issues stemming from the word-at-a-time style of programming inherited from the von Neumann computer. More than forty years later, computer architects must be creative to amortize the von Neumann Bottleneck (VNB) associated with fetching and decoding instructions which only keep the datapath busy for a very short period of time. In particular, vector processors promise to be one of the most efficient architectures to tackle the VNB, by amortizing the energy overhead of instruction fetching and decoding over several chunks of data. This work explores vector processing as an option to build small and efficient processing elements for large-scale clusters of cores sharing access to tightly-coupled L1 memory
Author: Michael Stefano Fritz Schaffner
Publisher: BoD – Books on Demand
Published: 2018-10-24
Total Pages: 294
ISBN-13: 3866286244
DOWNLOAD EBOOKMultiview autostereoscopic displays (MADs) make it possible to view video content in 3D without wearing special glasses, and such displays have recently become available. The main problem of MADs is that they require several (typically 8 or 9) views, while most of the 3D video content is in stereoscopic 3D today. To bridge this content-display gap, the research community started to devise automatic multiview synthesis (MVS) methods. Common MVS methods are based on depth-image-based rendering, where a dense depth map of the scene is used to reproject the image to new viewpoints. Although physically correct, this approach requires accurate depth maps and additional inpainting steps. Our work uses an alternative conversion concept based on image domain warping (IDW) which has been successfully applied to related problems such as aspect ratio retargeting for streaming video, and dispa- rity remapping for depth adjustments in stereoscopic 3D content. IDW shows promising performance in this context as it only requires robust, sparse point- correspondences and no inpainting steps. However, MVS, using IDW as well as alternative approaches, is computationally demanding and requires realtime processing - yet such methods should be portable to end-user and even mobile devices to develop their full potential. To this end, this thesis investigates efficient algorithms and hardware architectures for a variety of subproblems arising in the MVS pipeline.
Author: Miguel Peón Quirós
Publisher: Springer Nature
Published: 2020-01-30
Total Pages: 214
ISBN-13: 3030374327
DOWNLOAD EBOOKThis book defines and explores the problem of placing the instances of dynamic data types on the components of the heterogeneous memory organization of an embedded system, with the final goal of reducing energy consumption and improving performance. It is one of the first to cover the problem of placement for dynamic data objects on embedded systems with heterogeneous memory architectures, presenting a complete methodology that can be easily adapted to real cases and work flows. The authors discuss how to improve system performance and energy consumption simultaneously. Discusses the problem of placement for dynamic data objects on embedded systems with heterogeneous memory architectures; Presents a complete methodology that can be adapted easily to real cases and work flows; Offers hints on how to improve system performance and energy consumption simultaneously.
Author: Wen-mei W. Hwu
Publisher: Morgan Kaufmann
Published: 2015-11-20
Total Pages: 207
ISBN-13: 0128008016
DOWNLOAD EBOOKHeterogeneous Systems Architecture - a new compute platform infrastructure presents a next-generation hardware platform, and associated software, that allows processors of different types to work efficiently and cooperatively in shared memory from a single source program. HSA also defines a virtual ISA for parallel routines or kernels, which is vendor and ISA independent thus enabling single source programs to execute across any HSA compliant heterogeneous processer from those used in smartphones to supercomputers. The book begins with an overview of the evolution of heterogeneous parallel processing, associated problems, and how they are overcome with HSA. Later chapters provide a deeper perspective on topics such as the runtime, memory model, queuing, context switching, the architected queuing language, simulators, and tool chains. Finally, three real world examples are presented, which provide an early demonstration of how HSA can deliver significantly higher performance thru C++ based applications. Contributing authors are HSA Foundation members who are experts from both academia and industry. Some of these distinguished authors are listed here in alphabetical order: Yeh-Ching Chung, Benedict R. Gaster, Juan Gómez-Luna, Derek Hower, Lee Howes, Shih-Hao HungThomas B. Jablin, David Kaeli,Phil Rogers, Ben Sander, I-Jui (Ray) Sung. - Provides clear and concise explanations of key HSA concepts and fundamentals by expert HSA Specification contributors - Explains how performance-bound programming algorithms and application types can be significantly optimized by utilizing HSA hardware and software features - Presents HSA simply, clearly, and concisely without reading the detailed HSA Specification documents - Demonstrates ideal mapping of processing resources from CPUs to many other heterogeneous processors that comply with HSA Specifications
Author: Florian Stefan Glaser
Publisher: BoD – Books on Demand
Published: 2022-12-02
Total Pages: 216
ISBN-13: 3866287771
DOWNLOAD EBOOKAging population and the thereby ever-rising cost of health services call for novel and innovative solutions for providing medical care and services. So far, medical care is primarily provided in the form of time-consuming in-person appointments with trained personnel and expensive, stationary instrumentation equipment. As for many current and past challenges, the advances in microelectronics are a crucial enabler and offer a plethora of opportunities. With key building blocks such as sensing, processing, and communication systems and circuits getting smaller, cheaper, and more energy-efficient, personal and wearable or even implantable point-of-care devices with medicalgrade instrumentation capabilities become feasible. Device size and battery lifetime are paramount for the realization of such devices. Besides integrating the required functionality into as few individual microelectronic components as possible, the energy efficiency of such is crucial to reduce battery size, usually being the dominant contributor to overall device size. In this thesis, we present two major contributions to achieve the discussed goals in the context of miniaturized medical instrumentation: First, we present a synchronization solution for embedded, parallel near-threshold computing (NTC), a promising concept for enabling the required processing capabilities with an energy efficiency that is suitable for highly mobile devices with very limited battery capacity. Our proposed solution aims at increasing energy efficiency and performance for parallel NTC clusters by maximizing the effective utilization of the available cores under parallel workloads. We describe a hardware unit that enables fine-grain parallelization by greatly optimizing and accelerating core-to-core synchronization and communication and analyze the impact of those mechanisms on the overall performance and energy efficiency of an eight-core cluster. With a range of digital signal processing (DSP) applications typical for the targeted systems, the proposed hardware unit improves performance by up to 92% and 23% on average and energy efficiency by up to 98% and 39% on average. In the second part, we present a MCU processing and control subsystem (MPCS) for the integration into VivoSoC, a highly versatile single-chip solution for mobile medical instrumentation. In addition to the MPCS, it includes a multitude of analog front-ends (AFEs) and a multi-channel power management IC (PMIC) for voltage conversion. ...
Author: David R. Kaeli
Publisher: Morgan Kaufmann
Published: 2015-06-18
Total Pages: 330
ISBN-13: 0128016493
DOWNLOAD EBOOKHeterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs). This fully-revised edition includes the latest enhancements in OpenCL 2.0 including: • Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources • Dynamic parallelism which reduces processor load and avoids bottlenecks • Improved imaging support and integration with OpenGL Designed to work on multiple platforms, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book explores memory spaces, optimization techniques, extensions, debugging and profiling. Multiple case studies and examples illustrate high-performance algorithms, distributing work across heterogeneous systems, embedded domain-specific languages, and will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms. Updated content to cover the latest developments in OpenCL 2.0, including improvements in memory handling, parallelism, and imaging support Explanations of principles and strategies to learn parallel programming with OpenCL, from understanding the abstraction models to thoroughly testing and debugging complete applications Example code covering image analytics, web plugins, particle simulations, video editing, performance optimization, and more
Author: Rainer Leupers
Publisher: Springer Science & Business Media
Published: 2010-09-15
Total Pages: 343
ISBN-13: 1441961755
DOWNLOAD EBOOKSimulation of computer architectures has made rapid progress recently. The primary application areas are hardware/software performance estimation and optimization as well as functional and timing verification. Recent, innovative technologies such as retargetable simulator generation, dynamic binary translation, or sampling simulation have enabled widespread use of processor and system-on-chip (SoC) simulation tools in the semiconductor and embedded system industries. Simultaneously, processor and SoC simulation is still a very active research area, e.g. what amounts to higher simulation speed, flexibility, and accuracy/speed trade-offs. This book presents and discusses the principle technologies and state-of-the-art in high-level hardware architecture simulation, both at the processor and the system-on-chip level.
Author: Barbara Chapman
Publisher: IOS Press
Published: 2010
Total Pages: 760
ISBN-13: 1607505290
DOWNLOAD EBOOKFrom Multicores and GPUs to Petascale. Parallel computing technologies have brought dramatic changes to mainstream computing the majority of todays PCs, laptops and even notebooks incorporate multiprocessor chips with up to four processors. Standard components are increasingly combined with GPUs Graphics Processing Unit, originally designed for high-speed graphics processing, and FPGAs Free Programmable Gate Array to build parallel computers with a wide spectrum of high-speed processing functions. The scale of this powerful hardware is limited only by factors such as energy consumption and thermal control. However, in addition to"