This book constitutes the refereed proceedings of the 9th International Workshop on OpenMP, held in Canberra, Australia, in September 2013. The 14 technical full papers presented were carefully reviewed and selected from various submissions. The papers are organized in topical sections on proposed extensions to OpenMP, applications, accelerators, scheduling, and tools.
This book constitutes the proceedings of the 12th International Workshop on OpenMP, IWOMP 2016, held in Nara, Japan, in October 2016. The 24 full papers presented in this volume were carefully reviewed and selected from 28 submissions. They were organized in topical sections named: applications, locality, task parallelism, extensions, tools, accelerator programming, and performance evaluations and optimization.
High Performance Computing (HPC) remains a driver that offers huge potentials and benefits for science and society. However, a profound understanding of the computational matters and specialized software is needed to arrive at effective and efficient simulations. Dedicated software tools are important parts of the HPC software landscape, and support application developers. Even though a tool is by definition not a part of an application, but rather a supplemental piece of software, it can make a fundamental difference during the development of an application. Such tools aid application developers in the context of debugging, performance analysis, and code optimization, and therefore make a major contribution to the development of robust and efficient parallel software. This book introduces a selection of the tools presented and discussed at the 9th International Parallel Tools Workshop held in Dresden, Germany, September 2-3, 2015, which offered an established forum for discussing the latest advances in parallel tools.
The only book to offer special coverage of the fundamentals of multicore DSP for implementation on the TMS320C66xx SoC This unique book provides readers with an understanding of the TMS320C66xx SoC as well as its constraints. It offers critical analysis of each element, which not only broadens their knowledge of the subject, but aids them in gaining a better understanding of how these elements work so well together. Written by Texas Instruments’ First DSP Educator Award winner, Naim Dahnoun, the book teaches readers how to use the development tools, take advantage of the maximum performance and functionality of this processor and have an understanding of the rich content which spans from architecture, development tools and programming models, such as OpenCL and OpenMP, to debugging tools. It also covers various multicore audio and image applications in detail. Additionally, this one-of-a-kind book is supplemented with: A rich set of tested laboratory exercises and solutions Audio and Image processing applications source code for the Code Composer Studio (integrated development environment from Texas Instruments) Multiple tables and illustrations With no other book on the market offering any coverage at all on the subject and its rich content with twenty chapters, Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC is a rare and much-needed source of information for undergraduates and postgraduates in the field that allows them to make real-time applications work in a relatively short period of time. It is also incredibly beneficial to hardware and software engineers involved in programming real-time embedded systems.
This book offers the first comprehensive view on integrated circuit and system design for the Internet of Things (IoT), and in particular for the tiny nodes at its edge. The authors provide a fresh perspective on how the IoT will evolve based on recent and foreseeable trends in the semiconductor industry, highlighting the key challenges, as well as the opportunities for circuit and system innovation to address them. This book describes what the IoT really means from the design point of view, and how the constraints imposed by applications translate into integrated circuit requirements and design guidelines. Chapter contributions equally come from industry and academia. After providing a system perspective on IoT nodes, this book focuses on state-of-the-art design techniques for IoT applications, encompassing the fundamental sub-systems encountered in Systems on Chip for IoT: ultra-low power digital architectures and circuits low- and zero-leakage memories (including emerging technologies) circuits for hardware security and authentication System on Chip design methodologies on-chip power management and energy harvesting ultra-low power analog interfaces and analog-digital conversion short-range radios miniaturized battery technologies packaging and assembly of IoT integrated systems (on silicon and non-silicon substrates). As a common thread, all chapters conclude with a prospective view on the foreseeable evolution of the related technologies for IoT. The concepts developed throughout the book are exemplified by two IoT node system demonstrations from industry. The unique balance between breadth and depth of this book: enables expert readers quickly to develop an understanding of the specific challenges and state-of-the-art solutions for IoT, as well as their evolution in the foreseeable future provides non-experts with a comprehensive introduction to integrated circuit design for IoT, and serves as an excellent starting point for further learning, thanks to the broad coverage of topics and selected references makes it very well suited for practicing engineers and scientists working in the hardware and chip design for IoT, and as textbook for senior undergraduate, graduate and postgraduate students ( familiar with analog and digital circuits).
As predicted by Gordon E. Moore in 1965, the performance of computer processors increased at an exponential rate. Nevertheless, the increases in computing speeds of single processor machines were eventually curtailed by physical constraints. This led to the development of parallel computing, and whilst progress has been made in this field, the complexities of parallel algorithm design, the deficiencies of the available software development tools and the complexity of scheduling tasks over thousands and even millions of processing nodes represent a major challenge to the construction and use of more powerful parallel systems. This book presents the proceedings of the biennial International Conference on Parallel Computing (ParCo2015), held in Edinburgh, Scotland, in September 2015. Topics covered include computer architecture and performance, programming models and methods, as well as applications. The book also includes two invited talks and a number of mini-symposia. Exascale computing holds enormous promise in terms of increasing scientific knowledge acquisition and thus contributing to the future well-being and prosperity of mankind. A number of innovative approaches to the development and use of future high-performance and high-throughput systems are to be found in this book, which will be of interest to all those whose work involves the handling and processing of large amounts of data.
Heterogeneous systems on chip (HeSoCs) combine general-purpose, feature-rich multi-core host processors with domain-specific programmable many-core accelerators (PMCAs) to unite versatility with energy efficiency and peak performance. By virtue of their heterogeneity, HeSoCs hold the promise of increasing performance and energy efficiency compared to homogeneous multiprocessors, because applications can be executed on hardware that is designed for them. However, this heterogeneity also increases system complexity substantially. This thesis presents the first research platform for HeSoCs where all components, from accelerator cores to application programming interface, are available under permissive open-source licenses. We begin by identifying the hardware and software components that are required in HeSoCs and by designing a representative hardware and software architecture. We then design, implement, and evaluate four critical HeSoC components that have not been discussed in research at the level required for an open-source implementation: First, we present a modular, topology-agnostic, high-performance on-chip communication platform, which adheres to a state-of-the-art industry-standard protocol. We show that the platform can be used to build high-bandwidth (e.g., 2.5 GHz and 1024 bit data width) end-to-end communication fabrics with high degrees of concurrency (e.g., up to 256 independent concurrent transactions). Second, we present a modular and efficient solution for implementing atomic memory operations in highly-scalable many-core processors, which demonstrates near-optimal linear throughput scaling for various synthetic and real-world workloads and requires only 0.5 kGE per core. Third, we present a hardware-software solution for shared virtual memory that avoids the majority of translation lookaside buffer misses with prefetching, supports parallel burst transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our work improves accelerator performance for memory-intensive kernels by up to 4×. Fourth, we present a software toolchain for mixed-data-model heterogeneous compilation and OpenMP offloading. Our work enables transparent memory sharing between a 64-bit host processor and a 32-bit accelerator at overheads below 0.7 % compared to 32-bit-only execution. Finally, we combine our contributions to a research platform for state-of-the-art HeSoCs and demonstrate its performance and flexibility.
The demand for mobile broadband will continue to increase in upcoming years, largely driven by the need to deliver ultra-high definition video. 5G is not only evolutionary, it also provides higher bandwidth and lower latency than the current-generation technology. More importantly, 5G is revolutionary in that it is expected to enable fundamentally new applications with much more stringent requirements in latency and bandwidth. 5G should help solve the last-mile/last-kilometer problem and provide broadband access to the next billion users on earth at a much lower cost because of its use of new spectrum and its improvements in spectral efficiency. 5G wireless access networks will need to combine several innovative aspects of decentralized and centralized allocation looking to maximize performance and minimize signaling load. Research is currently conducted to understand the inspirations, requirements, and the promising technical options to boost and enrich activities in 5G. Design Methodologies and Tools for 5G Network Development and Application presents the enhancement methods of 5G communication, explores the methods for faster communication, and provides a promising alternative solution that equips designers with the capability to produce high performance, scalable, and adoptable communication protocol. This book provides complete design methodologies, supporting tools for 5G communication, and innovative works. The design and evaluation of different proposed 5G structures signal integrity, reliability, low-power techniques, application mapping, testing, and future trends. This book is ideal for researchers who are working in communication, networks, design and implementations, industry personnel, engineers, practitioners, academicians, and students who are interested in the evolution, importance, usage, and technology adoption for 5G applications.
This book presents the proceedings of the 12th International Parallel Tools Workshop, held in Stuttgart, Germany, during September 17-18, 2018, and of the 13th International Parallel Tools Workshop, held in Dresden, Germany, during September 2-3, 2019. The workshops are a forum to discuss the latest advances in parallel tools for high-performance computing. High-performance computing plays an increasingly important role for numerical simulation and modeling in academic and industrial research. At the same time, using large-scale parallel systems efficiently is becoming more difficult. A number of tools addressing parallel program development and analysis has emerged from the high-performance computing community over the last decade, and what may have started as a collection of a small helper scripts has now matured into production-grade frameworks. Powerful user interfaces and an extensive body of documentation together create a user-friendly environment for parallel tools.
Numerical simulation and modelling using High Performance Computing has evolved into an established technique in academic and industrial research. At the same time, the High Performance Computing infrastructure is becoming ever more complex. For instance, most of the current top systems around the world use thousands of nodes in which classical CPUs are combined with accelerator cards in order to enhance their compute power and energy efficiency. This complexity can only be mastered with adequate development and optimization tools. Key topics addressed by these tools include parallelization on heterogeneous systems, performance optimization for CPUs and accelerators, debugging of increasingly complex scientific applications and optimization of energy usage in the spirit of green IT. This book represents the proceedings of the 8th International Parallel Tools Workshop, held October 1-2, 2014 in Stuttgart, Germany – which is a forum to discuss the latest advancements in the parallel tools.