Hardware and Compiler-directed Cache Coherence in Large-scale Multiprocessors

Hardware and Compiler-directed Cache Coherence in Large-scale Multiprocessors

Author: Lynn Choi

Publisher:

Published: 1996

Total Pages: 40

ISBN-13:

DOWNLOAD EBOOK

Abstract: "In this paper, we study a hardware-supported, compiler-directed (HSCD) cache coherence scheme, which can be implemented on a large-scale multiprocessor using off-the-shelf microprocessors, such as the Cray T3D. The scheme can be adapted to various cache organizations, including multi-word cache lines and byte-addressable architectures. Several system related issues, including critical sections, inter-thread communication, and task migration have also been addressed. The cost of the required hardware support is minimal and proportional to the cache size. The necessary compiler algorithms, including intra- and interprocedural array data flow analysis, have been implemented on the Polaris parallelizing compiler [33]. From our simulation study using the Perfect Club benchmarks [5], we found that in spite of the conservative analysis made by the compiler, the performance of the proposed HSCD scheme can be comparable to that of a full-map hardware directory scheme. Given its comparable performance and reduced hardware cost, the proposed scheme can be a viable alternative for large-scale multiprocessors such as the Cray T3D, which rely on users to maintain data coherence."


Languages and Compilers for Parallel Computing

Languages and Compilers for Parallel Computing

Author: Siddharta Chatterjee

Publisher: Springer

Published: 2003-06-26

Total Pages: 395

ISBN-13: 3540483195

DOWNLOAD EBOOK

LCPC’98 Steering and Program Committes for their time and energy in - viewing the submitted papers. Finally, and most importantly, we thank all the authors and participants of the workshop. It is their signi cant research work and their enthusiastic discussions throughout the workshopthat made LCPC’98 a success. May 1999 Siddhartha Chatterjee Program Chair Preface The year 1998 marked the eleventh anniversary of the annual Workshop on Languages and Compilers for Parallel Computing (LCPC), an international - rum for leading research groups to present their current research activities and latest results. The LCPC community is interested in a broad range of te- nologies, with a common goal of developing software systems that enable real applications. Amongthetopicsofinteresttotheworkshoparelanguagefeatures, communication code generation and optimization, communication libraries, d- tributed shared memory libraries, distributed object systems, resource m- agement systems, integration of compiler and runtime systems, irregular and dynamic applications, performance evaluation, and debuggers. LCPC’98 was hosted by the University of North Carolina at Chapel Hill (UNC-CH) on 7 - 9 August 1998, at the William and Ida Friday Center on the UNC-CH campus. Fifty people from the United States, Europe, and Asia attended the workshop. The program committee of LCPC’98, with the help of external reviewers, evaluated the submitted papers. Twenty-four papers were selected for formal presentation at the workshop. Each session was followed by an open panel d- cussion centered on the main topic of the particular session.


A Primer on Memory Consistency and Cache Coherence

A Primer on Memory Consistency and Cache Coherence

Author: Vijay Nagarajan

Publisher: Morgan & Claypool Publishers

Published: 2020-02-04

Total Pages: 296

ISBN-13: 1681737108

DOWNLOAD EBOOK

Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence protocols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems. This second edition reflects a decade of advancements since the first edition and includes, among other more modest changes, two new chapters: one on consistency and coherence for non-CPU accelerators (with a focus on GPUs) and one that points to formal work and tools on consistency and coherence.


Proceedings of the 1993 International Conference on Parallel Processing

Proceedings of the 1993 International Conference on Parallel Processing

Author: C.Y. Roger Chen

Publisher: CRC Press

Published: 1993-08-16

Total Pages: 392

ISBN-13: 9780849389849

DOWNLOAD EBOOK

This three-volume work presents a compendium of current and seminal papers on parallel/distributed processing offered at the 22nd International Conference on Parallel Processing, held August 16-20, 1993 in Chicago, Illinois. Topics include processor architectures; mapping algorithms to parallel systems, performance evaluations; fault diagnosis, recovery, and tolerance; cube networks; portable software; synchronization; compilers; hypercube computing; and image processing and graphics. Computer professionals in parallel processing, distributed systems, and software engineering will find this book essential to their complete computer reference library.


The Cache Coherence Problem in Shared-Memory Multiprocessors

The Cache Coherence Problem in Shared-Memory Multiprocessors

Author: Igor Tartalja

Publisher: Wiley-IEEE Computer Society Press

Published: 1996-02-13

Total Pages: 368

ISBN-13:

DOWNLOAD EBOOK

The book illustrates state-of-the-art software solutions for cache coherence maintenance in shared-memory multiprocessors. It begins with a brief overview of the cache coherence problem and introduces software solutions to the problem. The text defines and details static and dynamic software schemes, techniques for modeling performance evaluation mechanisms, and performance evaluation studies.


Encyclopedia of Parallel Computing

Encyclopedia of Parallel Computing

Author: David Padua

Publisher: Springer Science & Business Media

Published: 2011-09-08

Total Pages: 2211

ISBN-13: 0387097651

DOWNLOAD EBOOK

Containing over 300 entries in an A-Z format, the Encyclopedia of Parallel Computing provides easy, intuitive access to relevant information for professionals and researchers seeking access to any aspect within the broad field of parallel computing. Topics for this comprehensive reference were selected, written, and peer-reviewed by an international pool of distinguished researchers in the field. The Encyclopedia is broad in scope, covering machine organization, programming languages, algorithms, and applications. Within each area, concepts, designs, and specific implementations are presented. The highly-structured essays in this work comprise synonyms, a definition and discussion of the topic, bibliographies, and links to related literature. Extensive cross-references to other entries within the Encyclopedia support efficient, user-friendly searchers for immediate access to useful information. Key concepts presented in the Encyclopedia of Parallel Computing include; laws and metrics; specific numerical and non-numerical algorithms; asynchronous algorithms; libraries of subroutines; benchmark suites; applications; sequential consistency and cache coherency; machine classes such as clusters, shared-memory multiprocessors, special-purpose machines and dataflow machines; specific machines such as Cray supercomputers, IBM’s cell processor and Intel’s multicore machines; race detection and auto parallelization; parallel programming languages, synchronization primitives, collective operations, message passing libraries, checkpointing, and operating systems. Topics covered: Speedup, Efficiency, Isoefficiency, Redundancy, Amdahls law, Computer Architecture Concepts, Parallel Machine Designs, Benmarks, Parallel Programming concepts & design, Algorithms, Parallel applications. This authoritative reference will be published in two formats: print and online. The online edition features hyperlinks to cross-references and to additional significant research. Related Subjects: supercomputing, high-performance computing, distributed computing


Scalable Shared-Memory Multiprocessing

Scalable Shared-Memory Multiprocessing

Author: Daniel E. Lenoski

Publisher: Elsevier

Published: 2014-06-28

Total Pages: 364

ISBN-13: 1483296016

DOWNLOAD EBOOK

Dr. Lenoski and Dr. Weber have experience with leading-edge research and practical issues involved in implementing large-scale parallel systems. They were key contributors to the architecture and design of the DASH multiprocessor. Currently, they are involved with commercializing scalable shared-memory technology.


Distributed Sparse Gaussian Elimination and Orthogonal Factorization

Distributed Sparse Gaussian Elimination and Orthogonal Factorization

Author: Padma Raghavan

Publisher:

Published: 1993

Total Pages: 40

ISBN-13:

DOWNLOAD EBOOK

Abstract: "We consider the solution of a linear system Ax = b on a distributed memory machine when the matrix A has full rank and is large, sparse and nonsymmetric. We use our Cartesian Nested Dissection algorithm to compute a fill-reducing column ordering of the matrix. We develop algorithms that use the associated separator tree to estimate the structure of the factor and to distribute and perform numeric computations. When the matrix is nonsymmetric but square, the numeric computations involve Gaussian elimination with row pivoting; when the matrixis overdetermined, row-oriented Householder transforms are applied to compute the triangular factor of an orthogonal factorization. We compare the fill incurred by our approach to that incurred by well known sequential methods and report on the performance of our implementation on the Intel iPSC/860."