The Evolution of Fault-Tolerant Computing

The Evolution of Fault-Tolerant Computing

Author: A. Avizienis

Publisher: Springer Science & Business Media

Published: 2012-12-06

Total Pages: 467

ISBN-13: 3709188717

DOWNLOAD EBOOK

For the editors of this book, as well as for many other researchers in the area of fault-tolerant computing, Dr. William Caswell Carter is one of the key figures in the formation and development of this important field. We felt that the IFIP Working Group 10.4 at Baden, Austria, in June 1986, which coincided with an important step in Bill's career, was an appropriate occasion to honor Bill's contributions and achievements by organizing a one day "Symposium on the Evolution of Fault-Tolerant Computing" in the honor of William C. Carter. The Symposium, held on June 30, 1986, brought together a group of eminent scientists from all over the world to discuss the evolu tion, the state of the art, and the future perspectives of the field of fault-tolerant computing. Historic developments in academia and industry were presented by individuals who themselves have actively been involved in bringing them about. The Symposium proved to be a unique historic event and these Proceedings, which contain the final versions of the papers presented at Baden, are an authentic reference document.


Fault-Tolerant Computing Symposium (FTCS-22)

Fault-Tolerant Computing Symposium (FTCS-22)

Author: IEEE Computer Society Press

Publisher:

Published: 1992

Total Pages: 560

ISBN-13: 9780818628757

DOWNLOAD EBOOK

The July 1992 conference was held in Boston, Massachusetts and heard 59 papers on architecture, recovery, communication protocols, self-checking and diagnosis, modeling and simulation, fault-tolerant hypercubes and meshes, scheduling and fault classification, among other related topics. No index. An


Fault-Tolerant Systems

Fault-Tolerant Systems

Author: Israel Koren

Publisher: Morgan Kaufmann

Published: 2020-09-01

Total Pages: 418

ISBN-13: 0128181060

DOWNLOAD EBOOK

Fault-Tolerant Systems, Second Edition, is the first book on fault tolerance design utilizing a systems approach to both hardware and software. No other text takes this approach or offers the comprehensive and up-to-date treatment that Koren and Krishna provide. The book comprehensively covers the design of fault-tolerant hardware and software, use of fault-tolerance techniques to improve manufacturing yields, and design and analysis of networks. Incorporating case studies that highlight more than ten different computer systems with fault-tolerance techniques implemented in their design, the book includes critical material on methods to protect against threats to encryption subsystems used for security purposes. The text's updated content will help students and practitioners in electrical and computer engineering and computer science learn how to design reliable computing systems, and how to analyze fault-tolerant computing systems. - Delivers the first book on fault tolerance design with a systems approach - Offers comprehensive coverage of both hardware and software fault tolerance, as well as information and time redundancy - Features fully updated content plus new chapters on failure mechanisms and fault-tolerance in cyber-physical systems - Provides a complete ancillary package, including an on-line solutions manual for instructors and PowerPoint slides


Fehlertolerierende Rechensysteme / Fault-tolerant Computing Systems

Fehlertolerierende Rechensysteme / Fault-tolerant Computing Systems

Author: Winfried Görke

Publisher: Springer Science & Business Media

Published: 2012-12-06

Total Pages: 400

ISBN-13: 3642750028

DOWNLOAD EBOOK

Dieses Buch enthält die Beiträge der 4. GI/ITG/GMA-Fachtagung über Fehlertolerierende Rechensysteme, die im September 1989 in einer Reihe von Tagungen in München 1982, Bonn 1984 sowie Bremerhaven 1987 veranstaltet wurde. Die 31 Beiträge, darunter 4 eingeladene, sind teils in deutscher, überwiegend aber in englischer Sprache verfa€t. Insgesamt wird durch diese Beiträge die Entwicklung der Konzeption und Implementierung fehlertoleranter Systeme in den letzten zwei Jahren vor allem in Europa dokumentiert. Sämtliche Beiträge berichten über neue Forschungs- oder Entwicklungsergebnisse.


Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design

Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design

Author: Xiaowei Li

Publisher: Springer Nature

Published: 2023-03-01

Total Pages: 318

ISBN-13: 9811985510

DOWNLOAD EBOOK

With the end of Dennard scaling and Moore’s law, IC chips, especially large-scale ones, now face more reliability challenges, and reliability has become one of the mainstay merits of VLSI designs. In this context, this book presents a built-in on-chip fault-tolerant computing paradigm that seeks to combine fault detection, fault diagnosis, and error recovery in large-scale VLSI design in a unified manner so as to minimize resource overhead and performance penalties. Following this computing paradigm, we propose a holistic solution based on three key components: self-test, self-diagnosis and self-repair, or “3S” for short. We then explore the use of 3S for general IC designs, general-purpose processors, network-on-chip (NoC) and deep learning accelerators, and present prototypes to demonstrate how 3S responds to in-field silicon degradation and recovery under various runtime faults caused by aging, process variations, or radical particles. Moreover, we demonstrate that 3S not only offers a powerful backbone for various on-chip fault-tolerant designs and implementations, but also has farther-reaching implications such as maintaining graceful performance degradation, mitigating the impact of verification blind spots, and improving chip yield. This book is the outcome of extensive fault-tolerant computing research pursued at the State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences over the past decade. The proposed built-in on-chip fault-tolerant computing paradigm has been verified in a broad range of scenarios, from small processors in satellite computers to large processors in HPCs. Hopefully, it will provide an alternative yet effective solution to the growing reliability challenges for large-scale VLSI designs.


Middleware 2006

Middleware 2006

Author: Maarten van Steen

Publisher: Springer Science & Business Media

Published: 2006-11-10

Total Pages: 436

ISBN-13: 354049023X

DOWNLOAD EBOOK

This book constitutes the refereed proceedings of the ACM/IFIP/USENIX 7th International Middleware Conference 2006, held in Melbourne, Australia, in November/December 2006. The 21 revised full papers presented were carefully reviewed and selected from 122 submissions. The papers are organized in topical sections on performance, composition, management, publish/subscribe technology, databases, mobile and ubiquitous computing, security, and data mining techniques


Fault-Tolerant Real-Time Systems

Fault-Tolerant Real-Time Systems

Author: Stefan Poledna

Publisher: Springer Science & Business Media

Published: 2007-11-23

Total Pages: 161

ISBN-13: 0585295808

DOWNLOAD EBOOK

Real-time computer systems are very often subject to dependability requirements because of their application areas. Fly-by-wire airplane control systems, control of power plants, industrial process control systems and others are required to continue their function despite faults. Fault-tolerance and real-time requirements thus constitute a kind of natural combination in process control applications. Systematic fault-tolerance is based on redundancy, which is used to mask failures of individual components. The problem of replica determinism is thereby to ensure that replicated components show consistent behavior in the absence of faults. It might seem trivial that, given an identical sequence of inputs, replicated computer systems will produce consistent outputs. Unfortunately, this is not the case. The problem of replica non-determinism and the presentation of its possible solutions is the subject of Fault-Tolerant Real-Time Systems: The Problem of Replica Determinism. The field of automotive electronics is an important application area of fault-tolerant real-time systems. Systems like anti-lock braking, engine control, active suspension or vehicle dynamics control have demanding real-time and fault-tolerance requirements. These requirements have to be met even in the presence of very limited resources since cost is extremely important. Because of its interesting properties Fault-Tolerant Real-Time Systems gives an introduction to the application area of automotive electronics. The requirements of automotive electronics are a topic of discussion in the remainder of this work and are used as a benchmark to evaluate solutions to the problem of replica determinism.


Encyclopedia of Software Engineering Three-Volume Set (Print)

Encyclopedia of Software Engineering Three-Volume Set (Print)

Author: Phillip A. Laplante

Publisher: CRC Press

Published: 2010-11-22

Total Pages: 1872

ISBN-13: 1351249258

DOWNLOAD EBOOK

Software engineering requires specialized knowledge of a broad spectrum of topics, including the construction of software and the platforms, applications, and environments in which the software operates as well as an understanding of the people who build and use the software. Offering an authoritative perspective, the two volumes of the Encyclopedia of Software Engineering cover the entire multidisciplinary scope of this important field. More than 200 expert contributors and reviewers from industry and academia across 21 countries provide easy-to-read entries that cover software requirements, design, construction, testing, maintenance, configuration management, quality control, and software engineering management tools and methods. Editor Phillip A. Laplante uses the most universally recognized definition of the areas of relevance to software engineering, the Software Engineering Body of Knowledge (SWEBOK®), as a template for organizing the material. Also available in an electronic format, this encyclopedia supplies software engineering students, IT professionals, researchers, managers, and scholars with unrivaled coverage of the topics that encompass this ever-changing field. Also Available Online This Taylor & Francis encyclopedia is also available through online subscription, offering a variety of extra benefits for researchers, students, and librarians, including: Citation tracking and alerts Active reference linking Saved searches and marked lists HTML and PDF format options Contact Taylor and Francis for more information or to inquire about subscription options and print/online combination packages. US: (Tel) 1.888.318.2367; (E-mail) [email protected] International: (Tel) +44 (0) 20 7017 6062; (E-mail) [email protected]


Fault Tolerant Computer Architecture

Fault Tolerant Computer Architecture

Author: Daniel Sorin

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 103

ISBN-13: 3031017234

DOWNLOAD EBOOK

For many years, most computer architects have pursued one primary goal: performance. Architects have translated the ever-increasing abundance of ever-faster transistors provided by Moore's law into remarkable increases in performance. Recently, however, the bounty provided by Moore's law has been accompanied by several challenges that have arisen as devices have become smaller, including a decrease in dependability due to physical faults. In this book, we focus on the dependability challenge and the fault tolerance solutions that architects are developing to overcome it. The two main purposes of this book are to explore the key ideas in fault-tolerant computer architecture and to present the current state-of-the-art - over approximately the past 10 years - in academia and industry. Table of Contents: Introduction / Error Detection / Error Recovery / Diagnosis / Self-Repair / The Future


Reliable Computer Systems

Reliable Computer Systems

Author: Daniel Siewiorek

Publisher: Digital Press

Published: 2014-06-28

Total Pages: 929

ISBN-13: 1483297438

DOWNLOAD EBOOK

Enhance your hardware/software reliability Enhancement of system reliability has been a major concern of computer users and designers ¦ and this major revision of the 1982 classic meets users' continuing need for practical information on this pressing topic. Included are case studies of reliable systems from manufacturers such as Tandem, Stratus, IBM, and Digital, as well as coverage of special systems such as the Galileo Orbiter fault protection system and AT&T telephone switching processors.