Fault-tolerant Process Restoration
Author: John A. McPherson
Publisher:
Published: 1981
Total Pages: 424
ISBN-13:
DOWNLOAD EBOOKRead and Download eBook Full
Author: John A. McPherson
Publisher:
Published: 1981
Total Pages: 424
ISBN-13:
DOWNLOAD EBOOKAuthor: Pankaj Jalote
Publisher: Prentice Hall
Published: 1994
Total Pages: 456
ISBN-13:
DOWNLOAD EBOOKFault tolerance is an approach by which reliability of a computer system can be increased beyond what can be achieved by traditional methods. Comprehensive and self-contained, this book explores the information available on software supported fault tolerance techniques, with a focus on fault tolerance in distributed systems.
Author: Thomas Herault
Publisher: Springer
Published: 2015-07-01
Total Pages: 325
ISBN-13: 3319209434
DOWNLOAD EBOOKThis timely text presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as ABFT. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Features: provides a survey of resilience methods and performance models; examines the various sources for errors and faults in large-scale systems; reviews the spectrum of techniques that can be applied to design a fault-tolerant MPI; investigates different approaches to replication; discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems.
Author: Peter A. Lee
Publisher: Springer Science & Business Media
Published: 2012-12-06
Total Pages: 326
ISBN-13: 370918990X
DOWNLOAD EBOOKThe production of a new version of any book is a daunting task, as many authors will recognise. In the field of computer science, the task is made even more daunting by the speed with which the subject and its supporting technology move forward. Since the publication of the first edition of this book in 1981 much research has been conducted, and many papers have been written, on the subject of fault tolerance. Our aim then was to present for the first time the principles of fault tolerance together with current practice to illustrate those principles. We believe that the principles have (so far) stood the test of time and are as appropriate today as they were in 1981. Much work on the practical applications of fault tolerance has been undertaken, and techniques have been developed for ever more complex situations, such as those required for distributed systems. Nevertheless, the basic principles remain the same.
Author: Robert S. Hanmer
Publisher: John Wiley & Sons
Published: 2013-07-12
Total Pages: 272
ISBN-13: 1118351541
DOWNLOAD EBOOKSoftware patterns have revolutionized the way developer’s and architects think about how software is designed, built and documented. This new title in Wiley’s prestigious Series in Software Design Patterns presents proven techniques to achieve patterns for fault tolerant software. This is a key reference for experts seeking to select a technique appropriate for a given system. Readers are guided from concepts and terminology, through common principles and methods, to advanced techniques and practices in the development of software systems. References will provide access points to the key literature, including descriptions of exemplar applications of each technique. Organized into a collection of software techniques, specific techniques can be easily found with sufficient detail to allow appropriate choices for the system being designed.
Author: Mario Dal Cin
Publisher: Springer Science & Business Media
Published: 2012-12-06
Total Pages: 436
ISBN-13: 3642769306
DOWNLOAD EBOOK5th International GI/ITG/GMA Conference, Nürnberg, September 25-27, 1991. Proceedings
Author: Mengfei Yang
Publisher: John Wiley & Sons
Published: 2017-05-01
Total Pages: 370
ISBN-13: 111910727X
DOWNLOAD EBOOKComprehensive coverage of all aspects of space application oriented fault tolerance techniques • Experienced expert author working on fault tolerance for Chinese space program for almost three decades • Initiatively provides a systematic texts for the cutting-edge fault tolerance techniques in spacecraft control computer, with emphasis on practical engineering knowledge • Presents fundamental and advanced theories and technologies in a logical and easy-to-understand manner • Beneficial to readers inside and outside the area of space applications
Author: Dimiter R. Avresky
Publisher: Springer Science & Business Media
Published: 2012-12-06
Total Pages: 396
ISBN-13: 1461554497
DOWNLOAD EBOOKThe most important use of computing in the future will be in the context of the global "digital convergence" where everything becomes digital and every thing is inter-networked. The application will be dominated by storage, search, retrieval, analysis, exchange and updating of information in a wide variety of forms. Heavy demands will be placed on systems by many simultaneous re quests. And, fundamentally, all this shall be delivered at much higher levels of dependability, integrity and security. Increasingly, large parallel computing systems and networks are providing unique challenges to industry and academia in dependable computing, espe cially because of the higher failure rates intrinsic to these systems. The chal lenge in the last part of this decade is to build a systems that is both inexpensive and highly available. A machine cluster built of commodity hardware parts, with each node run ning an OS instance and a set of applications extended to be fault resilient can satisfy the new stringent high-availability requirements. The focus of this book is to present recent techniques and methods for im plementing fault-tolerant parallel and distributed computing systems. Section I, Fault-Tolerant Protocols, considers basic techniques for achieving fault-tolerance in communication protocols for distributed systems, including synchronous and asynchronous group communication, static total causal order ing protocols, and fail-aware datagram service that supports communications by time.
Author: David Powell
Publisher: Springer Science & Business Media
Published: 2013-04-17
Total Pages: 249
ISBN-13: 1475733534
DOWNLOAD EBOOKThe design of computer systems to be embedded in critical real-time applications is a complex task. Such systems must not only guarantee to meet hard real-time deadlines imposed by their physical environment, they must guarantee to do so dependably, despite both physical faults (in hardware) and design faults (in hardware or software). A fault-tolerance approach is mandatory for these guarantees to be commensurate with the safety and reliability requirements of many life- and mission-critical applications. This book explains the motivations and the results of a collaborative project', whose objective was to significantly decrease the lifecycle costs of such fault tolerant systems. The end-user companies participating in this project already deploy fault-tolerant systems in critical railway, space and nuclear-propulsion applications. However, these are proprietary systems whose architectures have been tailored to meet domain-specific requirements. This has led to very costly, inflexible, and often hardware-intensive solutions that, by the time they are developed, validated and certified for use in the field, can already be out-of-date in terms of their underlying hardware and software technology.
Author: Jan Vytopil
Publisher: Springer Science & Business Media
Published: 2012-12-06
Total Pages: 213
ISBN-13: 1461532205
DOWNLOAD EBOOKFormal Techniques in Real-Time and Fault-Tolerant Systems focuses on the state of the art in formal specification, development and verification of fault-tolerant computing systems. The term `fault-tolerance' refers to a system having properties which enable it to deliver its specified function despite (certain) faults of its subsystem. Fault-tolerance is achieved by adding extra hardware and/or software which corrects the effects of faults. In this sense, a system can be called fault-tolerant if it can be proved that the resulting (extended) system under some model of reliability meets the reliability requirements. The main theme of Formal Techniques in Real-Time and Fault-Tolerant Systems can be formulated as follows: how do the specification, development and verification of conventional and fault-tolerant systems differ? How do the notations, methodology and tools used in design and development of fault-tolerant and conventional systems differ? Formal Techniques in Real-Time and Fault-Tolerant Systems is divided into two parts. The chapters in Part One set the stage for what follows by defining the basic notions and practices of the field of design and specification of fault-tolerant systems. The chapters in Part Two represent the `how-to' section, containing examples of the use of formal methods in specification and development of fault-tolerant systems. The book serves as an excellent reference for researchers in both academia and industry, and may be used as a text for advanced courses on the subject.