Computer systems have become an important element of the world economy, with billions of dollars spent each year on development, manufacture, operation, and maintenance. Combining coverage of computer system reliability, safety, usability, and other related topics into a single volume, Computer System Reliability: Safety and Usability eliminates th
With computers becoming embedded as controllers in everything from network servers to the routing of subway schedules to NASA missions, there is a critical need to ensure that systems continue to function even when a component fails. In this book, bestselling author Martin Shooman draws on his expertise in reliability engineering and software engineering to provide a complete and authoritative look at fault tolerant computing. He clearly explains all fundamentals, including how to use redundant elements in system design to ensure the reliability of computer systems and networks. Market: Systems and Networking Engineers, Computer Programmers, IT Professionals.
Computing systems are of growing importance because of their wide use in many areas including those in safety-critical systems. This book describes the basic models and approaches to the reliability analysis of such systems. An extensive review is provided and models are categorized into different types. Some Markov models are extended to the analysis of some specific computing systems such as combined software and hardware, imperfect debugging processes, failure correlation, multi-state systems, heterogeneous subsystems, etc. One of the aims of the presentation is that based on the sound analysis and simplicity of the approaches, the use of Markov models can be better implemented in the computing system reliability.
Computer software reliability has never been so important. Computers are used in areas as diverse as air traffic control, nuclear reactors, real-time military, industrial process control, security system control, biometric scan-systems, automotive, mechanical and safety control, and hospital patient monitoring systems. Many of these applications require critical functionality as software applications increase in size and complexity. This book is an introduction to software reliability engineering and a survey of the state-of-the-art techniques, methodologies and tools used to assess the reliability of software and combined software-hardware systems. Current research results are reported and future directions are signposted. This text will interest: graduate students as a course textbook introducing reliability engineering software; reliability engineers as a broad, up-to-date survey of the field; and researchers and lecturers in universities and research institutions as a one-volume reference.
Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package provides a variety of probabilistic, discrete-state models used to assess the reliability and performance of computer and communication systems. The models included are combinatorial reliability models (reliability block diagrams, fault trees and reliability graphs), directed, acyclic task precedence graphs, Markov and semi-Markov models (including Markov reward models), product-form queueing networks and generalized stochastic Petri nets. A practical approach to system modeling is followed; all of the examples described are solved and analyzed using the SHARPE tool. In structuring the book, the authors have been careful to provide the reader with a methodological approach to analytical modeling techniques. These techniques are not seen as alternatives but rather as an integral part of a single process of assessment which, by hierarchically combining results from different kinds of models, makes it possible to use state-space methods for those parts of a system that require them and non-state-space methods for the more well-behaved parts of the system. The SHARPE (Symbolic Hierarchical Automated Reliability and Performance Evaluator) package is the `toolchest' that allows the authors to specify stochastic models easily and solve them quickly, adopting model hierarchies and very efficient solution techniques. All the models described in the book are specified and solved using the SHARPE language; its syntax is described and the source code of almost all the examples discussed is provided. Audience: Suitable for use in advanced level courses covering reliability and performance of computer and communications systems and by researchers and practicing engineers whose work involves modeling of system performance and reliability.
This book presents current methods for dealing with software reliability, illustrating the advantages and disadvantages of each method. The description of the techniques is intended for a non-expert audience with some minimal technical background. It also describes some advanced techniques, aimed at researchers and practitioners in software engineering. This reference will serve as an introduction to formal methods and techniques and will be a source for learning about various ways to enhance software reliability. Various projects and exercises give readers hands-on experience with the various formal methods and tools.
A holistic approach to service reliability and availability of cloud computing Reliability and Availability of Cloud Computing provides IS/IT system and solution architects, developers, and engineers with the knowledge needed to assess the impact of virtualization and cloud computing on service reliability and availability. It reveals how to select the most appropriate design for reliability diligence to assure that user expectations are met. Organized in three parts (basics, risk analysis, and recommendations), this resource is accessible to readers of diverse backgrounds and experience levels. Numerous examples and more than 100 figures throughout the book help readers visualize problems to better understand the topic—and the authors present risks and options in bulleted lists that can be applied directly to specific applications/problems. Special features of this book include: Rigorous analysis of the reliability and availability risks that are inherent in cloud computing Simple formulas that explain the quantitative aspects of reliability and availability Enlightening discussions of the ways in which virtualized applications and cloud deployments differ from traditional system implementations and deployments Specific recommendations for developing reliable virtualized applications and cloud-based solutions Reliability and Availability of Cloud Computing is the guide for IS/IT staff in business, government, academia, and non-governmental organizations who are moving their applications to the cloud. It is also an important reference for professionals in technical sales, product management, and quality management, as well as software and quality engineers looking to broaden their expertise.
Recent Advances in System Reliability Engineering describes and evaluates the latest tools, techniques, strategies, and methods in this topic for a variety of applications. Special emphasis is put on simulation and modelling technology which is growing in influence in industry, and presents challenges as well as opportunities to reliability and systems engineers. Several manufacturing engineering applications are addressed, making this a particularly valuable reference for readers in that sector. - Contains comprehensive discussions on state-of-the-art tools, techniques, and strategies from industry - Connects the latest academic research to applications in industry including system reliability, safety assessment, and preventive maintenance - Gives an in-depth analysis of the benefits and applications of modelling and simulation to reliability
Computing systems are of growing importance because of their wide use in many areas including those in safety-critical systems. This book describes the basic models and approaches to the reliability analysis of such systems. An extensive review is provided and models are categorized into different types. Some Markov models are extended to the analysis of some specific computing systems such as combined software and hardware, imperfect debugging processes, failure correlation, multi-state systems, heterogeneous subsystems, etc. One of the aims of the presentation is that based on the sound analysis and simplicity of the approaches, the use of Markov models can be better implemented in the computing system reliability.
This book provides the latest research advances in the field of system reliability assurance and engineering. It contains reference material for applications of reliability in system engineering, offering a theoretical sound background with adequate numerical illustrations. Included are concepts pertaining to reliability analysis, assurance techniques and methodologies, tools, and practical applications of system reliability modeling and allocation. The collection discusses various soft computing techniques like artificial intelligence and particle swarm optimization approach for reliability assessment. Importance of differentiating between the optimal release time and testing stop time of the software has been explicitly discussed and presented in the book. Features: Creates understanding of the costs associated with complex systems Covers reliability measurement of engineering systems Incorporates an efficient effort-based expenditure policy incorporating cost and reliability criteria Provides information for optimal testing stop and release time of software system Presents software performance and security layout Addresses reliability prediction and its maintenance through advanced analytics techniques Overall, System Reliability Management: Solutions and Techniques is a collaborative and interdisciplinary approach for better communication of problems and solutions to increase the performance of the system for better utilization and resource management.