Deals constructively with recognized software problems. Focuses on the unreliability of computer programs and offers state-of-the-art solutions. Covers—software development, software testing, structured programming, composite design, language design, proofs of program correctness, and mathematical reliability models. Written in an informal style for anyone whose work is affected by the unreliability of software. Examples illustrate key ideas, over 180 references.
The overwhelming majority of a software system’s lifespan is spent in use, not in design or implementation. So, why does conventional wisdom insist that software engineers focus primarily on the design and development of large-scale computing systems? In this collection of essays and articles, key members of Google’s Site Reliability Team explain how and why their commitment to the entire lifecycle has enabled the company to successfully build, deploy, monitor, and maintain some of the largest software systems in the world. You’ll learn the principles and practices that enable Google engineers to make systems more scalable, reliable, and efficient—lessons directly applicable to your organization. This book is divided into four sections: Introduction—Learn what site reliability engineering is and why it differs from conventional IT industry practices Principles—Examine the patterns, behaviors, and areas of concern that influence the work of a site reliability engineer (SRE) Practices—Understand the theory and practice of an SRE’s day-to-day work: building and operating large distributed computing systems Management—Explore Google's best practices for training, communication, and meetings that your organization can use
Computer systems, whether hardware or software, are subject to failure. Precisely, what is a failure? It is defined as: The inability of a system or system component to perform a required function within specified limits. Afailure may be produced when a fault is encountered and a loss of the expected service to the user results [IEEE/AIAA P1633]. This brings us to the question of what is a fault? A fault is defect in the hardware or computer code that can be the cause of one or more failures. Software-based systems have become the dominant player in the computer systems world. Since it is imperative that computer systems operate reliably, considering the criticality of software, particularly in safety critical systems, the IEEE and AIAA commissioned the development of the Recommended Practice on Software Reliability. This tutorial serves as a companion document with the purpose of elaborating on key software reliability process practices in more detail than can be specified in the Recommended Practice. However, since other subjects like maintainability and availability are also covered, the tutorial can be used as a stand-alone document. While the focus of the Recommended Practice is software reliability, software and hardware do not operate in a vacuum. Therefore, both software and hardware are addressed in this tutorial in an integrated fashion. The narrative of the tutorial is augmented with illustrative solved problems. The recommended practice [IEEE P1633] is a composite of models and tools and describes the "what and how" of software reliability engineering. It is important for an organization to have a disciplined process if it is to produce high reliability software. This process uses a life cycle approach to software reliability that takes into account the risk to reliability due to requirements changes. A requirements change may induce ambiguity and uncertainty in the development process that cause errors in implementing the changes. Subsequently, these errors may propagate through later phases of development and maintenance. In view of the life cycle ramifications of the software reliability process, maintenance is included in this tutorial. Furthermore, because reliability and maintainability determine availability, the latter is also included.
This book presents current methods for dealing with software reliability, illustrating the advantages and disadvantages of each method. The description of the techniques is intended for a non-expert audience with some minimal technical background. It also describes some advanced techniques, aimed at researchers and practitioners in software engineering. This reference will serve as an introduction to formal methods and techniques and will be a source for learning about various ways to enhance software reliability. Various projects and exercises give readers hands-on experience with the various formal methods and tools.
Authoritative resource providing step-by-step guidance for producing reliable software to be tailored for specific projects Software Reliability Techniques for Real-World Applications is a practical, up to date, go-to source that can be referenced repeatedly to efficiently prevent software defects, find and correct defects if they occur, and create a higher level of confidence in software products. From content development to software support and maintenance, the author creates a depiction of each phase in a project such as design and coding, operation and maintenance, management, product production, and concept development and describes the activities and products needed for each. Software Reliability Techniques for Real-World Applications introduces clear ways to understand each process of software reliability and explains how it can be managed effectively and reliably. The book is supported by a plethora of detailed examples and systematic approaches, covering analogies between hardware and software reliability to ensure a clear understanding. Overall, this book helps readers create a higher level of confidence in software products. In Software Reliability Techniques for Real-World Applications, readers will find specific information on: Defects, including where defects enter the project system, effects, detection, and causes of defects, and how to handle defects Project phases, including concept development and planning, requirements and interfaces, design and coding, and integration, verification, and validation Roadmap and practical guidelines, including at the start of a project, as a member of an organization, and how to handle troubled projects Techniques, including an introduction to techniques in general, plus techniques by organization (systems engineering, software, and reliability engineering) Software Reliability Techniques for Real-World Applications is a practical text on software reliability, providing over sixty-five different techniques and step-by-step guidance for producing reliable software. It is an essential and complete resource on the subject for software developers, software maintainers, and producers of software.
Revised and updated for professional software engineers, systems analysts and project managers, this highly acclaimed book provides key concepts of software reliability and practical solutions for measuring reliability.
There are many books on computers, networks, and software engineering but none that integrate the three with applications. Integration is important because, increasingly, software dominates the performance, reliability, maintainability, and availability of complex computer and systems. Books on software engineering typically portray software as if it exists in a vacuum with no relationship to the wider system. This is wrong because a system is more than software. It is comprised of people, organizations, processes, hardware, and software. All of these components must be considered in an integrative fashion when designing systems. On the other hand, books on computers and networks do not demonstrate a deep understanding of the intricacies of developing software. In this book you will learn, for example, how to quantitatively analyze the performance, reliability, maintainability, and availability of computers, networks, and software in relation to the total system. Furthermore, you will learn how to evaluate and mitigate the risk of deploying integrated systems. You will learn how to apply many models dealing with the optimization of systems. Numerous quantitative examples are provided to help you understand and interpret model results. This book can be used as a first year graduate course in computer, network, and software engineering; as an on-the-job reference for computer, network, and software engineers; and as a reference for these disciplines.
A one-stop reference guide to design for safety principles and applications Design for Safety (DfSa) provides design engineers and engineering managers with a range of tools and techniques for incorporating safety into the design process for complex systems. It explains how to design for maximum safe conditions and minimum risk of accidents. The book covers safety design practices, which will result in improved safety, fewer accidents, and substantial savings in life cycle costs for producers and users. Readers who apply DfSa principles can expect to have a dramatic improvement in the ability to compete in global markets. They will also find a wealth of design practices not covered in typical engineering books—allowing them to think outside the box when developing safety requirements. Design Safety is already a high demand field due to its importance to system design and will be even more vital for engineers in multiple design disciplines as more systems become increasingly complex and liabilities increase. Therefore, risk mitigation methods to design systems with safety features are becoming more important. Designing systems for safety has been a high priority for many safety-critical systems—especially in the aerospace and military industries. However, with the expansion of technological innovations into other market places, industries that had not previously considered safety design requirements are now using the technology in applications. Design for Safety: Covers trending topics and the latest technologies Provides ten paradigms for managing and designing systems for safety and uses them as guiding themes throughout the book Logically defines the parameters and concepts, sets the safety program and requirements, covers basic methodologies, investigates lessons from history, and addresses specialty topics within the topic of Design for Safety (DfSa) Supplements other books in the series on Quality and Reliability Engineering Design for Safety is an ideal book for new and experienced engineers and managers who are involved with design, testing, and maintenance of safety critical applications. It is also helpful for advanced undergraduate and postgraduate students in engineering. Design for Safety is the second in a series of “Design for” books. Design for Reliability was the first in the series with more planned for the future.
Software engineering requires specialized knowledge of a broad spectrum of topics, including the construction of software and the platforms, applications, and environments in which the software operates as well as an understanding of the people who build and use the software. Offering an authoritative perspective, the two volumes of the Encyclopedia of Software Engineering cover the entire multidisciplinary scope of this important field. More than 200 expert contributors and reviewers from industry and academia across 21 countries provide easy-to-read entries that cover software requirements, design, construction, testing, maintenance, configuration management, quality control, and software engineering management tools and methods. Editor Phillip A. Laplante uses the most universally recognized definition of the areas of relevance to software engineering, the Software Engineering Body of Knowledge (SWEBOK®), as a template for organizing the material. Also available in an electronic format, this encyclopedia supplies software engineering students, IT professionals, researchers, managers, and scholars with unrivaled coverage of the topics that encompass this ever-changing field. Also Available Online This Taylor & Francis encyclopedia is also available through online subscription, offering a variety of extra benefits for researchers, students, and librarians, including: Citation tracking and alerts Active reference linking Saved searches and marked lists HTML and PDF format options Contact Taylor and Francis for more information or to inquire about subscription options and print/online combination packages. US: (Tel) 1.888.318.2367; (E-mail) [email protected] International: (Tel) +44 (0) 20 7017 6062; (E-mail) [email protected]
With computers becoming embedded as controllers in everything from network servers to the routing of subway schedules to NASA missions, there is a critical need to ensure that systems continue to function even when a component fails. In this book, bestselling author Martin Shooman draws on his expertise in reliability engineering and software engineering to provide a complete and authoritative look at fault tolerant computing. He clearly explains all fundamentals, including how to use redundant elements in system design to ensure the reliability of computer systems and networks. Market: Systems and Networking Engineers, Computer Programmers, IT Professionals.