This book describes cloud computing as a service that is "highly scalable" and operates in "a resilient environment". The authors emphasize architectural layers and models - but also business and security factors.
"Carefully distinguishing between big data and open data, and exploring various data infrastructures, Kitchin vividly illustrates how the data landscape is rapidly changing and calls for a revolution in how we think about data." - Evelyn Ruppert, Goldsmiths, University of London "Deconstructs the hype around the ‘data revolution’ to carefully guide us through the histories and the futures of ‘big data.’ The book skilfully engages with debates from across the humanities, social sciences, and sciences in order to produce a critical account of how data are enmeshed into enormous social, economic, and political changes that are taking place." - Mark Graham, University of Oxford Traditionally, data has been a scarce commodity which, given its value, has been either jealously guarded or expensively traded. In recent years, technological developments and political lobbying have turned this position on its head. Data now flow as a deep and wide torrent, are low in cost and supported by robust infrastructures, and are increasingly open and accessible. A data revolution is underway, one that is already reshaping how knowledge is produced, business conducted, and governance enacted, as well as raising many questions concerning surveillance, privacy, security, profiling, social sorting, and intellectual property rights. In contrast to the hype and hubris of much media and business coverage, The Data Revolution provides a synoptic and critical analysis of the emerging data landscape. Accessible in style, the book provides: A synoptic overview of big data, open data and data infrastructures An introduction to thinking conceptually about data, data infrastructures, data analytics and data markets Acritical discussion of the technical shortcomings and the social, political and ethical consequences of the data revolution An analysis of the implications of the data revolution to academic, business and government practices
Every enterprise application creates data, whether it’s log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you’re an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds. Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you’ll learn Kafka’s design principles, reliability guarantees, key APIs, and architecture details, including the replication protocol, the controller, and the storage layer. Understand publish-subscribe messaging and how it fits in the big data ecosystem. Explore Kafka producers and consumers for writing and reading messages Understand Kafka patterns and use-case requirements to ensure reliable data delivery Get best practices for building data pipelines and applications with Kafka Manage Kafka in production, and learn to perform monitoring, tuning, and maintenance tasks Learn the most critical metrics among Kafka’s operational measurements Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems
This handbook brings together a variety of approaches to the uses of big data in multiple fields, primarily science, medicine, and business. This single resource features contributions from researchers around the world from a variety of fields, where they share their findings and experience. This book is intended to help spur further innovation in big data. The research is presented in a way that allows readers, regardless of their field of study, to learn from how applications have proven successful and how similar applications could be used in their own field. Contributions stem from researchers in fields such as physics, biology, energy, healthcare, and business. The contributors also discuss important topics such as fraud detection, privacy implications, legal perspectives, and ethical handling of big data.
Many enterprises are investing in a next-generation data lake, hoping to democratize data at scale to provide business insights and ultimately make automated intelligent decisions. In this practical book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today's organizations. A distributed data mesh is a better choice. Dehghani guides architects, technical leaders, and decision makers on their journey from monolithic big data architecture to a sociotechnical paradigm that draws from modern distributed architecture. A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance. This book shows you why and how. Examine the current data landscape from the perspective of business and organizational needs, environmental challenges, and existing architectures Analyze the landscape's underlying characteristics and failure modes Get a complete introduction to data mesh principles and its constituents Learn how to design a data mesh architecture Move beyond a monolithic data lake to a distributed data mesh.
Use ACI fabrics to drive unprecedented value from your data center environment With the Cisco Application Centric Infrastructure (ACI) software-defined networking platform, you can achieve dramatic improvements in data center performance, redundancy, security, visibility, efficiency, and agility. In Deploying ACI, three leading Cisco experts introduce this breakthrough platform, and walk network professionals through all facets of design, deployment, and operation. The authors demonstrate how ACI changes data center networking, security, and management; and offer multiple field-proven configurations. Deploying ACI is organized to follow the key decision points associated with implementing data center network fabrics. After a practical introduction to ACI concepts and design, the authors show how to bring your fabric online, integrate virtualization and external connections, and efficiently manage your ACI network. You’ll master new techniques for improving visibility, control, and availability; managing multitenancy; and seamlessly inserting service devices into application data flows. The authors conclude with expert advice for troubleshooting and automation, helping you deliver data center services with unprecedented efficiency. Understand the problems ACI solves,and how it solves them Design your ACI fabric, build it, and interface with devices to bring it to life Integrate virtualization technologieswith your ACI fabric Perform networking within an ACI fabric (and understand how ACI changes data center networking) Connect external networks and devices at Layer 2/Layer 3 levels Coherently manage unified ACI networks with tenants and application policies Migrate to granular policies based on applications and their functions Establish multitenancy, and evolve networking, security, and services to support it Integrate L4–7 services: device types, design scenarios, and implementation Use multisite designs to meet rigorous requirements for redundancy and business continuity Troubleshoot and monitor ACI fabrics Improve operational efficiency through automation and programmability
Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently. Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.
A significant new way of understanding contemporary capitalism is to understand the intensification and spread of data analytics. This text is about the powerful promises and visions that have led to the expansion of data analytics and data-led forms of social ordering. It is centrally concerned with examining the types of knowledge associated with data analytics and shows that how these analytics are envisioned is central to the emergence and prominence of data at various scales of social life. This text aims to understand the powerful role of the data analytics industry and how this industry facilitates the spread and intensification of data-led processes. As such, The Data Gaze is concerned with understanding how data-led, data-driven and data-reliant forms of capitalism pervade organisational and everyday life. Using a clear theoretical approach derived from Foucault and critical data studies, the text develops the concept of the data gaze and shows how powerful and persuasive it is. It’s an essential and subversive guide to data analytics and data capitalism.
When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals. Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed. This book examines: Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for each Storage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead Log Distributed systems: Learn step-by-step how nodes and processes connect and build complex communication patterns Database clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency
This Handbook intends to inform Data Providers and researchers on how to provide privacy-protected access to, handle, and analyze administrative data, and to link them with existing resources, such as a database of data use agreements (DUA) and templates. Available publicly, the Handbook will provide guidance on data access requirements and procedures, data privacy, data security, property rights, regulations for public data use, data architecture, data use and storage, cost structure and recovery, ethics and privacy-protection, making data accessible for research, and dissemination for restricted access use. The knowledge base will serve as a resource for all researchers looking to work with administrative data and for Data Providers looking to make such data available.