Designing Cloud Data Platforms

Designing Cloud Data Platforms

Author: Danil Zburivsky

Publisher: Simon and Schuster

Published: 2021-04-20

Total Pages: 334

ISBN-13: 1617296449

DOWNLOAD EBOOK

Centralized data warehouses, the long-time defacto standard for housing data for analytics, are rapidly giving way to multi-faceted cloud data platforms. Companies that embrace modern cloud data platforms benefit from an integrated view of their business using all of their data and can take advantage of advanced analytic practices to drive predictions and as yet unimagined data services. Designing Cloud Data Platforms is an hands-on guide to envisioning and designing a modern scalable data platform that takes full advantage of the flexibility of the cloud. As you read, you''ll learn the core components of a cloud data platform design, along with the role of key technologies like Spark and Kafka Streams. You''ll also explore setting up processes to manage cloud-based data, keep it secure, and using advanced analytic and BI tools to analyse it. about the technology Access to affordable, dependable, serverless cloud services has revolutionized the way organizations can approach data management, and companies both big and small are raring to migrate to the cloud. But without a properly designed data platform, data in the cloud can remain just as siloed and inaccessible as it is today for most organizations. Designing Cloud Data Platforms lays out the principles of a well-designed platform that uses the scalable resources of the public cloud to manage all of an organization''s data, and present it as useful business insights. about the book In Designing Cloud Data Platforms, you''ll learn how to integrate data from multiple sources into a single, cloud-based, modern data platform. Drawing on their real-world experiences designing cloud data platforms for dozens of organizations, cloud data experts Danil Zburivsky and Lynda Partner take you through a six-layer approach to creating cloud data platforms that maximizes flexibility and manageability and reduces costs. Starting with foundational principles, you''ll learn how to get data into your platform from different databases, files, and APIs, the essential practices for organizing and processing that raw data, and how to best take advantage of the services offered by major cloud vendors. As you progress past the basics you''ll take a deep dive into advanced topics to get the most out of your data platform, including real-time data management, machine learning analytics, schema management, and more. what''s inside The tools of different public cloud for implementing data platforms Best practices for managing structured and unstructured data sets Machine learning tools that can be used on top of the cloud Cost optimization techniques about the reader For data professionals familiar with the basics of cloud computing and distributed data processing systems like Hadoop and Spark. about the authors Danil Zburivsky has over 10 years experience designing and supporting large-scale data infrastructure for enterprises across the globe. Lynda Partner is the VP of Analytics-as-a-Service at Pythian, and has been on the business side of data for over 20 years.


Data Pipelines with Apache Airflow

Data Pipelines with Apache Airflow

Author: Bas P. Harenslak

Publisher: Simon and Schuster

Published: 2021-04-27

Total Pages: 478

ISBN-13: 1617296902

DOWNLOAD EBOOK

This book teaches you how to build and maintain effective data pipelines. Youll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. --


Architecting Modern Data Platforms

Architecting Modern Data Platforms

Author: Jan Kunigk

Publisher: "O'Reilly Media, Inc."

Published: 2018-12-05

Total Pages: 636

ISBN-13: 1491969229

DOWNLOAD EBOOK

There’s a lot of information about big data technologies, but splicing these technologies into an end-to-end enterprise data platform is a daunting task not widely covered. With this practical book, you’ll learn how to build big data infrastructure both on-premises and in the cloud and successfully architect a modern data platform. Ideal for enterprise architects, IT managers, application architects, and data engineers, this book shows you how to overcome the many challenges that emerge during Hadoop projects. You’ll explore the vast landscape of tools available in the Hadoop and big data realm in a thorough technical primer before diving into: Infrastructure: Look at all component layers in a modern data platform, from the server to the data center, to establish a solid foundation for data in your enterprise Platform: Understand aspects of deployment, operation, security, high availability, and disaster recovery, along with everything you need to know to integrate your platform with the rest of your enterprise IT Taking Hadoop to the cloud: Learn the important architectural aspects of running a big data platform in the cloud while maintaining enterprise security and high availability


Building Cloud Data Platforms Solutions

Building Cloud Data Platforms Solutions

Author: Anouar BEN ZAHRA

Publisher: Anouar BEN ZAHRA

Published:

Total Pages: 339

ISBN-13:

DOWNLOAD EBOOK

"Building Cloud Data Platforms Solutions: An End-to-End Guide for Designing, Implementing, and Managing Robust Data Solutions in the Cloud" comprehensively covers a wide range of topics related to building data platforms in the cloud. This book provides a deep exploration of the essential concepts, strategies, and best practices involved in designing, implementing, and managing end-to-end data solutions. The book begins by introducing the fundamental principles and benefits of cloud computing, with a specific focus on its impact on data management and analytics. It covers various cloud services and architectures, enabling readers to understand the foundation upon which cloud data platforms are built. Next, the book dives into key considerations for building cloud data solutions, aligning business needs with cloud data strategies, and ensuring scalability, security, and compliance. It explores the process of data ingestion, discussing various techniques for acquiring and ingesting data from different sources into the cloud platform. The book then delves into data storage and management in the cloud. It covers different storage options, such as data lakes and data warehouses, and discusses strategies for organizing and optimizing data storage to facilitate efficient data processing and analytics. It also addresses data governance, data quality, and data integration techniques to ensure data integrity and consistency across the platform. A significant portion of the book is dedicated to data processing and analytics in the cloud. It explores modern data processing frameworks and technologies, such as Apache Spark and serverless computing, and provides practical guidance on implementing scalable and efficient data processing pipelines. The book also covers advanced analytics techniques, including machine learning and AI, and demonstrates how these can be integrated into the data platform to unlock valuable insights. Furthermore, the book addresses an aspects of data platform monitoring, security, and performance optimization. It explores techniques for monitoring data pipelines, ensuring data security, and optimizing performance to meet the demands of real-time data processing and analytics. Throughout the book, real-world examples, case studies, and best practices are provided to illustrate the concepts discussed. This helps readers apply the knowledge gained to their own data platform projects.


Data Mesh

Data Mesh

Author: Zhamak Dehghani

Publisher: "O'Reilly Media, Inc."

Published: 2022-03-08

Total Pages: 387

ISBN-13: 1492092363

DOWNLOAD EBOOK

Many enterprises are investing in a next-generation data lake, hoping to democratize data at scale to provide business insights and ultimately make automated intelligent decisions. In this practical book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today's organizations. A distributed data mesh is a better choice. Dehghani guides architects, technical leaders, and decision makers on their journey from monolithic big data architecture to a sociotechnical paradigm that draws from modern distributed architecture. A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance. This book shows you why and how. Examine the current data landscape from the perspective of business and organizational needs, environmental challenges, and existing architectures Analyze the landscape's underlying characteristics and failure modes Get a complete introduction to data mesh principles and its constituents Learn how to design a data mesh architecture Move beyond a monolithic data lake to a distributed data mesh.


Cloud Data Management

Cloud Data Management

Author: Liang Zhao

Publisher: Springer

Published: 2014-07-08

Total Pages: 216

ISBN-13: 3319047655

DOWNLOAD EBOOK

In practice, the design and architecture of a cloud varies among cloud providers. We present a generic evaluation framework for the performance, availability and reliability characteristics of various cloud platforms. We describe a generic benchmark architecture for cloud databases, specifically NoSQL database as a service. It measures the performance of replication delay and monetary cost. Service Level Agreements (SLA) represent the contract which captures the agreed upon guarantees between a service provider and its customers. The specifications of existing service level agreements (SLA) for cloud services are not designed to flexibly handle even relatively straightforward performance and technical requirements of consumer applications. We present a novel approach for SLA-based management of cloud-hosted databases from the consumer perspective and an end-to-end framework for consumer-centric SLA management of cloud-hosted databases. The framework facilitates adaptive and dynamic provisioning of the database tier of the software applications based on application-defined policies for satisfying their own SLA performance requirements, avoiding the cost of any SLA violation and controlling the monetary cost of the allocated computing resources. In this framework, the SLA of the consumer applications are declaratively defined in terms of goals which are subjected to a number of constraints that are specific to the application requirements. The framework continuously monitors the application-defined SLA and automatically triggers the execution of necessary corrective actions (scaling out/in the database tier) when required. The framework is database platform-agnostic, uses virtualization-based database replication mechanisms and requires zero source code changes of the cloud-hosted software applications.


Big Data Platforms and Applications

Big Data Platforms and Applications

Author: Florin Pop

Publisher: Springer Nature

Published: 2021-09-28

Total Pages: 300

ISBN-13: 3030388360

DOWNLOAD EBOOK

This book provides a review of advanced topics relating to the theory, research, analysis and implementation in the context of big data platforms and their applications, with a focus on methods, techniques, and performance evaluation. The explosive growth in the volume, speed, and variety of data being produced every day requires a continuous increase in the processing speeds of servers and of entire network infrastructures, as well as new resource management models. This poses significant challenges (and provides striking development opportunities) for data intensive and high-performance computing, i.e., how to efficiently turn extremely large datasets into valuable information and meaningful knowledge. The task of context data management is further complicated by the variety of sources such data derives from, resulting in different data formats, with varying storage, transformation, delivery, and archiving requirements. At the same time rapid responses are needed for real-time applications. With the emergence of cloud infrastructures, achieving highly scalable data management in such contexts is a critical problem, as the overall application performance is highly dependent on the properties of the data management service.


Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform

Author: Adi Wijaya

Publisher: Packt Publishing Ltd

Published: 2022-03-31

Total Pages: 440

ISBN-13: 1800565062

DOWNLOAD EBOOK

Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineer Key Features Understand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solution Learn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelines Discover tips to prepare for and pass the Professional Data Engineer exam Book DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learn Load data into BigQuery and materialize its output for downstream consumption Build data pipeline orchestration using Cloud Composer Develop Airflow jobs to orchestrate and automate a data warehouse Build a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc cluster Leverage Pub/Sub for messaging and ingestion for event-driven systems Use Dataflow to perform ETL on streaming data Unlock the power of your data with Data Studio Calculate the GCP cost estimation for your end-to-end data solutions Who this book is for This book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.


Google Cloud Platform for Architects

Google Cloud Platform for Architects

Author: Vitthal Srinivasan

Publisher: Packt Publishing Ltd

Published: 2018-06-26

Total Pages: 355

ISBN-13: 1788833074

DOWNLOAD EBOOK

Get acquainted with GCP and manage robust, highly available, and dynamic solutions to drive business objective Key Features Identify the strengths, weaknesses and ideal use-cases for individual services offered on the Google Cloud Platform Make intelligent choices about which cloud technology works best for your use-case Leverage Google Cloud Platform to analyze and optimize technical and business processes Book Description Using a public cloud platform was considered risky a decade ago, and unconventional even just a few years ago. Today, however, use of the public cloud is completely mainstream - the norm, rather than the exception. Several leading technology firms, including Google, have built sophisticated cloud platforms, and are locked in a fierce competition for market share. The main goal of this book is to enable you to get the best out of the GCP, and to use it with confidence and competence. You will learn why cloud architectures take the forms that they do, and this will help you become a skilled high-level cloud architect. You will also learn how individual cloud services are configured and used, so that you are never intimidated at having to build it yourself. You will also learn the right way and the right situation in which to use the important GCP services. By the end of this book, you will be able to make the most out of Google Cloud Platform design. What you will learn Set up GCP account and utilize GCP services using the cloud shell, web console, and client APIs Harness the power of App Engine, Compute Engine, Containers on the Kubernetes Engine, and Cloud Functions Pick the right managed service for your data needs, choosing intelligently between Datastore, BigTable, and BigQuery Migrate existing Hadoop, Spark, and Pig workloads with minimal disruption to your existing data infrastructure, by using Dataproc intelligently Derive insights about the health, performance, and availability of cloud-powered applications with the help of monitoring, logging, and diagnostic tools in Stackdriver Who this book is for If you are a Cloud architect who is responsible to design and manage robust cloud solutions with Google Cloud Platform, then this book is for you. System engineers and Enterprise architects will also find this book useful. A basic understanding of distributed applications would be helpful, although not strictly necessary. Some working experience on other public cloud platforms would help too.


Data Science on the Google Cloud Platform

Data Science on the Google Cloud Platform

Author: Valliappa Lakshmanan

Publisher: "O'Reilly Media, Inc."

Published: 2017-12-12

Total Pages: 403

ISBN-13: 1491974532

DOWNLOAD EBOOK

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science. You’ll learn how to: Automate and schedule data ingest, using an App Engine application Create and populate a dashboard in Google Data Studio Build a real-time analysis pipeline to carry out streaming analytics Conduct interactive data exploration with Google BigQuery Create a Bayesian model on a Cloud Dataproc cluster Build a logistic regression machine-learning model with Spark Compute time-aggregate features with a Cloud Dataflow pipeline Create a high-performing prediction model with TensorFlow Use your deployed model as a microservice you can access from both batch and real-time pipelines