BigQuery for Data Warehousing

BigQuery for Data Warehousing

Author: Mark Mucchetti

Publisher: Apress

Published: 2020-12-20

Total Pages: 400

ISBN-13: 9781484261859

DOWNLOAD EBOOK

Create a data warehouse, complete with reporting and dashboards using Google’s BigQuery technology. This book takes you from the basic concepts of data warehousing through the design, build, load, and maintenance phases. You will build capabilities to capture data from the operational environment, and then mine and analyze that data for insight into making your business more successful. You will gain practical knowledge about how to use BigQuery to solve data challenges in your organization. BigQuery is a managed cloud platform from Google that provides enterprise data warehousing and reporting capabilities. Part I of this book shows you how to design and provision a data warehouse in the BigQuery platform. Part II teaches you how to load and stream your operational data into the warehouse to make it ready for analysis and reporting. Parts III and IV cover querying and maintaining, helping you keep your information relevant with other Google Cloud Platform services and advanced BigQuery. Part V takes reporting to the next level by showing you how to create dashboards to provide at-a-glance visual representations of your business situation. Part VI provides an introduction to data science with BigQuery, covering machine learning and Jupyter notebooks. What You Will Learn Design a data warehouse for your project or organization Load data from a variety of external and internal sources Integrate other Google Cloud Platform services for more complex workflows Maintain and scale your data warehouse as your organization grows Analyze, report, and create dashboards on the information in the warehouse Become familiar with machine learning techniques using BigQuery ML Who This Book Is For Developers who want to provide business users with fast, reliable, and insightful analysis from operational data, and data analysts interested in a cloud-based solution that avoids the pain of provisioning their own servers.


Google BigQuery: The Definitive Guide

Google BigQuery: The Definitive Guide

Author: Valliappa Lakshmanan

Publisher: O'Reilly Media

Published: 2019-10-23

Total Pages: 522

ISBN-13: 1492044431

DOWNLOAD EBOOK

Work with petabyte-scale datasets while building a collaborative, agile workplace in the process. This practical book is the canonical reference to Google BigQuery, the query engine that lets you conduct interactive analysis of large datasets. BigQuery enables enterprises to efficiently store, query, ingest, and learn from their data in a convenient framework. With this book, you’ll examine how to analyze data at scale to derive insights from large datasets efficiently. Valliappa Lakshmanan, tech lead for Google Cloud Platform, and Jordan Tigani, engineering director for the BigQuery team, provide best practices for modern data warehousing within an autoscaled, serverless public cloud. Whether you want to explore parts of BigQuery you’re not familiar with or prefer to focus on specific tasks, this reference is indispensable.


Learning Google BigQuery

Learning Google BigQuery

Author: Eric Brown

Publisher: Packt Publishing Ltd

Published: 2017-12-22

Total Pages: 255

ISBN-13: 1787286290

DOWNLOAD EBOOK

Get a fundamental understanding of how Google BigQuery works by analyzing and querying large datasets About This Book Get started with BigQuery API and write custom applications using it Learn how BigQuery API can be used for storing, managing, and query massive datasets with ease A practical guide with examples and use-cases to teach you everything you need to know about Google BigQuery Who This Book Is For If you are a developer, data analyst, or a data scientist looking to run complex queries over thousands of records in seconds, this book will help you. No prior experience of working with BigQuery is assumed. What You Will Learn Get a hands-on introduction to Google Cloud Platform and its services Understand the different data types supported by Google BigQuery Migrate your enterprise data to BigQuery and query it using the legacy and standard SQL techniques Use partition tables in your project and query external data sources and wild card tables Create tables and data sets dynamically using the BigQuery API Perform real-time inserting of records for analytics using Python and C# Visualize your BigQuery data by connecting it to third party tools such as Tableau and R Master the Google Cloud Pub/Sub for implementing real-time reporting and analytics of your Big Data In Detail Google BigQuery is a popular cloud data warehouse for large-scale data analytics. This book will serve as a comprehensive guide to mastering BigQuery, and how you can utilize it to quickly and efficiently get useful insights from your Big Data. You will begin with getting a quick overview of the Google Cloud Platform and the various services it supports. Then, you will be introduced to the Google BigQuery API and how it fits within in the framework of GCP. The book covers useful techniques to migrate your existing data from your enterprise to Google BigQuery, as well as readying and optimizing it for analysis. You will perform basic as well as advanced data querying using BigQuery, and connect the results to various third party tools for reporting and visualization purposes such as R and Tableau. If you're looking to implement real-time reporting of your streaming data running in your enterprise, this book will also help you. This book also provides tips, best practices and mistakes to avoid while working with Google BigQuery and services that interact with it. By the time you're done with it, you will have set a solid foundation in working with BigQuery to solve even the trickiest of data problems. Style and Approach This book follows a step-by-step approach to teach readers the concepts of Google BigQuery using SQL. To explain various data querying processes, large-scale datasets are used wherever required.


Google BigQuery Analytics

Google BigQuery Analytics

Author: Jordan Tigani

Publisher: John Wiley & Sons

Published: 2014-05-21

Total Pages: 529

ISBN-13: 1118824792

DOWNLOAD EBOOK

How to effectively use BigQuery, avoid common mistakes, and execute sophisticated queries against large datasets Google BigQuery Analytics is the perfect guide for business and data analysts who want the latest tips on running complex queries and writing code to communicate with the BigQuery API. The book uses real-world examples to demonstrate current best practices and techniques, and also explains and demonstrates streaming ingestion, transformation via Hadoop in Google Compute engine, AppEngine datastore integration, and using GViz with Tableau to generate charts of query results. In addition to the mechanics of BigQuery, the book also covers the architecture of the underlying Dremel query engine, providing a thorough understanding that leads to better query results. Features a companion website that includes all code and data sets from the book Uses real-world examples to explain everything analysts need to know to effectively use BigQuery Includes web application examples coded in Python


Data Pipelines Pocket Reference

Data Pipelines Pocket Reference

Author: James Densmore

Publisher: O'Reilly Media

Published: 2021-02-10

Total Pages: 277

ISBN-13: 1492087807

DOWNLOAD EBOOK

Data pipelines are the foundation for success in data analytics. Moving data from numerous diverse sources and transforming it to provide context is the difference between having data and actually gaining value from it. This pocket reference defines data pipelines and explains how they work in today's modern data stack. You'll learn common considerations and key decision points when implementing pipelines, such as batch versus streaming data ingestion and build versus buy. This book addresses the most common decisions made by data professionals and discusses foundational concepts that apply to open source frameworks, commercial products, and homegrown solutions. You'll learn: What a data pipeline is and how it works How data is moved and processed on modern data infrastructure, including cloud platforms Common tools and products used by data engineers to build pipelines How pipelines support analytics and reporting needs Considerations for pipeline maintenance, testing, and alerting


Machine Learning with BigQuery ML

Machine Learning with BigQuery ML

Author: Alessandro Marrandino

Publisher: Packt Publishing Ltd

Published: 2021-06-11

Total Pages: 344

ISBN-13: 1800562187

DOWNLOAD EBOOK

Manage different business scenarios with the right machine learning technique using Google's highly scalable BigQuery ML Key FeaturesGain a clear understanding of AI and machine learning services on GCP, learn when to use these, and find out how to integrate them with BigQuery MLLeverage SQL syntax to train, evaluate, test, and use ML modelsDiscover how BigQuery works and understand the capabilities of BigQuery ML using examplesBook Description BigQuery ML enables you to easily build machine learning (ML) models with SQL without much coding. This book will help you to accelerate the development and deployment of ML models with BigQuery ML. The book starts with a quick overview of Google Cloud and BigQuery architecture. You'll then learn how to configure a Google Cloud project, understand the architectural components and capabilities of BigQuery, and find out how to build ML models with BigQuery ML. The book teaches you how to use ML using SQL on BigQuery. You'll analyze the key phases of a ML model's lifecycle and get to grips with the SQL statements used to train, evaluate, test, and use a model. As you advance, you'll build a series of use cases by applying different ML techniques such as linear regression, binary and multiclass logistic regression, k-means, ARIMA time series, deep neural networks, and XGBoost using practical use cases. Moving on, you'll cover matrix factorization and deep neural networks using BigQuery ML's capabilities. Finally, you'll explore the integration of BigQuery ML with other Google Cloud Platform components such as AI Platform Notebooks and TensorFlow along with discovering best practices and tips and tricks for hyperparameter tuning and performance enhancement. By the end of this BigQuery book, you'll be able to build and evaluate your own ML models with BigQuery ML. What you will learnDiscover how to prepare datasets to build an effective ML modelForecast business KPIs by leveraging various ML models and BigQuery MLBuild and train a recommendation engine to suggest the best products for your customers using BigQuery MLDevelop, train, and share a BigQuery ML model from previous parts with AI Platform NotebooksFind out how to invoke a trained TensorFlow model directly from BigQueryGet to grips with BigQuery ML best practices to maximize your ML performanceWho this book is for This book is for data scientists, data analysts, data engineers, and anyone looking to get started with Google's BigQuery ML. You'll also find this book useful if you want to accelerate the development of ML models or if you are a business user who wants to apply ML in an easy way using SQL. Basic knowledge of BigQuery and SQL is required.


The Data Warehouse Toolkit

The Data Warehouse Toolkit

Author: Ralph Kimball

Publisher: John Wiley & Sons

Published: 2011-08-08

Total Pages: 464

ISBN-13: 1118082141

DOWNLOAD EBOOK

This old edition was published in 2002. The current and final edition of this book is The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling, 3rd Edition which was published in 2013 under ISBN: 9781118530801. The authors begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios. Clear-cut guidelines for designing dimensional models are illustrated using real-world data warehouse case studies drawn from a variety of business application areas and industries, including: Retail sales and e-commerce Inventory management Procurement Order management Customer relationship management (CRM) Human resources management Accounting Financial services Telecommunications and utilities Education Transportation Health care and insurance By the end of the book, you will have mastered the full range of powerful techniques for designing dimensional databases that are easy to understand and provide fast query response. You will also learn how to create an architected framework that integrates the distributed data warehouse using standardized dimensions and facts.


Data Science on the Google Cloud Platform

Data Science on the Google Cloud Platform

Author: Valliappa Lakshmanan

Publisher: "O'Reilly Media, Inc."

Published: 2017-12-12

Total Pages: 403

ISBN-13: 1491974532

DOWNLOAD EBOOK

Learn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science. You’ll learn how to: Automate and schedule data ingest, using an App Engine application Create and populate a dashboard in Google Data Studio Build a real-time analysis pipeline to carry out streaming analytics Conduct interactive data exploration with Google BigQuery Create a Bayesian model on a Cloud Dataproc cluster Build a logistic regression machine-learning model with Spark Compute time-aggregate features with a Cloud Dataflow pipeline Create a high-performing prediction model with TensorFlow Use your deployed model as a microservice you can access from both batch and real-time pipelines


SAP HANA 2.0

SAP HANA 2.0

Author: Denys Van Kempen

Publisher: SAP PRESS

Published: 2019

Total Pages: 438

ISBN-13: 9781493218387

DOWNLOAD EBOOK

Enter the fast-paced world of SAP HANA 2.0 with this introductory guide. Begin with an exploration of the technological backbone of SAP HANA as a database and platform. Then, step into key SAP HANA user roles and discover core capabilities for administration, application development, advanced analytics, security, data integration, and more. No matter how SAP HANA 2.0 fits into your business, this book is your starting point. In this book, you'll learn about: a. Technology Discover what makes an in-memory database platform. Learn about SAP HANA's journey from version 1.0 to 2.0, take a tour of your technology options, and walk through deployment scenarios and implementation requirements. b. Tools Unpack your SAP HANA toolkit. See essential tools in action, from SAP HANA cockpit and SAP HANA studio, to the SAP HANA Predictive Analytics Library and SAP HANA smart data integration. c. Key Roles Understand how to use SAP HANA as a developer, administrator, data scientist, data center architect, and more. Explore key tasks like backend programming with SQLScript, security setup with roles and authorizations, data integration with the SAP HANA Data Management Suite, and more. Highlights include: 1) Architecture 2) Administration 3) Application development 4) Analytics 5) Security 6) Data integration 7) Data architecture 8) Data center


Data Engineering with Google Cloud Platform

Data Engineering with Google Cloud Platform

Author: Adi Wijaya

Publisher: Packt Publishing Ltd

Published: 2022-03-31

Total Pages: 440

ISBN-13: 1800565062

DOWNLOAD EBOOK

Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineer Key Features Understand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solution Learn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelines Discover tips to prepare for and pass the Professional Data Engineer exam Book DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learn Load data into BigQuery and materialize its output for downstream consumption Build data pipeline orchestration using Cloud Composer Develop Airflow jobs to orchestrate and automate a data warehouse Build a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc cluster Leverage Pub/Sub for messaging and ingestion for event-driven systems Use Dataflow to perform ETL on streaming data Unlock the power of your data with Data Studio Calculate the GCP cost estimation for your end-to-end data solutions Who this book is for This book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.