DuckDB in Action

DuckDB in Action

Author: Mark Needham

Publisher: Simon and Schuster

Published: 2024-09-10

Total Pages: 310

ISBN-13: 1638355592

DOWNLOAD EBOOK

Dive into DuckDB and start processing gigabytes of data with ease—all with no data warehouse. DuckDB is a cutting-edge SQL database that makes it incredibly easy to analyze big data sets right from your laptop. In DuckDB in Action you’ll learn everything you need to know to get the most out of this awesome tool, keep your data secure on prem, and save you hundreds on your cloud bill. From data ingestion to advanced data pipelines, you’ll learn everything you need to get the most out of DuckDB—all through hands-on examples. Open up DuckDB in Action and learn how to: • Read and process data from CSV, JSON and Parquet sources both locally and remote • Write analytical SQL queries, including aggregations, common table expressions, window functions, special types of joins, and pivot tables • Use DuckDB from Python, both with SQL and its "Relational"-API, interacting with databases but also data frames • Prepare, ingest and query large datasets • Build cloud data pipelines • Extend DuckDB with custom functionality Pragmatic and comprehensive, DuckDB in Action introduces the DuckDB database and shows you how to use it to solve common data workflow problems. You won’t need to read through pages of documentation—you’ll learn as you work. Get to grips with DuckDB's unique SQL dialect, learning to seamlessly load, prepare, and analyze data using SQL queries. Extend DuckDB with both Python and built-in tools such as MotherDuck, and gain practical insights into building robust and automated data pipelines. About the technology DuckDB makes data analytics fast and fun! You don’t need to set up a Spark or run a cloud data warehouse just to process a few hundred gigabytes of data. DuckDB is easily embeddable in any data analytics application, runs on a laptop, and processes data from almost any source, including JSON, CSV, Parquet, SQLite and Postgres. About the book DuckDB in Action guides you example-by-example from setup, through your first SQL query, to advanced topics like building data pipelines and embedding DuckDB as a local data store for a Streamlit web app. You’ll explore DuckDB’s handy SQL extensions, get to grips with aggregation, analysis, and data without persistence, and use Python to customize DuckDB. A hands-on project accompanies each new topic, so you can see DuckDB in action. What's inside • Prepare, ingest and query large datasets • Build cloud data pipelines • Extend DuckDB with custom functionality • Fast-paced SQL recap: From simple queries to advanced analytics About the reader For data pros comfortable with Python and CLI tools. About the author Mark Needham is a blogger and video creator at @?LearnDataWithMark. Michael Hunger leads product innovation for the Neo4j graph database. Michael Simons is a Java Champion, author, and Engineer at Neo4j.


ScyllaDB in Action

ScyllaDB in Action

Author: Bo Ingram

Publisher: Simon and Schuster

Published: 2024-11-12

Total Pages: 390

ISBN-13: 1638356122

DOWNLOAD EBOOK

Build, maintain, and run databases that are easy to scale and quick to query—all with ScyllaDB. ScyllaDB in Action is your guide to everything you need to know about ScyllaDB, from your very first queries to running it in a production environment. It starts you with the basics of creating, reading, and deleting data and expands your knowledge from there. You’ll soon have mastered everything you need to build, maintain, and run an effective and efficient database. Inside ScyllaDB in Action you’ll learn how to: • Read, write, and delete data in ScyllaDB • Design database schemas for ScyllaDB • Write performant queries against ScyllaDB • Connect and query a ScyllaDB cluster from an application • Configure, monitor, and operate ScyllaDB in production This book teaches you ScyllaDB the best way—through hands-on examples. Dive into the node-based architecture of ScyllaDB to understand how its distributed systems work, how you can troubleshoot problems, and how you can constantly improve performance. About the technology ScyllaDB is a versatile NoSQL database that can move large volumes of data fast. Very, very, very fast. This drop-in replacement for Cassandra takes full advantage of modern multi-core hardware and scales to handle large real-time data workloads with incredibly low latency. It features built-in monitoring and management tools, and its efficient use of computing resources can save a lot of money on high-volume applications. About the book ScyllaDB in Action demonstrates how to integrate ScyllaDB into data-intensive applications. You’ll work through a hands-on project step by step as you use ScyllaDB to store data and learn to configure, monitor, and safely operate a distributed database. Along the way, you’ll discover how ScyllaDB’s unique “shard per core” approach helps you deliver impressive performance in real-time systems. What's inside • Design schemas for ScyllaDB • Write performant queries • Get an instant speed boost over Cassandra About the reader For backend and infrastructure engineers who know the basics of SQL. About the author Bo Ingram is a staff software engineer at Discord working in database infrastructure. He has extensive experience working with ScyllaDB as an operator and developer. The technical editor on this book was Piotr Wiktor Sarna. Table of Contents Part 1 1 Introducing ScyllaDB 2 Touring ScyllaDB Part 2 3 Data modeling in ScyllaDB 4 Data types in ScyllaDB 5 Tables in ScyllaDB Part 3 6 Writing data to ScyllaDB 7 Reading data from ScyllaDB Part 4 8 ScyllaDB’s architecture 9 Running ScyllaDB in production 10 Application development with ScyllaDB 11 Monitoring ScyllaDB 12 Moving data in bulk with ScyllaDB Appendix Docker


Getting Started with DuckDB

Getting Started with DuckDB

Author: Simon Aubury

Publisher: Packt Publishing Ltd

Published: 2024-06-24

Total Pages: 382

ISBN-13: 1803232536

DOWNLOAD EBOOK

Analyze and transform data efficiently with DuckDB, a versatile, modern, in-process SQL database Key Features Use DuckDB to rapidly load, transform, and query data across a range of sources and formats Gain practical experience using SQL, Python, and R to effectively analyze data Learn how open source tools and cloud services in the broader data ecosystem complement DuckDB’s versatile capabilities Purchase of the print or Kindle book includes a free PDF eBook Book DescriptionDuckDB is a fast in-process analytical database. Getting Started with DuckDB offers a practical overview of its usage. You'll learn to load, transform, and query various data formats, including CSV, JSON, and Parquet. The book covers DuckDB's optimizations, SQL enhancements, and extensions for specialized applications. Working with examples in SQL, Python, and R, you'll explore analyzing public datasets and discover tools enhancing DuckDB workflows. This guide suits both experienced and new data practitioners, quickly equipping you to apply DuckDB's capabilities in analytical projects. You'll gain proficiency in using DuckDB for diverse tasks, enabling effective integration into your data workflows.What you will learn Understand the properties and applications of a columnar in-process database Use SQL to load, transform, and query a range of data formats Discover DuckDB's rich extensions and learn how to apply them Use nested data types to model semi-structured data and extract and model JSON data Integrate DuckDB into your Python and R analytical workflows Effectively leverage DuckDB's convenient SQL enhancements Explore the wider ecosystem and pathways for building DuckDB-powered data applications Who this book is for If you’re interested in expanding your analytical toolkit, this book is for you. It will be particularly valuable for data analysts wanting to rapidly explore and query complex data, data and software engineers looking for a lean and versatile data processing tool, along with data scientists needing a scalable data manipulation library that integrates seamlessly with Python and R. You will get the most from this book if you have some familiarity with SQL and foundational database concepts, as well as exposure to a programming language such as Python or R.


Quantum Computing in Action

Quantum Computing in Action

Author: Johan Vos

Publisher: Simon and Schuster

Published: 2022-02-08

Total Pages: 262

ISBN-13: 1617296325

DOWNLOAD EBOOK

Quantum computing is on the horizon, ready to impact everything from scientific research to encryption and security. But you don't need a physics degree to get started in quantum computing. Quantum Computing for Developers shows you how to leverage your existing Java skills into writing your first quantum software so you're ready for the revolution. Rather than a hardware manual or academic theory guide, this book is focused on practical implementations of quantum computing algorithms. Using Strange, a Java-based quantum computer simulator, you'll go hands-on with quantum computing's core components including qubits and quantum gates as you write your very first quantum code. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.


Spark: The Definitive Guide

Spark: The Definitive Guide

Author: Bill Chambers

Publisher: "O'Reilly Media, Inc."

Published: 2018-02-08

Total Pages: 594

ISBN-13: 1491912294

DOWNLOAD EBOOK

Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of the open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique goals. Youâ??ll explore the basic operations and common functions of Sparkâ??s structured APIs, as well as Structured Streaming, a new high-level API for building end-to-end streaming applications. Developers and system administrators will learn the fundamentals of monitoring, tuning, and debugging Spark, and explore machine learning techniques and scenarios for employing MLlib, Sparkâ??s scalable machine-learning library. Get a gentle overview of big data and Spark Learn about DataFrames, SQL, and Datasetsâ??Sparkâ??s core APIsâ??through worked examples Dive into Sparkâ??s low-level APIs, RDDs, and execution of SQL and DataFrames Understand how Spark runs on a cluster Debug, monitor, and tune Spark clusters and applications Learn the power of Structured Streaming, Sparkâ??s stream-processing engine Learn how you can apply MLlib to a variety of problems, including classification or recommendation


Learning Spark

Learning Spark

Author: Holden Karau

Publisher: "O'Reilly Media, Inc."

Published: 2015-01-28

Total Pages: 289

ISBN-13: 1449359051

DOWNLOAD EBOOK

Data in all domains is getting bigger. How can you work with it efficiently? Recently updated for Spark 1.3, this book introduces Apache Spark, the open source cluster computing system that makes data analytics fast to write and fast to run. With Spark, you can tackle big datasets quickly through simple APIs in Python, Java, and Scala. This edition includes new information on Spark SQL, Spark Streaming, setup, and Maven coordinates. Written by the developers of Spark, this book will have data scientists and engineers up and running in no time. You’ll learn how to express parallel jobs with just a few lines of code, and cover applications from simple batch jobs to stream processing and machine learning. Quickly dive into Spark capabilities such as distributed datasets, in-memory caching, and the interactive shell Leverage Spark’s powerful built-in libraries, including Spark SQL, Spark Streaming, and MLlib Use one programming paradigm instead of mixing and matching tools like Hive, Hadoop, Mahout, and Storm Learn how to deploy interactive, batch, and streaming applications Connect to data sources including HDFS, Hive, JSON, and S3 Master advanced topics like data partitioning and shared variables


Analytics Engineering with SQL and dbt

Analytics Engineering with SQL and dbt

Author: Rui Pedro Machado

Publisher: "O'Reilly Media, Inc."

Published: 2023-12-08

Total Pages: 324

ISBN-13: 1098142349

DOWNLOAD EBOOK

With the shift from data warehouses to data lakes, data now lands in repositories before it's been transformed, enabling engineers to model raw data into clean, well-defined datasets. dbt (data build tool) helps you take data further. This practical book shows data analysts, data engineers, BI developers, and data scientists how to create a true self-service transformation platform through the use of dynamic SQL. Authors Rui Machado from Monstarlab and Hélder Russa from Jumia show you how to quickly deliver new data products by focusing more on value delivery and less on architectural and engineering aspects. If you know your business well and have the technical skills to model raw data into clean, well-defined datasets, you'll learn how to design and deliver data models without any technical influence. With this book, you'll learn: What dbt is and how a dbt project is structured How dbt fits into the data engineering and analytics worlds How to collaborate on building data models The main tools and architectures for building useful, functional data models How to fit dbt into data warehousing and laking architecture How to build tests for data transformations


Spring Data

Spring Data

Author: Mark Pollack

Publisher: "O'Reilly Media, Inc."

Published: 2012-10-24

Total Pages: 315

ISBN-13: 1449323952

DOWNLOAD EBOOK

You can choose several data access frameworks when building Java enterprise applications that work with relational databases. But what about big data? This hands-on introduction shows you how Spring Data makes it relatively easy to build applications across a wide range of new data access technologies such as NoSQL and Hadoop. Through several sample projects, you’ll learn how Spring Data provides a consistent programming model that retains NoSQL-specific features and capabilities, and helps you develop Hadoop applications across a wide range of use-cases such as data analysis, event stream processing, and workflow. You’ll also discover the features Spring Data adds to Spring’s existing JPA and JDBC support for writing RDBMS-based data access layers. Learn about Spring’s template helper classes to simplify the use of database-specific functionality Explore Spring Data’s repository abstraction and advanced query functionality Use Spring Data with Redis (key/value store), HBase (column-family), MongoDB (document database), and Neo4j (graph database) Discover the GemFire distributed data grid solution Export Spring Data JPA-managed entities to the Web as RESTful web services Simplify the development of HBase applications, using a lightweight object-mapping framework Build example big-data pipelines with Spring Batch and Spring Integration


Practical SQL, 2nd Edition

Practical SQL, 2nd Edition

Author: Anthony DeBarros

Publisher: No Starch Press

Published: 2022-01-25

Total Pages: 466

ISBN-13: 1718501072

DOWNLOAD EBOOK

Analyze data like a pro, even if you’re a beginner. Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. Anthony DeBarros, a journalist and data analyst, focuses on using SQL to find the story within your data. The examples and code use the open-source database PostgreSQL and its companion pgAdmin interface, and the concepts you learn will apply to most database management systems, including MySQL, Oracle, SQLite, and others.* You’ll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from real-world datasets such as US Census demographics, New York City taxi rides, and earthquakes from US Geological Survey. Each chapter includes exercises and examples that teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently. You’ll learn how to: Create databases and related tables using your own data Aggregate, sort, and filter data to find patterns Use functions for basic math and advanced statistical operations Identify errors in data and clean them up Analyze spatial data with a geographic information system (PostGIS) Create advanced queries and automate tasks This updated second edition has been thoroughly revised to reflect the latest in SQL features, including additional advanced query techniques for wrangling data. This edition also has two new chapters: an expanded set of instructions on for setting up your system plus a chapter on using PostgreSQL with the popular JSON data interchange format. Learning SQL doesn’t have to be dry and complicated. Practical SQL delivers clear examples with an easy-to-follow approach to teach you the tools you need to build and manage your own databases. * Microsoft SQL Server employs a variant of the language called T-SQL, which is not covered by Practical SQL.


arc42 by Example

arc42 by Example

Author: Dr. Gernot Starke

Publisher: Packt Publishing Ltd

Published: 2019-10-07

Total Pages: 236

ISBN-13: 1839219262

DOWNLOAD EBOOK

Document the architecture of your software easily with this highly practical, open-source template. Key FeaturesGet to grips with leveraging the features of arc42 to create insightful documentsLearn the concepts of software architecture documentation through real-world examplesDiscover techniques to create compact, helpful, and easy-to-read documentationBook Description When developers document the architecture of their systems, they often invent their own specific ways of articulating structures, designs, concepts, and decisions. What they need is a template that enables simple and efficient software architecture documentation. arc42 by Example shows how it's done through several real-world examples. Each example in the book, whether it is a chess engine, a huge CRM system, or a cool web system, starts with a brief description of the problem domain and the quality requirements. Then, you'll discover the system context with all the external interfaces. You'll dive into an overview of the solution strategy to implement the building blocks and runtime scenarios. The later chapters also explain various cross-cutting concerns and how they affect other aspects of a program. What you will learnUtilize arc42 to document a system's physical infrastructureLearn how to identify a system's scope and boundariesBreak a system down into building blocks and illustrate the relationships between themDiscover how to describe the runtime behavior of a systemKnow how to document design decisions and their reasonsExplore the risks and technical debt of your systemWho this book is for This book is for software developers and solutions architects who are looking for an easy, open-source tool to document their systems. It is a useful reference for those who are already using arc42. If you are new to arc42, this book is a great learning resource. For those of you who want to write better technical documentation will benefit from the general concepts covered in this book.