The Journey Continues: From Data Lake to Data-Driven Organization

The Journey Continues: From Data Lake to Data-Driven Organization

Author: Mandy Chessell

Publisher: IBM Redbooks

Published: 2018-02-19

Total Pages: 30

ISBN-13: 0738456667

DOWNLOAD EBOOK

This IBM RedguideTM publication looks back on the key decisions that made the data lake successful and looks forward to the future. It proposes that the metadata management and governance approaches developed for the data lake can be adopted more broadly to increase the value that an organization gets from its data. Delivering this broader vision, however, requires a new generation of data catalogs and governance tools built on open standards that are adopted by a multi-vendor ecosystem of data platforms and tools. Work is already underway to define and deliver this capability, and there are multiple ways to engage. This guide covers the reasons why this new capability is critical for modern businesses and how you can get value from it.


Data Management at Scale

Data Management at Scale

Author: Piethein Strengholt

Publisher: "O'Reilly Media, Inc."

Published: 2020-07-29

Total Pages: 404

ISBN-13: 1492054739

DOWNLOAD EBOOK

As data management and integration continue to evolve rapidly, storing all your data in one place, such as a data warehouse, is no longer scalable. In the very near future, data will need to be distributed and available for several technological solutions. With this practical book, you’ll learnhow to migrate your enterprise from a complex and tightly coupled data landscape to a more flexible architecture ready for the modern world of data consumption. Executives, data architects, analytics teams, and compliance and governance staff will learn how to build a modern scalable data landscape using the Scaled Architecture, which you can introduce incrementally without a large upfront investment. Author Piethein Strengholt provides blueprints, principles, observations, best practices, and patterns to get you up to speed. Examine data management trends, including technological developments, regulatory requirements, and privacy concerns Go deep into the Scaled Architecture and learn how the pieces fit together Explore data governance and data security, master data management, self-service data marketplaces, and the importance of metadata


Data Mesh

Data Mesh

Author: Zhamak Dehghani

Publisher: "O'Reilly Media, Inc."

Published: 2022-03-08

Total Pages: 387

ISBN-13: 1492092363

DOWNLOAD EBOOK

Many enterprises are investing in a next-generation data lake, hoping to democratize data at scale to provide business insights and ultimately make automated intelligent decisions. In this practical book, author Zhamak Dehghani reveals that, despite the time, money, and effort poured into them, data warehouses and data lakes fail when applied at the scale and speed of today's organizations. A distributed data mesh is a better choice. Dehghani guides architects, technical leaders, and decision makers on their journey from monolithic big data architecture to a sociotechnical paradigm that draws from modern distributed architecture. A data mesh considers domains as a first-class concern, applies platform thinking to create self-serve data infrastructure, treats data as a product, and introduces a federated and computational model of data governance. This book shows you why and how. Examine the current data landscape from the perspective of business and organizational needs, environmental challenges, and existing architectures Analyze the landscape's underlying characteristics and failure modes Get a complete introduction to data mesh principles and its constituents Learn how to design a data mesh architecture Move beyond a monolithic data lake to a distributed data mesh.


Introduction to Ethics

Introduction to Ethics

Author: Chhanda Chakraborti

Publisher: Springer Nature

Published: 2023-09-17

Total Pages: 783

ISBN-13: 9819907071

DOWNLOAD EBOOK

The book introduces the reader to western ethics as a subject, along with its three standard subdivisions. Although the book is written with university students, policymakers, and professionals in mind, the book is lucid enough to be accessible to most adult readers. The book begins with introductions to the basics of ethics. These chapters are meant to provide the reader with the background knowledge necessary for understanding the more technical chapters on metaethics, normative ethics theories, and applied ethics, the three well-known subdivisions within ethics. The chapters that follow take up core ethical issues from each of these areas. The sections focus on explanation and a critical understanding of the ethical issue. The chapters also have examples, cases, and exercises to encourage critical thinking and to enable the reader to grasp the issue better. The book has tried to bring contemporary issues, such as ethics of human organ transplantation, and contemporary theories, such as Amartya Sen’s concept of Justice and Martha Nussbaum’s Capabilities Approach, to engage the readers with ethics in the real world. The book concludes with applied ethics, but with the example of ethics of artificial intelligence. The aim is to keep ethics as a future-driven activity and to emphasize the need to understand the real-world ethical situations and dilemmas that will affect the stakeholders all around the world in the coming years as artificial intelligence and data-driven technologies change our everyday life.


Data Lake for Enterprises

Data Lake for Enterprises

Author: Tomcy John

Publisher: Packt Publishing Ltd

Published: 2017-05-31

Total Pages: 585

ISBN-13: 1787282651

DOWNLOAD EBOOK

A practical guide to implementing your enterprise data lake using Lambda Architecture as the base About This Book Build a full-fledged data lake for your organization with popular big data technologies using the Lambda architecture as the base Delve into the big data technologies required to meet modern day business strategies A highly practical guide to implementing enterprise data lakes with lots of examples and real-world use-cases Who This Book Is For Java developers and architects who would like to implement a data lake for their enterprise will find this book useful. If you want to get hands-on experience with the Lambda Architecture and big data technologies by implementing a practical solution using these technologies, this book will also help you. What You Will Learn Build an enterprise-level data lake using the relevant big data technologies Understand the core of the Lambda architecture and how to apply it in an enterprise Learn the technical details around Sqoop and its functionalities Integrate Kafka with Hadoop components to acquire enterprise data Use flume with streaming technologies for stream-based processing Understand stream- based processing with reference to Apache Spark Streaming Incorporate Hadoop components and know the advantages they provide for enterprise data lakes Build fast, streaming, and high-performance applications using ElasticSearch Make your data ingestion process consistent across various data formats with configurability Process your data to derive intelligence using machine learning algorithms In Detail The term "Data Lake" has recently emerged as a prominent term in the big data industry. Data scientists can make use of it in deriving meaningful insights that can be used by businesses to redefine or transform the way they operate. Lambda architecture is also emerging as one of the very eminent patterns in the big data landscape, as it not only helps to derive useful information from historical data but also correlates real-time data to enable business to take critical decisions. This book tries to bring these two important aspects — data lake and lambda architecture—together. This book is divided into three main sections. The first introduces you to the concept of data lakes, the importance of data lakes in enterprises, and getting you up-to-speed with the Lambda architecture. The second section delves into the principal components of building a data lake using the Lambda architecture. It introduces you to popular big data technologies such as Apache Hadoop, Spark, Sqoop, Flume, and ElasticSearch. The third section is a highly practical demonstration of putting it all together, and shows you how an enterprise data lake can be implemented, along with several real-world use-cases. It also shows you how other peripheral components can be added to the lake to make it more efficient. By the end of this book, you will be able to choose the right big data technologies using the lambda architectural patterns to build your enterprise data lake. Style and approach The book takes a pragmatic approach, showing ways to leverage big data technologies and lambda architecture to build an enterprise-level data lake.


The Self-Service Data Roadmap

The Self-Service Data Roadmap

Author: Sandeep Uttamchandani

Publisher: "O'Reilly Media, Inc."

Published: 2020-09-10

Total Pages: 297

ISBN-13: 1492075205

DOWNLOAD EBOOK

Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw data can still take days or weeks. Most organizations can’t scale data science teams fast enough to keep up with the growing amounts of data to transform. What’s the answer? Self-service data. With this practical book, data engineers, data scientists, and team managers will learn how to build a self-service data science platform that helps anyone in your organization extract insights from data. Sandeep Uttamchandani provides a scorecard to track and address bottlenecks that slow down time to insight across data discovery, transformation, processing, and production. This book bridges the gap between data scientists bottlenecked by engineering realities and data engineers unclear about ways to make self-service work. Build a self-service portal to support data discovery, quality, lineage, and governance Select the best approach for each self-service capability using open source cloud technologies Tailor self-service for the people, processes, and technology maturity of your data platform Implement capabilities to democratize data and reduce time to insight Scale your self-service portal to support a large number of users within your organization


Data Lakes For Dummies

Data Lakes For Dummies

Author: Alan R. Simon

Publisher: John Wiley & Sons

Published: 2021-07-14

Total Pages: 391

ISBN-13: 1119786169

DOWNLOAD EBOOK

Take a dive into data lakes “Data lakes” is the latest buzz word in the world of data storage, management, and analysis. Data Lakes For Dummies decodes and demystifies the concept and helps you get a straightforward answer the question: “What exactly is a data lake and do I need one for my business?” Written for an audience of technology decision makers tasked with keeping up with the latest and greatest data options, this book provides the perfect introductory survey of these novel and growing features of the information landscape. It explains how they can help your business, what they can (and can’t) achieve, and what you need to do to create the lake that best suits your particular needs. With a minimum of jargon, prolific tech author and business intelligence consultant Alan Simon explains how data lakes differ from other data storage paradigms. Once you’ve got the background picture, he maps out ways you can add a data lake to your business systems; migrate existing information and switch on the fresh data supply; clean up the product; and open channels to the best intelligence software for to interpreting what you’ve stored. Understand and build data lake architecture Store, clean, and synchronize new and existing data Compare the best data lake vendors Structure raw data and produce usable analytics Whatever your business, data lakes are going to form ever more prominent parts of the information universe every business should have access to. Dive into this book to start exploring the deep competitive advantage they make possible—and make sure your business isn’t left standing on the shore.


Statistical Process Control and Data Analytics

Statistical Process Control and Data Analytics

Author: John Oakland

Publisher: Taylor & Francis

Published: 2024-09-02

Total Pages: 387

ISBN-13: 1040104983

DOWNLOAD EBOOK

The business, commercial and public-sector world has changed dramatically since John Oakland wrote the first edition of Statistical Process Control in the mid-1980s. Then, people were rediscovering statistical methods of ‘quality control,’ and the book responded to an often desperate need to find out about the techniques and use them on data. Pressure over time from organizations supplying directly to the consumer, typically in the automotive and high technology sectors, forced those in charge of the supplying, production and service operations to think more about preventing problems than how to find and fix them. Subsequent editions retained the ‘tool kit’ approach of the first but included some of the ‘philosophy’ behind the techniques and their use. Now entitled Statistical Process Control and Data Analytics, this revised and updated eighth edition retains its focus on processes that require understanding, have variation, must be properly controlled, have a capability and need improvement – as reflected in the five sections of the book. In this book the authors provide not only an instructional guide for the tools but communicate the management practices which have become so vital to success in organizations throughout the world. The book is supported by the authors' extensive consulting work with thousands of organizations worldwide. A new chapter on data governance and data analytics reflects the increasing importance of big data in today’s business environment. Fully updated to include real-life case studies, new research based on client work from an array of industries and integration with the latest computer methods and software, the book also retains its valued textbook quality through clear learning objectives and online end-of-chapter discussion questions. It can still serve as a textbook for both student and practicing engineers, scientists, technologists, managers and anyone wishing to understand or implement modern statistical process control techniques and data analytics.


Software Engineering at Google

Software Engineering at Google

Author: Titus Winters

Publisher: O'Reilly Media

Published: 2020-02-28

Total Pages: 602

ISBN-13: 1492082767

DOWNLOAD EBOOK

Today, software engineers need to know not only how to program effectively but also how to develop proper engineering practices to make their codebase sustainable and healthy. This book emphasizes this difference between programming and software engineering. How can software engineers manage a living codebase that evolves and responds to changing requirements and demands over the length of its life? Based on their experience at Google, software engineers Titus Winters and Hyrum Wright, along with technical writer Tom Manshreck, present a candid and insightful look at how some of the world’s leading practitioners construct and maintain software. This book covers Google’s unique engineering culture, processes, and tools and how these aspects contribute to the effectiveness of an engineering organization. You’ll explore three fundamental principles that software organizations should keep in mind when designing, architecting, writing, and maintaining code: How time affects the sustainability of software and how to make your code resilient over time How scale affects the viability of software practices within an engineering organization What trade-offs a typical engineer needs to make when evaluating design and development decisions


Data Mesh

Data Mesh

Author: Zhamak Dehghani

Publisher: "O'Reilly Media, Inc."

Published: 2022-03-08

Total Pages: 379

ISBN-13: 1492092347

DOWNLOAD EBOOK

We're at an inflection point in data, where our data management solutions no longer match the complexity of organizations, the proliferation of data sources, and the scope of our aspirations to get value from data with AI and analytics. In this practical book, author Zhamak Dehghani introduces data mesh, a decentralized sociotechnical paradigm drawn from modern distributed architecture that provides a new approach to sourcing, sharing, accessing, and managing analytical data at scale. Dehghani guides practitioners, architects, technical leaders, and decision makers on their journey from traditional big data architecture to a distributed and multidimensional approach to analytical data management. Data mesh treats data as a product, considers domains as a primary concern, applies platform thinking to create self-serve data infrastructure, and introduces a federated computational model of data governance. Get a complete introduction to data mesh principles and its constituents Design a data mesh architecture Guide a data mesh strategy and execution Navigate organizational design to a decentralized data ownership model Move beyond traditional data warehouses and lakes to a distributed data mesh