Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps. Create analytics applications by using the agile big data development methodology Build value from your data in a series of agile sprints, using the data-value stack Gain insight by using several data structures to extract multiple features from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future, and translate predictions into action Get feedback from users after each sprint to keep your project on track
Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track
Gain a practical introduction to DataOps, a new discipline for delivering data science at scale inspired by practices at companies such as Facebook, Uber, LinkedIn, Twitter, and eBay. Organizations need more than the latest AI algorithms, hottest tools, and best people to turn data into insight-driven action and useful analytical data products. Processes and thinking employed to manage and use data in the 20th century are a bottleneck for working effectively with the variety of data and advanced analytical use cases that organizations have today. This book provides the approach and methods to ensure continuous rapid use of data to create analytical data products and steer decision making. Practical DataOps shows you how to optimize the data supply chain from diverse raw data sources to the final data product, whether the goal is a machine learning model or other data-orientated output. The book provides an approach to eliminate wasted effort and improve collaboration between data producers, data consumers, and the rest of the organization through the adoption of lean thinking and agile software development principles. This book helps you to improve the speed and accuracy of analytical application development through data management and DevOps practices that securely expand data access, and rapidly increase the number of reproducible data products through automation, testing, and integration. The book also shows how to collect feedback and monitor performance to manage and continuously improve your processes and output. What You Will LearnDevelop a data strategy for your organization to help it reach its long-term goals Recognize and eliminate barriers to delivering data to users at scale Work on the right things for the right stakeholders through agile collaboration Create trust in data via rigorous testing and effective data management Build a culture of learning and continuous improvement through monitoring deployments and measuring outcomes Create cross-functional self-organizing teams focused on goals not reporting lines Build robust, trustworthy, data pipelines in support of AI, machine learning, and other analytical data products Who This Book Is For Data science and advanced analytics experts, CIOs, CDOs (chief data officers), chief analytics officers, business analysts, business team leaders, and IT professionals (data engineers, developers, architects, and DBAs) supporting data teams who want to dramatically increase the value their organization derives from data. The book is ideal for data professionals who want to overcome challenges of long delivery time, poor data quality, high maintenance costs, and scaling difficulties in getting data science output and machine learning into customer-facing production.
You have to make sense of enormous amounts of data, and while the notion of "agile data warehousing might sound tricky, it can yield as much as a 3-to-1 speed advantage while cutting project costs in half. Bring this highly effective technique to your organization with the wisdom of agile data warehousing expert Ralph Hughes. Agile Data Warehousing Project Management will give you a thorough introduction to the method as you would practice it in the project room to build a serious "data mart. Regardless of where you are today, this step-by-step implementation guide will prepare you to join or even lead a team in visualizing, building, and validating a single component to an enterprise data warehouse. - Provides a thorough grounding on the mechanics of Scrum as well as practical advice on keeping your team on track - Includes strategies for getting accurate and actionable requirements from a team's business partner - Revolutionary estimating techniques that make forecasting labor far more understandable and accurate - Demonstrates a blends of Agile methods to simplify team management and synchronize inputs across IT specialties - Enables you and your teams to start simple and progress steadily to world-class performance levels
Using Agile methods, you can bring far greater innovation, value, and quality to any data warehousing (DW), business intelligence (BI), or analytics project. However, conventional Agile methods must be carefully adapted to address the unique characteristics of DW/BI projects. In Agile Analytics, Agile pioneer Ken Collier shows how to do just that. Collier introduces platform-agnostic Agile solutions for integrating infrastructures consisting of diverse operational, legacy, and specialty systems that mix commercial and custom code. Using working examples, he shows how to manage analytics development teams with widely diverse skill sets and how to support enormous and fast-growing data volumes. Collier's techniques offer optimal value whether your projects involve "back-end" data management, "front-end" business analysis, or both. Part I focuses on Agile project management techniques and delivery team coordination, introducing core practices that shape the way your Agile DW/BI project community can collaborate toward success Part II presents technical methods for enabling continuous delivery of business value at production-quality levels, including evolving superior designs; test-driven DW development; version control; and project automation Collier brings together proven solutions you can apply right now--whether you're an IT decision-maker, data warehouse professional, database administrator, business intelligence specialist, or database developer. With his help, you can mitigate project risk, improve business alignment, achieve better results--and have fun along the way.
Leverage DataRobot's enterprise AI platform and automated decision intelligence to extract business value from data Key FeaturesGet well-versed with DataRobot features using real-world examplesUse this all-in-one platform to build, monitor, and deploy ML models for handling the entire production life cycleMake use of advanced DataRobot capabilities to programmatically build and deploy a large number of ML modelsBook Description DataRobot enables data science teams to become more efficient and productive. This book helps you to address machine learning (ML) challenges with DataRobot's enterprise platform, enabling you to extract business value from data and rapidly create commercial impact for your organization. You'll begin by learning how to use DataRobot's features to perform data prep and cleansing tasks automatically. The book then covers best practices for building and deploying ML models, along with challenges faced while scaling them to handle complex business problems. Moving on, you'll perform exploratory data analysis (EDA) tasks to prepare your data to build ML models and ways to interpret results. You'll also discover how to analyze the model's predictions and turn them into actionable insights for business users. Next, you'll create model documentation for internal as well as compliance purposes and learn how the model gets deployed as an API. In addition, you'll find out how to operationalize and monitor the model's performance. Finally, you'll work with examples on time series forecasting, NLP, image processing, MLOps, and more using advanced DataRobot capabilities. By the end of this book, you'll have learned to use DataRobot's AutoML and MLOps features to scale ML model building by avoiding repetitive tasks and common errors. What you will learnUnderstand and solve business problems using DataRobotUse DataRobot to prepare your data and perform various data analysis tasks to start building modelsDevelop robust ML models and assess their results correctly before deploymentExplore various DataRobot functions and outputs to help you understand the models and select the one that best solves the business problemAnalyze a model's predictions and turn them into actionable insights for business usersUnderstand how DataRobot helps in governing, deploying, and maintaining ML modelsWho this book is for This book is for data scientists, data analysts, and data enthusiasts looking for a practical guide to building and deploying robust machine learning models using DataRobot. Experienced data scientists will also find this book helpful for rapidly exploring, building, and deploying a broader range of models. The book assumes a basic understanding of machine learning.
Agile Data Warehouse Design is a step-by-step guide for capturing data warehousing/business intelligence (DW/BI) requirements and turning them into high performance dimensional models in the most direct way: by modelstorming (data modeling + brainstorming) with BI stakeholders. This book describes BEAM✲, an agile approach to dimensional modeling, for improving communication between data warehouse designers, BI stakeholders and the whole DW/BI development team. BEAM✲ provides tools and techniques that will encourage DW/BI designers and developers to move away from their keyboards and entity relationship based tools and model interactively with their colleagues. The result is everyone thinks dimensionally from the outset! Developers understand how to efficiently implement dimensional modeling solutions. Business stakeholders feel ownership of the data warehouse they have created, and can already imagine how they will use it to answer their business questions. Within this book, you will learn: ✲ Agile dimensional modeling using Business Event Analysis & Modeling (BEAM✲) ✲ Modelstorming: data modeling that is quicker, more inclusive, more productive, and frankly more fun! ✲ Telling dimensional data stories using the 7Ws (who, what, when, where, how many, why and how) ✲ Modeling by example not abstraction; using data story themes, not crow's feet, to describe detail ✲ Storyboarding the data warehouse to discover conformed dimensions and plan iterative development ✲ Visual modeling: sketching timelines, charts and grids to model complex process measurement - simply ✲ Agile design documentation: enhancing star schemas with BEAM✲ dimensional shorthand notation ✲ Solving difficult DW/BI performance and usability problems with proven dimensional design patterns Lawrence Corr is a data warehouse designer and educator. As Principal of DecisionOne Consulting, he helps clients to review and simplify their data warehouse designs, and advises vendors on visual data modeling techniques. He regularly teaches agile dimensional modeling courses worldwide and has taught dimensional DW/BI skills to thousands of students. Jim Stagnitto is a data warehouse and master data management architect specializing in the healthcare, financial services, and information service industries. He is the founder of the data warehousing and data mining consulting firm Llumino.
Build resilient applied machine learning teams that deliver better data products through adapting the guiding principles of the Agile Manifesto. Bringing together talented people to create a great applied machine learning team is no small feat. With developers and data scientists both contributing expertise in their respective fields, communication alone can be a challenge. Agile Machine Learning teaches you how to deliver superior data products through agile processes and to learn, by example, how to organize and manage a fast-paced team challenged with solving novel data problems at scale, in a production environment. The authors’ approach models the ground-breaking engineering principles described in the Agile Manifesto. The book provides further context, and contrasts the original principles with the requirements of systems that deliver a data product. What You'll Learn Effectively run a data engineering team that is metrics-focused, experiment-focused, and data-focused Make sound implementation and model exploration decisions based on the data and the metrics Know the importance of data wallowing: analyzing data in real time in a group setting Recognize the value of always being able to measure your current state objectively Understand data literacy, a key attribute of a reliable data engineer, from definitions to expectations Who This Book Is For Anyone who manages a machine learning team, or is responsible for creating production-ready inference components. Anyone responsible for data project workflow of sampling data; labeling, training, testing, improving, and maintaining models; and system and data metrics will also find this book useful. Readers should be familiar with software engineering and understand the basics of machine learning and working with data.
A field guide for the unique challenges of data science leadership, filled with transformative insights, personal experiences, and industry examples. In How To Lead in Data Science you will learn: Best practices for leading projects while balancing complex trade-offs Specifying, prioritizing, and planning projects from vague requirements Navigating structural challenges in your organization Working through project failures with positivity and tenacity Growing your team with coaching, mentoring, and advising Crafting technology roadmaps and championing successful projects Driving diversity, inclusion, and belonging within teams Architecting a long-term business strategy and data roadmap as an executive Delivering a data-driven culture and structuring productive data science organizations How to Lead in Data Science is full of techniques for leading data science at every seniority level—from heading up a single project to overseeing a whole company's data strategy. Authors Jike Chong and Yue Cathy Chang share hard-won advice that they've developed building data teams for LinkedIn, Acorns, Yiren Digital, large asset-management firms, Fortune 50 companies, and more. You'll find advice on plotting your long-term career advancement, as well as quick wins you can put into practice right away. Carefully crafted assessments and interview scenarios encourage introspection, reveal personal blind spots, and highlight development areas. About the technology Lead your data science teams and projects to success! To make a consistent, meaningful impact as a data science leader, you must articulate technology roadmaps, plan effective project strategies, support diversity, and create a positive environment for professional growth. This book delivers the wisdom and practical skills you need to thrive as a data science leader at all levels, from team member to the C-suite. About the book How to Lead in Data Science shares unique leadership techniques from high-performance data teams. It’s filled with best practices for balancing project trade-offs and producing exceptional results, even when beginning with vague requirements or unclear expectations. You’ll find a clearly presented modern leadership framework based on current case studies, with insights reaching all the way to Aristotle and Confucius. As you read, you’ll build practical skills to grow and improve your team, your company’s data culture, and yourself. What's inside How to coach and mentor team members Navigate an organization’s structural challenges Secure commitments from other teams and partners Stay current with the technology landscape Advance your career About the reader For data science practitioners at all levels. About the author Dr. Jike Chong and Yue Cathy Chang build, lead, and grow high-performing data teams across industries in public and private companies, such as Acorns, LinkedIn, large asset-management firms, and Fortune 50 companies. Table of Contents 1 What makes a successful data scientist? PART 1 THE TECH LEAD: CULTIVATING LEADERSHIP 2 Capabilities for leading projects 3 Virtues for leading projects PART 2 THE MANAGER: NURTURING A TEAM 4 Capabilities for leading people 5 Virtues for leading people PART 3 THE DIRECTOR: GOVERNING A FUNCTION 6 Capabilities for leading a function 7 Virtues for leading a function PART 4 THE EXECUTIVE: INSPIRING AN INDUSTRY 8 Capabilities for leading a company 9 Virtues for leading a company PART 5 THE LOOP AND THE FUTURE 10 Landscape, organization, opportunity, and practice 11 Leading in data science and a future outlook
Building upon his earlier book that detailed agile data warehousing programming techniques for the Scrum master, Ralph's latest work illustrates the agile interpretations of the remaining software engineering disciplines: - Requirements management benefits from streamlined templates that not only define projects quickly, but ensure nothing essential is overlooked. - Data engineering receives two new "hyper modeling" techniques, yielding data warehouses that can be easily adapted when requirements change without having to invest in ruinously expensive data-conversion programs. - Quality assurance advances with not only a stereoscopic top-down and bottom-up planning method, but also the incorporation of the latest in automated test engines. Use this step-by-step guide to deepen your own application development skills through self-study, show your teammates the world's fastest and most reliable techniques for creating business intelligence systems, or ensure that the IT department working for you is building your next decision support system the right way. - Learn how to quickly define scope and architecture before programming starts - Includes techniques of process and data engineering that enable iterative and incremental delivery - Demonstrates how to plan and execute quality assurance plans and includes a guide to continuous integration and automated regression testing - Presents program management strategies for coordinating multiple agile data mart projects so that over time an enterprise data warehouse emerges - Use the provided 120-day road map to establish a robust, agile data warehousing program