Practical Data Analysis Cookbook

Practical Data Analysis Cookbook

Author: Tomasz Drabas

Publisher: Packt Publishing Ltd

Published: 2016-04-29

Total Pages: 384

ISBN-13: 1783558512

DOWNLOAD EBOOK

Over 60 practical recipes on data exploration and analysis About This Book Clean dirty data, extract accurate information, and explore the relationships between variables Forecast the output of an electric plant and the water flow of American rivers using pandas, NumPy, Statsmodels, and scikit-learn Find and extract the most important features from your dataset using the most efficient Python libraries Who This Book Is For If you are a beginner or intermediate-level professional who is looking to solve your day-to-day, analytical problems with Python, this book is for you. Even with no prior programming and data analytics experience, you will be able to finish each recipe and learn while doing so. What You Will Learn Read, clean, transform, and store your data usng Pandas and OpenRefine Understand your data and explore the relationships between variables using Pandas and D3.js Explore a variety of techniques to classify and cluster outbound marketing campaign calls data of a bank using Pandas, mlpy, NumPy, and Statsmodels Reduce the dimensionality of your dataset and extract the most important features with pandas, NumPy, and mlpy Predict the output of a power plant with regression models and forecast water flow of American rivers with time series methods using pandas, NumPy, Statsmodels, and scikit-learn Explore social interactions and identify fraudulent activities with graph theory concepts using NetworkX and Gephi Scrape Internet web pages using urlib and BeautifulSoup and get to know natural language processing techniques to classify movies ratings using NLTK Study simulation techniques in an example of a gas station with agent-based modeling In Detail Data analysis is the process of systematically applying statistical and logical techniques to describe and illustrate, condense and recap, and evaluate data. Its importance has been most visible in the sector of information and communication technologies. It is an employee asset in almost all economy sectors. This book provides a rich set of independent recipes that dive into the world of data analytics and modeling using a variety of approaches, tools, and algorithms. You will learn the basics of data handling and modeling, and will build your skills gradually toward more advanced topics such as simulations, raw text processing, social interactions analysis, and more. First, you will learn some easy-to-follow practical techniques on how to read, write, clean, reformat, explore, and understand your data—arguably the most time-consuming (and the most important) tasks for any data scientist. In the second section, different independent recipes delve into intermediate topics such as classification, clustering, predicting, and more. With the help of these easy-to-follow recipes, you will also learn techniques that can easily be expanded to solve other real-life problems such as building recommendation engines or predictive models. In the third section, you will explore more advanced topics: from the field of graph theory through natural language processing, discrete choice modeling to simulations. You will also get to expand your knowledge on identifying fraud origin with the help of a graph, scrape Internet websites, and classify movies based on their reviews. By the end of this book, you will be able to efficiently use the vast array of tools that the Python environment has to offer. Style and approach This hands-on recipe guide is divided into three sections that tackle and overcome real-world data modeling problems faced by data analysts/scientist in their everyday work. Each independent recipe is written in an easy-to-follow and step-by-step fashion.


Practical Data Science Cookbook

Practical Data Science Cookbook

Author: Prabhanjan Tattar

Publisher: Packt Publishing Ltd

Published: 2017-06-29

Total Pages: 428

ISBN-13: 178712326X

DOWNLOAD EBOOK

Over 85 recipes to help you complete real-world data science projects in R and Python About This Book Tackle every step in the data science pipeline and use it to acquire, clean, analyze, and visualize your data Get beyond the theory and implement real-world projects in data science using R and Python Easy-to-follow recipes will help you understand and implement the numerical computing concepts Who This Book Is For If you are an aspiring data scientist who wants to learn data science and numerical programming concepts through hands-on, real-world project examples, this is the book for you. Whether you are brand new to data science or you are a seasoned expert, you will benefit from learning about the structure of real-world data science projects and the programming examples in R and Python. What You Will Learn Learn and understand the installation procedure and environment required for R and Python on various platforms Prepare data for analysis by implement various data science concepts such as acquisition, cleaning and munging through R and Python Build a predictive model and an exploratory model Analyze the results of your model and create reports on the acquired data Build various tree-based methods and Build random forest In Detail As increasing amounts of data are generated each year, the need to analyze and create value out of it is more important than ever. Companies that know what to do with their data and how to do it well will have a competitive advantage over companies that don't. Because of this, there will be an increasing demand for people that possess both the analytical and technical abilities to extract valuable insights from data and create valuable solutions that put those insights to use. Starting with the basics, this book covers how to set up your numerical programming environment, introduces you to the data science pipeline, and guides you through several data projects in a step-by-step format. By sequentially working through the steps in each chapter, you will quickly familiarize yourself with the process and learn how to apply it to a variety of situations with examples using the two most popular programming languages for data analysis—R and Python. Style and approach This step-by-step guide to data science is full of hands-on examples of real-world data science tasks. Each recipe focuses on a particular task involved in the data science pipeline, ranging from readying the dataset to analytics and visualization


Practical Data Analysis

Practical Data Analysis

Author: Hector Cuesta

Publisher: Packt Publishing Ltd

Published: 2016-09-30

Total Pages: 330

ISBN-13: 1785286668

DOWNLOAD EBOOK

A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark About This Book Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insight Who This Book Is For This book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed. What You Will Learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting In Detail Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Style and approach This is a hands-on guide to data analysis and data processing. The concrete examples are explained with simple code and accessible data.


Access Data Analysis Cookbook

Access Data Analysis Cookbook

Author: Ken Bluttman

Publisher: "O'Reilly Media, Inc."

Published: 2007-05-14

Total Pages: 369

ISBN-13: 0596555210

DOWNLOAD EBOOK

If you have large quantities of data in a Microsoft Access database, and need to study that data in depth, this book is a data cruncher's dream. Access Data Analysis Cookbook offers practical recipes to solve a variety of common problems that users have with extracting Access data and performing calculations on it. Each recipe includes a discussion on how and why the solution works. Whether you use Access 2007 or an earlier version, this book will teach you new methods to query data, different ways to move data in and out of Access, how to calculate answers to financial and investment issues, and more. Learn how to apply statistics to summarize business information, how to jump beyond SQL by manipulating data with VBA, how to process dates and times, and even how to reach into the Excel data analysis toolkit. Recipes demonstrate ways to: Develop basic and sophisticated queries Apply aggregate functions, custom functions, regular expressions, and crosstabs Apply queries to perform non-passive activities such as inserting, updating, and deleting data Create and manipulate tables and queries programmatically Manage text-based data, including methods to isolate parts of a string and ways to work with numbers that are stored as text Use arrays, read and write to the Windows registry, encrypt data, and use transaction processing Use the FileSystemObject, use XML with XSLT, communicate with SQL Server, and exchange data with other Office products Find answers from time-based data, such as how to add time, count elapsed time, work with leap years, and how to manage time zones in your calculations Deal with business and finance problems, including methods for calculating depreciation, loan paybacks, and Return on Investment (ROI) Explore statistical techniques, such as frequency, variance, kurtosis, linear regression, combinations and permutations Access Data Analysis Cookbook is a one-stop-shop for extracting nuggets of valuable information from your database, and anyone with Access experience will benefit from these tips and techniques, including seasoned developers. If you want to use your data, and not just store it, you'll find this guide indispensable.


Python Data Analysis Cookbook

Python Data Analysis Cookbook

Author: Ivan Idris

Publisher: Packt Publishing Ltd

Published: 2016-07-22

Total Pages: 462

ISBN-13: 1785283855

DOWNLOAD EBOOK

Over 140 practical recipes to help you make sense of your data with ease and build production-ready data apps About This Book Analyze Big Data sets, create attractive visualizations, and manipulate and process various data types Packed with rich recipes to help you learn and explore amazing algorithms for statistics and machine learning Authored by Ivan Idris, expert in python programming and proud author of eight highly reviewed books Who This Book Is For This book teaches Python data analysis at an intermediate level with the goal of transforming you from journeyman to master. Basic Python and data analysis skills and affinity are assumed. What You Will Learn Set up reproducible data analysis Clean and transform data Apply advanced statistical analysis Create attractive data visualizations Web scrape and work with databases, Hadoop, and Spark Analyze images and time series data Mine text and analyze social networks Use machine learning and evaluate the results Take advantage of parallelism and concurrency In Detail Data analysis is a rapidly evolving field and Python is a multi-paradigm programming language suitable for object-oriented application development and functional design patterns. As Python offers a range of tools and libraries for all purposes, it has slowly evolved as the primary language for data science, including topics on: data analysis, visualization, and machine learning. Python Data Analysis Cookbook focuses on reproducibility and creating production-ready systems. You will start with recipes that set the foundation for data analysis with libraries such as matplotlib, NumPy, and pandas. You will learn to create visualizations by choosing color maps and palettes then dive into statistical data analysis using distribution algorithms and correlations. You'll then help you find your way around different data and numerical problems, get to grips with Spark and HDFS, and then set up migration scripts for web mining. In this book, you will dive deeper into recipes on spectral analysis, smoothing, and bootstrapping methods. Moving on, you will learn to rank stocks and check market efficiency, then work with metrics and clusters. You will achieve parallelism to improve system performance by using multiple threads and speeding up your code. By the end of the book, you will be capable of handling various data analysis techniques in Python and devising solutions for problem scenarios. Style and Approach The book is written in “cookbook” style striving for high realism in data analysis. Through the recipe-based format, you can read each recipe separately as required and immediately apply the knowledge gained.


R Cookbook

R Cookbook

Author: Paul Teetor

Publisher: "O'Reilly Media, Inc."

Published: 2011-03-03

Total Pages: 438

ISBN-13: 1449307264

DOWNLOAD EBOOK

With more than 200 practical recipes, this book helps you perform data analysis with R quickly and efficiently. The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression. Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an experienced data programmer, it will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process. Create vectors, handle variables, and perform other basic functions Input and output data Tackle data structures such as matrices, lists, factors, and data frames Work with probability, probability distributions, and random variables Calculate statistics and confidence intervals, and perform statistical tests Create a variety of graphic displays Build statistical models with linear regressions and analysis of variance (ANOVA) Explore advanced statistical techniques, such as finding clusters in your data "Wonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language—one practical example at a time."—Jeffrey Ryan, software consultant and R package author


R Cookbook

R Cookbook

Author: JD Long

Publisher: "O'Reilly Media, Inc."

Published: 2019-06-21

Total Pages: 625

ISBN-13: 1492040630

DOWNLOAD EBOOK

Perform data analysis with R quickly and efficiently with more than 275 practical recipes in this expanded second edition. The R language provides everything you need to do statistical work, but its structure can be difficult to master. These task-oriented recipes make you productive with R immediately. Solutions range from basic tasks to input and output, general statistics, graphics, and linear regression. Each recipe addresses a specific problem and includes a discussion that explains the solution and provides insight into how it works. If you’re a beginner, R Cookbook will help get you started. If you’re an intermediate user, this book will jog your memory and expand your horizons. You’ll get the job done faster and learn more about R in the process. Create vectors, handle variables, and perform basic functions Simplify data input and output Tackle data structures such as matrices, lists, factors, and data frames Work with probability, probability distributions, and random variables Calculate statistics and confidence intervals and perform statistical tests Create a variety of graphic displays Build statistical models with linear regressions and analysis of variance (ANOVA) Explore advanced statistical techniques, such as finding clusters in your data


R Graphics Cookbook

R Graphics Cookbook

Author: Winston Chang

Publisher: "O'Reilly Media, Inc."

Published: 2013

Total Pages: 414

ISBN-13: 1449316956

DOWNLOAD EBOOK

"Practical recipes for visualizing data"--Cover.


Python for Data Analysis

Python for Data Analysis

Author: Wes McKinney

Publisher: "O'Reilly Media, Inc."

Published: 2017-09-25

Total Pages: 553

ISBN-13: 1491957611

DOWNLOAD EBOOK

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the IPython shell and Jupyter notebook for exploratory computing Learn basic and advanced features in NumPy (Numerical Python) Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples


Practical Data Science with R

Practical Data Science with R

Author: Nina Zumel

Publisher: Manning Publications

Published: 2014-04-10

Total Pages: 416

ISBN-13: 9781617291562

DOWNLOAD EBOOK

Summary Practical Data Science with R lives up to its name. It explains basic principles without the theoretical mumbo-jumbo and jumps right to the real use cases you'll face as you collect, curate, and analyze the data crucial to the success of your business. You'll apply the R programming language and statistical analysis techniques to carefully explained examples based in marketing, business intelligence, and decision support. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Book Business analysts and developers are increasingly collecting, curating, analyzing, and reporting on crucial business data. The R language and its associated tools provide a straightforward way to tackle day-to-day data science tasks without a lot of academic theory or advanced mathematics. Practical Data Science with R shows you how to apply the R programming language and useful statistical techniques to everyday business situations. Using examples from marketing, business intelligence, and decision support, it shows you how to design experiments (such as A/B tests), build predictive models, and present results to audiences of all levels. This book is accessible to readers without a background in data science. Some familiarity with basic statistics, R, or another scripting language is assumed. What's Inside Data science for the business professional Statistical analysis using the R language Project lifecycle, from planning to delivery Numerous instantly familiar use cases Keys to effective data presentations About the Authors Nina Zumel and John Mount are cofounders of a San Francisco-based data science consulting firm. Both hold PhDs from Carnegie Mellon and blog on statistics, probability, and computer science at win-vector.com. Table of Contents PART 1 INTRODUCTION TO DATA SCIENCE The data science process Loading data into R Exploring data Managing data PART 2 MODELING METHODS Choosing and evaluating models Memorization methods Linear and logistic regression Unsupervised methods Exploring advanced methods PART 3 DELIVERING RESULTS Documentation and deployment Producing effective presentations