The book is styled on a Cookbook, containing recipes - combined with free datasets - which will turn readers into proficient OpenRefine users in the fastest possible way.This book is targeted at anyone who works on or handles a large amount of data. No prior knowledge of OpenRefine is required, as we start from the very beginning and gradually reveal more advanced features. You don't even need your own dataset, as we provide example data to try out the book's recipes.
Over 60 practical recipes on data exploration and analysis About This Book Clean dirty data, extract accurate information, and explore the relationships between variables Forecast the output of an electric plant and the water flow of American rivers using pandas, NumPy, Statsmodels, and scikit-learn Find and extract the most important features from your dataset using the most efficient Python libraries Who This Book Is For If you are a beginner or intermediate-level professional who is looking to solve your day-to-day, analytical problems with Python, this book is for you. Even with no prior programming and data analytics experience, you will be able to finish each recipe and learn while doing so. What You Will Learn Read, clean, transform, and store your data usng Pandas and OpenRefine Understand your data and explore the relationships between variables using Pandas and D3.js Explore a variety of techniques to classify and cluster outbound marketing campaign calls data of a bank using Pandas, mlpy, NumPy, and Statsmodels Reduce the dimensionality of your dataset and extract the most important features with pandas, NumPy, and mlpy Predict the output of a power plant with regression models and forecast water flow of American rivers with time series methods using pandas, NumPy, Statsmodels, and scikit-learn Explore social interactions and identify fraudulent activities with graph theory concepts using NetworkX and Gephi Scrape Internet web pages using urlib and BeautifulSoup and get to know natural language processing techniques to classify movies ratings using NLTK Study simulation techniques in an example of a gas station with agent-based modeling In Detail Data analysis is the process of systematically applying statistical and logical techniques to describe and illustrate, condense and recap, and evaluate data. Its importance has been most visible in the sector of information and communication technologies. It is an employee asset in almost all economy sectors. This book provides a rich set of independent recipes that dive into the world of data analytics and modeling using a variety of approaches, tools, and algorithms. You will learn the basics of data handling and modeling, and will build your skills gradually toward more advanced topics such as simulations, raw text processing, social interactions analysis, and more. First, you will learn some easy-to-follow practical techniques on how to read, write, clean, reformat, explore, and understand your data—arguably the most time-consuming (and the most important) tasks for any data scientist. In the second section, different independent recipes delve into intermediate topics such as classification, clustering, predicting, and more. With the help of these easy-to-follow recipes, you will also learn techniques that can easily be expanded to solve other real-life problems such as building recommendation engines or predictive models. In the third section, you will explore more advanced topics: from the field of graph theory through natural language processing, discrete choice modeling to simulations. You will also get to expand your knowledge on identifying fraud origin with the help of a graph, scrape Internet websites, and classify movies based on their reviews. By the end of this book, you will be able to efficiently use the vast array of tools that the Python environment has to offer. Style and approach This hands-on recipe guide is divided into three sections that tackle and overcome real-world data modeling problems faced by data analysts/scientist in their everyday work. Each independent recipe is written in an easy-to-follow and step-by-step fashion.
This practical textbook offers a hands-on introduction to big data analytics, helping you to develop the skills required to hit the ground running as a data professional. It complements theoretical foundations with an emphasis on the application of big data analytics, illustrated by real-life examples and datasets. Containing comprehensive coverage of all the key topics in this area, this book uses open-source technologies and examples in Python and Apache Spark. Learning features include: - Ethics by Design encourages you to consider data ethics at every stage. - Industry Insights facilitate a deeper understanding of the link between what you are studying and how it is applied in industry. - Datasets, questions, and exercises give you the opportunity to apply your learning. Dr Funmi Obembe is the Head of Technology at the Faculty of Arts, Science and Technology, University of Northampton. Dr Ofer Engel is a Data Scientist at the University of Groningen.
A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark About This Book Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data and how to turn it into insight Who This Book Is For This book is for developers who want to implement data analysis and data-driven algorithms in a practical way. It is also suitable for those without a background in data analysis or data processing. Basic knowledge of Python programming, statistics, and linear algebra is assumed. What You Will Learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting In Detail Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. Style and approach This is a hands-on guide to data analysis and data processing. The concrete examples are explained with simple code and accessible data.
A practical, skill-based introduction to data analysis and literacy We are swimming in a world of data, and this handy guide will keep you afloat while you learn to make sense of it all. In Data Literacy: A User′s Guide, David Herzog, a journalist with a decade of experience using data analysis to transform information into captivating storytelling, introduces students and professionals to the fundamentals of data literacy, a key skill in today’s world. Assuming the reader has no advanced knowledge of data analysis or statistics, this book shows how to create insight from publicly-available data through exercises using simple Excel functions. Extensively illustrated, step-by-step instructions within a concise, yet comprehensive, reference will help readers identify, obtain, evaluate, clean, analyze and visualize data. A concluding chapter introduces more sophisticated data analysis methods and tools including database managers such as Microsoft Access and MySQL and standalone statistical programs such as SPSS, SAS and R.
One of the most influential data visualization books—updated with new techniques, technologies, and examples Visualize This demonstrates how to explain data visually, so that you can present and communicate information in a way that is appealing and easy to understand. Today, there is a continuous flow of data available to answer almost any question. Thoughtful charts, maps, and analysis can help us make sense of this data. But the data does not speak for itself. As leading data expert Nathan Yau explains in this book, graphics provide little value unless they are built upon a firm understanding of the data behind them. Visualize This teaches you a data-first approach from a practical point of view. You'll start by exploring what your data has to say, and then you'll design visualizations that are both remarkable and meaningful. With this book, you'll discover what tools are available to you without becoming overwhelmed with options. You'll be exposed to a variety of software and code and jump right into real-world datasets so that you can learn visualization by doing. You'll learn to ask and answer questions with data, so that you can make charts that are both beautiful and useful. Visualize This also provides you with opportunities to apply what you learn to your own data. This completely updated, full-color second edition: Presents a unique approach to visualizing and telling stories with data, from data visualization expert Nathan Yau Offers step-by-step tutorials and practical design tips for creating statistical graphics, geographical maps, and information design Details tools that can be used to visualize data graphics for reports, presentations, and stories, for the web or for print, with major updates for the latest R packages, Python libraries, JavaScript libraries, illustration software, and point-and-click applications Contains numerous examples and descriptions of patterns and outliers and explains how to show them Information designers, analysts, journalists, statisticians, data scientists—as well as anyone studying for careers in these fields—will gain a valuable background in the concepts and techniques of data visualization, thanks to this legendary book.
In pre-modern religions in the geographical context of Asia we encounter unique scripts, number systems, calendars, and naming conventions. These can make Western-built technologies – even tools specifically developed for digital humanities – an ill fit to our needs. The present volume explores this struggle and the limitations and potential opportunities of applying a digital humanities approach to pre-modern Asian religions. The authors cover Buddhism, Christianity, Daoism, Islam, Jainism, Judaism and Shintoism with chapters categorized according to their focus on: 1) temples, 2) manuscripts, 3) texts, and 4) social media. Thus, the volume guides readers through specific methodologies and practical examples while also providing a critical reflection on the state of the field, pushing the interface between digital humanities and pre-modern Asian religions into new territory.
The SAGE Handbook of Social Media Research Methods offers a step-by-step guide to overcoming the challenges inherent in research projects that deal with ‘big and broad data’, from the formulation of research questions through to the interpretation of findings. The handbook includes chapters on specific social media platforms such as Twitter, Sina Weibo and Instagram, as well as a series of critical chapters. The holistic approach is organised into the following sections: Conceptualising & Designing Social Media Research Collection & Storage Qualitative Approaches to Social Media Data Quantitative Approaches to Social Media Data Diverse Approaches to Social Media Data Analytical Tools Social Media Platforms This handbook is the single most comprehensive resource for any scholar or graduate student embarking on a social media project.
The world around us is full of data that holds unique insights and valuable stories, and this book will help you uncover them. Whether you already work with data or want to learn more about its possibilities, the examples and techniques in this practical book will help you more easily clean, evaluate, and analyze data so that you can generate meaningful insights and compelling visualizations. Complementing foundational concepts with expert advice, author Susan E. McGregor provides the resources you need to extract, evaluate, and analyze a wide variety of data sources and formats, along with the tools to communicate your findings effectively. This book delivers a methodical, jargon-free way for data practitioners at any level, from true novices to seasoned professionals, to harness the power of data. Use Python 3.8+ to read, write, and transform data from a variety of sources Understand and use programming basics in Python to wrangle data at scale Organize, document, and structure your code using best practices Collect data from structured data files, web pages, and APIs Perform basic statistical analyses to make meaning from datasets Visualize and present data in clear and compelling ways
Tell your story and show it with data, using free and easy-to-learn tools on the web. This introductory book teaches you how to design interactive charts and customized maps for your website, beginning with simple drag-and-drop tools such as Google Sheets, Datawrapper, and Tableau Public. You'll also gradually learn how to edit open source code templates like Chart.js, Highcharts, and Leaflet on GitHub. Hands-On Data Visualization takes you step-by-step through tutorials, real-world examples, and online resources. This practical guide is ideal for students, nonprofit organizations, small business owners, local governments, journalists, academics, and anyone who wants to take data out of spreadsheets and turn it into lively interactive stories. No coding experience is required. Build interactive charts and maps and embed them in your website Understand the principles for designing effective charts and maps Learn key data visualization concepts to help you choose the right tools Convert and transform tabular and spatial data to tell your data story Edit and host Chart.js, Highcharts, and Leaflet map code templates on GitHub Learn how to detect bias in charts and maps produced by others