In the age of Big Data, efficient algorithms are in high demand. It is also essential that efficient algorithms should be scalable. This book surveys a family of algorithmic techniques for the design of scalable algorithms. These techniques include local network exploration, advanced sampling, sparsification, and geometric partitioning.
If you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce applications with Apache Hadoop or Apache Spark. Each chapter provides a recipe for solving a massive computational problem, such as building a recommendation system. You’ll learn how to implement the appropriate MapReduce solution with code that you can use in your projects. Dr. Mahmoud Parsian covers basic design patterns, optimization techniques, and data mining and machine learning solutions for problems in bioinformatics, genomics, statistics, and social network analysis. This book also includes an overview of MapReduce, Hadoop, and Spark. Topics include: Market basket analysis for a large set of transactions Data mining algorithms (K-means, KNN, and Naive Bayes) Using huge genomic data to sequence DNA and RNA Naive Bayes theorem and Markov chains for data and market prediction Recommendation algorithms and pairwise document similarity Linear regression, Cox regression, and Pearson correlation Allelic frequency and mining DNA Social network analysis (recommendation systems, counting triangles, sentiment analysis)
This book constitutes the refereed proceedings of the 23rd International Conference on Computing and Combinatorics, COCOON 2017, held in Hiong Kong, China, in August 2017. The 56 full papers papers presented in this book were carefully reviewed and selected from 119 submissions. The papers cover various topics, including algorithms and data structures, complexity theory and computability, algorithmic game theory, computational learning theory, cryptography, computationalbiology, computational geometry and number theory, graph theory, and parallel and distributed computing.
This open access book surveys the progress in addressing selected challenges related to the growth of big data in combination with increasingly complicated hardware. It emerged from a research program established by the German Research Foundation (DFG) as priority program SPP 1736 on Algorithmics for Big Data where researchers from theoretical computer science worked together with application experts in order to tackle problems in domains such as networking, genomics research, and information retrieval. Such domains are unthinkable without substantial hardware and software support, and these systems acquire, process, exchange, and store data at an exponential rate. The chapters of this volume summarize the results of projects realized within the program and survey-related work. This is an open access book.
Drawing examples from real-world networks, this essential book traces the methods behind network analysis and explains how network data is first gathered, then processed and interpreted. The text will equip you with a toolbox of diverse methods and data modelling approaches, allowing you to quickly start making your own calculations on a huge variety of networked systems. This book sets you up to succeed, addressing the questions of what you need to know and what to do with it, when beginning to work with network data. The hands-on approach adopted throughout means that beginners quickly become capable practitioners, guided by a wealth of interesting examples that demonstrate key concepts. Exercises using real-world data extend and deepen your understanding, and develop effective working patterns in network calculations and analysis. Suitable for both graduate students and researchers across a range of disciplines, this novel text provides a fast-track to network data expertise.
"This book presents, discusses, shares ideas, results and experiences on the recent important advances and future challenges on enabling technologies for achieving higher performance"--Provided by publisher.
This book highlights the different types of data architecture and illustrates the many possibilities hidden behind the term "Big Data", from the usage of No-SQL databases to the deployment of stream analytics architecture, machine learning, and governance. Scalable Big Data Architecture covers real-world, concrete industry use cases that leverage complex distributed applications , which involve web applications, RESTful API, and high throughput of large amount of data stored in highly scalable No-SQL data stores such as Couchbase and Elasticsearch. This book demonstrates how data processing can be done at scale from the usage of NoSQL datastores to the combination of Big Data distribution. When the data processing is too complex and involves different processing topology like long running jobs, stream processing, multiple data sources correlation, and machine learning, it’s often necessary to delegate the load to Hadoop or Spark and use the No-SQL to serve processed data in real time. This book shows you how to choose a relevant combination of big data technologies available within the Hadoop ecosystem. It focuses on processing long jobs, architecture, stream data patterns, log analysis, and real time analytics. Every pattern is illustrated with practical examples, which use the different open sourceprojects such as Logstash, Spark, Kafka, and so on. Traditional data infrastructures are built for digesting and rendering data synthesis and analytics from large amount of data. This book helps you to understand why you should consider using machine learning algorithms early on in the project, before being overwhelmed by constraints imposed by dealing with the high throughput of Big data. Scalable Big Data Architecture is for developers, data architects, and data scientists looking for a better understanding of how to choose the most relevant pattern for a Big Data project and which tools to integrate into that pattern.
This book highlights cutting-edge research in the field of network science, offering scientists, researchers, students and practitioners a unique update on the latest advances in theory, together with a wealth of applications. It presents the peer-reviewed proceedings of the VII International Conference on Complex Networks and their Applications (COMPLEX NETWORKS 2018), which was held in Cambridge on December 11–13, 2018. The carefully selected papers cover a wide range of theoretical topics such as network models and measures; community structure and network dynamics; diffusion, epidemics and spreading processes; and resilience and control; as well as all the main network applications, including social and political networks; networks in finance and economics; biological and neuroscience networks; and technological networks.