Understanding sequence data, and the ability to utilize this hidden knowledge, will create a significant impact on many aspects of our society. Examples of sequence data include DNA, protein, customer purchase history, web surfing history, and more. This book provides thorough coverage of the existing results on sequence data mining as well as pattern types and associated pattern mining methods. It offers balanced coverage on data mining and sequence data analysis, allowing readers to access the state-of-the-art results in one place.
"This book provides a comprehensive view of sequence mining techniques, and present current research and case studies in Pattern Discovery in Sequential data authored by researchers and practitioners"--
In many applications, e.g., bioinformatics, web access traces, system u- lization logs, etc., the data is naturally in the form of sequences. It has been of great interests to analyze the sequential data to find their inherent char- teristics. The sequential pattern is one of the most widely studied models to capture such characteristics. Examples of sequential patterns include but are not limited to protein sequence motifs and web page navigation traces. In this book, we focus on sequential pattern mining. To meet different needs of various applications, several models of sequential patterns have been proposed. We do not only study the mathematical definitions and application domains of these models, but also the algorithms on how to effectively and efficiently find these patterns. The objective of this book is to provide computer scientists and domain - perts such as life scientists with a set of tools in analyzing and understanding the nature of various sequences by : (1) identifying the specific model(s) of - quential patterns that are most suitable, and (2) providing an efficient algorithm for mining these patterns. Chapter 1 INTRODUCTION Data Mining is the process of extracting implicit knowledge and discovery of interesting characteristics and patterns that are not explicitly represented in the databases. The techniques can play an important role in understanding data and in capturing intrinsic relationships among data instances. Data mining has been an active research area in the past decade and has been proved to be very useful.
Written especially for computer scientists, all necessary biology is explained. Presents new techniques on gene expression data mining, gene mapping for disease detection, and phylogenetic knowledge discovery.
Probabilistic models are becoming increasingly important in analysing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analysing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it aims to be accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time present the state-of-the-art in this new and highly important field.
Data Mining: Concepts and Techniques provides the concepts and techniques in processing gathered data or information, which will be used in various applications. Specifically, it explains data mining and the tools used in discovering knowledge from the collected data. This book is referred as the knowledge discovery from data (KDD). It focuses on the feasibility, usefulness, effectiveness, and scalability of techniques of large data sets. After describing data mining, this edition explains the methods of knowing, preprocessing, processing, and warehousing data. It then presents information about data warehouses, online analytical processing (OLAP), and data cube technology. Then, the methods involved in mining frequent patterns, associations, and correlations for large data sets are described. The book details the methods for data classification and introduces the concepts and methods for data clustering. The remaining chapters discuss the outlier detection and the trends, applications, and research frontiers in data mining. This book is intended for Computer Science students, application developers, business professionals, and researchers who seek information on data mining. - Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects - Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields - Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of your data
Like a data-guzzling turbo engine, advanced data mining has been powering post-genome biological studies for two decades. Reflecting this growth, Biological Data Mining presents comprehensive data mining concepts, theories, and applications in current biological and medical research. Each chapter is written by a distinguished team of interdisciplin
This open access book provides innovative methods and original applications of sequence analysis (SA) and related methods for analysing longitudinal data describing life trajectories such as professional careers, family paths, the succession of health statuses, or the time use. The applications as well as the methodological contributions proposed in this book pay special attention to the combined use of SA and other methods for longitudinal data such as event history analysis, Markov modelling, and sequence network. The methodological contributions in this book include among others original propositions for measuring the precarity of work trajectories, Markov-based methods for clustering sequences, fuzzy and monothetic clustering of sequences, network-based SA, joint use of SA and hidden Markov models, and of SA and survival models. The applications cover the comparison of gendered occupational trajectories in Germany, the study of the changes in women market participation in Denmark, the study of typical day of dual-earner couples in Italy, of mobility patterns in Togo, of internet addiction in Switzerland, and of the quality of employment career after a first unemployment spell. As such this book provides a wealth of information for social scientists interested in quantitative life course analysis, and all those working in sociology, demography, economics, health, psychology, social policy, and statistics.; Provides new perspectives and methods for sequence analysis Focusses on the link between sequence analysis and other methods for longitudinal data, especially event history analysis and Markov models Stresses the complementarity of sequence analysis and other models for longitudinal data Applications of sequence analysis in a whole range of different domains This work was published by Saint Philip Street Press pursuant to a Creative Commons license permitting commercial use. All rights not granted by the work's license are retained by the author or authors.
A Fruitful Field for Researching Data Mining Methodology and for Solving Real-Life ProblemsContrast Data Mining: Concepts, Algorithms, and Applications collects recent results from this specialized area of data mining that have previously been scattered in the literature, making them more accessible to researchers and developers in data mining and
This book constitutes the refereed proceedings of the 5th International Conference on Modeling Decisions for Artificial Intelligence, MDAI 2008, held in Sabadell, Spain, in October 2008. The 19 revised full papers presented together with 2 invited lectures were thoroughly reviewed and selected from 43 submissions; they are devoted to theory and tools for modeling decisions, as well as applications that encompass decision making processes and information fusion techniques. The papers are organized in topical sections on aggregation operators, decision making, clustering and similarity, computational intelligence and optimization, as well as data privacy.