Combinatorial data analysis (CDA) refers to a wide class of methods for the study of relevant data sets in which the arrangement of a collection of objects is absolutely central. The focus of this monograph is on the identification of arrangements, which are then further restricted to where the combinatorial search is carried out by a recursive optimization process based on the general principles of dynamic programming (DP).
This book provides clear explanatory text, illustrative mathematics and algorithms, demonstrations of the iterative process, pseudocode, and well-developed examples for applications of the branch-and-bound paradigm to important problems in combinatorial data analysis. Supplementary material, such as computer programs, are provided on the world wide web. Dr. Brusco is an editorial board member for the Journal of Classification, and a member of the Board of Directors for the Classification Society of North America.
Analytic combinatorics aims to enable precise quantitative predictions of the properties of large combinatorial structures. The theory has emerged over recent decades as essential both for the analysis of algorithms and for the study of scientific models in many disciplines, including probability theory, statistical physics, computational biology, and information theory. With a careful combination of symbolic enumeration methods and complex analysis, drawing heavily on generating functions, results of sweeping generality emerge that can be applied in particular to fundamental structures such as permutations, sequences, strings, walks, paths, trees, graphs and maps. This account is the definitive treatment of the topic. The authors give full coverage of the underlying mathematics and a thorough treatment of both classical and modern applications of the theory. The text is complemented with exercises, examples, appendices and notes to aid understanding. The book can be used for an advanced undergraduate or a graduate course, or for self-study.
Decision trees and decision rule systems are widely used in different applications as algorithms for problem solving, as predictors, and as a way for knowledge representation. Reducts play key role in the problem of attribute (feature) selection. The aims of this book are (i) the consideration of the sets of decision trees, rules and reducts; (ii) study of relationships among these objects; (iii) design of algorithms for construction of trees, rules and reducts; and (iv) obtaining bounds on their complexity. Applications for supervised machine learning, discrete optimization, analysis of acyclic programs, fault diagnosis, and pattern recognition are considered also. This is a mixture of research monograph and lecture notes. It contains many unpublished results. However, proofs are carefully selected to be understandable for students. The results considered in this book can be useful for researchers in machine learning, data mining and knowledge discovery, especially for those who are working in rough set theory, test theory and logical analysis of data. The book can be used in the creation of courses for graduate students.
Dynamic programming is an efficient technique for solving optimization problems. It is based on breaking the initial problem down into simpler ones and solving these sub-problems, beginning with the simplest ones. A conventional dynamic programming algorithm returns an optimal object from a given set of objects. This book develops extensions of dynamic programming, enabling us to (i) describe the set of objects under consideration; (ii) perform a multi-stage optimization of objects relative to different criteria; (iii) count the number of optimal objects; (iv) find the set of Pareto optimal points for bi-criteria optimization problems; and (v) to study relationships between two criteria. It considers various applications, including optimization of decision trees and decision rule systems as algorithms for problem solving, as ways for knowledge representation, and as classifiers; optimization of element partition trees for rectangular meshes, which are used in finite element methods for solving PDEs; and multi-stage optimization for such classic combinatorial optimization problems as matrix chain multiplication, binary search trees, global sequence alignment, and shortest paths. The results presented are useful for researchers in combinatorial optimization, data mining, knowledge discovery, machine learning, and finite element methods, especially those working in rough set theory, test theory, logical analysis of data, and PDE solvers. This book can be used as the basis for graduate courses.
This monograph offers an original broad and very diverse exploration of the seriation domain in data analysis, together with building a specific relation to clustering. Relative to a data table crossing a set of objects and a set of descriptive attributes, the search for orders which correspond respectively to these two sets is formalized mathematically and statistically. State-of-the-art methods are created and compared with classical methods and a thorough understanding of the mutual relationships between these methods is clearly expressed. The authors distinguish two families of methods: Geometric representation methods Algorithmic and Combinatorial methods Original and accurate methods are provided in the framework for both families. Their basis and comparison is made on both theoretical and experimental levels. The experimental analysis is very varied and very comprehensive. Seriation in Combinatorial and Statistical Data Analysis has a unique character in the literature falling within the fields of Data Analysis, Data Mining and Knowledge Discovery. It will be a valuable resource for students and researchers in the latter fields.
"This book covers research topics of data mining on bioinformatics presenting the basics and problems of bioinformatics and applications of data mining technologies pertaining to the field"--Provided by publisher.
Geometric Data Analysis (GDA) is the name suggested by P. Suppes (Stanford University) to designate the approach to Multivariate Statistics initiated by Benzécri as Correspondence Analysis, an approach that has become more and more used and appreciated over the years. This book presents the full formalization of GDA in terms of linear algebra - the most original and far-reaching consequential feature of the approach - and shows also how to integrate the standard statistical tools such as Analysis of Variance, including Bayesian methods. Chapter 9, Research Case Studies, is nearly a book in itself; it presents the methodology in action on three extensive applications, one for medicine, one from political science, and one from education (data borrowed from the Stanford computer-based Educational Program for Gifted Youth ). Thus the readership of the book concerns both mathematicians interested in the applications of mathematics, and researchers willing to master an exceptionally powerful approach of statistical data analysis.
The combinatorial theory of species, introduced by Joyal in 1980, provides a unified understanding of the use of generating functions for both labelled and unlabelled structures and as a tool for the specification and analysis of these structures. Of particular importance is their capacity to transform recursive definitions of tree-like structures into functional or differential equations, and vice versa. The goal of this book is to present the basic elements of the theory and to give a unified account of its developments and applications. It offers a modern introduction to the use of various generating functions, with applications to graphical enumeration, Polya theory and analysis of data structures in computer science, and to other areas such as special functions, functional equations, asymptotic analysis and differential equations. This book will be a valuable reference to graduate students and researchers in combinatorics, analysis, and theoretical computer science.
This volume was born from the experience of the authors as researchers and educators,whichsuggeststhatmanystudentsofdataminingarehandicapped in their research by the lack of a formal, systematic education in its mat- matics. The data mining literature contains many excellent titles that address the needs of users with a variety of interests ranging from decision making to p- tern investigation in biological data. However, these books do not deal with the mathematical tools that are currently needed by data mining researchers and doctoral students. We felt it timely to produce a book that integrates the mathematics of data mining with its applications. We emphasize that this book is about mathematical tools for data mining and not about data mining itself; despite this, a substantial amount of applications of mathematical c- cepts in data mining are presented. The book is intended as a reference for the working data miner. In our opinion, three areas of mathematics are vital for data mining: set theory,includingpartially orderedsetsandcombinatorics;linear algebra,with its many applications in principal component analysis and neural networks; and probability theory, which plays a foundational role in statistics, machine learning and data mining. Thisvolumeisdedicatedtothestudyofset-theoreticalfoundationsofdata mining. Two further volumes are contemplated that will cover linear algebra and probability theory. The ?rst part of this book, dedicated to set theory, begins with a study of functionsandrelations.Applicationsofthesefundamentalconceptstosuch- sues as equivalences and partitions are discussed. Also, we prepare the ground for the following volumes by discussing indicator functions, ?elds and?-?elds, and other concepts.