High-dimensional Microarray Data Analysis

High-dimensional Microarray Data Analysis

Author: Shuichi Shinmura

Publisher: Springer

Published: 2019-05-14

Total Pages: 437

ISBN-13: 9811359989

DOWNLOAD EBOOK

This book shows how to decompose high-dimensional microarrays into small subspaces (Small Matryoshkas, SMs), statistically analyze them, and perform cancer gene diagnosis. The information is useful for genetic experts, anyone who analyzes genetic data, and students to use as practical textbooks. Discriminant analysis is the best approach for microarray consisting of normal and cancer classes. Microarrays are linearly separable data (LSD, Fact 3). However, because most linear discriminant function (LDF) cannot discriminate LSD theoretically and error rates are high, no one had discovered Fact 3 until now. Hard-margin SVM (H-SVM) and Revised IP-OLDF (RIP) can find Fact3 easily. LSD has the Matryoshka structure and is easily decomposed into many SMs (Fact 4). Because all SMs are small samples and LSD, statistical methods analyze SMs easily. However, useful results cannot be obtained. On the other hand, H-SVM and RIP can discriminate two classes in SM entirely. RatioSV is the ratio of SV distance and discriminant range. The maximum RatioSVs of six microarrays is over 11.67%. This fact shows that SV separates two classes by window width (11.67%). Such easy discrimination has been unresolved since 1970. The reason is revealed by facts presented here, so this book can be read and enjoyed like a mystery novel. Many studies point out that it is difficult to separate signal and noise in a high-dimensional gene space. However, the definition of the signal is not clear. Convincing evidence is presented that LSD is a signal. Statistical analysis of the genes contained in the SM cannot provide useful information, but it shows that the discriminant score (DS) discriminated by RIP or H-SVM is easily LSD. For example, the Alon microarray has 2,000 genes which can be divided into 66 SMs. If 66 DSs are used as variables, the result is a 66-dimensional data. These signal data can be analyzed to find malignancy indicators by principal component analysis and cluster analysis.


Discriminant Analysis and Applications

Discriminant Analysis and Applications

Author: T. Cacoullos

Publisher: Academic Press

Published: 2014-05-10

Total Pages: 455

ISBN-13: 1483268713

DOWNLOAD EBOOK

Discriminant Analysis and Applications comprises the proceedings of the NATO Advanced Study Institute on Discriminant Analysis and Applications held in Kifissia, Athens, Greece in June 1972. The book presents the theory and applications of Discriminant analysis, one of the most important areas of multivariate statistical analysis. This volume contains chapters that cover the historical development of discriminant analysis methods; logistic and quasi-linear discrimination; and distance functions. Medical and biological applications, and computer graphical analysis and graphical techniques for multidimensional data are likewise discussed. Statisticians, mathematicians, and biomathematicians will find the book very interesting.


Error Estimation for Pattern Recognition

Error Estimation for Pattern Recognition

Author: Ulisses M. Braga Neto

Publisher: John Wiley & Sons

Published: 2015-07-07

Total Pages: 336

ISBN-13: 1118999738

DOWNLOAD EBOOK

This book is the first of its kind to discuss error estimation with a model-based approach. From the basics of classifiers and error estimators to distributional and Bayesian theory, it covers important topics and essential issues pertaining to the scientific validity of pattern classification. Error Estimation for Pattern Recognition focuses on error estimation, which is a broad and poorly understood topic that reaches all research areas using pattern classification. It includes model-based approaches and discussions of newer error estimators such as bolstered and Bayesian estimators. This book was motivated by the application of pattern recognition to high-throughput data with limited replicates, which is a basic problem now appearing in many areas. The first two chapters cover basic issues in classification error estimation, such as definitions, test-set error estimation, and training-set error estimation. The remaining chapters in this book cover results on the performance and representation of training-set error estimators for various pattern classifiers. Additional features of the book include: • The latest results on the accuracy of error estimation • Performance analysis of re-substitution, cross-validation, and bootstrap error estimators using analytical and simulation approaches • Highly interactive computer-based exercises and end-of-chapter problems This is the first book exclusively about error estimation for pattern recognition. Ulisses M. Braga Neto is an Associate Professor in the Department of Electrical and Computer Engineering at Texas A&M University, USA. He received his PhD in Electrical and Computer Engineering from The Johns Hopkins University. Dr. Braga Neto received an NSF CAREER Award for his work on error estimation for pattern recognition with applications in genomic signal processing. He is an IEEE Senior Member. Edward R. Dougherty is a Distinguished Professor, Robert F. Kennedy ’26 Chair, and Scientific Director at the Center for Bioinformatics and Genomic Systems Engineering at Texas A&M University, USA. He is a fellow of both the IEEE and SPIE, and he has received the SPIE Presidents Award. Dr. Dougherty has authored several books including Epistemology of the Cell: A Systems Perspective on Biological Knowledge and Random Processes for Image and Signal Processing (Wiley-IEEE Press).


High-Dimensional Data Analysis in Cancer Research

High-Dimensional Data Analysis in Cancer Research

Author: Xiaochun Li

Publisher: Springer Science & Business Media

Published: 2008-12-19

Total Pages: 164

ISBN-13: 0387697659

DOWNLOAD EBOOK

Multivariate analysis is a mainstay of statistical tools in the analysis of biomedical data. It concerns with associating data matrices of n rows by p columns, with rows representing samples (or patients) and columns attributes of samples, to some response variables, e.g., patients outcome. Classically, the sample size n is much larger than p, the number of variables. The properties of statistical models have been mostly discussed under the assumption of fixed p and infinite n. The advance of biological sciences and technologies has revolutionized the process of investigations of cancer. The biomedical data collection has become more automatic and more extensive. We are in the era of p as a large fraction of n, and even much larger than n. Take proteomics as an example. Although proteomic techniques have been researched and developed for many decades to identify proteins or peptides uniquely associated with a given disease state, until recently this has been mostly a laborious process, carried out one protein at a time. The advent of high throughput proteome-wide technologies such as liquid chromatography-tandem mass spectroscopy make it possible to generate proteomic signatures that facilitate rapid development of new strategies for proteomics-based detection of disease. This poses new challenges and calls for scalable solutions to the analysis of such high dimensional data. In this volume, we will present the systematic and analytical approaches and strategies from both biostatistics and bioinformatics to the analysis of correlated and high-dimensional data.


New Theory of Discriminant Analysis After R. Fisher

New Theory of Discriminant Analysis After R. Fisher

Author: Shuichi Shinmura

Publisher: Springer

Published: 2016-12-27

Total Pages: 221

ISBN-13: 9811021643

DOWNLOAD EBOOK

This is the first book to compare eight LDFs by different types of datasets, such as Fisher’s iris data, medical data with collinearities, Swiss banknote data that is a linearly separable data (LSD), student pass/fail determination using student attributes, 18 pass/fail determinations using exam scores, Japanese automobile data, and six microarray datasets (the datasets) that are LSD. We developed the 100-fold cross-validation for the small sample method (Method 1) instead of the LOO method. We proposed a simple model selection procedure to choose the best model having minimum M2 and Revised IP-OLDF based on MNM criterion was found to be better than other M2s in the above datasets. We compared two statistical LDFs and six MP-based LDFs. Those were Fisher’s LDF, logistic regression, three SVMs, Revised IP-OLDF, and another two OLDFs. Only a hard-margin SVM (H-SVM) and Revised IP-OLDF could discriminate LSD theoretically (Problem 2). We solved the defect of the generalized inverse matrices (Problem 3). For more than 10 years, many researchers have struggled to analyze the microarray dataset that is LSD (Problem 5). If we call the linearly separable model "Matroska," the dataset consists of numerous smaller Matroskas in it. We develop the Matroska feature selection method (Method 2). It finds the surprising structure of the dataset that is the disjoint union of several small Matroskas. Our theory and methods reveal new facts of gene analysis.


COMPSTAT 2006 - Proceedings in Computational Statistics

COMPSTAT 2006 - Proceedings in Computational Statistics

Author: Alfredo Rizzi

Publisher: Springer Science & Business Media

Published: 2007-12-03

Total Pages: 530

ISBN-13: 3790817090

DOWNLOAD EBOOK

International Association for Statistical Computing The International Association for Statistical Computing (IASC) is a Section of the International Statistical Institute. The objectives of the Association are to foster world-wide interest in e?ective statistical computing and to - change technical knowledge through international contacts and meetings - tween statisticians, computing professionals, organizations, institutions, g- ernments and the general public. The IASC organises its own Conferences, IASC World Conferences, and COMPSTAT in Europe. The 17th Conference of ERS-IASC, the biennial meeting of European - gional Section of the IASC was held in Rome August 28 - September 1, 2006. This conference took place in Rome exactly 20 years after the 7th COMP- STAT symposium which was held in Rome, in 1986. Previous COMPSTAT conferences were held in: Vienna (Austria, 1974); West-Berlin (Germany, 1976); Leiden (The Netherlands, 1978); Edimbourgh (UK, 1980); Toulouse (France, 1982); Prague (Czechoslovakia, 1984); Rome (Italy, 1986); Copenhagen (Denmark, 1988); Dubrovnik (Yugoslavia, 1990); Neuchˆ atel (Switzerland, 1992); Vienna (Austria,1994); Barcelona (Spain, 1996);Bristol(UK,1998);Utrecht(TheNetherlands,2000);Berlin(Germany, 2002); Prague (Czech Republic, 2004).


Journeys to Data Mining

Journeys to Data Mining

Author: Mohamed Medhat Gaber

Publisher: Springer Science & Business Media

Published: 2012-07-20

Total Pages: 241

ISBN-13: 3642280471

DOWNLOAD EBOOK

Data mining, an interdisciplinary field combining methods from artificial intelligence, machine learning, statistics and database systems, has grown tremendously over the last 20 years and produced core results for applications like business intelligence, spatio-temporal data analysis, bioinformatics, and stream data processing. The fifteen contributors to this volume are successful and well-known data mining scientists and professionals. Although by no means an exhaustive list, all of them have helped the field to gain the reputation and importance it enjoys today, through the many valuable contributions they have made. Mohamed Medhat Gaber has asked them (and many others) to write down their journeys through the data mining field, trying to answer the following questions: 1. What are your motives for conducting research in the data mining field? 2. Describe the milestones of your research in this field. 3. What are your notable success stories? 4. How did you learn from your failures? 5. Have you encountered unexpected results? 6. What are the current research issues and challenges in your area? 7. Describe your research tools and techniques. 8. How would you advise a young researcher to make an impact? 9. What do you predict for the next two years in your area? 10. What are your expectations in the long term? In order to maintain the informal character of their contributions, they were given complete freedom as to how to organize their answers. This narrative presentation style provides PhD students and novices who are eager to find their way to successful research in data mining with valuable insights into career planning. In addition, everyone else interested in the history of computer science may be surprised about the stunning successes and possible failures computer science careers (still) have to offer.


Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition

Author: Geoffrey J. McLachlan

Publisher: John Wiley & Sons

Published: 2005-02-25

Total Pages: 553

ISBN-13: 0471725285

DOWNLOAD EBOOK

The Wiley-Interscience Paperback Series consists of selected books that have been made more accessible to consumers in an effort to increase global appeal and general circulation. With these new unabridged softcover volumes, Wiley hopes to extend the lives of these works by making them available to future generations of statisticians, mathematicians, and scientists. "For both applied and theoretical statisticians as well as investigators working in the many areas in which relevant use can be made of discriminant techniques, this monograph provides a modern, comprehensive, and systematic account of discriminant analysis, with the focus on the more recent advances in the field." –SciTech Book News ". . . a very useful source of information for any researcher working in discriminant analysis and pattern recognition." –Computational Statistics Discriminant Analysis and Statistical Pattern Recognition provides a systematic account of the subject. While the focus is on practical considerations, both theoretical and practical issues are explored. Among the advances covered are regularized discriminant analysis and bootstrap-based assessment of the performance of a sample-based discriminant rule, and extensions of discriminant analysis motivated by problems in statistical image analysis. The accompanying bibliography contains over 1,200 references.