This book is the first of its kind to provide a large collection of bioinformatics problems with accompanying solutions. Notably, the problem set includes all of the problems offered in Biological Sequence Analysis, by Durbin et al. (Cambridge, 1998), widely adopted as a required text for bioinformatics courses at leading universities worldwide. Although many of the problems included in Biological Sequence Analysis as exercises for its readers have been repeatedly used for homework and tests, no detailed solutions for the problems were available. Bioinformatics instructors had therefore frequently expressed a need for fully worked solutions and a larger set of problems for use on courses. This book provides just that: following the same structure as Biological Sequence Analysis and significantly extending the set of workable problems, it will facilitate a better understanding of the contents of the chapters in BSA and will help its readers develop problem-solving skills that are vitally important for conducting successful research in the growing field of bioinformatics. All of the material has been class-tested by the authors at Georgia Tech, where the first ever MSc degree program in Bioinformatics was held.
Probabilistic models are becoming increasingly important in analysing the huge amount of data being produced by large-scale DNA-sequencing efforts such as the Human Genome Project. For example, hidden Markov models are used for analysing biological sequences, linguistic-grammar-based probabilistic models for identifying RNA secondary structure, and probabilistic evolutionary models for inferring phylogenies of sequences from different organisms. This book gives a unified, up-to-date and self-contained account, with a Bayesian slant, of such methods, and more generally to probabilistic methods of sequence analysis. Written by an interdisciplinary team of authors, it aims to be accessible to molecular biologists, computer scientists, and mathematicians with no formal knowledge of the other fields, and at the same time present the state-of-the-art in this new and highly important field.
Biology is in the midst of a era yielding many significant discoveries and promising many more. Unique to this era is the exponential growth in the size of information-packed databases. Inspired by a pressing need to analyze that data, Introduction to Computational Biology explores a new area of expertise that emerged from this fertile field- the combination of biological and information sciences. This introduction describes the mathematical structure of biological data, especially from sequences and chromosomes. After a brief survey of molecular biology, it studies restriction maps of DNA, rough landmark maps of the underlying sequences, and clones and clone maps. It examines problems associated with reading DNA sequences and comparing sequences to finding common patterns. The author then considers that statistics of pattern counts in sequences, RNA secondary structure, and the inference of evolutionary history of related sequences. Introduction to Computational Biology exposes the reader to the fascinating structure of biological data and explains how to treat related combinatorial and statistical problems. Written to describe mathematical formulation and development, this book helps set the stage for even more, truly interdisciplinary work in biology.
Bioinformatics, a field devoted to the interpretation and analysis of biological data using computational techniques, has evolved tremendously in recent years due to the explosive growth of biological information generated by the scientific community. Soft computing is a consortium of methodologies that work synergistically and provides, in one form or another, flexible information processing capabilities for handling real-life ambiguous situations. Several research articles dealing with the application of soft computing tools to bioinformatics have been published in the recent past; however, they are scattered in different journals, conference proceedings and technical reports, thus causing inconvenience to readers, students and researchers. This book, unique in its nature, is aimed at providing a treatise in a unified framework, with both theoretical and experimental results, describing the basic principles of soft computing and demonstrating the various ways in which they can be used for analyzing biological data in an efficient manner. Interesting research articles from eminent scientists around the world are brought together in a systematic way such that the reader will be able to understand the issues and challenges in this domain, the existing ways of tackling them, recent trends, and future directions. This book is the first of its kind to bring together two important research areas, soft computing and bioinformatics, in order to demonstrate how the tools and techniques in the former can be used for efficiently solving several problems in the latter. Sample Chapter(s). Chapter 1: Bioinformatics: Mining the Massive Data from High Throughput Genomics Experiments (160 KB). Contents: Overview: Bioinformatics: Mining the Massive Data from High Throughput Genomics Experiments (H Tang & S Kim); An Introduction to Soft Computing (A Konar & S Das); Biological Sequence and Structure Analysis: Reconstructing Phylogenies with Memetic Algorithms and Branch-and-Bound (J E Gallardo et al.); Classification of RNA Sequences with Support Vector Machines (J T L Wang & X Wu); Beyond String Algorithms: Protein Sequence Analysis Using Wavelet Transforms (A Krishnan & K-B Li); Filtering Protein Surface Motifs Using Negative Instances of Active Sites Candidates (N L Shrestha & T Ohkawa); Distill: A Machine Learning Approach to Ab Initio Protein Structure Prediction (G Pollastri et al.); In Silico Design of Ligands Using Properties of Target Active Sites (S Bandyopadhyay et al.); Gene Expression and Microarray Data Analysis: Inferring Regulations in a Genomic Network from Gene Expression Profiles (N Noman & H Iba); A Reliable Classification of Gene Clusters for Cancer Samples Using a Hybrid Multi-Objective Evolutionary Procedure (K Deb et al.); Feature Selection for Cancer Classification Using Ant Colony Optimization and Support Vector Machines (A Gupta et al.); Sophisticated Methods for Cancer Classification Using Microarray Data (S-B Cho & H-S Park); Multiobjective Evolutionary Approach to Fuzzy Clustering of Microarray Data (A Mukhopadhyay et al.). Readership: Graduate students and researchers in computer science, bioinformatics, computational and molecular biology, artificial intelligence, data mining, machine learning, electrical engineering, system science; researchers in pharmaceutical industries.
Sequence - Evolution - Function is an introduction to the computational approaches that play a critical role in the emerging new branch of biology known as functional genomics. The book provides the reader with an understanding of the principles and approaches of functional genomics and of the potential and limitations of computational and experimental approaches to genome analysis. Sequence - Evolution - Function should help bridge the "digital divide" between biologists and computer scientists, allowing biologists to better grasp the peculiarities of the emerging field of Genome Biology and to learn how to benefit from the enormous amount of sequence data available in the public databases. The book is non-technical with respect to the computer methods for genome analysis and discusses these methods from the user's viewpoint, without addressing mathematical and algorithmic details. Prior practical familiarity with the basic methods for sequence analysis is a major advantage, but a reader without such experience will be able to use the book as an introduction to these methods. This book is perfect for introductory level courses in computational methods for comparative and functional genomics.
An introductory text that emphasizes the underlying algorithmic ideas that are driving advances in bioinformatics. This introductory text offers a clear exposition of the algorithmic principles driving advances in bioinformatics. Accessible to students in both biology and computer science, it strikes a unique balance between rigorous mathematics and practical techniques, emphasizing the ideas underlying algorithms rather than offering a collection of apparently unrelated problems. The book introduces biological and algorithmic ideas together, linking issues in computer science to biology and thus capturing the interest of students in both subjects. It demonstrates that relatively few design techniques can be used to solve a large number of practical problems in biology, and presents this material intuitively. An Introduction to Bioinformatics Algorithms is one of the first books on bioinformatics that can be used by students at an undergraduate level. It includes a dual table of contents, organized by algorithmic idea and biological idea; discussions of biologically relevant problems, including a detailed problem formulation and one or more solutions for each; and brief biographical sketches of leading figures in the field. These interesting vignettes offer students a glimpse of the inspirations and motivations for real work in bioinformatics, making the concepts presented in the text more concrete and the techniques more approachable.PowerPoint presentations, practical bioinformatics problems, sample code, diagrams, demonstrations, and other materials can be found at the Author's website.
Advances in computers and biotechnology have had a profound impact on biomedical research, and as a result complex data sets can now be generated to address extremely complex biological questions. Correspondingly, advances in the statistical methods necessary to analyze such data are following closely behind the advances in data generation methods. The statistical methods required by bioinformatics present many new and difficult problems for the research community. This book provides an introduction to some of these new methods. The main biological topics treated include sequence analysis, BLAST, microarray analysis, gene finding, and the analysis of evolutionary processes. The main statistical techniques covered include hypothesis testing and estimation, Poisson processes, Markov models and Hidden Markov models, and multiple testing methods. The second edition features new chapters on microarray analysis and on statistical inference, including a discussion of ANOVA, and discussions of the statistical theory of motifs and methods based on the hypergeometric distribution. Much material has been clarified and reorganized. The book is written so as to appeal to biologists and computer scientists who wish to know more about the statistical methods of the field, as well as to trained statisticians who wish to become involved with bioinformatics. The earlier chapters introduce the concepts of probability and statistics at an elementary level, but with an emphasis on material relevant to later chapters and often not covered in standard introductory texts. Later chapters should be immediately accessible to the trained statistician. Sufficient mathematical background consists of introductory courses in calculus and linear algebra. The basic biological concepts that are used are explained, or can be understood from the context, and standard mathematical concepts are summarized in an Appendix. Problems are provided at the end of each chapter allowing the reader to develop aspects of the theory outlined in the main text. Warren J. Ewens holds the Christopher H. Brown Distinguished Professorship at the University of Pennsylvania. He is the author of two books, Population Genetics and Mathematical Population Genetics. He is a senior editor of Annals of Human Genetics and has served on the editorial boards of Theoretical Population Biology, GENETICS, Proceedings of the Royal Society B and SIAM Journal in Mathematical Biology. He is a fellow of the Royal Society and the Australian Academy of Science. Gregory R. Grant is a senior bioinformatics researcher in the University of Pennsylvania Computational Biology and Informatics Laboratory. He obtained his Ph.D. in number theory from the University of Maryland in 1995 and his Masters in Computer Science from the University of Pennsylvania in 1999. Comments on the first edition: "This book would be an ideal text for a postgraduate course...[and] is equally well suited to individual study.... I would recommend the book highly." (Biometrics) "Ewens and Grant have given us a very welcome introduction to what is behind those pretty [graphical user] interfaces." (Naturwissenschaften) "The authors do an excellent job of presenting the essence of the material without getting bogged down in mathematical details." (Journal American Statistical Association) "The authors have restructured classical material to a great extent and the new organization of the different topics is one of the outstanding services of the book." (Metrika)
Calculations for Molecular Biology and Biotechnology: A Guide to Mathematics in the Laboratory, Second Edition, provides an introduction to the myriad of laboratory calculations used in molecular biology and biotechnology. The book begins by discussing the use of scientific notation and metric prefixes, which require the use of exponents and an understanding of significant digits. It explains the mathematics involved in making solutions; the characteristics of cell growth; the multiplicity of infection; and the quantification of nucleic acids. It includes chapters that deal with the mathematics involved in the use of radioisotopes in nucleic acid research; the synthesis of oligonucleotides; the polymerase chain reaction (PCR) method; and the development of recombinant DNA technology. Protein quantification and the assessment of protein activity are also discussed, along with the centrifugation method and applications of PCR in forensics and paternity testing. - Topics range from basic scientific notations to complex subjects like nucleic acid chemistry and recombinant DNA technology - Each chapter includes a brief explanation of the concept and covers necessary definitions, theory and rationale for each type of calculation - Recent applications of the procedures and computations in clinical, academic, industrial and basic research laboratories are cited throughout the text New to this Edition: - Updated and increased coverage of real time PCR and the mathematics used to measure gene expression - More sample problems in every chapter for readers to practice concepts
Supervised sequence labelling is a vital area of machine learning, encompassing tasks such as speech, handwriting and gesture recognition, protein secondary structure prediction and part-of-speech tagging. Recurrent neural networks are powerful sequence learning tools—robust to input noise and distortion, able to exploit long-range contextual information—that would seem ideally suited to such problems. However their role in large-scale sequence labelling systems has so far been auxiliary. The goal of this book is a complete framework for classifying and transcribing sequential data with recurrent neural networks only. Three main innovations are introduced in order to realise this goal. Firstly, the connectionist temporal classification output layer allows the framework to be trained with unsegmented target sequences, such as phoneme-level speech transcriptions; this is in contrast to previous connectionist approaches, which were dependent on error-prone prior segmentation. Secondly, multidimensional recurrent neural networks extend the framework in a natural way to data with more than one spatio-temporal dimension, such as images and videos. Thirdly, the use of hierarchical subsampling makes it feasible to apply the framework to very large or high resolution sequences, such as raw audio or video. Experimental validation is provided by state-of-the-art results in speech and handwriting recognition.