Computational Methods for Transcriptome-based Cellular Phenotyping

Computational Methods for Transcriptome-based Cellular Phenotyping

Author: Matthew Nathan Bernstein

Publisher:

Published: 2019

Total Pages: 160

ISBN-13:

DOWNLOAD EBOOK

Although the basic chemical mechanisms of cellular biology are now well-known, we are still a long way from understanding how phenotypes emerge from these basic mechanisms. Within the last decade, RNA-sequencing (RNA-seq) has become a ubiquitous technology for measuring the transcriptome, which provides a snapshot of gene expression across the entire genome. An improvement in our ability to predict how phenotypes emerge from the complex patterns of gene expression, a task we refer to as transcriptome-based cellular phenotyping (TBCP), would lead to considerable medical and technological advancements. Machine learning promises to be an apt approach for TBCP due to its ability to overcome noise inherent in RNA-seq data and because it does not require a priori knowledge regarding the rules and patterns that lead from gene expression to phenotype. Furthermore, there exist large, public databases of RNA-seq data that promise to be a valuable source of training data for developing machine learning algorithms to perform TBCP. Unfortunately, this opportunity is impeded by a number of challenges inherent in these databases including poorly structured metadata and data heterogeneity. In this thesis, I present three projects that push the state-of-the-art in the ability to leverage the trove of publicly available gene expression data for TBCP. In the first project, we address the problem of poorly structured metadata that exist in public genomics databases. We specifically focus on the Sequence Read Archive (SRA), which is the premiere repository of raw RNA-seq data curated by the National Institutes of Health; however, our work generalizes to other databases. Existing approaches treat metadata normalization as a named entity recognition problem where the goal is to tag metadata with terms from controlled vocabularies when that term is mentioned in the metadata. We reframe this problem as an inference task, in which we tag the metadata with only those terms that describe the underlying biology of the described sample rather than with all mentioned terms. By doing so, we achieve much higher precision than that achieved by existing methods, and maintain a competitive recall. In the second project, we leverage the normalized metadata produced by the first project in order to train predictive models of phenotype from RNA-seq derived gene expression data. We specifically focus on the cell type prediction task: given an RNA-seq sample, we wish to predict the cell type from which the sample was derived. Cell type prediction is an important step in many transcriptomic analyses, including that of annotating cell types in single-cell RNA-seq datasets. This work represents the first effort towards a cell type prediction task that utilizes the full potential of publicly available RNA-seq data. Finally, in the third project, we build on the second project in order to address the task of cell type prediction on sparse single-cell RNA-seq data (scRNA-seq) produced by novel droplet-based technologies. These droplet-based scRNA-seq technologies are enabling the sequencing of higher numbers of cells at the cost of a lower read-depth per cell. Such low read-depths result in fewer genes with detected expression per cell. We explore the effects of applying cell type classifiers trained on dense, bulk RNA-seq data to sparse scRNA-seq data and propose a novel probabilistic generative model for adapting the bulk-trained classifiers to sparse input data.


Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing

Computational Methods for Studying Cellular Differentiation Using Single-cell RNA-sequencing

Author: Hui Ting Grace Yeo

Publisher:

Published: 2020

Total Pages: 176

ISBN-13:

DOWNLOAD EBOOK

Single-cell RNA-sequencing (scRNA-seq) enables transcriptome-wide measurements of single cells at scale. As scRNA-seq datasets grow in complexity and size, more complex computational methods are required to distill raw data into biological insight. In this thesis, we introduce computational methods that enable analysis of novel scRNA-seq perturbational assays. We also develop computational models that seek to move beyond simple observations of cell states toward more complex models of underlying biological processes. In particular, we focus on cellular differentiation, which is the process by which cells acquire some specific form or function. First, we introduce barcodelet scRNA-seq (barRNA-seq), an assay which tags individual cells with RNA ‘barcodelets’ to identify them based on the treatments they receive. We apply barRNA-seq to study the effects of the combinatorial modulation of signaling pathways during early mESC differentiation toward germ layer and mesodermal fates. Using a data-driven analysis framework, we identify combinatorial signaling perturbations that drive cells toward specific fates. Second, we describe poly-adenine CRISPR gRNA-based scRNA-seq (pAC-seq), a method that enables the direct observation of guide RNAs (gRNAs) in scRNA-seq. We apply it to assess the phenotypic consequences of CRISPR/Cas9-based alterations of gene cis-regulatory regions. We find that power to detect transcriptomic effects depend on factors such as rate of mono/biallelic loss, baseline gene expression, and the number of cells per target gRNA. Third, we propose a generative model for analyzing scRNA-seq containing unwanted sources of variation. Using only weak supervision from a control population, we show that the model enables removal of nuisance effects from the learned representation without prior knowledge of the confounding factors. Finally, we develop a generative modeling framework that learns an underlying differentiation landscape from population-level time-series data. We validate the modeling framework on an experimental lineage tracing dataset, and show that it is able to recover the expected effects of known modulators of cell fate in hematopoiesis.


Evolution of Translational Omics

Evolution of Translational Omics

Author: Institute of Medicine

Publisher: National Academies Press

Published: 2012-09-13

Total Pages: 354

ISBN-13: 0309224187

DOWNLOAD EBOOK

Technologies collectively called omics enable simultaneous measurement of an enormous number of biomolecules; for example, genomics investigates thousands of DNA sequences, and proteomics examines large numbers of proteins. Scientists are using these technologies to develop innovative tests to detect disease and to predict a patient's likelihood of responding to specific drugs. Following a recent case involving premature use of omics-based tests in cancer clinical trials at Duke University, the NCI requested that the IOM establish a committee to recommend ways to strengthen omics-based test development and evaluation. This report identifies best practices to enhance development, evaluation, and translation of omics-based tests while simultaneously reinforcing steps to ensure that these tests are appropriately assessed for scientific validity before they are used to guide patient treatment in clinical trials.


Computational Methods for Single-Cell Data Analysis

Computational Methods for Single-Cell Data Analysis

Author: Guo-Cheng Yuan

Publisher: Humana Press

Published: 2019-02-14

Total Pages: 271

ISBN-13: 9781493990566

DOWNLOAD EBOOK

This detailed book provides state-of-art computational approaches to further explore the exciting opportunities presented by single-cell technologies. Chapters each detail a computational toolbox aimed to overcome a specific challenge in single-cell analysis, such as data normalization, rare cell-type identification, and spatial transcriptomics analysis, all with a focus on hands-on implementation of computational methods for analyzing experimental data. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Computational Methods for Single-Cell Data Analysis aims to cover a wide range of tasks and serves as a vital handbook for single-cell data analysis.


Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics

Statistical and Computational Methods for Single-cell Transcriptome Sequencing and Metagenomics

Author: Fanny Perraudeau

Publisher:

Published: 2018

Total Pages: 246

ISBN-13:

DOWNLOAD EBOOK

I propose statistical methods and software for the analysis of single-cell transcriptome sequencing (scRNA-seq) and metagenomics data. Specifically, I present a general and flexible zero-inflated negative binomial-based wanted variation extraction (ZINB-WaVE) method, which extracts low-dimensional signal from scRNA-seq read counts, accounting for zero inflation (dropouts), over-dispersion, and the discrete nature of the data. Additionally, I introduce an application of the ZINB-WaVE method that identifies excess zero counts and generates gene and cell-specific weights to unlock bulk RNA-seq differential expression pipelines for zero-inflated data, boosting performance for scRNA-seq analysis. Finally, I present a method to estimate bacterial abundances in human metagenomes using full-length 16S sequencing reads.


Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data

Statistical and Computational Methods for Analysis of Spatial Transcriptomics Data

Author: Dylan Maxwell Cable

Publisher:

Published: 2020

Total Pages: 39

ISBN-13:

DOWNLOAD EBOOK

Spatial transcriptomic technologies measure gene expression at increasing spatial resolution, approaching individual cells. One limitation of current technologies is that spatial measurements may contain contributions from multiple cells, hindering the discovery of cell type-specific spatial patterns of localization and expression. In this thesis, I will explore the development of Robust Cell Type Decomposition (RCTD), a computational method that leverages cell type profiles learned from single-cell RNA sequencing data to decompose mixtures, such as those observed in spatial transcriptomic technologies. Our RCTD approach accounts for platform effects introduced by systematic technical variability inherent to different sequencing modalities. We demonstrate RCTD provides substantial improvement in cell type assignment in Slide-seq data by accurately reproducing known cell type and subtype localization patterns in the cerebellum and hippocampus. We further show the advantages of RCTD by its ability to detect mixtures and identify cell types on an assessment dataset. Finally, we show how RCTD’s recovery of cell type localization uniquely enables the discovery of genes within a cell type whose expression depends on spatial environment. Spatial mapping of cell types with RCTD has the potential to enable the definition of spatial components of cellular identity, uncovering new principles of cellular organization in biological tissue.


Next Steps for Functional Genomics

Next Steps for Functional Genomics

Author: National Academies of Sciences, Engineering, and Medicine

Publisher: National Academies Press

Published: 2020-12-18

Total Pages: 201

ISBN-13: 0309676738

DOWNLOAD EBOOK

One of the holy grails in biology is the ability to predict functional characteristics from an organism's genetic sequence. Despite decades of research since the first sequencing of an organism in 1995, scientists still do not understand exactly how the information in genes is converted into an organism's phenotype, its physical characteristics. Functional genomics attempts to make use of the vast wealth of data from "-omics" screens and projects to describe gene and protein functions and interactions. A February 2020 workshop was held to determine research needs to advance the field of functional genomics over the next 10-20 years. Speakers and participants discussed goals, strategies, and technical needs to allow functional genomics to contribute to the advancement of basic knowledge and its applications that would benefit society. This publication summarizes the presentations and discussions from the workshop.


Transcriptome Analysis

Transcriptome Analysis

Author: Miroslav Blumenberg

Publisher: BoD – Books on Demand

Published: 2019-11-20

Total Pages: 110

ISBN-13: 1789843278

DOWNLOAD EBOOK

Transcriptome analysis is the study of the transcriptome, of the complete set of RNA transcripts that are produced under specific circumstances, using high-throughput methods. Transcription profiling, which follows total changes in the behavior of a cell, is used throughout diverse areas of biomedical research, including diagnosis of disease, biomarker discovery, risk assessment of new drugs or environmental chemicals, etc. Transcriptome analysis is most commonly used to compare specific pairs of samples, for example, tumor tissue versus its healthy counterpart. In this volume, Dr. Pyo Hong discusses the role of long RNA sequences in transcriptome analysis, Dr. Shinichi describes the next-generation single-cell sequencing technology developed by his team, Dr. Prasanta presents transcriptome analysis applied to rice under various environmental factors, Dr. Xiangyuan addresses the reproductive systems of flowering plants and Dr. Sadovsky compares codon usage in conifers.


Introduction to Single Cell Omics

Introduction to Single Cell Omics

Author: Xinghua Pan

Publisher: Frontiers Media SA

Published: 2019-09-19

Total Pages: 129

ISBN-13: 2889459209

DOWNLOAD EBOOK

Single-cell omics is a progressing frontier that stems from the sequencing of the human genome and the development of omics technologies, particularly genomics, transcriptomics, epigenomics and proteomics, but the sensitivity is now improved to single-cell level. The new generation of methodologies, especially the next generation sequencing (NGS) technology, plays a leading role in genomics related fields; however, the conventional techniques of omics require number of cells to be large, usually on the order of millions of cells, which is hardly accessible in some cases. More importantly, harnessing the power of omics technologies and applying those at the single-cell level are crucial since every cell is specific and unique, and almost every cell population in every systems, derived in either vivo or in vitro, is heterogeneous. Deciphering the heterogeneity of the cell population hence becomes critical for recognizing the mechanism and significance of the system. However, without an extensive examination of individual cells, a massive analysis of cell population would only give an average output of the cells, but neglect the differences among cells. Single-cell omics seeks to study a number of individual cells in parallel for their different dimensions of molecular profile on genome-wide scale, providing unprecedented resolution for the interpretation of both the structure and function of an organ, tissue or other system, as well as the interaction (and communication) and dynamics of single cells or subpopulations of cells and their lineages. Importantly single-cell omics enables the identification of a minor subpopulation of cells that may play a critical role in biological process over a dominant subpolulation such as a cancer and a developing organ. It provides an ultra-sensitive tool for us to clarify specific molecular mechanisms and pathways and reveal the nature of cell heterogeneity. Besides, it also empowers the clinical investigation of patients when facing a very low quantity of cell available for analysis, such as noninvasive cancer screening with circulating tumor cells (CTC), noninvasive prenatal diagnostics (NIPD) and preimplantation genetic test (PGT) for in vitro fertilization. Single-cell omics greatly promotes the understanding of life at a more fundamental level, bring vast applications in medicine. Accordingly, single-cell omics is also called as single-cell analysis or single-cell biology. Within only a couple of years, single-cell omics, especially transcriptomic sequencing (scRNA-seq), whole genome and exome sequencing (scWGS, scWES), has become robust and broadly accessible. Besides the existing technologies, recently, multiplexing barcode design and combinatorial indexing technology, in combination with microfluidic platform exampled by Drop-seq, or even being independent of microfluidic platform but using a regular PCR-plate, enable us a greater capacity of single cell analysis, switching from one single cell to thousands of single cells in a single test. The unique molecular identifiers (UMIs) allow the amplification bias among the original molecules to be corrected faithfully, resulting in a reliable quantitative measurement of omics in single cells. Of late, a variety of single-cell epigenomics analyses are becoming sophisticated, particularly single cell chromatin accessibility (scATAC-seq) and CpG methylation profiling (scBS-seq, scRRBS-seq). High resolution single molecular Fluorescence in situ hybridization (smFISH) and its revolutionary versions (ex. seqFISH, MERFISH, and so on), in addition to the spatial transcriptome sequencing, make the native relationship of the individual cells of a tissue to be in 3D or 4D format visually and quantitatively clarified. On the other hand, CRISPR/cas9 editing-based In vivo lineage tracing methods enable dynamic profile of a whole developmental process to be accurately displayed. Multi-omics analysis facilitates the study of multi-dimensional regulation and relationship of different elements of the central dogma in a single cell, as well as permitting a clear dissection of the complicated omics heterogeneity of a system. Last but not the least, the technology, biological noise, sequence dropout, and batch effect bring a huge challenge to the bioinformatics of single cell omics. While significant progress in the data analysis has been made since then, revolutionary theory and algorithm logics for single cell omics are expected. Indeed, single-cell analysis exert considerable impacts on the fields of biological studies, particularly cancers, neuron and neural system, stem cells, embryo development and immune system; other than that, it also tremendously motivates pharmaceutic RD, clinical diagnosis and monitoring, as well as precision medicine. This book hereby summarizes the recent developments and general considerations of single-cell analysis, with a detailed presentation on selected technologies and applications. Starting with the experimental design on single-cell omics, the book then emphasizes the consideration on heterogeneity of cancer and other systems. It also gives an introduction of the basic methods and key facts for bioinformatics analysis. Secondary, this book provides a summary of two types of popular technologies, the fundamental tools on single-cell isolation, and the developments of single cell multi-omics, followed by descriptions of FISH technologies, though other popular technologies are not covered here due to the fact that they are intensively described here and there recently. Finally, the book illustrates an elastomer-based integrated fluidic circuit that allows a connection between single cell functional studies combining stimulation, response, imaging and measurement, and corresponding single cell sequencing. This is a model system for single cell functional genomics. In addition, it reports a pipeline for single-cell proteomics with an analysis of the early development of Xenopus embryo, a single-cell qRT-PCR application that defined the subpopulations related to cell cycling, and a new method for synergistic assembly of single cell genome with sequencing of amplification product by phi29 DNA polymerase. Due to the tremendous progresses of single-cell omics in recent years, the topics covered here are incomplete, but each individual topic is excellently addressed, significantly interesting and beneficial to scientists working in or affiliated with this field.


Gene Quantification

Gene Quantification

Author: Francois Ferre

Publisher: Springer Science & Business Media

Published: 2012-12-06

Total Pages: 379

ISBN-13: 1461241642

DOWNLOAD EBOOK

Geneticists and molecular biologists have been interested in quantifying genes and their products for many years and for various reasons (Bishop, 1974). Early molecular methods were based on molecular hybridization, and were devised shortly after Marmur and Doty (1961) first showed that denaturation of the double helix could be reversed - that the process of molecular reassociation was exquisitely sequence dependent. Gillespie and Spiegelman (1965) developed a way of using the method to titrate the number of copies of a probe within a target sequence in which the target sequence was fixed to a membrane support prior to hybridization with the probe - typically a RNA. Thus, this was a precursor to many of the methods still in use, and indeed under development, today. Early examples of the application of these methods included the measurement of the copy numbers in gene families such as the ribosomal genes and the immunoglo bulin family. Amplification of genes in tumors and in response to drug treatment was discovered by this method. In the same period, methods were invented for estimating gene num bers based on the kinetics of the reassociation process - the so-called Cot analysis. This method, which exploits the dependence of the rate of reassociation on the concentration of the two strands, revealed the presence of repeated sequences in the DNA of higher eukaryotes (Britten and Kohne, 1968). An adaptation to RNA, Rot analysis (Melli and Bishop, 1969), was used to measure the abundance of RNAs in a mixed population.