An Integrated Approach to Reconstructing Genome-scale Transcriptional Regulatory Networks

An Integrated Approach to Reconstructing Genome-scale Transcriptional Regulatory Networks

Author:

Publisher:

Published: 2015

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

Transcriptional regulatory networks (TRNs) program cells to dynamically alter their gene expression in response to changing internal or environmental conditions. In this study, we develop a novel workflow for generating large-scale TRN models that integrates comparative genomics data, global gene expression analyses, and intrinsic properties of transcription factors (TFs). An assessment of this workflow using benchmark datasets for the well-studied [gamma]-proteobacterium Escherichia coli showed that it outperforms expression-based inference approaches, having a significantly larger area under the precision-recall curve. Further analysis indicated that this integrated workflow captures different aspects of the E. coli TRN than expression-based approaches, potentially making them highly complementary. We leveraged this new workflow and observations to build a large-scale TRN model for the [alpha]-Proteobacterium Rhodobacter sphaeroides that comprises 120 gene clusters, 1211 genes (including 93 TFs), 1858 predicted protein-DNA interactions and 76 DNA binding motifs. We found that ~67% of the predicted gene clusters in this TRN are enriched for functions ranging from photosynthesis or central carbon metabolism to environmental stress responses. We also found that members of many of the predicted gene clusters were consistent with prior knowledge in R. sphaeroides and/or other bacteria. Experimental validation of predictions from this R. sphaeroides TRN model showed that high precision and recall was also obtained for TFs involved in photosynthesis (PpsR), carbon metabolism (RSP_0489) and iron homeostasis (RSP_3341). In addition, this integrative approach enabled generation of TRNs with increased information content relative to R. sphaeroides TRN models built via other approaches. We also show how this approach can be used to simultaneously produce TRN models for each related organism used in the comparative genomics analysis. Our results highlight the advantages of integrating comparative genomics of closely related organisms with gene expression data to assemble large-scale TRN models with high-quality predictions.


Integrated Approach to Reconstruction of Microbial Regulatory Networks

Integrated Approach to Reconstruction of Microbial Regulatory Networks

Author:

Publisher:

Published: 2013

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

This project had the goal(s) of development of integrated bioinformatics platform for genome-scale inference and visualization of transcriptional regulatory networks (TRNs) in bacterial genomes. The work was done in Sanford-Burnham Medical Research Institute (SBMRI, P.I.D.A. Rodionov) and Lawrence Berkeley National Laboratory (LBNL, co-P.I.P.S. Novichkov). The developed computational resources include: (1) RegPredict web-platform for TRN inference and regulon reconstruction in microbial genomes, and (2) RegPrecise database for collection, visualization and comparative analysis of transcriptional regulons reconstructed by comparative genomics. These analytical resources were selected as key components in the DOE Systems Biology KnowledgeBase (SBKB). The high-quality data accumulated in RegPrecise will provide essential datasets of reference regulons in diverse microbes to enable automatic reconstruction of draft TRNs in newly sequenced genomes. We outline our progress toward the three aims of this grant proposal, which were: Develop integrated platform for genome-scale regulon reconstruction; Infer regulatory annotations in several groups of bacteria and building of reference collections of microbial regulons; and Develop KnowledgeBase on microbial transcriptional regulation.


Computational Methods for Integrative Inference of Genome-scale Gene Regulatory Networks

Computational Methods for Integrative Inference of Genome-scale Gene Regulatory Networks

Author: Alireza Fotuhi Siahpirani

Publisher:

Published: 2019

Total Pages: 0

ISBN-13:

DOWNLOAD EBOOK

Inference of transcriptional regulatory networks is an important filed of research in systems biology, and many computational methods have been developed to infer regulatory networks from different types of genomic data. One of the most popular classes of computational network inference methods is expression based network inference. Given the mRNA levels of genes, these methods reconstruct a network between regulatory genes (called transcription factors) and potential target genes that best explains the input data. However, it has been shown that the networks that are inferred only using expression, have low agreement with experimentally validated physical regulatory interactions. In recent years, many methods have been developed to improve the accuracy of these computational methods by incorporating additional data types. In this dissertation, we describe our contributions towards advancing the state of the art in this field. Our first contribution, is developing a prior-based network inference method, MERLIN-P. MERLIN-P uses both expression of genes, and prior knowledge of interactions between regulatory genes and their potential targets, and infers a network that is supported by both expression and prior knowledge. Using a logistic function, MERLIN-P could incorporate and combine multiple sources of prior knowledge. The inferred networks in yeast, outperform state of the art expression based network inference methods, and perform better or at a par with prior based state of the art method. Our second contribution, is developing a method to estimate transcription factor activity from a noisy prior network, NCA+LASSO. Network Component Analysis (NCA), is a computational method that given expression of target genes and a (potentially incomplete and noisy) network structure that describes the connection of regulatory genes to these target genes, estimates unobserved activity of the regulators (transcription factor activities, TFA). It has been shown that using TFA can improve the quality of inferred networks. However, our prior knowledge in new contexts could be incomplete and noisy, and we do not know to what extent presence of noise in input network affects the quality of estimated TFA. We first show how presence of noise in the input prior network can decrease the quality of estimated TFA, and then show that by adding a regularization term, we can improve the quality of the estimated TFA. We show that using estimated TFA instead of just expression of TFs in network inference, improves the agreement of inferred networks to experimentally validated physical interactions, for all state of the art methods, including MERLIN-P. Our final contribution, is developing a multi-task inference method, Dynamic Regulatory Module Network (DRMN), that simultaneously infers regulatory networks for related cell lines, while taking into account the expected similarity of the cell lines. Many biological contexts are hierarchically related, and leveraging the similarity of these contexts could help us infer more accurate regulatory programs in each context. However, the small number of measurements in each context makes the inference of regulatory networks challenging. By inferring regulatory programs at module level (groups of co-expressed genes), DRMN is able to handle the small number of measurements, while the use of multi-task learning allows for incorporation of hierarchical relationship of contexts. DRMN first infers modules of co-expressed genes in each cell line, then infers a regulatory network for each module, and iteratively updates the inferred modules to reflect both co-expression and co-regulation, and updates the inferred networks to reflect the updated modules. We assess the accuracy of the inferred networks by predicting the expression on hold out genes, and show that the resulting modules and networks, provide insight into the process of differentiation between these related cell lines. For all the developed methods, we validate our results by comparing to known experimentally validated networks, and show that our results provide useful insight into the biological processes under consideration. Specifically, in chapter 2, we evaluated our inferred networks based on both network structure and predictive power, identified TFs that all tested methods fail to recover their target sets, and explored potential reasons that can explain this failure. Additionally, we used our method to infer stress specific networks, and evaluated predictions using stress specific knock-down experiments. In chapter 3, we evaluated our inferred networks based on both network structure and predictive power, and furthermore used our inferred networks to identify potential regulators that could be important for pluripotency state in mESC. We tested the effect of these regulators using shRNA experiments, and experimentally validated some of their predicted targets. Finally, in chapter 4, we evaluated our inferred models based on their predictive power and ability to predict gene expression in hold out data.


Elucidating Mechanisms of Transcriptional Regulation at the Genome-scale

Elucidating Mechanisms of Transcriptional Regulation at the Genome-scale

Author: Stephen A. Federowicz

Publisher:

Published: 2014

Total Pages: 162

ISBN-13: 9781321516531

DOWNLOAD EBOOK

Throughout the course of evolution, almost all organisms have generated complex, hierarchical, and robust regulatory systems. One major component of these biological regulatory systems is the transcriptional activation or repression of gene expression. This regulation is carried out simultaneously across a genome by thousands of biological components at thousands of individual promoters. The sum total of all of the regulatory events and their interconnections or overlaps is commonly referred to as the transcriptional regulatory network. The focus of this thesis is to determine the mechanisms or guiding principles behind these transcriptional regulatory networks and to provide a basis upon which predictive mathematical models of these networks can be built. In the first section, a reconstruction of the full transcriptional regulatory network for a model organism is presented along with the OME software framework developed to handle the full complexity of genome-scale datasets and models. In the second section, the mechanisms of individual regulatory events are elucidated in a massively parallel fashion using ChIP- exonuclease and the OME framework. This leads to fundamental insights into the nature of transcriptional initiation complexes for canonical regulators. Finally, in the third section, an effort is undertaken to determine systems level mechanisms which dictate the coordinate regulation of hundreds of simultaneous regulatory events in response to major physiological and metabolic perturbations. Here we show that the two principal dimensions of a metabolic system, growth and the production of energy, drive not only the organization of the metabolic network, but also the organization of the transcriptional regulatory network.


Systems Biology

Systems Biology

Author: Bernhard Palsson

Publisher: Cambridge University Press

Published: 2015-01-26

Total Pages: 551

ISBN-13: 1107038855

DOWNLOAD EBOOK

The first comprehensive single-authored textbook on genome-scale models and the bottom-up approach to systems biology.


Enhancing Comparative Genomics of Transcriptional Regulatory Networks Through Data Collection, Transfer and Integration

Enhancing Comparative Genomics of Transcriptional Regulatory Networks Through Data Collection, Transfer and Integration

Author: Sefa Kilic

Publisher:

Published: 2016

Total Pages: 320

ISBN-13:

DOWNLOAD EBOOK

Comparative genomics has proven itself to be an invaluable approach for the characterization of transcriptional regulatory networks in Bacteria and the evolutionary analysis of transcriptional regulatory elements: transcription factors, their binding motifs and regulons they control. The growing influx of high-throughput experimental data, however, introduces challenges for each step of the comparative genomics pipeline: the collection of transcription factor binding site data, the transfer of available information on the regulatory network to the species under analysis, and the integration of binding site search results from multiple genomes across orthologs. This dissertation addresses issues on each step of the workflow and describes a platform for the analysis of transcriptional regulatory networks in the Bacteria domain. First, CollecTF, a transcription factor binding site database across the Bacteria domain, was developed to compile experimentally-validated transcription factor-binding sites through manual curation. CollecTF provides fully customizable access to high-quality curated data and integrates it with major biological resources such as RefSeq, UniProtKB and the Gene Ontology Consortium. Secondly, different methods for transferring known information from reference to target species were systematically evaluated for the first time using a large catalog of known binding sites in Bacteria. Methods assuming conservation of the binding were shown to outperform those assuming conservation of regulon composition. Lastly, a complete comparative genomics platform (CGB) was built for the analysis of transcriptional regulation on any annotated bacterial genome. It combines binding evidence from multiple sources using phylogeny, reports the probability of TF-regulation for each gene through a Bayesian framework, and performs formal ancestral state reconstruction for each group of orthologous genes across the species under analysis to reconstruct the evolutionary history of TF-regulation of the gene. CGB was benchmarked by replicating a comparative genomics analysis of LexA regulation in Gram-positive Bacteria, and was later used to characterize LexA regulon in Verrucomicrobia, a recently established Gram-negative phylum predominant in many soil bacterial communities.


Systems Metabolic Engineering

Systems Metabolic Engineering

Author: Hal S. Alper

Publisher: Humana Press

Published: 2013-02-16

Total Pages: 0

ISBN-13: 9781627032988

DOWNLOAD EBOOK

With the ultimate goal of systematically and robustly defining the specific perturbations necessary to alter a cellular phenotype, systems metabolic engineering has the potential to lead to a complete cell model capable of simulating cell and metabolic function as well as predicting phenotypic response to changes in media, gene knockouts/overexpressions, or the incorporation of heterologous pathways. In Systems Metabolic Engineering: Methods and Protocols, experts in the field describe the methodologies and approaches in the area of systems metabolic engineering and provide a step-by-step guide for their implementation. Four major tenants of this approach are addressed, including modeling and simulation, multiplexed genome engineering, ‘omics technologies, and large data-set incorporation and synthesis, all elucidated through the use of model host organisms. Written in the highly successful Methods in Molecular BiologyTM series format, chapters include introductions on their respective topics, lists of the necessary materials and reagents, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Comprehensive and cutting-edge, Systems Metabolic Engineering: Methods and Protocols serves as an ideal guide for metabolic engineers, molecular biologists, and microbiologists aiming to implement the most recent approaches available in the field.


Systematic Approaches for Modelling and Visualising Responses to Perturbation of Transcriptional Regulatory Networks

Systematic Approaches for Modelling and Visualising Responses to Perturbation of Transcriptional Regulatory Networks

Author: Nam Shik Han

Publisher:

Published: 2013

Total Pages:

ISBN-13:

DOWNLOAD EBOOK

One of the greatest challenges in modern biology is to understand quantitatively the mechanisms underlying messenger Ribonucleic acid (mRNA) transcription within the cell. To this end, integrated functional genomics attempts to use the vast wealth of data produced by modern large scale genomic projects to understand how the genome is deployed to create a diversity of tissues and species. The expression levels of tens or hundreds of thousands genes are profiled at multiple time points or different experimental conditions in the genomic projects. The profiling results are deposited in large scale quantitative data files that are not possible to analyse without systematic computational methods. In particular, it is much more difficult to experimentally measure the concentration level of transcription factor proteins and their affinity for the promoter region of genes, while it is relatively easy to measure the result of transcription using experimental techniques such as microarrays. In the absence of such biological experiments, it becomes necessary to use in silico techniques to determine the transcription factor regulatory activities given existing gene expression profile data. It therefore presents significant challenges and opportunities to the computer science community. This PhD Project made use of one such in silico technique to determine the differences (if any) in transcription factor regulatory activities of different experimental conditions and time points. The research aim of the Project was to understand the transcriptional regulatory mechanism that controls the sophisticated process of gene expression in cells. In particular, differences in the downstream signalling from which transcription factors can play a role in predisposition to diseases such as Parasitic disease, Cancer, and Neuroendocrine disease. To address this question I have had access to large integrated genomics datasets generated in studies on parasitic disease, lung cancer, and endocrine (hormone) disease. The current state-of-the-art takes existing knowledge and asks "How do these data relate to what we already know?" By applying machine learning approaches the project explored the role that such data can play in uncovering new biological knowledge.


Uncovering the System-level Regulatory Architecture of Gene Expression in Humans

Uncovering the System-level Regulatory Architecture of Gene Expression in Humans

Author: Junhong Luo

Publisher:

Published: 2011

Total Pages: 512

ISBN-13:

DOWNLOAD EBOOK

Uncovering the system-level transcriptional regulatory architecture of gene expression in human has been a major focal point in modern systems biology researches. Human genes are commonly under coordinated and combinatorial transcriptional regulation mediated by a class of proteins known as transcription factors. Recent technological advancements have enabled comprehensive mapping of transcription factor binding motifs in human genome through both experimental (e.g. ChIP-seq) and computational (e.g. comparative sequence analysis) methods. Collections of defined transcription factor binding motifs and their corresponding target genes in a given species can be used as a "dictionary" to define groups of genes that have the potential of being under coordinated control. Utilising a large-scale transcription factor binding motif location data in human genome, this study has defined a comprehensive genome-wide binding motif dictionary of human transcription factors. Instead of inferring transcriptional regulatory networks from co-expression, this study attempted to take the opposite approach in which, by using simple multivariate data analysis methods and graph theory, groups of genes under putative transcriptional co-regulation were defined based on the promoter motif content similarity between genes. These defined networks of genes could be used as a platform for incorporating other biological datasets to clarify the system-level regulatory architecture of human genes. In this thesis, independent gene function annotation, gene expression and ChIP-seq datasets were employed to bring in functional and biological insights, and also to provide independent validations for the regulatory networks defined based on promoter motif content similarity. The results showed the regulatory networks defined in this work were of biological significance. More specifically, groups of genes under putative co-regulation defined in this study on the basis of sharing common transcription factor binding motifs were likely to share common function and expression. Overall, the affirmative findings described in this thesis demonstrated the feasibility of identifying putative gene regulatory networks by using large-scale motif dictionaries.