Introduction |
Research Summary |
Ongoing Studies |
Recent Scientific Advances
Introduction
Our laboratory focuses on two closely related fundamental aspects of mammalian genomics and cancer:
(i) transcriptional regulation and (ii) epigenetic modifications. Our research program is interdisciplinary in nature with a complement of collaborative experimental investigation. We develop computational models and bioinformatics tools for systems biology research to study cancer and other diseases, which eventually help improved biomarkers and drug discovery.
Research Summary
We are beginning to appreciate the increasing complexity of mammalian gene structure. A phenomenon that adds an important dimension to this complexity is the use of alternative gene promoters and first exons that produce multiple pre-mRNA isoforms and drive widespread cell type, tissue type and/or developmental gene regulation. To determine the cellular state, genes require guidance cues that enable them to express precise isoforms in right cell types at appropriate times. Such cues are partly provided by the use of alternative promoters and chromatin state of the corresponding genomic regions, which are altered in disease settings.
Recent annotations of the mammalian genomes suggest that almost half of the protein-coding genes contain alternative promoters, including those of many disease-associated genes. Aberrant use of one promoter over another has been found to be associated with various diseases, including cancer. Whether the alternative promoters are normally regulated by different pathways or whether their expression is tightly linked to one another is an important aspect that needs to be fully explored. Therefore, determining the activity of alternative promoters in different cellular conditions and dissecting the genetic and epigenetic regulatory mechanisms across alternative promoters is imperative to understanding a diversity of developmental processes in both the normal and diseased states. Computational modeling coupled with recent high-throughput technologies, such as chromatin immunoprecipitation (ChIP) followed by microarray analysis (ChIP-chip) and ChIP coupled with massive parallel sequencing (ChIP-seq), enable the genome-wide identification of alternative promoters and associated chromatin modifications. This integrative approach will help us to understand the use or misuse of alternative promoters in a wide variety of cell types, developmental stages and disease conditions, and help to address the key outstanding issues in mammalian genome research.
The research projects in our laboratory are sponsored by
- National Human Genome Research Institute (NIH Grant - R01 HG 003362-01:Genome-wide discovery of alternate promoters in human and mouse genomes)
- American Cancer Society (Research Scholar Grant RSG-06-268-01: Modeling epigenetic changes across alternative promoters in cancer genomes)
- National Cancer Institute (NIH R01 Grant - Sub contract to MD Anderson Cancer Center, Huston, TX; PI: Dr. George Calin)
- Philadelphia Health Care Trust
Ongoing Studies:
- Genome-wide analysis of alternative promoters in mouse cerebellum developmental stages: Davuluri laboratory is collaborating with Dahmane and Showe laboratories at Wistar to identify and model alternative promoters that are active in different developmental stages of mouse cerebellum.
Key Personal: - Shramistha Pal (Postdoc, Molecular Biology)
- Hyunsoo Kim (Staff Scientist, Computational Biology)
- Ravi Gupta (Postdoc, Computational Biology)
- Switching of alternative promoter usage in cancer genomes: There is growing evidence linking aberrant use of multiple promoters and cancer formation: several oncogenes and tumor-suppressor genes (e.g., MYC, CYP19, BRCA1, P73, MID1, Cathespin B, SRC, kallikrien 6 and TGF- 3) have multiple promoters, and moreover, the aberrant preference of one promoter over another in some of these genes is directly linked to cancerous cell growth. We rationalize that the regional epigenetic modifications in alternative promoters are tissue-specific and that disruptions to these processes occur during cancer initiation and progression. These abnormal events may lead to the eventual silencing or activation of critical promoters of cancer genes. We hypothesize that regional epigenetic deregulation of one promoter over another alters the transcriptional profile of the target gene, resulting in the initiation and promotion of neoplastic outgrowth of cancer cells. We use a combination of computational, statistical and high-throughput experimental (ChIP-seq) approaches to study and model the use of alternative promoters of human genes in cancer cells. This new paradigm will lead to improved biomarkers and drug discovery for cancer treatment. This research is supported by a four year research scholar grant (RSG-06-268-01) by American Cancer Society. We are collaborating with Dahmane and Showe laboratories at Wistar to perform the experimental part of this project.
Key Personal: - Shramistha Pal (Postdoc, Molecular Biology)
- Hyunsoo Kim (Staff Scientist, Computational Biology)
- Ravi Gupta (Postdoc, Computational Biology)
- Combinatorial control of gene regulation by transcription factor splice variants: To determine the cellular state, genes require guidance cues that enable them to express precise isoforms in the correct cell types at appropriate times. It is the present dogma that the guidance cues for cell-type specific gene expression are primarily provided by groups of transcription factors (TFs) that interact in a combinatorial fashion and bind to synergistically occurring TF binding sites (cis-regulatory modules) in the target gene promoters. Recent observations have challenged this view by showing that alternative splicing occurs in most mammalian genes, including a majority of TFs. These TFs are found to be expressed in cell type- and developmental stage-specific manner, and differential use of protein isoforms in healthy versus diseased tissues was demonstrated for a number of TFs (e.g. LEF1, TP73, HNF4A, RASSF1, BCL2L1). These advances suggest that mammalian organisms have evolved cell type-specific TF isoforms as a mechanism to accommodate more complex programs of cell type- and developmental stage-specific transcription. Disregulation of these developmental programs governed by specific TF isoforms have been suggested in a number of human diseases, but the overall contribution to disease etiology remains unexplored. We hypothesize that a majority of TF genes encode functionally distinct cell-type specific protein isoforms, and that the regulatory reprogramming through alternatively spliced transcription factors underlies numerous cases of cell-type specific gene expression and the development of disease states. We propose a combination of high-throughput sequencing methods and novel computational modeling approaches to address the hypotheses (Collaboration with Dahmane & Showe laboratories).
Key Personal: - Shramistha Pal (Postdoc, Molecular Biology)
- Hyunsoo Kim (Staff Scientist, Computational Biology)
- Anirban Bhattacharyya (Postdoc, Computational Biology)
- An integrated computational pipeline for analyzing massive parallel sequencing data: The introduction of high-throughput DNA sequencing technologies, such as Applied Biosystems' SOLiD3 and Illumina GA/solexa sequencing technology, has opened new avenues, including determination of the relationship of genomic and epigenomic variation and phenotypes to disease. These technologies are particularly well suited for small RNA discovery applications because they can produce millions of short reads, each around 30-75 base pairs long, in a relatively rapid period of time. However, since the large amounts (terabases) of data generated overwhelm existing computational resources and analytic methods, urgent action is needed to enable the translation of this rich new source of genomic information into medical benefit. In this research direction, we are developing an automated and easy-to-use bioinformatics toolkit for performing an integrative type of analysis for the small RNA sequences obtained from the next-generation sequencers. (collaboration with Kazuko laboratory, Wistar Institute)
Key Personal: - Ravi Gupta (Postdoc, Computational Biology)
- Hyunsoo Kim (Staff Scientist, Computational Biology)
- Francisco Agosto Perez (Graduate Student, Computational Biology)
- Anirban Bhattacharyya (Postdoc, Computational Biology)
Recent Scientific Advances
- Genome-wide analysis of alternative promoters in different mouse tissues using ChIP-Seq technology: Davuluri laboratory has collaborated with Huang laboratory to perform ChIP using antibody against RNA Pol II to pull-down the RNA polymerase II pre-initiation complex (PIC) bound genomic regions. This was followed by massive parallel sequencing of PIC-bound DNA sequences by Illumina solexa high-throughput sequencer, which produced millions of 35 mer sequence tags. By computationally mapping the 35mer sequence tags to the mouse genome, Davuluri laboratory built a genome-wide map of active promoters and alternative promoter usage in five mouse tissues (brain, liver, lung, spleen and kidney). Features of the map show the global promoter activities in different mouse tissues, identification of novel alternative promoters that are selectively used in different mouse tissues. Davuluri laboratory is developing bioinformatics tools and pipeline to analyze the high-throughput sequence data.
Key Personal: - Hao Sun (former lab member, currently Assistant Professor, The Chinese University of Hong Kong, HongKong)
- Jeijun Wu (Posdot of Dr. Huang laboratory, Human Cancer Genetics Program, OSU, Columbus, OH)
- Priyankara Wikramasinghe (former lab member, currently Bioinformatics Analyst, Bioinformatics Facility, Wistar Cancer Center)
- Development of computational models to infer Transcription Factor Binding Site (TFBS) modules: Transcription initiation in eukaryotic cells is a complex process that typically involves multiple transcription factors binding to the DNA and interacting both with each other and with the RNA polymerase II complex. Because of their short, highly degenerate recognition sites, computational methods for identifying TFBS in the genome suffer from high error rates. However, when combined with some experimental data (either expression data to identify co-regulated genes or ChIP-chip data to identify real TFBS), the accuracy of these predictions increases substantially. Davuluri group has evaluated two machine learning algorithms-classification trees and random forests-in their ability to identify related promoters in the human genome and to predict sets of TFBS that apparently work in concert. These algorithms are being implemented as GenePattern modules that can be easily integrated into an analysis pipeline. In addition, a web interface is available.
Key Personal: - Gregory Singer (former postdoc, Currently with Biodiversity Institute of Ontario)
- Hyunsoo Kim (Staff Scientist, Computational Biology)
- Anirban Bhattacharyya (Postdoc, Computational Biology)
- Shramistha Pal (Postdoc, Molecular Biology)
- Single nucleotide polymorphisms inside micro-RNA target sites influence tumor susceptibility: MicroRNAs (miRNAs) are small, non-coding RNAs that base pair imperfectly to complementary sequences in target mRNAs (mRNAs) and negatively control the gene expression. Single nucleotide polymorphisms (SNPs) are the most common genetic variants in the human genome, and an immense source of information for localizing and identifying disease susceptible genes. Davuluri group is investigating how the SNPs located in transcribed regions of protein coding genes will affect the miR-mRNA interaction by altering the Minimum Free Energy (MFE) of the miR-mRNA duplex, thus destroying the existing miR target sites or creating the new target sites. They are collaborating with George Calin Laboratory at MD Anderson Cancer Center to investigate how single nucleotide polymorphisms (SNPs) within miRNA target sites perturb RNA duplex minimum free energy (MFE), miRNA binding and, consequently influence the gene expression/activity. A bioinformatics pipeline was developed to predict the target SNPs, which can potentially influence the miR-mRNA interaction, based on the SNPs' ability to alter the MFE of the miR-mRNA duplex. The annotations of target SNPs, miRs, and the target gene annotation information was integrated into a database, called miR-SNPDB as a public resource for research community. The Davuluri team is currently expanding this study in collaboration with George Calin laboratory (MD Anderson Cancer Center, Huston, TX) to breast cancer, suggesting that a disruption of miRNA gene regulation by SNPs may contribute to tumor susceptibility.
Key Personal: - Hao Sun (former lab member, currently Assistant Professor, The Chinese University of Hong Kong, HongKong)
- Priyankara Wikramasinghe (former lab member, currently Bioinformatics Analyst, Bioinformatics Facility, Wistar Cancer Center)
- Modeling SMAD Regulatory Modules by Integrative Datamining of ChIP-chip and Gene Expression Profiles: While the molecular mechanisms of TGF-&beta/SMAD signaling pathway have been studied in detail, the global networks downstream of SMAD remain largely unknown. To address this question, Davuluri and Huang (OSU) laboratories simultaneously performed chromatin immunoprecipitation followed by microarray analysis (ChIP-chip) and mRNA expression profiling to identify TGF-&beta/SMAD regulated and synchronously coexpressed gene sets in ovarian surface epithelium. Intersecting the ChIP-chip and gene expression data yielded 150 direct targets, of which 141 were grouped into 3 co-expressed gene sets (sustained up-regulated, transient up-regulated and down-regulated), based on their temporal changes in expression after TGF-&beta activation. Davuluri group developed a data-mining method driven by the Random Forest algorithm to model SMAD transcriptional modules in the target sequences. The predicted SMAD modules contain SMAD binding element and up to 2 of 7 other transcription factor binding sites (E2F, P53, LEF1, ELK1, COUPTF, PAX4 and DR1). Together, the computational results further the understanding of the interactions between SMAD and other transcription factors at specific target promoters, and provide the basis for more targeted experimental verification of the co-regulatory modules.
Key Personal: - Hauxia Qin (former Postdoc, currently Statistician, JP Morgan Chase Bank, Columbus, OH)
- Michael Chan (former Postdoc of Dr Huang Lab, OSU, currently Assistant Professor, National Chung Cheng University, Taiwan ROC)
- Sandya Liyanarachchi (former lab member, currently Research Statistician, OSU CCC)
- Mammalian promoter databases: Davuluri laboratory is developing mammalian promoter databases, MPromDb and OMGProm, by integrating UCSC known gene annotations and full length cDNA data from Fantom project. The two databases currently contain 85,843 promoters (35,977 of human, 49,322 of mouse and 850 of rat protein coding genes respectively). The computationally identified transcription factor binding sites (TFBS) are mapped to all the annotated promoter regions. A set of Perl modules and APIs (Application Programming Interface) that can automatically retrieve the data from MPromDb and OMGProm are implemented to study, i) the over represented TFBSs in the list of genes, ii) to find the list of genes that are regulated by a specific transcription factor. The publicly available alternative promoter annotations, methylation status data from ChIP-Seq experiments for mouse embryonic stem cells, neural progenitor cells, embryonic fibroblasts, and human CD4+ T cell are regularly updated in in MPromDb. The ChIP-Seq data from 5 different mouse tissues (brain, kidney, liver, lung and spleen) is also added to MPromDb. Further, MPromDb is also updated with computationally annotated miRNA PolII promoters for nearly 400 human and 300 mouse miRs.
Key Personal: - Hao Sun (former lab member, currently Assistant Professor, The Chinese University of Hong Kong, Hong Kong)
- Saranyan Palaniswamy (former lab member, currently ---)
- Anirban Bhattacharyya (postdoc, Computational Biology)
- Francisco Agosto Perez (Graduate Student, Computational Biology)