We work in a highly interdisciplinary environment at the interface of computer science and biology. Members of the group come from a primarily computational background and share a strong passion for understanding biological systems. We are engaged in several collaborative research partnerships with biological and experimental collaborators, at MIT, Harvard, the Broad Institute, the ENCODE, modENCODE, GTEx, and Epigenomics Roadmap consortia, the Harvard Medical School, and other universities.
Our group at MIT aims to further our understanding of the human genome by computational integration of large-scale functional and comparative genomics datasets.
(1) Using alignments of multiple closely related species, we have defined evolutionary signatures for the systematic discovery and characterization of diverse classes of functional elements, including protein-coding genes, RNA structures, microRNAs, developmental enhancers, regulatory motifs, and biological networks.
(2) Using epigenomics datasets of multiple chromatin marks across the complete genome, we have defined chromatin signatures that reveal numerous classes of promoter, enhancer, transcribed, and repressed regions, each with distinct functional properties.
(3) Using diverse functional datasets across many cell types, we have defined multi-cell activity signatures for chromatin states, regulator expression, motif enrichment, and target gene expression, and have used their correlations to link candidate enhancers to their putative target genes, infer cell type-specific activators and repressors, and to predict and validate functional regulator binding in specific chromatin states.
We have used these evolutionary, chromatin, and activity signatures to elucidate the function and regulatory circuitry of the human and fly genomes, to reveal many new insights on animal gene regulation and development, including abundant translational read-through in neuronal proteins, functionality of anti-sense microRNA transcripts, and thousands of novel large intergenic non-coding RNAs.
We have also used these signatures to revisit previously uncharacterized diseaseassociated single-nucleotide polymorphism (SNP) variants linked to several diseases and phenotypes from genome-wide association studies, which has enabled us to provide mechanistic insights into their likely molecular roles.
Overall, our genomic signatures dramatically expand the annotation of the non-coding genome, providing a systematic annotation of chromatin functions, new insights on diverse regulatory mechanisms, and shining new light on previously uncharacterized disease-assocaited variants. We have also developed methods to study systematic differences between the species compared, and uncovered important evolutionary mechanisms for the emergence of new functions.