Penn State Biology and Molecular Biology Home Page.  Includes image map to PSU Home, BMB Home, and Eberly College of Science Home.

Ross C. Hardison

T. Ming Chu Professor of Biochemistry and Molecular Biology

304 Wartik Laboratory, University Park, PA 16802
Phone: (814) 863-0113
Fax: (814) 863-7024
E-mail: rch8@psu.edu

B.A. in chemistry, Vanderbilt University
Ph.D. in biochemistry, University of Iowa

Hardison Lab Web Site

Functional genomics of noncoding DNA: Regulation of mammalian erythroid genes

The determination of almost complete genomic DNA sequences from many species, including humans, is revolutionizing biological sciences. While the first wave of post-genomic analysis is focused on identifying the protein-coding portions of the genome, results of that effort will assign a function to for less than 2% of the genomic DNA in mammals (and other species). Another 40-45% of the human genome is repetitive DNA generated by retrotransposition. That leaves about half the genome (noncoding, nonrepetitive DNA) to analyze. Some people have considered this DNA (mostly within introns or between genes) to be “junk”, but it is clear that many localized regions are involved in regulating the expression of the genes. Regulation of expression is important to understand both for fundamental issues in developmental biology and for exploring novel avenues for therapeutic advances, improvement in agriculture, etc. Hence it is critical to find the regions of noncoding, nonrepetitive DNA that are likely to be involved in an important function, such as gene regulation.

In a collaboration that began around 1990, Dr. Webb Miller (Biology and Computer Science and Engineering) and I have developed computer tools to align long sequences of genomic DNA and to find strong candidates for functional sequences in the alignments of noncoding DNA. We have analyzed globin gene complexes from mammals extensively, and have used the output of our programs to identify and experimentally test candidate novel regulatory elements. These biochemical studies have demonstrated the efficacy of the bioinformatics tools. Since 2002, we have collaborated with Dr. David Haussler (HHMI and University of California at Santa Cruz) to use our software to align the sequenced genomes of human, mouse, rat, chicken and more species (28 vertebrates as of 2007). We participated in the landmark studies of the International Mouse Genome Sequencing Consortium that showed, among other things, that the amount of the human DNA under selection is at least 3 times greater than the coding capacity and that the rates of evolution vary markedly along chromosomes. Building from these studies, with our collaborators we compute scores based on multiple alignments that estimate the probability that a genomic segment is under selection and/or serves as a cis-regulatory module. These scores are computed across the human and mouse genomes and are made available to the public via genome browsers (see “Web sites” below or link to our Lab Web Site above).

Building on the strong foundation of Dr. Miller’s alignment software, we have been able to recruit and build a strong Center for Comparative Genomics and Bioinformatics (CCGB) within the intercollege Huck Institute for Genomics, one of the Huck Institutes for Life Sciences. Faculty from the Departments of Biochemistry and Molecular Biology, Biology, Computer Science and Engineering, and Statistics work together on many aspects of comparative and functional genomics. In all our activities, we continue our strong dedication to easy accessibility of our software and data repositories. Multi-species whole-genome alignments of bacteria (enterics), yeast, flies, worms and vertebrates are available at genome browsers. The alignments can be extracted and large datasets of genomic information can be imported and analyzed using the Galaxy workspace (see additional information at the Nekrutenko lab). Custom alignments can be generated on our PipMaker/MultiPipMaker server or other servers running the same alignment software. Hemoglobin variants and thalassemia mutations are available in HbVar, and a union of many (soon to be most or all) data in locus specific databases is accessible through PhenCode.

The expertise and collaborative environment of the CCGB has led to participation of its members in several international consortia for genomes and for functional genomics. Members of the Center and the Hardison lab have been involved in the analysis of the genomes of mouse, rat, chimpanzee, rhesus macaque, chicken and platypus genomes. Several groups were deeply involved in the data analysis and integration of the pilot phase ENCODE project, an international endeavor to annotate all functional sequences in the human genome.

Experimental work in our laboratory tests the effectiveness of alignment-based predictions of erythroid cis-regulatory regions. One important avenue applies two predictive measures to loci known to be up- or down-regulated during late erythroid maturation. One measure, called regulatory potential or RP, results from a statistical machine-learning procedure that scores aligned DNA segments for the similarity of patterns (strings of alignment columns) to those in a training set of alignments in known regulatory regions and difference from patterns in alignments in (likely) neutral DNA. This methodology resulted from a collaboration headed by Dr. Francesca Chiaromonte. The second measure is conservation of matches to the binding site motif for a transcription factor; in this case it is the GATA-1 factor required for late erythroid maturation. By testing almost 100 DNA segments in two different enhancer assays (transient transfection and site-directed stable integration of reporter gene constructs), we validated over half of the predicted cis-regulatory modules, and the validation rate increased with higher RP scores (see Figure below). In current collaborations, we are applying this with good success to additional erythroid protein-coding genes (e.g., Ahsp, c-kit) and microRNA genes.

A second experimental approach is agnostic to the sequence alignments. In collaboration with Dr. Mitch Weiss at Children’s Hospital of Philadelphia and Dr. Roland Green at NimbleGen, we have used chromatin immunoprecipitation (ChIP) with antibody against the GATA-1 protein analyzed by hybridization to high-density tiling arrays (NimbleGen) to identify sites occupied by GATA-1 on 67 million base pairs of mouse chromosome 7. Further validation by quantitative PCR and functional assays show that this study has exquisite specificity and good sensitivity. These new results enable further studies on the determinants of occupancy and of the role of these regulatory regions in multiple aspects of differentiation.

Future work will expand the ChIP analyses genome-wide and to multiple factors, integrating it with data from community resources such as the new full-scale ENCODE project. We are moving toward using the analysis of multiple sequences to improve understanding of the evolution and mechanism of regulatory regions, while improving both the sensitivity and specificity of our predictions. Our long-term goal is a full understanding of the many regulatory interactions and their dynamics during erythropoiesis. Our hope is that insights from such studies will lead to insights with real therapeutic potential for the very large number of people affected by blood disorders.

 

 

Figure Legend. Prediction of erythroid cis-regulatory modules (CRMs) and validation rate. Panel a shows tracks of predicted CRMs, gene models, regulatory potential (RP) and conservation in vertebrate species around the mouse gene Gata2, which encodes a transcription factor related to GATA-1 and is under the control of GATA-1. The predicted erythroid CRMs are available throughout the mouse and human genomes, and they show a good validation rate around erythroid genes, as summarized in panel b. The distribution of mean RP scores for all DNA fragments in eight loci (total of about 1 Mb of DNA) are shown with the yellow bars. Most have a mean score less than 0, and hence are not predicted to be CRMs. All noncoding DNA fragments with an RP score above 0.05 and also containing a conserved match to a GATA-1 binding site were tested for enhancer activity, and fragments with lower RP scores (and conserved matches to a GATA-1 binding site) were sampled for testing (numbers tested are given above the panel). Most of the preCRMs were validated in experimental tests (red bars), with a substantial increase in validation rate for higher RP. Details are in Wang et al. (2006) Genome Research 16:1480-1492.

Web sites:

  • Center for Comparative Genomics and Bioinformatics - http://www.bx.psu.edu/

  • Galaxy workspace - http://main.g2.bx.psu.edu/
  • Genome-wide predictions of erythroid cis-regulatory modules in mouse and human - http://www.bx.psu.edu/~ying/dataset/prediction/
  • Genome-wide predictions of any cis-regulatory module in human           http://www.bx.psu.edu/~ross/share/RP(0.05,200)NoKGexons.txt
  • PhenCode - http://globin.bx.psu.edu/phencode/
  • HbVar - http://globin.bx.psu.edu/hbvar/menu.html
  • Servers for custom alignments - http://www.bx.psu.edu/
  • UCSC Genome Browser - http://genome.ucsc.edu/

     

    Publications with links to full text for most recent ones

  • Hardison publication list - http://www.bx.psu.edu/~ross/pubs/PublicationList.html
  • PubMed  - search for “hardison r NOT hardison rm”

Recent Publications:

As of 2007, the total number of publications is 150, which have been cited in excess of 8900 times.

  • Waterston, R.H., K. Lindblad-Toh, many authors including R.C. Hardison … E.S. Lander (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420: 520-562.
  • Hardison, R.C., K.M. Roskin, S. Yang, M. Diekhans, W.J. Kent, R. Weber, L. Elnitski, J. Li, M. O'Connor, D. Kolbe, S. Schwartz, T.S. Furey, S. Whelan, N. Goldman, A. Smit, W. Miller, F. Chiaromonte and D. Haussler (2003) Co-variation in frequencies of substitution, deletion, transposition and recombination during eutherian evolution. Genome Res. 13: 13-26.
  • Elnitski, L., R.C. Hardison, J. Li, S. Yang, D. Kolbe, P. Eswara, M.J. O'Connor, S. Schwartz, W. Miller and F. Chiaromonte (2003) Distinguishing regulatory DNA from neutral sites. Genome Res. 13: 64-72.
  • Schwartz, S., W.J. Kent, A. Smit, Z. Zhang, R. Baertsch, R.C. Hardison, D. Haussler and W. Miller (2003) Human-mouse alignments with Blastz. Genome Res. 13: 103-105.
  • Giardine, B.M., L. Elnitski, C. Riemer, I. Makalowska, S. Schwartz, W. Miller and R.C. Hardison (2003) GALA, a database for genomic sequence alignments and annotations. Genome Res. 13: 732-741.
  • Bulger, M., D. Schubeler, M.A. Bender, J. Hamilton, C.M. Farrell, R.C. Hardison, and M. Groudine (2003) A complex chromatin "landscape" revealed by patterns of nuclease sensitivity and histone modification within the mouse beta-globin locus. Mol. Cell. Biol. 23: 5234-5244.
  • Gibbs, R.A., G.M. Weinstock, many authors including R.C. Hardison…F. Collins (2004) Genome Sequence of the Brown Norway Rat Yields Insights into Mammalian Evolution. Nature 428: 493-521.
  • Yang, S., A.F. Smit, S. Schwartz, F. Chiaromonte, K.M. Roskin, D. Haussler, W. Miller, and R.C. Hardison. 2004. Patterns of insertions and their covariation with substitutions in the rat, mouse and human genomes.  Genome Res. 14: 517-527.
  • Kolbe, D., J. Taylor, L. Elnitski, P. Eswara, J. Li, W. Miller, R. Hardison, and F. Chiaromonte. 2004. Regulatory potential scores from genome-wide three-way alignments of human, mouse and rat. Genome Res. 14: 700-707.
  • Miller, W., K.D. Makova, A. Nekrutenko, and R.C. Hardison (2004) Comparative Genomics. In Annual Reviews of Genomics and Human Genetics, 5:15-56.
  • Welch, J. J., J. A. Watts, C. R. Vakoc, Y. Yao, H. Wang, R. C. Hardison, G. A. Blobel, L. A. Chodosh and M. J. Weiss (2004). Global regulation of erythroid gene expression by transcription factor GATA-1. BLOOD 104: 3136-3147.
  • Feingold, E.A., P.J. Good, many authors including R. Hardison … (2004) The ENCODE (ENCyclopedia of DNA Elements) Project. Science 306: 636-640.
  • Hillier, L. W., W. Miller, E. Birney, W. Warren, R. C. Hardison, many authors … R. K. Wilson (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432: 695-716.
  • King, David C., James Taylor, Laura Elnitski, Francesca Chiaromonte, Webb Miller, and Ross C. Hardison (2005) Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. Genome Res. 15: 1051-1060.
  • Giardine, Belinda, Cathy Riemer, Ross C. Hardison, Richard Burhans, Laura Elnitski, Prachi Shah, Yi Zhang, Daniel Blankenberg, Istvan Albert, Webb Miller, W. James Kent, and Anton Nekrutenko (2005) Galaxy: A platform for interactive large-scale genome analysis. Genome Res. 15: 1451-1455.
  • Wang, H.,  Y. Zhang, Y. Cheng, Y. Zhou, D.C. King, J. Taylor, F. Chiaromonte, J. Kasturi, H. Petrykowska, B. Gibb, C. Dorman, W. Miller, L.C. Dore, J. Welch, M.J. Weiss, R.C. Hardison (2006) Experimental Validation of Predicted Mammalian Erythroid Cis-Regulatory Modules. Genome Res. 16: 1480-1492.
  • Taylor, J., S. Tyekucheva, D.C. King, R. C. Hardison, W. Miller and F. Chiaromonte (2006) ESPERR: Learning strong and weak signals in genomic sequence alignments to identify functional elements, Genome Res. 16: 1596-1604.
  • Giardine, B., C. Riemer, T. Hefferon, D.J. Thomas, F. Hsu, multiple authors, J. Kent, W. Miller, R.C. Hardison (2007) PhenCode: Connecting ENCODE Data with Mutations and Phenotype. Human Mutation, 28:554-562.
  • The Rhesus Macaque Genome Sequencing and Analysis Consortium (2007) Evolutionary and biomedical insights from the rhesus macaque genome. Science 316: 222-234.
  • Margulies, E.H., G.M. Cooper, G. Asimenos, D.J. Thomas, many other authors including R.C. Hardison, W. Miller … (2007) Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome. Genome Res. 17:760-774.
  • King, D.C., J. Taylor, Y. Zhang, Y. Cheng, H.A. Lawson, J. Martin, ENCODE groups for Transcriptional Regulation and Multispecies Alignment, F. Chiaromonte, W. Miller, and R.C. Hardison (2007) Finding cis-regulatory elements using comparative genomics: some lessons from ENCODE data. Genome Res. 17:775-786.
  • Blankenberg, D., J. Taylor, I. Schenck, J. He, Y. Zhang, M. Ghent, N. Veeraraghavan, I. Albert, W. Miller, K. Makova, R.C. Hardison, and A. Nekrutenko (2007) A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Res. 17: 960-964.
  • The ENCODE Project Consortium (2007) The ENCODE pilot project: identification and analysis of functional elements in 1% of the human genome. Nature, 447:799-816.
  • Webb Miller, Kate Rosenbloom, Ross C. Hardison, Minmei Hou, James Taylor, …several authors … Richard A. Gibbs, Eric S. Lander, Adam Siepel, David Haussler, W. James Kent (2007) 28-way vertebrate alignment and conservation track in the UCSC Genome Browser. Genome Res., in press.

Search the MEDLINE database at PubMed for articles by R Hardison

Back to Penn State Home Page.  Back to Biochemistry & Molecular Biology home page. Visit Eberly College of Science.