Bioinformatics Seminar: Annotating the Non-Protein Coding Regions of the Human Genome
| What |
|
|---|---|
| When |
Mar 27, 2007 from 11:00 am to 12:00 pm |
| Add event to calendar |
|
Speaker
Dr. Deyou Zheng
Yale University
Abstract
Coding exons of the estimated 21,000 or so protein-coding genes occupy ~2% of the genomic space in the human genome, while the majority of the DNA lies in regions between exons or genes. One interesting component within these vast non-exonic regions is pseudogenes, which are sequences displaying sequence and structural similarity to functional genes but disabled by various mutations. Their identification is an important problem in computational genomics, and is also critical for obtaining an accurate picture of the human genome's structure and function. We have developed a computational pipeline for distinguishing pseudogenic sequences from functional genes. Our analyses identified about 20,000 pseudogenes or pseudogene fragments that arise from either retrotranspositions or DNA duplications. Comparative genomics demonstrated that a significant fraction of the processed pseudogenes is primate specific sequences, highlighting the increasing retrotransposition activity in primates. Furthermore, an integrated analysis of multiple lines of evidence derived from recent functional genomics data demonstrated that a good percentage of human pseudogenes is likely transcriptionally alive. Our discovery indicates that pseudogene transcription can contribute to the complexity of the human transcriptome, and suggest that some pseudogenes might be a source of non-coding RNAs.

