Personal tools

You are here:
Home Events Bioinformatics Seminar: Computational Phenotype Prediction by Querying Public Microarray Data Repositories

Bioinformatics Seminar: Computational Phenotype Prediction by Querying Public Microarray Data Repositories

— filed under: ,

What
  • Bioinformatics Seminar
  • talk
When Mar 20, 2007
from 11:00 am to 12:00 pm
Where LT-415
Add event to calendar vCal
iCal

Speaker

Jianjun Hu
Postdoc Research Fellow
Molecular and Computational Biology Section
University of Southern California

Abstract

One fundamental goal of genomics research is to elucidate the molecular underpinnings of complex phenotypic traits such as human diseases and to develop novel methods for diagnosing diseases. Currently, high-throughput genomic databases, gene expression microarray data repositories, and biomedical knowledge bases are being accumulated rapidly. However, few methods are available to effectively integrate these sources of information at different biological scales, for the purposes of phenotypic prediction and genotype-phenotype relationship discovery.

In this talk, we present a framework that exploits the large-scale National Center for Biological Information Gene Expression Omnibus (NCBI GEO) gene expression microarray dataset repository and the Unified Medical Language System (UMLS) biomedical ontology to predict phenotypes from gene expression profiles of unknown traits or diseases. For each reference dataset in the NCBI GEO, we extract descriptive keywords from their meshhead description of related publication and their GEO dataset summary and map them to UMLS concepts. The phenotype of a dataset is thus encoded into a list of UMLS concepts. We then define a cellular-state-switch signature as the ranked list of genes in terms of gene expression changes between two conditions or states. Given a query gene expression dataset, we query the NCBI microarray datasets based on the similarity of the cellular-state-signatures by calculating the ordered-list similarity scores between two profile-pairs. We then use the kernel logistic regression approach to predict the phenotypes of the query dataset in terms of UMLS concepts, which has the advantage of considering the relationships among all the reference datasets. To link phenotypic concepts to their underlying genes, we propose an order-statistics based ranking algorithm to screen out highly differentially expressed genes related to a given UMLS concept. Our experiments on a selected set of GEO datasets demonstrate the utility of public high-throughout biological datasets for phenotype prediction and for elucidating the genotype-phenotype relationships.