University of Michigan
University of Michigan
560 MSRB II, 1150
W. Medical Center Dr.-SPC5676
48109 Ann Arbor
USA
Project Leader
Prof. Matthias Kretzler
Phone: + 1 734 615 5757
Fax: + 1 734 763 0982
Contact
Project Staff
Assistant Prof. Yuanfang Guan
Co-Investigator
Contact
Institute Presentation
Our translational systems biology research pipeline allows for optimal genome scale data analysis from genes to function. EURenOmics will provide a unique opportunity to analyze cohorts with prospective, well phenotyped cohort of patients with rare renal disease. Matching biosamples of kidney tissue, blood and urine will continue to be procured using Standard Operating Procedures. A large number of projects initiated by EURenomics investigators have already generated genome-scale data sets. The European Renal cDNA Bank (ERCB) has provided samples for large-scale analyses by the international renal research community. We have performed expression profiles from micro-dissected renal tissues from human and murine samples and have generated comprehensive genome-wide data sets from glomerular and tubulo-interstitial compartments from a wide spectrum of rare renal diseases and control samples. We also have generated high throughput RNA-seq transcriptomics data, metabolomics data and whole-genome SNP data for a subset of these samples. This extensive collection of expression profiles along with high quality well documented morphometric and longitudinal clinical data will serve as the basis for our research.
Using standardized and semi-structured workflows involving GenePattern (Fig.2), we ensure highly efficient and transparent data pre-processing, quality control and data-integration.
Multi-scalar data integration: EURenOmics is specifically designed to enable generation of multiple layers of genome wide data in the same individual with kidney disease. Our SME bioinformatics partner Genomatix (www.genomatix.com) has established in close collaboration with us an analysis pipeline capable of combining diverse genome wide data sets for comprehensive molecular disease definitions. Fig. 3 summarizes the data analysis and mining pipeline. Integrating these data sets across the different genome scale data domains, referred to here as multi-scalar analysis, has the potential to identify key drivers of human disease. Such integration will move these analyses from the level of pure association to one of causal inference. Systems genetics approaches will be employed to define regulatory dependencies between genotypic risk, molecular profiles and phenotypic parameters.
These interactive complex data interaction demands the use of a web based interactive, shared data mining portal. The National Center for Integrative Bioinformatics (NCIBI) data portal (http://portal.ncibi.org/gateway/) combines the need of not only an effective communication and training platform but also solves the data storage requirements. A secure web portal allows investigator involved in the project to store and share data, manuscripts and presentations. Because the portal is a web-based platform for both collaboration and analysis, many of the limitations of locally managed data analysis solutions will not be applicable. The portal also serves as a suite of a multitude of bioinformatic analysis tools employed by the ASBC team to serve our distributed research team. The portal structure has proven to be essential for real-time online assistance through available diverse web based communication channels.
In order to facilitate the optimal usage of very rich renal data sets by worldwide kidney research teams we developed a web-based research interface for analyzing complex, disease-specific gene expression data sets using a predefined analysis algorithm. The Nephroseq system (www.nephroseq.org) is a kidney-specific version of the highly successful cancer-specific data-portal, ‘Oncomine’ (www.oncomine.org) and has been developed in collaboration with Drs. Rhodes and Chinnaiyan. Nephroseq is a fully-automated web-based systems biology search engine for context specific renal disease gene expression data mining (Fig. 4). Nephroseq contains human renal gene expression data sets from nearly 2,000 samples encompassing over 59 million gene expression measurements across 15 chronic kidney diseases. Nephroseq is freely available to the worldwide academic community through a web-based interface that allows interrogation of the data sets with two distinct tools. First, Nephroseq allows retrieval of disease and tissue-specific gene expression values for defined molecules of interest. Second, it enables advanced analysis of entire gene expression data sets for a given disease entity by linking to system biology tools in a predefined automated manner. Nephroseq computes differential expression using uniform analysis tools; performs meta-analysis of gene expression across studies including concept analysis using gene set enrichment tools; and displays molecular interaction of co-regulated genes. The automated integration of different external resources into the data warehouse, working in the software background (Fig. 4), allows user-friendly data export and analysis covering a broad spectrum of functionalities. Data sources in Nephroseq include those from the public domain (GEO at NCBI and Array Express at EBI), studies generated in the framework of the O’Brien Renal Center, and data sets obtained by request from ongoing or published studies. Data generated in the UM O’Brien Renal Center are often placed into Nephroseq immediately and no later than 3 years after generation. Nephroseq has been rapidly adopted by kidney researchers.
Expertise of key personnel
Matthias Kretzler, MD has more than 14 years of experience in shared integrated data mining of molecular renal data sets. He has conceived and implemented the most comprehensive multi-center, international study for gene expression of renal disease, the European Renal cDNA Bank (ERCB), he has lead the ASBC over the past five years, has initiated the Nephrotic Syndrome Study Network for integrated systems biology of glomerular diseases. He recruited and trained a research team for kidney disease focused gene expression data mining. He integrated the kidney-specific research focus into the unique bioinformatics environment at the University of Michigan, drawing on the extensive expertise in bioinformatics database management, genome-wide expression data set analysis and data mining using a wide spectrum of approaches available in CCMB and NCIBI. His systems biology research focus is on molecular marker definition of renal disease and definition of transcriptional networks in renal disease. In addition, he has long standing expertise in podocyte cell biology. As a faculty member in the Department for Computational Medicine and Biology he is responsible for the systems biology segment of the Introduction to Bioinformatics and Computational Biology course in the University of Michigan Bioinformatics doctoral program.
Dr. Kretzler will be responsible for the coordination of the diverse aspects of the bioinformatics data integration in EURenOmics. He will use his extensive experience integrating bioinformatics, molecular and clinical approaches, which served more than 70 collaborative research teams, to maintain a standardized data processing pipeline, continuously update the available data mining tools, and ensure optimal applicability of the research tools for the goals of the EURenOmics network.
Yuanfan Guan, PhD, Assistant Professor, Department of Computational Medicine and Bioinformatics and Internal Medicine/Nephrology, Co-I (1.2 month effort): Dr. Guan’s long-standing research interest is to develop bioinformatics tools for integrating heterogeneous functional genomics data, which can assist to understand the protein functions and molecular mechanisms behind renal diseases. She develops statistical and machine learning tools for inferring protein functions and phenotypes though heterogeneous data integration (Guan et al., 2008, Genome Biology, Guan et al., 2010, PLoS Comp. Biol.; Guan et al., in press, PLoS Comp. Biol). The software and tools developed (mouseNET, mouseMAP, KOMPute) are intensively used by the genetics community.
Since her recruitment to U-M in 2011 with joint appointment in the Division of Nephrology and the Department of Computational Medicine and Bioinformatics, she has focused on predicting activated pathway in renal disease, modeling expression changes across disease course, integrating context-specific networks relevant to kidney diseases and predicting isoform functions related to genes causing kidney diseases.
She will use her expertise in the ASBC to adapt her previous tools for genome-wide functional genomic data integration to renal disease-specific data integration and mining. To this end, she has established a locally maintained cluster that has the capability to process all publicly available functional genomic data (including RNAseq) and the forthcoming renal-specific functional genomic data generated by the network. She will apply the Bayesian network approach developed previously for global and tissue-specific networks to generate renal disease-specific networks. She will employ an algorithm prioritizing disease-associated genes and pathway components through mining large-scale, heterogeneous datasets in the ASBC for functional prediction and pathway association of novel disease genes. In addition, she will be responsible for functional analysis of transcripts identified through RNAseq and proteomic data to associate them with the potential roles in renal diseases. Finally, she will be responsible for integrating the above machine learning and statistical tools into user-friendly interfaces in the context of Nephroseq.