Department of Computer Science, Rice University ...

43 downloads 119 Views 2MB Size Report
Department of Computer Science, Rice University, Houston, Texas. Conformational Analysis of Proteins. Analysis of Metabolic Pathways. Acknowledgment.
Biomedical Computing: Generating, Analyzing, and Visualizing Complex High-Dimensional Data Abella J, Antunes D, Devaurs D, Kavraki L, Kim S, Moll M, Novinskaya A Department of Computer Science, Rice University, Houston, Texas

Conformational Analysis of Proteins

Molecular docking: generating conformations of a protein-ligand complex

Generating large datasets of protein conformations: for a large protein (containing 1,500 amino acids and modeled using 3,000 degrees of freedom), 10,000,000 conformations represent 100 GB of data

Iteratively grow ligand in binding site:

Analysis of generated conformations based on a dimensionality reduction method:

C3: low-energy conformational path

N.B.: Considering protein flexibility considerably increases the complexity of the problem

Analyzing complex high-dimensional data using dimensionality-reduction methods: - projection based on flexibility analysis - Principal Component Analysis (PCA) - Isomap (non-linear method) - ... PCA

flexibility analysis

Analysis of Biomolecular Interactions

Exploiting parallelism on high-performance systems:

Isomap

Functional Annotation of Proteins

Analysis of Metabolic Pathways

Semi-supervised learning framework to predict functional annotations from subtle structural variations within a (super)family of proteins

Input: specialized databases for metabolic data - KEGG (17,000 compounds - 10,000 reactions - 4,000 organisms) - MetaCyc (12,000 compounds - 13,000 reactions - 3,000 organisms)

one clustering (out of 3,000) of the kinase structure dataset (2,000 structures)

binding site of kinase ATP: 27 residues; 3,000 position subsets (triplets of residues); each position subset leads to a clustering

binding affinity predicted with SVM

Lysine: pathways clustered using agglomerative hierarchical clustering

E. coli metabolic network

Acknowledgment The Kavraki Lab is supported in part by NSF NRI grant #1317849, NSF ABI grant #1262491, NSF ExCAPE grant #1139011, NSF AF grant #1423304, and NSF ABI grant #0960612