Department of Computer Science, Rice University, Houston, Texas. Conformational Analysis of Proteins. Analysis of Metabolic Pathways. Acknowledgment.
Biomedical Computing: Generating, Analyzing, and Visualizing Complex High-Dimensional Data Abella J, Antunes D, Devaurs D, Kavraki L, Kim S, Moll M, Novinskaya A Department of Computer Science, Rice University, Houston, Texas
Conformational Analysis of Proteins
Molecular docking: generating conformations of a protein-ligand complex
Generating large datasets of protein conformations: for a large protein (containing 1,500 amino acids and modeled using 3,000 degrees of freedom), 10,000,000 conformations represent 100 GB of data
Iteratively grow ligand in binding site:
Analysis of generated conformations based on a dimensionality reduction method:
C3: low-energy conformational path
N.B.: Considering protein flexibility considerably increases the complexity of the problem
Analyzing complex high-dimensional data using dimensionality-reduction methods: - projection based on flexibility analysis - Principal Component Analysis (PCA) - Isomap (non-linear method) - ... PCA
flexibility analysis
Analysis of Biomolecular Interactions
Exploiting parallelism on high-performance systems:
Isomap
Functional Annotation of Proteins
Analysis of Metabolic Pathways
Semi-supervised learning framework to predict functional annotations from subtle structural variations within a (super)family of proteins
Input: specialized databases for metabolic data - KEGG (17,000 compounds - 10,000 reactions - 4,000 organisms) - MetaCyc (12,000 compounds - 13,000 reactions - 3,000 organisms)
one clustering (out of 3,000) of the kinase structure dataset (2,000 structures)
binding site of kinase ATP: 27 residues; 3,000 position subsets (triplets of residues); each position subset leads to a clustering
binding affinity predicted with SVM
Lysine: pathways clustered using agglomerative hierarchical clustering
E. coli metabolic network
Acknowledgment The Kavraki Lab is supported in part by NSF NRI grant #1317849, NSF ABI grant #1262491, NSF ExCAPE grant #1139011, NSF AF grant #1423304, and NSF ABI grant #0960612