Introduction
The goal of this project is to identify and distinguish biomolecules,
and develop clustering methods based on certain signatures or fingerprints.
The fingerprints are computed from both shapes and properties such as electrostatic potentials.
MolFinger database stores MACTs as unique protein chain metadata for 494
protein chains. MACT is used to support queries for cluster based
similarity. Based on the definition of norms, similarity metric can be
defined for molecular properties, such as electron density and electrostatic
potential. We use MOBIOS (Molecular Biological Information System)
(http://www.cs.utexas.edu/users/mobios/) for our MolFinger database. MOBIOS
uses metric space indexing techniques and provide database query language.
MolFinger database allows searching the database with given inputs, which
are PDB ID and distance value. The database must take the distance value and
calculate similarity. This method reduces the iterations from O (n^2) to O
(n*log n) by using metric space indexing technique. MOBIOS also supports a
range query. The range query can extract certain protein chains within the
similarity range out of 494 protein chains. In terms of similarity, we have
a range of 1-0 where a value of 0 indicates that the molecules are
dissimilar. However, the database uses a metric space indexing technique
based on a distance value. Here, a value of 0 represents a high similarity
(i.e. a similarity value of 1).
We deal with data from PDB, mainly proteins and nucleic acids.
Two different representation for biomolecules are used :
(i) Flexible Chain Complex (FCC) and (ii) blobby model. FCC contains bone-level
structures and blobby model contains blurred structures. Since
FCC contains too many information, reduced FCC representation is useful
for defining bone-level signatures.
The specific steps in the project are to
(i) compute various signatures or fingerprints of bio-molecules, mainly protein and nucleic acids.
(ii) define distance (similarity) metric based on the fingerprints (meta data)
(iii) cluster large number of bio-molecules
(iv) compare clustering methods
( CATH ,
DALI ,
SCOP ,
CE ,
Pfam and etc. )
(v) use combination of fingerprints for improved clustering
Geometric, topological, and combinatorial properties of a biomolecule defines
fingerprints of the biomolecule with volumetric representation.
For example, the distribution of area, volume, and gradient integral
for isosurfaces characterizes the geometric property.
Contour tree and Betti numbers provide both a topological and combinatorial
characterization. Fingerprints based on those properties define distance
metrics which is used for clustering of biomolecules. We build a database that
stores protein metadata and use them to support queries for clustering based on
similarity of the fingerprints.
We develope accelerated visualization techniques for each biomolecular
representation, FCC and blobby models by using programmable graphics hardware.
Millions of atoms, bonds, cylinders, and helices are rendered in an interactive rate.
- PDB -> FCC
(description)
Structure and Skeletal Graph Representation (Atomic level approach)
- PDB -> Rawiv, RawV
Volume Representation (Blobby level approach)
In this model, a molecule is represented as an electron density map of the
molecule. We may control the feature resolution of the model by controlling a
blobbyness parameter. A level set of the density map approximates a molecular
surface. The geometry (e.g. contour spectrum) and topology (e.g. contour tree,
morse graph, ...) of the density map are useful for capturing and comparing the
structures of molecules.
- PDB -> Raw, Rawn, Rawc, and Rawnc
Surface Representation
Additional information about Molecular Signature Database.
|