Biomolecular Object Network Databank: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 13:
SMID can be queried by entering a protein GI, ___domain identifier, PDB ID or SMID ID. The results of a search provide small molecule, protein, and ___domain information for each interaction identified in the database. Interactions with non-biological contacts are normally screened out by default.
SMID-BLAST is a tool developed to annotate known small-molecule binding sites as well as to predict binding sites in proteins whose [[crystal structures]] have not yet been determined. The prediction is based on extrapolation of known interactions, found in the PDB, to interactions between an uncrystallized protein with a small molecule of interest. SMID-BLAST was validated against a test set of known small molecule interactions from the PDB. It was shown to be an accurate predictor of protein-small molecule interactions; 60% of predicted interactions identically matched the PDB annotated binding site, and of these 73% had greater than 80% of the binding residues of the protein correctly identified. Hogue, C et al. estimated that 45% of predictions that were not observed in the PDB data do in fact represent true positives[3]. <ref>Snyder, K, ''et al''. Domain-based small molecule binding site annotation. BMC Bioinformatics 7: 152 (2006)</ref>
 
==Biomolecular Interaction Network Database (BIND)==
Line 26:
# The second type of BIND entries are the molecular complexes. Molecular complexes are defined as an aggregate of molecules that are stable a have a function when bound to each other. The record may also contain some information on the role of the complex in various interactions and the molecular complex entry links data from 2 or more interaction records.
# The third component of BIND is the pathway record section. A pathway consists of a network of interactions that are involved in the regulation of cellular processes. This section may also contain information on phenotypes and diseases related to the pathway.
<br />The minimum amount of information needed to create an entry in BIND is a [[PubMed]] publication reference and an entry in another database (e.g. [[GenBank]]). Each entry withiin the database provides references/authors for the data. As BIND is a constantly growing database, all components of BIND track updates and changes.<ref>Bader, GD, ''et al.'' BIND- The Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 29: 242-245 [4](2001).</ref>
 
BIND is based on a data specification written using Abstract Syntax Notation 1 ([[ASN.1]]) language. ASN.1 is used also by [[NCBI]] when storing data for their [[Entrez]] system and because of this BIND uses the same standards as NCBI for data representation. The ASN.1 language is preferred because it can be easily translated into other data specification languages (e.g. [[XML]]), can easily handle complex data and can be applied to all biological interactions – not just proteins. [4]<ref>Bader, GD, ''et al.'' BIND- The Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 29: 242-245 (2001).</ref> Bader and Hogue (2000) have prepared a detailed manuscript on the ASN.1 data specification used by BIND.<ref>Bader, [GD, Hogue, CWV. BIND- a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. ''Bioinformatics'' 16(5]): 465-477 (2000).</ref>
 
===Data submission and curation===
User submission to the database encouraged. To contribute to the database, one muust submit: contact info, [[PubMed]] identifier and the two molecules that interact. The person who submits a record is the owner of it. All records are validated before being made public and BIND is curated for quality assurance. BIND curation has two tracks: high-throughput (HTP) and low-throughput (LTP). HTP records are from papers which have reported more than 40 interaction results from one experimental methodology. HTP curators typically have a [[bioinformatics]] backgrounds. The HTP curators are responsible for the collection of storage of experimental data and they also create scripts to update BIND based on new publications. LTP records are curated by individuals with either an MSc or PhD and laboratory experience in interaction research. LTP curators are given further training through the [[Canadian Bioinformatics Workshops]]. Information on small molecule chemistry is curated separately by chemists to ensure the curator is knowledgeable about the subject. The priority for BIND curation is to focus on LTP to collect information as it is published. Although, HTP studies provide more information at once, there are more LTP studies being reported and similar numbers of interactions are being reported by both tracks. In 2004, BIND collected data from 110 journals.<ref>Alfarano, [6]C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 (2005). </ref>
 
===Database growth===
BIND has grown significantly since its conception; in fact, the database saw a 10 fold increase in entries between 2003 and 2004. By September 2004, there were over 100,000 interaction records by 2004 (including 58,266 protein-protein, 4,225 genetic, 874 protein-small molecule, 25,857 protein-DNA, and 19,348 biopolymer interactions). The database also contains sequence information for 31,972 proteins, 4560 DNA samples and 759 RNA samples. These entries have been collected from 11,649 publications; therefore, the database represents an important amalgamation of data. The organisms with entries in the database include: ''[[Saccharomyces cerevisiae]]'', ''[[Drosophila melanogaster]]'', ''[[Homo sapiens]]'', ''[[Mus musculus]]'', ''[[Caenorhabditis elegans]]'', ''[[Helicobacter pylori]]'', ''[[Bos taurus]]'', [[HIV-1]], ''[[Gallus gallus]]'', ''[[Arabidopsis thaliana]]'', as well as others. In total, 901 [[taxa]] were included by September 2004 and BIND has been split up into BIND-Metazoa, BIND-Fungi, and BIND-Taxroot.<ref>Alfarano, C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 [6](2005).</ref>
 
Not only is the information contained within the database continually updated, the software itself has gone through several revisions. Version 1.0 of BIND was released in 1999 and based on user feedback it was modified to include additional detail on experimental conditions required for binding and a hierarchical description of cellular ___location of the interaction. Version 2.0 was released in 2001 and included the capability to link to information available in other databases.<ref> [4]Bader, GD, ''et al.'' BIND- The Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 29: 242-245 (2001).</ref> Version 3.0 (2002) expanded the database from physical/biochemical interactions to also include genetic interactions.<ref>Bader, [7]GD, ''et al''. BIND: the Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 31: 248-250 (2003).</ref> Version 3.5 (2004) included a refined user-interface that aimed to simplify information retrieval.<ref>Alfarano, [6]C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 (2005).</ref> In 2006, BIND was incorporated into the Biomolecular Object Network Database (BOND) where it continues to be updated and improved.
 
===Special features===
BIND was the first database of its kind to contain info on biomolecular interactions, reactions and pathways in one schema. It is also the first to base its [[ontology]] on chemistry which allows 3D representation of molecular interactions. The underlying chemistry allows molecular interactions to be described down to the atomic level of resolution.<ref>Alfarano, C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 [6](2005).</ref>
 
PreBIND an associated system for data mining to locate biomolecular interaction information in the scientific literature. The name or [[accession number]] of a protein can be entered and PreBIND will scan the literature and return a list of potentially interacting proteins. BIND [[BLAST]] is also available to find interactions with proteins that are similar to the one specified in the query.<ref>Alfarano, C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 [6](2005).</ref>
 
BIND offers several “features” that many other proteomics databases do not include. The authors of this program have created an extension to traditional [[IUPAC]] nomenclature to help describe [[post-translational modifications]] that occur to amino acids. These modifications include: [[acetylation]], [[formylation]], [[methylation]], [[palmitoylation]], etc. the extension of the traditional IUPAC codes allows these amino acids to be represented in sequence form as well [4]. BIND also utilizes a unique visualization tool known as [[OntoGlyphs]]. The OntoGlyphs were developed based on [[Gene Ontology]] (GO) and provide a link back to the original GO information. A number of GO terms have been grouped into categories, each one representing a specific function, binding specificity, or localization in the cell. There are 83 OntoGlyph characters in total. There are 34 functional OntoGlyphs which contain information about the role of the molecule (e.g. cell physiology, ion transport, signaling). There are 25 binding OntoGlyphs which describe what the molecule binds (e.g. ligands, DNA, ions). The other 24 OntoGlyphs provide information about the ___location of the molecule within a cell (e.g. nucleus, cytoskeleton). The OntoGlyphs can be selected and manipulated to include or exclude certain characteristics from search results. The visual nature of the OntoGlyphs also facilitates pattern recognition when looking at search results. [6]<ref>Alfarano, C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 (2005).</ref> [[ProteoGlyphs]] are graphical representations of the structural and binding properties of proteins at the level of conserved domains. The protein is diagrammed as a straight horizontal line and glyphs are inserted to represent conserved domains. Each glyph is displayed to represent the relative position and length of its alignment in the protein sequence.
 
===Accessing the database===
Line 51:
 
==User statistics==
The number of Unleashed Registrants has increased 10 fold since the integration of BIND. As of December 2006 registration fell just short of 10,000. Subscribers to the commercial versions of BOND fall into six general categories; [[agriculture]] and [[food]], [[biotechnology]], [[pharmaceuticals]], [[informatics]], [[materials]] and other. The biotechnology sector is the largest of these groups, holding 28% of subscriptions. Pharmaceuticals and informatics follow with 22% and 18% respectively. The [[United State]] holds the bulk of these subscriptions, 69%. Other countries with access to the commercial versions of BOND include [[Canada]], the [[United Kingdom]], [[Japan]], [[China]], [[Korea]], [[Germany]], [[France]], [[India]] and [[Australia]]. All of these countries fall below 6% in user share.<ref>[2]http://bond.unleashedinformatics.com BOND at Unleashed Informatics]</ref>
==References==
<references />
 
 
3. Snyder, K, ''et al''. Domain-based small molecule binding site annotation. ''BMC Bioinformatics'' 7: 152 (2006).
 
4. Bader, GD, ''et al.'' BIND- The Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 29: 242-245 (2001).
 
5. Bader, GD, Hogue, CWV. BIND- a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. ''Bioinformatics'' 16(5): 465-477 (2000).
 
6. Alfarano, C, ''et al.'' The Biomolecular Interaction Network Database and related tools 2005 update. ''Nucleic Acids Research'' 33: D418-D424 (2005).
 
7. Bader, GD, ''et al''. BIND: the Biomolecular Interaction Network Database. ''Nucleic Acids Research'' 31: 248-250 (2003).
 
[[Category: Bioinformatics databases]]