Content deleted Content added
m →Biomolecular Interaction Network Database (BIND): Copyedit. Correct caps in section header. |
→References: Category change |
||
(50 intermediate revisions by 33 users not shown) | |||
Line 1:
The '''Biomolecular Object Network Databank''' is a [[bioinformatics]] [[databank]] containing information on [[small molecule]] structures and interactions. The databank integrates a number of existing databases to provide a comprehensive overview of the information currently available for a given molecule.
==Background== ▼
{{Infobox
The [[Blueprint Initiative]] started as a research program in the lab of [[Dr. Christopher Hogue]] at the [[Samuel Lunenfeld Research Institute]] at [[Mount Sinai Hospital in Toronto]]. On December 14, 2005 [[Unleashed Informatics Limited]] acquired the commercial rights to The Blueprint Initiative [[intellectual property]]. This included rights to the protein interaction database BIND, the small molecule interaction database SMID, as well as the data warehouse SeqHound. Unleashed Informatics is a data management service provider and is overseeing the management and curation of The Blueprint Initiative under the guidance of Dr. Hogue[1]. ▼
==Biomolecular Object Network Databank (BOND)==▼
The Biomolecular Object Network Databank, which integrates the original Blueprint Initiative databases as well as other databases, such as [[Genbank]], combined with many tools required to analyze these data. Annotation links for sequences, including [[taxon identifiers]], [[redundant sequences]], [[Genome Ontology descriptions]], [[Online Mendelian Inheritance in Man]] identifiers, [[conserved domains]], data base cross-references, [[LocusLink Identifiers]] and complete genomes are also available. Bond facilitates cross-database queries and is an [[open access]] resource which integrates interaction and sequence data[2]. ▼
▲The
==Construction==
▲
==Small Molecule Interaction Database (SMID)==
The [[Small Molecule]] Interaction Database is a database containing protein ___domain-small molecule interactions. It uses a
SMID can be queried by entering a
▲The Small Molecule Interaction Database is a database containing protein ___domain-small molecule interactions. It uses a [[___domain-based approach]] to identify [[___domain families]], found in the [[Conserved Domain Database (CCD)]], which interact with a [[query]] small molecule. The CCD from [[NCBI]] amalgamates data from several different sources; [[Protein FAMilies (PFAM)]], [[Simple Modular Architecture Research Tool (SMART)]], [[Cluster of Orthologous Genes (COGs)]], and NCBI’s own curated sequences. The data in SMID is derived from the Protein Data Bank (PDB), a database of known protein crystal structures.
▲SMID can be queried by entering a [[protein GI]], [[___domain identifier]], [[PDB ID]] or [[SMID ID]]. The results of a search provide small molecule, protein, and ___domain information for each interaction identified in the database. Interactions with non-biological contacts are normally screened out by default.
SMID-BLAST is a tool developed to annotate known small-molecule binding sites as well as to predict binding sites in proteins whose [[crystal structures]] have not yet been determined. The prediction is based on extrapolation of known interactions, found in the PDB, to interactions between an uncrystallized protein with a small molecule of interest. SMID-BLAST was validated against a test set of known small molecule interactions from the PDB. It was shown to be an accurate predictor of protein-small molecule interactions; 60% of predicted interactions identically matched the PDB annotated binding site, and of these 73% had greater than 80% of the binding residues of the protein correctly identified. Hogue, C et al. estimated that 45% of predictions that were not observed in the PDB data do in fact represent true positives
===Introduction===
The idea of a database to document all known molecular interactions was originally put forth by [[Anthony Pawson|Tony Pawson]] in the
The major goals of the BIND project are: to create a public proteomics resource that is available to all; to create a platform to enable [[datamining]] from other sources (PreBIND); to create a platform capable of presenting visualizations of complex molecular interactions. From the beginning, BIND has been [[open access]] and software can be freely distributed and modified. Currently, BIND includes a data specification, a database and associated data mining and visualization tools. Eventually, it is hoped that BIND will be a collection of all the interactions occurring in each of the major model organisms.▼
▲The major goals of the BIND project are: to create a public proteomics resource that is available to all; to create a platform to enable [[datamining]] from other sources (PreBIND); to create a platform capable of presenting visualizations of complex molecular interactions. From the beginning, BIND has been [[Open access (publishing)|open access]] and software can be freely distributed and modified. Currently, BIND includes a data specification, a database and associated data mining and visualization tools. Eventually, it is hoped that BIND will be a collection of all the interactions occurring in each of the major model organisms.
===Database structure===
BIND contains information on three types of data: interactions, molecular complexes and pathways.
# Interactions are the basic component of BIND and describe how 2 or more objects (A and B) interact with each other. The objects can be a variety of things: [[DNA]], [[RNA]], [[
# The second type of BIND entries are the molecular complexes. Molecular complexes are defined as an aggregate of molecules that are stable
# The third component of BIND is the pathway record section. A pathway consists of a network of interactions that are involved in the regulation of cellular processes. This section may also contain information on phenotypes and diseases related to the pathway.
<br />The minimum amount of information needed to create an entry in BIND is a [[PubMed]] publication reference and an entry in another database (e.g. [[GenBank]]). Each entry
BIND is based on a data specification written using Abstract Syntax Notation 1 ([[ASN.1]]) language. ASN.1 is used also by [[NCBI]] when storing data for their [[Entrez]] system and because of this BIND uses the same standards as NCBI for data representation. The ASN.1 language is preferred because it can be easily translated into other data specification languages (e.g. [[XML]]), can easily handle complex data and can be applied to all biological interactions – not just proteins [4]. Bader and Hogue (2000) have prepared a detailed manuscript on the ASN.1 data specification used by BIND [5].▼
▲BIND is based on a data specification written using Abstract Syntax Notation 1 ([[ASN.1]]) language. ASN.1 is used also by [[National Center for Biotechnology Information|NCBI]] when storing data for their [[Entrez]] system and because of this BIND uses the same standards as NCBI for data representation. The ASN.1 language is preferred because it can be easily translated into other data specification languages (e.g. [[XML]]), can easily handle complex data and can be applied to all biological interactions – not just proteins
===Data submission and curation===
User submission to the database is encouraged. To contribute to the database, one
===Database growth===
BIND has grown significantly since its conception; in fact, the database saw a 10 fold increase in entries between 2003 and 2004. By September 2004, there were over 100,000 interaction records by 2004 (including 58,266 protein-protein, 4,225 genetic, 874 protein-small molecule, 25,857 protein-DNA, and 19,348 biopolymer interactions). The database also contains sequence information for 31,972 proteins, 4560 DNA samples and 759 RNA samples. These entries have been collected from 11,649 publications; therefore, the database represents an important amalgamation of data. The organisms with entries in the database include: ''[[Saccharomyces cerevisiae]]'', ''[[Drosophila melanogaster]]'', ''[[Homo sapiens]]'', ''[[Mus musculus]]'', ''[[Caenorhabditis elegans]]'', ''[[Helicobacter pylori]]'', ''[[Bos taurus]]'', [[HIV-1]], ''[[Gallus gallus]]'', ''[[Arabidopsis thaliana]]'', as well as others. In total, 901 [[taxa]] were included by September 2004 and BIND has been split up into BIND-Metazoa, BIND-Fungi, and BIND-Taxroot
Not only is the information contained within the database continually updated, the software itself has gone through several revisions. Version 1.0 of BIND was released in 1999 and based on user feedback it was modified to include additional detail on experimental conditions required for binding and a hierarchical description of cellular ___location of the interaction. Version 2.0 was released in 2001 and included the capability to link to information available in other databases
===Special features===
BIND was the first database of its kind to contain info on biomolecular interactions, reactions and pathways in one schema. It is also the first to base its [[ontology]] on chemistry which allows 3D representation of molecular interactions. The underlying chemistry allows molecular interactions to be described down to the atomic level of resolution
PreBIND an associated system for data mining to locate biomolecular interaction information in the scientific literature. The name or [[Accession number (bioinformatics)|accession number]] of a protein can be entered and PreBIND will scan the literature and return a list of potentially interacting proteins. BIND [[BLAST (biotechnology)|BLAST]] is also available to find interactions with proteins that are similar to the one specified in the query
BIND offers several “features” that many other proteomics databases do not include. The authors of this program have created an extension to traditional [[IUPAC]] nomenclature to help describe [[post-translational modifications]] that occur to amino acids. These modifications include: [[acetylation]], [[formylation]], [[methylation]], [[palmitoylation]], etc. the extension of the traditional IUPAC codes allows these amino acids to be represented in sequence form as well
===Accessing the database===
[[Image:Copy of BIND Screen.JPG|thumb|Figure 1: Screen shot of sequence results obtained using BOND
The database user interface is web-based and can be queried using text or accession numbers/identifiers. Since its integration with the other components of BOND, sequences have been added to interactions, molecular complexes and pathways in the results. Records include information on: BIND ID, description of the interaction/complex/pathway, publications, update records, organism, OntoGlyphs, ProteoGlyphs, and links to other databases where additional information can be found. BIND records include various viewing formats (e.g. [[HTML]], [[ASN.1]], [[XML]], [[FASTA]]), various formats for exporting results (e.g.
==User
The number of
▲The number of [[Unleashed Registrants]] has increased 10 fold since the integration of BIND. As of December 2006 registration fell just short of 10,000. Subscribers to the commercial versions of BOND fall into six general categories; [[agriculture and food]], [[biotechnology]], [[pharmaceuticals]], [[informatics]], [[materials]] and other. The biotechnology sector is the largest of these groups, holding 28% of subscriptions. Pharmaceuticals and informatics follow with 22% and 18% respectively. The [[United State]] holds the bulk of these subscriptions, 69%. Other countries with access to the commercial versions of BOND include [[Canada]], the [[United Kingdom]], [[Japan]], [[China]], [[Korea]], [[Germany]], [[France]], [[India]] and [[Australia]]. All of these countries fall below 6% in user share[2].
==References==
{{reflist}}
▲[[Category: Bioinformatics databases]]
|