Revision as of 11:10, 15 March 2007 edit Rich Farmbrough (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, File movers, IP block exemptions, Pending changes reviewers, Rollbackers, Template editors 1,734,111 edits m →Biomolecular Interaction Network Database (BIND): Copyedit. Correct caps in section header. ← Previous edit		Revision as of 11:14, 15 March 2007 edit undo Rich Farmbrough (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, File movers, IP block exemptions, Pending changes reviewers, Rollbackers, Template editors 1,734,111 edits m Wikify dates (where month and day both present). Copyedit. Next edit →
Line 1: ==Background== {{Infobox Software\|name=BOND\|developer=Christopher Hogue et al., Samuel Lunenfeld Research Institute, Mount Sinai. Commercial rights: Unleashed Informatics\|latest_release_version=BIND 4.0, SMIDsuite\|genre=Bioinformatics tool\|license=Open Access\|website=[http://bond.unleashedinformatics.com/index.jsp?pg=0]}} The [[Blueprint Initiative]] started as a research program in the lab of [[Dr. Christopher Hogue]] at the [[Samuel Lunenfeld Research Institute]] at [[Mount Sinai Hospital in Toronto]]. On [[December 14,]] [[2005]] [[Unleashed Informatics Limited]] acquired the commercial rights to The Blueprint Initiative [[intellectual property]]. This included rights to the protein interaction database BIND, the small molecule interaction database SMID, as well as the data warehouse SeqHound. Unleashed Informatics is a data management service provider and is overseeing the management and curation of The Blueprint Initiative under the guidance of Dr. Hogue~~[1]~~.<ref>http://www.blueprint.org Blueprint.org</ref> ==Biomolecular Object Network Databank (BOND)== The Biomolecular Object Network Databank, which integrates the original Blueprint Initiative databases as well as other databases, such as [[Genbank]], combined with many tools required to analyze these data. Annotation links for sequences, including [[taxon identifiers]], [[redundant sequences]], [[Genome Ontology descriptions]], [[Online Mendelian Inheritance in Man]] identifiers, [[conserved domains]], data base cross-references, [[LocusLink Identifiers]] and complete genomes are also available. Bond facilitates cross-database queries and is an [[open access]] resource which integrates interaction and sequence data[2]. ==Small Molecule Interaction Database (SMID)== The Small Molecule Interaction Database is a database containing protein ___domain-small molecule interactions. It uses a [[___domain-based approach]] to identify [[___domain families]], found in the [[Conserved Domain Database (CCD)]], which interact with a [[query]] small molecule. The CCD from [[NCBI]] amalgamates data from several different sources; [[Protein FAMilies (PFAM)]], [[Simple Modular Architecture Research Tool (SMART)]], [[Cluster of Orthologous Genes (COGs)]], and NCBI’s own curated sequences. The data in SMID is derived from the Protein Data Bank (PDB), a database of known protein crystal structures. SMID can be queried by entering a [[protein GI]], [[___domain identifier]], [[PDB ID]] or [[SMID ID]]. The results of a search provide small molecule, protein, and ___domain information for each interaction identified in the database. Interactions with non-biological contacts are normally screened out by default. SMID-BLAST is a tool developed to annotate known small-molecule binding sites as well as to predict binding sites in proteins whose [[crystal structures]] have not yet been determined. The prediction is based on extrapolation of known interactions, found in the PDB, to interactions between an uncrystallized protein with a small molecule of interest. SMID-BLAST was validated against a test set of known small molecule interactions from the PDB. It was shown to be an accurate predictor of protein-small molecule interactions; 60% of predicted interactions identically matched the PDB annotated binding site, and of these 73% had greater than 80% of the binding residues of the protein correctly identified. Hogue, C et al. estimated that 45% of predictions that were not observed in the PDB data do in fact represent true positives[3]. ==Biomolecular Interaction Network Database (BIND)== ===Introduction=== The idea of a database to document all known molecular interactions was originally put forth by Tony Pawson in the 1990’s and was later developed by scientists at the [[University of Toronto]] in collaboration with the [[University of British Columbia]]. The development of the Biomolecular Interaction Network Database (BIND) has been supported by grants from the Canadian Institutes of Health Research ([[CIHR]]), Genome Canada, the Canadian Foundation for Innovation and the Ontario Research and Development Fund. BIND was originally designed to be a constantly growing depository for information regarding biomolecular interactions, molecular complexes and pathways. As [[proteomics]] is a rapidly advancing field, there is a need to have information from scientific journals readily available to researchers. BIND facilitates the understanding of molecular interactions and pathways involved in cellular processes and will eventually give scientists a better understanding of developmental processes and disease pathogenesis The major goals of the BIND project are: to create a public proteomics resource that is available to all; to create a platform to enable [[datamining]] from other sources (PreBIND); to create a platform capable of presenting visualizations of complex molecular interactions. From the beginning, BIND has been [[open access]] and software can be freely distributed and modified. Currently, BIND includes a data specification, a database and associated data mining and visualization tools. Eventually, it is hoped that BIND will be a collection of all the interactions occurring in each of the major model organisms. ===Database structure=== Line 37 ⟶ 27: BIND is based on a data specification written using Abstract Syntax Notation 1 ([[ASN.1]]) language. ASN.1 is used also by [[NCBI]] when storing data for their [[Entrez]] system and because of this BIND uses the same standards as NCBI for data representation. The ASN.1 language is preferred because it can be easily translated into other data specification languages (e.g. [[XML]]), can easily handle complex data and can be applied to all biological interactions – not just proteins [4]. Bader and Hogue (2000) have prepared a detailed manuscript on the ASN.1 data specification used by BIND [5]. ===Data submission and curation=== User submission to the database encouraged. To contribute to the database, one muust submit: contact info, [[PubMed]] identifier and the two molecules that interact. The person who submits a record is the owner of it. All records are validated before being made public and BIND is curated for quality assurance. BIND curation has two tracks: high-throughput (HTP) and low-throughput (LTP). HTP records are from papers which have reported more than 40 interaction results from one experimental methodology. HTP curators typically have a [[bioinformatics]] backgrounds. The HTP curators are responsible for the collection of storage of experimental data and they also create scripts to update BIND based on new publications. LTP records are curated by individuals with either an MSc or PhD and laboratory experience in interaction research. LTP curators are given further training through the [[Canadian Bioinformatics Workshops]]. Information on small molecule chemistry is curated separately by chemists to ensure the curator is knowledgeable about the subject. The priority for BIND curation is to focus on LTP to collect information as it is published. Although, HTP studies provide more information at once, there are more LTP studies being reported and similar numbers of interactions are being reported by both tracks. In 2004, BIND collected data from 110 journals [6]. ===Database growth=== Line 60 ⟶ 48: The database user interface is web-based and can be queried using text or accession numbers/identifiers. Since its integration with the other components of BOND, sequences have been added to interactions, molecular complexes and pathways in the results. Records include information on: BIND ID, description of the interaction/complex/pathway, publications, update records, organism, OntoGlyphs, ProteoGlyphs, and links to other databases where additional information can be found. BIND records include various viewing formats (e.g. [[HTML]], [[ASN.1]], [[XML]], [[FASTA]]), various formats for exporting results (e.g. [[ASN.1]], [[XML]], [[GI list]], [[PDF]]), and visualizations (e.g. [[Cytoscape]]). The exact viewing and exporting options vary depending on what type of data has been retrieved. ==User ~~Statistics~~statistics== The number of [[Unleashed Registrants]] has increased 10 fold since the integration of BIND. As of December 2006 registration fell just short of 10,000. Subscribers to the commercial versions of BOND fall into six general categories; [[agriculture and food]], [[biotechnology]], [[pharmaceuticals]], [[informatics]], [[materials]] and other. The biotechnology sector is the largest of these groups, holding 28% of subscriptions. Pharmaceuticals and informatics follow with 22% and 18% respectively. The [[United State]] holds the bulk of these subscriptions, 69%. Other countries with access to the commercial versions of BOND include [[Canada]], the [[United Kingdom]], [[Japan]], [[China]], [[Korea]], [[Germany]], [[France]], [[India]] and [[Australia]]. All of these countries fall below 6% in user share[2]. ==References== <references /> 2. [http://bond.unleashedinformatics.com BOND at unleashed informatics.com]▼ ~~1. http://www.blueprint.org~~ ▲2. http://bond.unleashedinformatics.com 3. Snyder, K, ''et al''. Domain-based small molecule binding site annotation. ''BMC Bioinformatics'' 7: 152 (2006).

Biomolecular Object Network Databank: Difference between revisions