Protein–protein interaction prediction: Difference between revisions

Content deleted Content added
Bookofjude (talk | contribs)
No edit summary
Line 56:
Displaying results can be problematic [117] because of the volumes of data generated (note resemblance to a hairball); therefore, it should be organised in a hierarchical manner, or interaction "tree". The two best approaches to date are first, to simply display only one or two interaction links deep of a hierarchy at a time [108]; the second is to assign the highly interactive (hub) proteins to be the roots of the interaction trees. [108]. This creates better groupings of functionally and spatially related proteins, making for a more easily interpreted interactome.
The main goal of proteomics is to predict the structures, interactions and functions of the proteins [29]. Specific function is only found through interactions. Because structures are primarily used to help find interactions, the prediction of protein-protein interactions is of vital interest in proteomics.
 
===Future possibilities===
All the methods reviewed are plagued with false positives and false negatives because the algorithms used are not accurate enough, possibly because their training sets or concepts are being tainted by not considering multiple interactions. For example water molecules, which are present at active sites, are normally excluded from the models, yet they have the potential to change the properties of the active sites of the interactions. Normally, only protein pairs are considered because combinations of multiple proteins can be more quickly (and arguably more accurately) predicted from a set of binary interactions [49]. This assumption could also collapse in the cases of soft body interactions, such as in the case of protein activators or repressors, where protein-protein binding requires the presence or absence of a third body for shape change.
Currently, PPIP algorithms can be implemented either in a stringent or in a flexible manner. If stringent, as with many docking methods, then you will get fewer false positives but more false negatives (because of ignoring the influences of a possible third body). If flexible, then you will keep more true positives that biologically rely on the third body for bonding, but you also introduce more false positives.
The methods also suffer because they are incapable of continuously updating themselves: they only "learn" before they generate predictions, when a training set (a list of interactions) is pushed through. In addition many methods only use one specific approach, although it has been demonstrated [94] that combining two or more approaches increases true positives and reduces false positives.
A method that addresses the weaknesses described above could use machine learning algorithm for initial selection; and a slower algorithm, consisting of both folding and docking methods, for random and boundary case verifications. By staying up to date on the literature, verifications could also be input from known evolutionary interactome relationships. Although protein localization could be accounted for in machine learning PPIP training it might be possible to increase PPIP by considering localization separately current localisation techniques are 83.6% accurate [44]. To maximize the accuracy of protein function prediction, more computation is needed on less data, and less computation is needed on more data.
These refined results would continuously [114] teach the machine learning algorithm to modify the pattern, resulting in a more accurate PPIP. As people learn from their mistakes, so we can program an algorithm to learn from its mistakes or even update itself.
It is difficult to include third body interaction in quick methods (SVM or Graphs based) methods, therefore slower verification methods (protein folding and docking) should try to compensate by considering third body interactions. The "quick" methods should be inclusive so that the "slow" methods have a reduced data set to work with. To aid interaction prediction it would be wise for interactomes to include identification of the active site used for each interaction. This can help with the prediction of third body interactions.
The above described combination of methods addresses the four problems inherent in using single methods for PPIP. This approach might push the accuracy over the as yet unsurpassed 90% mark. It is theoretically possible to design a vector learning method that sports 100% accuracy but even at current accuracy rates the computational methods provide significant insight for speedup of the biological methods of PPIP. Because of conserved proteins and domains it will become progressively easier to make protein interaction maps of each genome. The advantage of this approach is that interaction maps can be produced quickly and then improved more slowly with a smaller dataset, in contrast to most implementations which are a one shot affair. However, it would be desirable to analyze whole libraries of genomes on an ongoing basis as they become available, despite the apparent difficulty of performing in the order of 1012 interaction tests. The algorithms would need to be run periodically but if it is to be used as a PPIP server, this is to be expected anyway. There are implementations that use data from known interactions as well as multiple prediction methods such as [http://mysql5.mbi.ucla.edu/cgi-bin/functionator/pronav Prolinks ].
 
 
 
 
 
Line 89 ⟶ 77:
*[http://string.embl.de/ String]
 
==BibliographyReferences==
*[1] Albert, István. & Albert, Réka. (2004). Conserved Network Motifs Allow Protein-Protein Interaction. Bioinformatics., 20 (18), 3346-3352.
*[2] Aloy, Patrick & Russell, Robert B. (2003). InterPreTs Protein Interaction Prediction through Tertiary Structure. Bioinformatics., 19(1), 161-162.