Protein–protein interaction prediction: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: doi updated in citation with #oabot.
 
(219 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Prediction by observation and computation}}'''Protein–protein interaction prediction''' is a field combining [[bioinformatics]] and [[structural biology]] in an attempt to identify and catalog physical interactions between pairs or groups of proteins. Understanding [[protein–protein interaction]]s is important for the investigation of intracellular signaling pathways, modelling of protein complex structures and for gaining insights into various biochemical processes.
'''The prediction of an interaction or binding between proteins.'''
 
''Experimentally'', physical interactions between pairs of proteins can be inferred from a variety of techniques, including yeast [[two-hybrid screening|two-hybrid]] systems, [[Protein-fragment complementation assay|protein-fragment complementation assays]] (PCA), affinity purification/[[mass spectrometry]], [[protein microarray]]s, fluorescence resonance energy transfer (FRET), and [[Microscale Thermophoresis]] (MST). Efforts to experimentally determine the [[interactome]] of numerous species are ongoing. Experimentally determined interactions usually provide the basis for ''computational methods'' to predict interactions, e.g. using [[Homology (biology)|homologous]] protein sequences across species. However, there are also methods that predict interactions ''de novo'', without prior knowledge of existing interactions.
(computational multi genome assays are of primary interest)
 
== Methods ==
The acronym ('''PPIP''') stands for '''P'''rotein '''P'''rotein '''I'''nteraction '''P'''rediction
Proteins that interact are more likely to co-evolve,{{r|Dandekar}}{{r|Enright}}{{r|Marcotte}}{{r|Pazos}} therefore, it is possible to make inferences about interactions between pairs of proteins based on their phylogenetic distances. It has also been observed in some cases that pairs of interacting proteins have fused orthologues in other organisms. In addition, a number of bound protein complexes have been structurally solved and can be used to identify the residues that mediate the interaction so that similar motifs can be located in other organisms.
 
=== Phylogenetic profiling ===
==Protein Interaction Prediction Literature Review, 2005==
[[File:Phylogenetic Profiling Method.png|thumb|''Figure A.''The phylogenetic profiles of four genes (A, B, C and D) are shown on the right. A '1' denotes presence of the gene in the genome and a '0' denotes absence. The two identical profiles of genes A and B are highlighted in yellow<ref name=":0" />.]]
(computational multi genome assays are of primary interest)
''[[Phylogenetic profiling|The phylogenetic profile]] method'' is based on the hypothesis that if two or more proteins are concurrently present or absent across several genomes, then they are likely functionally related.<ref name=":0">{{Cite journal|last=Raman|first=Karthik|date=2010-02-15|title=Construction and analysis of protein–protein interaction networks|journal=Automated Experimentation|volume=2|issue=1|pages=2|doi=10.1186/1759-4499-2-2|issn=1759-4499|pmc=2834675|pmid=20334628 |doi-access=free }}</ref> ''Figure A'' illustrates a hypothetical situation in which proteins A and B are identified as functionally linked due to their identical phylogenetic profiles across 5 different genomes. The Joint Genome Institute provides an Integrated Microbial Genomes and Microbiomes database ([https://img.jgi.doe.gov JGI IMG]) that has a phylogenetic profiling tool for single genes and gene cassettes.
 
=== Prediction of co-evolved protein pairs based on similar phylogenetic trees===
===Why PPIP===
It was observed that the phylogenetic trees of ligands and receptors were often more similar than due to random chance.{{r|Pazos}} This is likely because they faced similar selection pressures and co-evolved. This method{{r|Tan}} uses the phylogenetic trees of protein pairs to determine if interactions exist. To do this, homologs of the proteins of interest are found (using a sequence search tool such as [[BLAST (biotechnology)|BLAST]]) and multiple-sequence alignments are done (with alignment tools such as [[Clustal]]) to build distance matrices for each of the proteins of interest.{{r|Pazos}} The distance matrices should then be used to build phylogenetic trees. However, comparisons between phylogenetic trees are difficult, and current methods circumvent this by simply comparing distance matrices{{r|Pazos}}. The distance matrices of the proteins are used to calculate a correlation coefficient, in which a larger value corresponds to co-evolution. The benefit of comparing distance matrices instead of phylogenetic trees is that the results do not depend on the method of tree building that was used. The downside is that difference matrices are not perfect representations of phylogenetic trees, and inaccuracies may result from using such a shortcut.{{r|Pazos}} Another factor worthy of note is that there are background similarities between the phylogenetic trees of any protein, even ones that do not interact. If left unaccounted for, this could lead to a high false-positive rate. For this reason, certain methods construct a background tree using 16S rRNA sequences which they use as the canonical tree of life. The distance matrix constructed from this tree of life is then subtracted from the distance matrices of the proteins of interest.{{r|PazosRanea}} However, because RNA distance matrices and DNA distance matrices have different scale, presumably because RNA and DNA have different mutation rates, the RNA matrix needs to be rescaled before it can be subtracted from the DNA matrices.{{r|PazosRanea}} By using molecular clock proteins, the scaling coefficient for protein distance/RNA distance can be calculated.{{r|PazosRanea}} This coefficient is used to rescale the RNA matrix.
In order to understand how organisms function, we need to shed more light on our deepest inner workings (i.e. chemical interactions). The most complex of these are protein interactions. For any given genome we want a comprehensive list, numbering in the millions, of protein interactions and their functions. For an organism the complete protein interaction network is called an interactome. Unfortunately, directly sensing what happens at this level is not currently feasible, although intuitively, it is the first thing to consider.
[[File:Alignment of the Rosette Stone protein Human Succinyl-CoA-Transferase with Acetate-CoA-Transferase subunits alpha and beta.png|thumb|''Figure B.'' The Human succinyl-CoA-Transferase enzyme is represented by the two joint blue and green bars at the top of the image. The alpha subunit of the Acetate-CoA-Transferase enzyme is homologous with the first half of the enzyme, represents by the blue bar. The beta subunit of the Acetate-CoA-Transferase enzyme is homologous with the second half of the enzyme, represents by the green bar. This mage was adapted from Uetz, P. & Pohl, E. (2018) ''Protein–Protein and Protein–DNA Interactions''. In: Wink, M. (ed.), Introduction to Molecular Biotechnology, 3rd ed. Wiley-VCH, ''in press''.]]
 
=== Rosetta stone (gene fusion) method ===
===Overview of methodologies===
''The Rosetta Stone or Domain Fusion method'' is based on the hypothesis that interacting proteins are sometimes fused into a single protein.<ref name="Marcotte" /> For instance, two or more separate proteins in a genome may be identified as fused into one single protein in another genome. The separate proteins are likely to interact and thus are likely functionally related. An example of this is the ''Human Succinyl coA Transferase'' enzyme, which is found as one protein in humans but as two separate proteins, ''[[Acetate CoA-transferase|Acetate coA Transferase]] alpha'' and ''Acetate coA Transferase beta'', in ''Escherichia coli''.<ref name="Marcotte" /> In order to identify these sequences, a sequence similarity algorithm such as the one used by ''[[BLAST (biotechnology)|BLAST]]'' is necessary. For example, if we had the amino acid sequences of proteins A and B and the amino acid sequences of all proteins in a certain genome, we could check each protein in that genome for non-overlapping regions of sequence similarity to both proteins A and B. ''Figure B'' depicts the BLAST sequence alignment of Succinyl coA Transferase with its two separate homologs in E. coli. The two subunits have non-overlapping regions of sequence similarity with the human protein, indicated by the pink regions, with the alpha subunit similar to the first half of the protein and the beta similar to the second half. One limit of this method is that not all proteins that interact can be found fused in another genome, and therefore cannot be identified by this method. On the other hand, the fusion of two proteins does not necessitate that they physically interact. For instance, the [[SH2 ___domain|SH2]] and [[SH3 ___domain|SH3]] domains in the [[Src family kinase|src protein]] are known to interact. However, many proteins possess homologs of these domains and they do not all interact.<ref name="Marcotte" />
The technologies for direct sensing reside in the realm of physics and all rely on the four fundamental forces (Strong, Electromagnetic, Weak, and Gravity). Currently, the best technologies do not have sufficient resolution and are either too disruptive, too slow to track or resolve protein Brownian movement, or have too narrow a field of view for genome wide PPIP. The highly advanced Stanford University optical trap microscope is only just capable of protein observation [141]. But even if direct observation and identification were possible, the sheer complexity of biological systems would present a challenge to our understanding.
[[File:Trp operon organization across three different bacterial species.png|thumb|''FigureC.'' Organization of the trp operon in three different species of bacteria: ''Escherichia coli'', ''Haemophilus influenzae'', ''Helicobacter pylori''. Only the trpA and trpB genes are adjacent across all three organisms and are thus predicted to interact by the conserved gene neighborhood method. This image was adapted from Dandekar, T., Snel, B., Huynen, M., & Bork, P. (1998). Conservation of gene order: a fingerprint of proteins that physically interact. ''Trends in biochemical sciences'', ''23''(9), 324-328.<ref name="Dandekar" />]]
Because direct sensing of protein interaction is currently ruled out, various genome-wide methods of PPIP are being developed. Some strictly biological methods that have been developed are: Yeast Two-Hybrid (Y2H) [60, 92], Correlated mRNA Expression Profiles, Genetic Interaction Data, and Mass Spectrometry Protein Complex Purification.
Biological experiments are prohibitively expensive and have large errors induced by their inherent limitations. For example, it is estimated that it would take a costly 10,000 pull down assays to discover 90% of the human interactome. This can be overcome by using computational methods, which reduce the potential interactions to be tested by them by many orders of magnitude. If they ever become sufficiently accurate, computational methods will be able to be used on their own without biological verification to predict protein interaction. To date, computational prediction accuracy is in the 80% range at the best of times. Computer algorithms rely on accurate genome sequencing a well as knowledge of physics, chemistry, and biology.
Genome sequencing is becoming faster and more accurate with micro fabricated high-density Pico-litre reactors [90]. Hypothetically, it is possible to predict any biological form or function from the DNA sequence, however to date experimentation has given only limited success.
Simulation predictions also rely on good programming. To accurately and quickly perform PPIP, speed optimization is imperative, as it is all too easy to make an algorithm that uses more time than the universe offers. It is of no relevance how accurate a prediction is if the result is never produced. Similarly it is of no relevance how speedy and algorithm is if it produces inaccurate results.
The accuracy of a PPIP algorithm is measured by:
 
=== Conserved gene neighborhood ===
Accuracy = (TP+TN)/(TP+FP+TN+FN),
The conserved neighborhood method is based on the hypothesis that if genes encoding two proteins are neighbors on a chromosome in many genomes, then they are likely functionally related. The method is based on an observation by Bork et al. of gene pair conservation across nine bacterial and archaeal genomes. The method is most effective in prokaryotes with operons as the organization of genes in an operon is generally related to function.<ref name=":1">{{Cite journal|date=1998-09-01|title=Conservation of gene order: a fingerprint of proteins that physically interact|journal=Trends in Biochemical Sciences|language=en|volume=23|issue=9|pages=324–328|doi=10.1016/S0968-0004(98)01274-2|issn=0968-0004|last1=Dandekar|first1=T.|pmid=9787636}}</ref> For instance, the ''trpA'' and ''trpB'' genes in ''[[Escherichia coli]]'' encode the two subunits of the ''[[tryptophan synthase]]'' enzyme known to interact to catalyze a single reaction. The adjacency of these two genes was shown to be conserved across nine different bacterial and archaeal genomes.<ref name=":1" />
Precision or Specificity = TP/(TP+FP),
Sensitivity = TP/(TP+FN);
Where TP = True Positive (prediction),
FP = False Positive,
TN = True Negative,
and FN = False Negative.
 
=== Classification methods ===
This standard scoring method will prove useful when comparing PPIP algorithms.
Classification methods use data to train a program (classifier) to distinguish positive examples of interacting protein/___domain pairs with negative examples of non-interacting pairs. Popular classifiers used are Random Forest Decision (RFD) and Support Vector Machines. RFD produces results based on the ___domain composition of interacting and non-interacting protein pairs. When given a protein pair to classify, RFD first creates a representation of the protein pair in a vector.{{r|Chen}} The vector contains all the ___domain types used to train RFD, and for each ___domain type the vector also contains a value of 0, 1, or 2. If the protein pair does not contain a certain ___domain, then the value for that ___domain is 0. If one of the proteins of the pair contains the ___domain, then the value is 1. If both proteins contain the ___domain, then the value is 2.{{r|Chen}} Using training data, RFD constructs a decision forest, consisting of many decision trees. Each decision tree evaluates several domains, and based on the presence or absence of interactions in these domains, makes a decision as to if the protein pair interacts. The vector representation of the protein pair is evaluated by each tree to determine if they are an interacting pair or a non-interacting pair. The forest tallies up all the input from the trees to come up with a final decision.{{r|Chen}} The strength of this method is that it does not assume that domains interact independent of each other. This makes it so that multiple domains in proteins can be used in the prediction.{{r|Chen}} This is a big step up from previous methods which could only predict based on a single ___domain pair. The limitation of this method is that it relies on the training dataset to produce results. Thus, usage of different training datasets could influence the results. A caveat of most methods is the lacks negative data, e.g non-interactions for proteins which can be overcome using topology-driven negative sampling.<ref>{{Citation |last=Chatterjee |first=Ayan |title=Topology-Driven Negative Sampling Enhances Generalizability in Protein-Protein Interaction Prediction |date=2024-04-29 |url=https://www.biorxiv.org/content/10.1101/2024.04.27.591478v1 |access-date=2024-05-04 |language=en |doi=10.1101/2024.04.27.591478 |last2=Ravandi |first2=Babak |last3=Philip |first3=Naomi H. |last4=Abdelmessih |first4=Mario |last5=Mowrey |first5=William R. |last6=Ricchiuto |first6=Piero |last7=Liang |first7=Yupu |last8=Ding |first8=Wei |last9=Mobarec |first9=Juan C.|doi-access=free }}</ref>
Computer simulation for prediction of protein function, which is our primary goal, is accomplished using a series of steps. First of all, the genome is sequenced; then Open Reading Frames (ORFs) are found; then pre-processing of mRNA is simulated; followed by PPIP which produces protein-protein interaction maps; from which, using known functional information, unknown protein function can be determined. When the interaction maps become sufficiently accurate more proteins will have more of their functions determined.
As algorithms, equations, and template solutions have many applications, it is natural to inquire across disciplines for solutions to similar type problems. The successful algorithms used in proteomics use methods, which have been in use previously in other fields. In the face of Occam's razor, computational algorithms rely on complexity to improve speed and accuracy.
 
=== Inference of interactions from homologous structures===
===Method Analysis===
This group of methods{{r|Aloy}}{{r|Chen}}{{r|Fukuhara}}{{r|Kittichotirat}}{{r|Ibis}} makes use of known protein complex structures to predict and structurally model interactions between query protein sequences. The prediction process generally starts by employing a sequence based method (e.g. [[Interolog]]) to search for protein complex structures that are homologous to the query sequences. These known complex structures are then used as templates to structurally model the interaction between query sequences. This method has the advantage of not only inferring protein interactions but also suggests models of how proteins interact structurally, which can provide some insights into the atomic level mechanism of that interaction. On the other hand, the ability for these methods to make a prediction is constrained by a limited number of known protein complex structures.
The '''Dynamics Method''' is the most simple brute force approach suggested by Occam’s razor. It performs PPIP using the same rules as the real system by simulating the dynamics of every force on every atom in two proteins of interest in order to predict first folding, and then interaction. It then does the same for every potential protein pair combination in the genome. Although hypothetically accurate, the Dynamics Method is impossible in practice due to its massive computational requirements, which require an infinite amount of time.
However, the unworkable Dynamics Method can be broken up into two smaller sub-problems, Folding and Docking. The most effective Folding Prediction Method predicts protein folding structures using a reasonable amount of computational time by using statistical substitution, followed by tweaking. Statistical substitution involves folding a small number of amino acids or residues by using the previously observed statistically dominant folding configuration. Tweaking is similar to heating the structure in that it introduces small random changes and selects those that have the lowest energy states. Computing the energy state of a protein has proven to be accurate and computationally intensive but reasonable for folding verification purposes [93]. The disadvantage of this Folding method is that it is still too slow to run on a genome wide scale, and it is not accurate with atypical structures. The advantage is that it will be improved as more folding conformations are verified. Related articles include [3, 8, 22, 23, 32, 34, 53, 62, 65, 84, 85, 93, 119, 124, 131, and 133].
Once protein folding has been successfully modeled, '''Docking Prediction Method'''s are the next logical step. To simplify the dynamics of docking, Binary docking methods find potentially active sites on a single folded protein structure and match them to active sites on a second protein using pattern recognition software or geometric hashing algorithms. Conserved domains are observed [52] and used to imply potential binding partners because surface complementarity between interacting protein sites is high. It is of interest that an average of 18 water molecules per interface is observed; this should be considered when designing a complementarily or docking test [4]. Multiple protein dockings are also being accurately predicted by current methods. CAPRI (Critical Assessment of Protein Interactions) lists successful competing methods [135]. Docking methods are not ideal as a genome wide solution because they rely on folding information that is not available for much of the genome. Also, they are computationally fast enough only for single interaction predicting, and even then reliable accuracy has not been achieved. Related articles include [2, 4, 21, 39, 40, 40, 41, 42, 47, 52, 63, 68, 75, 76, 78, 89, 94, 101, 109, 128, 135, and 144].
'''Sequence Method'''s attempt to avoid the modelling of folding and docking altogether by using direct pattern recognition of the binding sequences. Instead of relying on active sites, these methods use sequence domains and their organization for PPIP. The domains of both proteins are compared to observed ___domain interactions in order to calculate interaction probability. Alternatively direct comparison of domains with a BLAST like technique is moderately successful. Conservation of ___domain is a good (82% accurate) indicator of active site ___location [23]. Although sequencing methods are fast enough to be used as genome wide tools, they suffer from problems identifying accurate patterns because the use of domains to represent the protein is possibly an oversimplification. This results in oversimplified patterns in the implementations. Related articles include [25, 29, 35, 49, 69, 87, 97, 98, 111, 125, 126, 136, and 153].
The '''Graph Learning Method''' improves on the sequence method and its problems by programming a computer to learn what attributes are important for PPIP by identifying patterns in observed interactions. It then uses these attribute patterns for PPIP. The graph learning method creates a decision tree which deals with the complex pattern rules by identifying relevant sequence information and making an additive or RMSD weighted graph from a training set. The result is similar to regular a expression or automaton. Relatively little of the sequence variation of a protein is important for PPIP [119] therefore flexibility must be built into the graph often using generalities such as ___domain or residue charges instead of atomic properties. Defining the base attributes to use is of vital importance. Then based on the learned rules encoded in the graph, a probability is assigned to new potential interactions. The use of graphs can reduce prediction time dramatically by only checking information relevant to the already analyzed data. It can be implemented similarly to a binary search tree, with an Ω Log(n) search time and an mn2 creation time, where n is the number of significant attributes and m is the data set size. The use of random sampling is intuitively appropriate to reduce the computational demands of the graph creation. Random decision trees, Adaptive Boosting or other classification algorithms are used to find relevant features. Unfortunately, the graphs can get very complex and pattern detail might be missed by the simplicity of base attribute choices or uneven random sampling. The use of sequence domains as a base attribute can lead to incorrectly identified ___domain fragments and often overlooks relevant information between interactions. The Graph Learning Method can use amino acids or amino acid attributes (polarity, character, size, shape and charge) as base attributes but sometimes will simplify protein representation by using residues as a base attribute. Related articles include [7, 28, 29].
The '''Vector Learning Method''' is an alternative to the Graph Learning Method and is currently competing for the title of most efficient method. Both machine learning methods are probably of equal potential. A training set is mapped to an n-dimensional space where successful combinations of residues or amino acids are represented in a hyperspace. Each piece of the pattern or residue attribute is mapped to a separate dimension “vectorization”. Unlike normal two dimensional (latitude and longitude) city maps, protein pattern maps are most effective when using more than 20 dimensions. If a potential protein pair lies within the space identified as successful an interaction is predicted. (Similarly if an address is mapped to a residential zone, it is likely to be a residence). Support Vector Machines (SVM's), clustering, and other spatial approaches are often used as successful implementation of n-space mapping. The vector learning method leans towards a parallel approach but is often implemented on a regular CPU: this hints at Ω 1 PPIP n-space. It is not intuitive for most people to think in more than three dimensions, therefore, it becomes difficult to grasp why a particular interaction is probable. Advantageously, the complex pattern rules are “learned” in an inclusive and possibly exhaustive way. SVMs are interesting because when faced with an additional problem or complication they just add another dimension handle it, however the number of dimensions is directly related to computational cost. The vector learning tools available are widely popular and are implemented in the methods of docking, folding, and many other pattern recognition problems. Related articles include [6, 7, 9, 11, 20, 26, 28, 36, 37, 44, 50, 58, 59, 64, 79, 81, 82, 83, 98, 100, 106, 107, 114, 126, 132, 138, 139, 145, 146, 147, 148, and 152].
Because a large amount of work has been done on interactomes, the '''Evolutionary Method''' is becoming a practical speedup method. It uses the data from PPIPs or experimentally verified interaction maps to infer protein interaction for evolutionarily related organisms. This is a speedy method to create new interactomes and its accuracy is relatively high because many organisms are highly related. Unfortunately it relies on good databases and knowledge of orthologs, neither of which are widely available at the present time. Related articles include [23, 25, 26, 37, 45, 49, 50, 52, 94, 111, 118, 119, 136, 139, 147, and 152].
 
===Result PresentationAssociation methods ===
Association methods look for characteristic sequences or motifs that can help distinguish between interacting and non-interacting pairs. A classifier is trained by looking for sequence-signature pairs where one protein contains one sequence-signature, and its interacting partner contains another sequence-signature.{{r|Sprinzak}} They look specifically for sequence-signatures that are found together more often than by chance. This uses a log-odds score which is computed as log2(Pij/PiPj), where Pij is the observed frequency of domains i and j occurring in one protein pair; Pi and Pj are the background frequencies of domains i and j in the data. Predicted ___domain interactions are those with positive log-odds scores and also having several occurrences within the database.{{r|Sprinzak}} The downside with this method is that it looks at each pair of interacting domains separately, and it assumes that they interact independently of each other.
Displaying results can be problematic [117] because of the volumes of data generated (note resemblance to a hairball); therefore, it should be organised in a hierarchical manner, or interaction "tree". The two best approaches to date are first, to simply display only one or two interaction links deep of a hierarchy at a time [108]; the second is to assign the highly interactive (hub) proteins to be the roots of the interaction trees. [108]. This creates better groupings of functionally and spatially related proteins, making for a more easily interpreted interactome.
The main goal of proteomics is to predict the structures, interactions and functions of the proteins [29]. Specific function is only found through interactions. Because structures are primarily used to help find interactions, the prediction of protein-protein interactions is of vital interest in proteomics.
 
=== Identification of structural patterns ===
===Future possibilities===
This method{{r|Aytuna}}{{r|Ogmen}} builds a library of known protein–protein interfaces from the [[Protein Data Bank|PDB]], where the interfaces are defined as pairs of polypeptide fragments that are below a threshold slightly larger than the [[Van der Waals radius]] of the atoms involved. The sequences in the library are then clustered based on structural alignment and redundant sequences are eliminated. The residues that have a high (generally >50%) level of frequency for a given position are considered hotspots.{{r|Keskin}} This library is then used to identify potential interactions between pairs of targets, providing that they have a known structure (i.e. present in the [[Protein Data Bank|PDB]]).
All the methods reviewed are plagued with false positives and false negatives because the algorithms used are not accurate enough, possibly because their training sets or concepts are being tainted by not considering multiple interactions. For example water molecules, which are present at active sites, are normally excluded from the models, yet they have the potential to change the properties of the active sites of the interactions. Normally, only protein pairs are considered because combinations of multiple proteins can be more quickly (and arguably more accurately) predicted from a set of binary interactions [49]. This assumption could also collapse in the cases of soft body interactions, such as in the case of protein activators or repressors, where protein-protein binding requires the presence or absence of a third body for shape change.
Currently, PPIP algorithms can be implemented either in a stringent or in a flexible manner. If stringent, as with many docking methods, then you will get fewer false positives but more false negatives (because of ignoring the influences of a possible third body). If flexible, then you will keep more true positives that biologically rely on the third body for bonding, but you also introduce more false positives.
The methods also suffer because they are incapable of continuously updating themselves: they only "learn" before they generate predictions, when a training set (a list of interactions) is pushed through. In addition many methods only use one specific approach, although it has been demonstrated [94] that combining two or more approaches increases true positives and reduces false positives.
A method that addresses the weaknesses described above could use machine learning algorithm for initial selection; and a slower algorithm, consisting of both folding and docking methods, for random and boundary case verifications. By staying up to date on the literature, verifications could also be input from known evolutionary interactome relationships. Although protein localization could be accounted for in machine learning PPIP training it might be possible to increase PPIP by considering localization separately current localisation techniques are 83.6% accurate [44]. To maximize the accuracy of protein function prediction, more computation is needed on less data, and less computation is needed on more data.
These refined results would continuously [114] teach the machine learning algorithm to modify the pattern, resulting in a more accurate PPIP. As people learn from their mistakes, so we can program an algorithm to learn from its mistakes or even update itself.
It is difficult to include third body interaction in quick methods (SVM or Graphs based) methods, therefore slower verification methods (protein folding and docking) should try to compensate by considering third body interactions. The "quick" methods should be inclusive so that the "slow" methods have a reduced data set to work with. To aid interaction prediction it would be wise for interactomes to include identification of the active site used for each interaction. This can help with the prediction of third body interactions.
The above described combination of methods addresses the four problems inherent in using single methods for PPIP. This approach might push the accuracy over the as yet unsurpassed 90% mark. It is theoretically possible to design a vector learning method that sports 100% accuracy but even at current accuracy rates the computational methods provide significant insight for speedup of the biological methods of PPIP. Because of conserved proteins and domains it will become progressively easier to make protein interaction maps of each genome. The advantage of this approach is that interaction maps can be produced quickly and then improved more slowly with a smaller dataset, in contrast to most implementations which are a one shot affair. However, it would be desirable to analyze whole libraries of genomes on an ongoing basis as they become available, despite the apparent difficulty of performing in the order of 1012 interaction tests. The algorithms would need to be run periodically but if it is to be used as a PPIP server, this is to be expected anyway. There are implementations that use data from known interactions as well as multiple prediction methods such as [http://mysql5.mbi.ucla.edu/cgi-bin/functionator/pronav Prolinks ].
 
=== Bayesian network modelling ===
[[Bayesian method]]s{{r|Jansen}} integrate data from a wide variety of sources, including both experimental results and prior computational predictions, and use these features to assess the likelihood that a particular potential protein interaction is a true positive result. These methods are useful because experimental procedures, particularly the yeast two-hybrid experiments, are extremely noisy and produce many false positives, while the previously mentioned computational methods can only provide circumstantial evidence that a particular pair of proteins might interact.<ref>{{cite journal|last1=Zhang|first1=QC|last2=Petrey|first2=D|last3=Deng|first3=L|last4=Qiang|first4=L|last5=Shi|first5=Y|last6=Thu|first6=CA|last7=Bisikirska|first7=B|last8=Lefebvre|first8=C|last9=Accili|first9=D|last10=Hunter|first10=T|last11=Maniatis|first11=T|last12=Califano|first12=A|last13=Honig|first13=B|year=2012|title=Structure-based prediction of protein–protein interactions on a genome-wide scale|journal=Nature|volume=490|issue=7421|pages=556–60|doi=10.1038/nature11503|pmid=23023127|pmc=3482288 |bibcode=2012Natur.490..556Z}}</ref>
 
=== Domain-pair exclusion analysis ===
The ___domain-pair exclusion analysis{{r|Shoemaker}} detects specific ___domain interactions that are hard to detect using Bayesian methods. Bayesian methods are good at detecting nonspecific promiscuous interactions and not very good at detecting rare specific interactions. The ___domain-pair exclusion analysis method calculates an E-score which measures if two domains interact. It is calculated as log(probability that the two proteins interact given that the domains interact/probability that the two proteins interact given that the domains don’t interact). The probabilities required in the formula are calculated using an Expectation Maximization procedure, which is a method for estimating parameters in statistical models. High E-scores indicate that the two domains are likely to interact, while low scores indicate that other domains form the protein pair are more likely to be responsible for the interaction. The drawback with this method is that it does not take into account false positives and false negatives in the experimental data.
 
=== Supervised learning problem ===
The problem of PPI prediction can be framed as a supervised learning problem. In this paradigm the known protein interactions supervise the estimation of a function that can predict whether an interaction exists or not between two proteins given data about the proteins (e.g., expression levels of each gene in different experimental conditions, ___location information, phylogenetic profile, etc.).
 
== Relationship to docking methods ==
The field of protein–protein interaction prediction is closely related to the field of [[protein-protein docking|protein–protein docking]], which attempts to use geometric and steric considerations to fit two proteins of known structure into a bound complex. This is a useful mode of inquiry in cases where both proteins in the pair have known structures and are known (or at least strongly suspected) to interact, but since so many proteins do not have experimentally determined structures, sequence-based interaction prediction methods are especially useful in conjunction with experimental studies of an organism's [[interactome]].
 
== See also ==
===Further information===
*[[Interactome]]
In the absence of an international or concise index of Protein-Protein Interaction Prediction multiple sources can be monitored for information. To find further information about protein interactions, a cross journal search can be done most effectively at Scholars Portal, or to a more limited degree at [http://www.ncbi.nlm.nih.gov/ PubMed]. Scholars Portal will index articles that are not medically related such as SVM training techniques and algorithm/Math optimizations. For Hyperspace related programming tools visit [http://www.kernel-machines.org/software.html kernel machines]. [http://desktop.google.com/ Google Desktop] is a useful tool for searching the contents of your personal PDF's. Because an up to date, working knowledge of mathematics, programming, chemistry and biology is required for a full understanding of PPIP, it is noteworthy that Wikipedia proved most helpful for terminology definitions and concept clarification. the main journals of interest for PPIP are Nature, Proteins: Structure, Function, and Bioinformatics, and Protein Science.
*[[Protein–protein interaction]]
*[[Protein function prediction]]
*[[Protein structure prediction]]
*[[Protein structure prediction software]]
*[[Gene prediction]]
*[[Macromolecular docking]]
*[[Protein–DNA interaction site predictor]]
*[[Two-hybrid screening]]
*[[FastContact]]
 
== References ==
{{reflist|refs=
<ref name="Dandekar">Dandekar T., Snel B.,Huynen M. and Bork P. (1998) "Conservation of gene order: a fingerprint of proteins that physically interact." ''Trends Biochem. Sci.'' (23),324-328</ref>
<ref name="Enright">Enright A.J.,Iliopoulos I.,Kyripides N.C. and Ouzounis C.A. (1999) "Protein interaction maps for complete genomes based on gene fusion events." ''Nature'' (402), 86-90</ref>
<ref name="Marcotte">Marcotte E.M., Pellegrini M., Ng H.L., Rice D.W., Yeates T.O., Eisenberg D. (1999) "Detecting protein function and protein–protein interactions from genome sequences." ''Science'' (285), 751-753</ref>
<ref name="Pazos">{{cite journal|last1=Pazos|first1=F.|last2=Valencia|first2=A.|year=2001|title=Similarity of phylogenetic trees as indicator of protein–protein interaction|journal=Protein Engineering|volume=9|issue=14|pages=609–614 |doi=10.1093/protein/14.9.609|pmid=11707606|doi-access=free}}</ref>
<!-- not being used in article
<ref name="Pellegrini">{{cite journal|last1=Pellegrini|first1=M|last2=Marcotte|first2=EM|last3=Thompson|first3=MJ|last4=Eisenberg|first4=D|last5=Yeates|first5=TO|year=1999|title=Assigning protein functions by comparative genome analysis: protein phylogenetic profiles|journal=Proc Natl Acad Sci U S A|volume=96|pages=4285–8|doi=10.1073/pnas.96.8.4285|pmid=10200254|pmc=16324|bibcode=1999PNAS...96.4285P}}</ref>-->
<ref name="Tan">Tan S.H., Zhang Z., Ng S.K. (2004) "ADVICE: Automated Detection and Validation of Interaction by Co-Evolution." ''Nucleic Acids Res.'', ''32'' (Web Server issue):W69-72.</ref>
<ref name="Aytuna">{{cite journal|last1=Aytuna|first1=A. S.|last2=Keskin|first2=O.|last3=Gursoy|first3=A.|year=2005|title=Prediction of protein–protein interactions by combining structure and sequence conservation in protein interfaces|journal=Bioinformatics|volume=21|issue=12|pages=2850–2855|doi=10.1093/bioinformatics/bti443|pmid=15855251|doi-access=free}}</ref>
<ref name="Ogmen">{{cite journal|last1=Ogmen|first1=U.|last2=Keskin|first2=O.|last3=Aytuna|first3=A.S.|last4=Nussinov|first4=R.|last5=Gursoy|first5=A.|year=2005|title=PRISM: protein interactions by structural matching|journal=Nucleic Acids Res.|volume=33|issue=Web Server issue|pages=W331–336|doi=10.1093/nar/gki585|pmid=15991339|pmc=1160261|doi-access=free}}</ref>
<ref name="Keskin">{{cite journal|last1=Keskin|first1=O.|last2=Ma|first2=B.|last3=Nussinov|first3=R.|year=2004|title=Hot regions in protein–protein interactions: The organization and contribution of structurally conserved hot spot residues|journal=J. Mol. Biol.|volume=345|issue=5|pages=1281–1294|doi=10.1016/j.jmb.2004.10.077|pmid=15644221}}</ref>
<ref name="Jansen">{{cite journal|last1=Jansen|first1=R|last2=Yu|first2=H|last3=Greenbaum|first3=D|last4=Kluger|first4=Y|last5=Krogan|first5=NJ|last6=Chung|first6=S|last7=Emili|first7=A|last8=Snyder|first8=M|last9=Greenblatt|first9=JF|last10=Gerstein|first10=M|year=2003|title=A Bayesian networks approach for predicting protein–protein interactions from genomic data|journal=Science|volume=302|issue=5644|pages=449–53|doi=10.1126/science.1087361|pmid=14564010|bibcode=2003Sci...302..449J|citeseerx=10.1.1.217.8151|s2cid=5293611}}</ref>
<ref name="Aloy">{{cite journal|last1=Aloy|first1=P.|last2=Russell|first2=R. B.|year=2003|title=InterPreTS: protein Interaction Prediction through Tertiary Structure|journal=Bioinformatics|volume=19|issue=1|pages=161–162|doi=10.1093/bioinformatics/19.1.161|pmid=12499311|doi-access=free}}</ref>
<ref name="Chen">{{cite journal|last1=Chen|first1=XW|last2=Liu|first2=M|year=2005|title=Prediction of protein–protein interactions using random decision forest framework|journal=Bioinformatics|volume=21|issue=24|pages=4394–4400|doi=10.1093/bioinformatics/bti721|pmid=16234318|doi-access=free}}</ref>
<ref name="Fukuhara">Fukuhara, Naoshi, and Takeshi Kawabata. (2008) "HOMCOS: a server to predict interacting protein pairs and interacting sites by homology modeling of complex structures" ''Nucleic Acids Research'', ''36'' (S2): 185-.</ref>
<ref name="Kittichotirat">Kittichotirat W, M Guerquin, RE Bumgarner, and R Samudrala (2009) "Protinfo PPC: a web server for atomic level prediction of protein complexes" ''Nucleic Acids Research'', ''37'' (Web Server issue): 519-25.</ref>
<ref name="PazosRanea">{{cite journal|last1=Pazos|first1=F|last2=Ranea|first2=JA|last3=Juan|first3=D|last4=Sternberg|first4=MJ|year=2005|title=Assessing protein coevolution in the context of the tree of life assists in the prediction of the interactome|journal=J Mol Biol|volume=352|issue=4|pages=1002–1015|doi=10.1016/j.jmb.2005.07.005|pmid=16139301}}</ref>
<ref name="Sprinzak">{{cite journal|last1=Sprinzak|first1=E|last2=Margalit|first2=H|year=2001|title=Correlated sequence-signatures as markers of protein–protein interaction|journal=J Mol Biol|volume=311|issue=4|pages=681–692|doi=10.1006/jmbi.2001.4920|pmid=11518523}}</ref>
<ref name="Shoemaker">{{cite journal|last1=Shoemaker|first1=BA|last2=Panchenko|first2=AR|year=2007|title=Deciphering protein–protein interactions. Part II. Computational methods to predict protein and ___domain interaction partners|journal=PLOS Comput Biol|volume=3|issue=4|page=e43|doi=10.1371/journal.pcbi.0030043|pmid=17465672|pmc=1857810|bibcode=2007PLSCB...3...43S |doi-access=free }}</ref>
<ref name="Ibis">{{cite journal|last1=Shoemaker|first1=BA|last2=Zhang|first2=D|last3=Thangudu|first3=RR|last4=Tyagi|first4=M|last5=Fong|first5=JH|last6=Marchler-Bauer|first6=A|last7=Bryant|first7=SH|last8=Madej|first8=T|last9=Panchenko|first9=AR|date=Jan 2010|title=Inferred Biomolecular Interaction Server--a web server to analyze and predict protein interacting partners and binding sites|journal=Nucleic Acids Res|volume=38|issue=Database issue|pages=D518–24|pmid=19843613|doi=10.1093/nar/gkp842|pmc=2808861}}</ref>
<!-- not being used
<ref name="Marsh">{{cite journal|last1=Marsh|first1=J|last2=Hernandez|first2=H|last3=Hall|first3=Z|last4=Ahnert|first4=S|last5=Perica|first5=T|last6=Robinson|first6=C|last7=Teichmann|first7=S|year=2013|title=Protein complexes are under evolutionary selection to assemble via ordered pathways|journal=Cell|volume=153|issue=2|pages=461–70|doi=10.1016/j.cell.2013.02.044|pmid=23582331|pmc=4009401}}</ref>-->
}}
 
==External links==
===Online PPIP services===
*[http://openwetware.org/wiki/Protein–protein_interaction_databases Overview of protein interaction databases]
*[http://advice.i2r.a-star.edu.sg/search/pair.php advice]
*[http://www-appn.comp.nus.edu.sg/~bioinfo/bayesprot/bayesprot.htm Bayesian Protein Prediction]
*[http://interdom.lit.org.sg/validate/index_inter.php InterDom]
*[http://www.russell.embl-heidelberg.de/people/patrick/interprets/interprets.html InterPreTS]
*[http://interweaver.i2r.a-star.edu.sg/report/demo.php InterWeaver]
*[http://cbi.labri.fr/outils/ippred/ Ippred]
*[http://ophid.utoronto.ca/ophid/ppi.html OPHID]
*[http://gordion.hpc.eng.ku.edu.tr/prism/predictions.php#online PRISM]
*[http://mysql5.mbi.ucla.edu/cgi-bin/functionator/pronav Prolinks]
*[http://www.protsuggest.org/main.html Protsuggest]
*[http://point.bioinformatics.tw/ POINT]
*[http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi SVMProt]
*[http://string.embl.de/ String]
 
{{Prone to spam|date=December 2015}}
==Bibliography==
<!-- {{No more links}}
*[1] Albert, István. & Albert, Réka. (2004). Conserved Network Motifs Allow Protein-Protein Interaction. Bioinformatics., 20 (18), 3346-3352.
 
*[2] Aloy, Patrick & Russell, Robert B. (2003). InterPreTs Protein Interaction Prediction through Tertiary Structure. Bioinformatics., 19(1), 161-162.
Please be cautious adding more external links.
*[3] Anne Imberty, Veronique Piller, Friedrich Piller and Christelle Breton (1997). Fold recognition and molecular modeling of a lectin-like ___domain in UDP–GalNAc:polypeptide N-acetylgalactosaminyltransferases. Protein Engineering., 10 (12),1353–1356
 
*[4] Ansari, Sam & Helms, Volkhard. (2005). Statistical Analysis of Predominantly Transient Protein-Protein Interfaces. Proteins: Structure, Function and Bioinformatics., 64, 344-355.
Wikipedia is not a collection of links and should not be used for advertising.
*[5] Boer, D. Roeland., Kroon, Jan., Cole, Jason C., Smith, Barry & Verdonk, Marcel L. (2001). Superstar comparison of CSD and PDB-based interaction fields as a basis for the prediction of protein-ligand interactions. J. Mol. Biol., 312, 275-287.
 
*[6] Bordner, Andrew J. & Abagyan, Ruben. (2005). Statistical analysis and Prediction of Protein-Protein Interfaces. Proteins: Structure, Function and Bioinformatics., 60, 353-366.
Excessive or inappropriate links will be removed.
*[7] Borgwardt, Karsten M., Ong, Cheng Soon., Schönauer, Stefan., Vishwanathan, S. V. N., Smola, Alex J. & Kriegel, Hans-Peter. (2005). Protein function prediction via graph kernels. Bioinformatics., 21(1), i47-i56.
 
*[8] Bowie, James U. (2005). Solving the Membrane Protein- Folding Problem. Nature. 438, 581-589.
See [[Wikipedia:External links]] and [[Wikipedia:Spam]] for details.
*[9] Bradford, James R. & Westhead, David R. (2005). Improved Prediction of Protein-Protein Binding Sites Using a Support Vector machines Approach. Bioinformatics., 21(8), 1487-1494.
 
*[10] Brun, Christine., Chevenet, François., Martin, David., Wojcik, Jérôme., Guénoche, Alain & Jacq, Bernard. (2003).Functional Classification of Proteins for the Prediction of Cellular Function from a Protein-Protein Interaction Network. Genome Biology., 5(1)
If there are already suitable links, propose additions or replacements on
*[11] Bunescua,Razvan., Gea,Ruifang., Katea, Rohit J., Marcotteb,Edward M., Mooneya, Raymond J., Ramanib, Arun K. & Wonga, Yuk Wah. (2005). Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions. Artificial Intelligence in Medicine., 33,139-155.
the article's talk page, or submit your link to the relevant category at
*[12] Burguete, Alondra Schweizer., Harbury, Pehr B. & Pfeffer, Suzanne R. (2004). In Vitro Selection and Prediction of TIP47 Protein- Interaction Interfaces. Nature Methods., 1(1), 1-6.
DMOZ (dmoz.org) and link there using {{Dmoz}}.
*[13] Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X. & Chen, Y.Z. (2003). SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research., 31(13), 3692-3697.
 
*[14] Cai, C.Z., Wang, W.L.,Sun, L.Z & Chen, Y.Z. (2003). Protein Function Classification via Support Vector machine approach. Mathematical Biosciences., 185, 111-122.
-->
*[15] Cai, Yu-Dong & Lin, Shuo Liang. (2003). Support Vector Machines for Predicting rRNA-, RNA-, and DNA-binding proteins from amino acid sequence. Biochimica et Biophysica Acta., 1648, 127-133.
{{Protein methods}}
*[16] Cai, Yu-Dong., Lin, Shuo-Liang & Chou, Kuo-Chen. (2003). Support Vectors Machines for Prediction of Protein Signal Sequences and their Cleavage Sites. Peptides., 24, 159-161.
 
*[17] Cai, Yu-Dong., Liu, Xiao-Jun., Xu, Xue-biao & Chou, Kuo-Chen. (2000). Support Vector machines For Prediction of Protein Subcellular Location. Molecular Cell Biology Research Communications., 4, 230-233.
{{DEFAULTSORT:Protein-protein interaction prediction}}
*[18] Cai, Yu-Dong., Liu, Xiao-Jun., Xu, Xue-Biao & Chou, Kuo-Chen. (2003). Support Vector Machines for Prediction of Protein Domain Structural Class. J. theory. Biol., 221, 115-120.
[[Category:Proteomics]]
*[19] Cai, Yu-Dong., Liu, Xiao-Jun., Xu, Xue-biao., Chou, Kuo-Chen. PRediction of Protein Structural Classes by support vector machines. Computers and Chemistry., 26, 293-296.
*[20] Capriotti, Emidio., Fariselli, Piero., Calabrese, Remo & Casadio, Rita.(2005).Predicting protein stability changes from sequences using support vector machines. Bioinformatics. 21(2), ii54-ii58.
*[21] Carugo, Oliviero & Franzot, Giacomo. (2004). Prediction of Protein-Protein Interactions Based on Surface Patch Comparison. Proteomics., 4, 1727-1736.
*[22] Chattopadhyaya, Rajagopal. & Ghose, Asoke Chandra. (2002). Model of Vibrio cholerae toxin coregulated pilin capable of filament formation. Protein Engineering., 15(4), 297-304.
*[23] Chelliah, Vijayalakshimi., Blundell, Tom & Mizugichi, Kenji. (2005). Functional Restraints on the Patterns of Amino Acid Substitutions Application to Sequence–Structure Homology Recognition. Proteins: Structure, Function and Bioinformatics., 61, 722-731.
*[24] Chen, Pai-Hsuen., Lin, Chih-Jen & Scholkopf, Bernhard. (2003). A Tutorial on v-Support Vector Machines. Department of Computer Science and Information Engineering, National Taiwan University. 1-29.
*[25] Chen, Xue-wen & Liu, Mei. (2005). Prediction of Protein-Protein Interactions Using Random Decision Forest Framework. Bioinformatics., 1-4.
*[26] Chen, Yu-ching & Hwang, Jenn-Kang. (2005).Prediction of Disulfide Connectivity From Protein Sequences. Proteins: Structure, Function and Bioinformatics., 61, 507-512.
*[27] Chen, Yu-Ching., Lin, Yeong-Shin., Lin, Chih-Jen & Hwang, Jenn-Kang. (2004). Prediction of the Bonding States of Cysteines Using the Support Vector machines Based on Multiple Feature Vectors and Cysteine State Sequences. Proteins: Structure, Function and Bioinformatics., 55, 1036-1042.
*[28] Cheng, Betty Yee Man., Carbonell, Jaime G. & Klein-Seetharaman, Judith. (2005).Protein Classification Based on Text Document Classification Techniques. Proteins: Structure, Function and Bioinformatics., 58, 955-970.
*[29] Chinnasamy, Arunkumar., Mittal, Ankush & Sung, Wing-Kin. (2005). Probabilistic prediction of protein–protein interactions from the protein sequences. Computers in Biology and Medicine., 1-12.
*[30] Chmiel, Agnieszka., Radlinska, Monika., Pawlak, Sebastion D., Krowarsch, Daniel., Bujnicki, Janusz M. & Skowronek, Krzysztof J. (2005). A Theoretical Model of Restriction Endonuclease NiaIV in Complex with DNA, Predicted by Fold Recognition and Validated by Site-Directed Mutagenesis and Circular Dichroism Spectroscopy. Protein Engineering, Design & Selection., 18(4), 181-189.
*[31] Choulier, Laurence., Andersson, Karl., Hämäläinen, Markku D., Reggenmortel, Marc H.V. van., Malmqvist, Magnus & Altschuh, Danièle. (2002). QSAR studies applied to the prediction of antigen–antibody interaction kinetics as measured by BIOCORE. Protein Engineering., 15 (5), 373-382.
*[32] Chung-Jung Tsai and Ruth Nussinov (2001). The building block folding model and the kinetics of protein Folding. Protein Engineering., 14(10), 723–733
*[33] Craig, P.O., Berguer, P.M., Ainciart, N., Zylberman, V., Thomas, M.G., Martinez Tosar, L.J., Bulloj, A., Boccaccio, G.L. & Goldbaum, F.A. (2005). Multiple Display of Protein Domain on a Bacterial Polymeric Scaffold. Proteins: Structure, Function and Bioinformatics., 61, 1089-1100.
*[34] Deprez, Paola & Inestrosa, Nibaldo. (2000). Molecular Modeling of the Collagen-like Tail of Asymmetric Acetylcholinesterase. Protein Engineering.,13(1), 27-34.
*[35] Ding, Chris H.Q. & Dubchak, Inna. (2001). Multi-class Protein Fold Recognition using Support Vector Machines and Neural Networks. Bioinformatics., 17(4), 349-358.
*[36] Dobson, Paul D. & Doig, Andrew J. (2005). Predicting Enzyme Class from Protein Structure Without Alignments. J. Mol. Biol., 345, 187-199.
*[37] Dubey, Anshul., Realff, Matthew J., Lee, Jay H. & Bommarius, Andreas S. (2005).Support vector machines for learning to identify the critical positions of a protein. Journal of Theoretical Biology., 234, 351-361.
*[38] Eisenhaber,Birgit.,Eisenhaber,Frank.,Maurer-Stroh, Sebastian & Neuberger, Georg. (2004). Prediction of sequence signals for lipid post-translational modifications: Insights fromcase studies. Proteomics.,4, 1614-1625.
*[39] English, Andrew C., Groom, Colin R. & Hubbard, Roderick E. (2001). Experimental and Computational Mapping of the Binding Surface of a Crystalline Protein. Protein Engineering., 14(1), 47-59.
*[40] Fariselli, Piero., Pazos, Florencio., Valencia, Alfonso & Casadio, Rita. (2002). Prediction of Protein-Protein Interaction Sites in Heterocomplexes with Neural Networks. Eur. J. Biochem., 269, 1356-1361.
*[41] Fernández- Recio, Juan., Totrov, Maxim & Abagyan, Ruben. (2004). Identification of Protein-Protein Interaction Sites from Docking Energy Landscapes. Journal of Molecular Biology., 335, 843- 865.
*[42] Fernández-Recio, Juan., Totrov, Max., Skorodumov, Constantin & Abagyan, Ruben. (2005). Optimal Docking Area: A New Method For Predicting Protein-Protein Interaction Sites. Proteins: Structure, Function and Bioinformatics., 58, 134-143.
*[43] Franzot, Giacomo & Carugo, Oliviero.(2004). Computational Approaches to Protein-Protein Interaction. Journal of Structural and Functional Genomics., 4, 245-255.
*[44] Gao, Qing-Bin & Wang, Zheng-Zhi. (2005). Using Nearest Feature Line and Tunable Nearest Neighbor methods for prediction of protein subcellular locations. Computational Biology and Chemistry., 29, 388-392.
*[45] Gomez, Manuel., Alonso-Allende, Ramón., Pazos, Florencio., Grana, Osvaldo., Juan, David.& Valencia, Alfonso. (2004). Accessible Protein Interaction Data for Network Modeling. Structure of the information and available repositories. Structural Bioinformatics group, Imperial College.
*[46] Gomez, Shawn M. & Rzhetsky, Andrey. (2002).Towards the Prediction of Complete Protein- Protein Interaction Networks. Columbia Genome Center, Department of Medical Informatics, Columbia University.
*[47] Gottschalk, Kay-Eberhard., Neuvirth, Hani & Schreiber, Gideon. (2004). A Novel Method for Scoring of Docked Protein Complexes Using Predicted Protein-Protein Binding Sites. Protein Engineering, Design & Selection., 17(2), 183-189.
*[48] Guo,Ting., Shi, Yanxin & Sun, Zhirong. (2005). A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Engineering, Design & Selection., 18(2), 65-70.
*[49] Han, Dong-soo., Kim, Hong-soo., Jang, Wong –Hyuk., Lee, Sung-Doke & Suh, Jung-Keun.(2004). PreSPI: a ___domain combination based prediction system for protein–protein interaction. Nucleic Acids Research., 32(21), 6312-6320.
*[50] Han, Sangjo., Lee, Byung-chui, Yu, Seung Taek., Jeong, Chan-seok., Lee, Soyoung & Dongsup, Kim. (2005). Fold Recognition by combining profile-profile alignment and support vector machine. Bioinformatics., 21(11), 2667-2673.
*[51] Hermjakob, Henning., Motecchi- Palazzi, Luisa., Bader, Gary., Wojcik, Jérôme., Salwinski, Lukasz., Ceol, Arnaud et al. (2004). The HUPO PSI's Molecular Interaction format—a community standard for the representation of protein interaction data. Nature Biotechnology., 22(2) 177-183.
*[52] Heuser, Phillipp., Baù, Davide., Benkert, Pascal & Schomberg, Dietmar. (2005). Refinement of Unbound Protein Docking Studies Using Biological Knowledge. Proteins: Structure, Function and Bioinformatics., 61, 1059-1067.
*[53] Hirokawa, Takatsugu., Uechi, Junichi., Sasamoto, Hiroyuki., Suwa, Makiko & Mitaku, Shigeki. (2000). A Triangle Lattice Model that Predicts Transmembrane Helix Configuration using a Polar Jigsaw Puzzle. Protein Engineering., 13(11), 771-778.
*[54] Ho, Tin Kam. (1998). The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence., 20(8), 832- 842.
*[55] Horváth, Gábor V., Pettkó- Szandtner, Aladár., Nikovics, Krisztina., Bilgin, Metin., Boulton, Margaret., Davies, Jeffery W., Gutiérrez, Crisanto & Dudits, Dénes. (1998). Prediction of functional regions of the maize streak virus replication-associated proteins by protein-protein interaction analysis. Plant Molecular Biology., 38, 699-712.
*[56] Hsu, Chih-Wei., Chang, Chih-Chung & Lin, Chih-Jen. (2003). Practical guide to Support Vector Classification. Department of Computer science and Information Engineering, National Taiwan University, 1-12.
*[57] Hu, Hai., Columbus, John., Zhang, Yi., Wu, Dongying., Lian, Lubing., Yang, Song., Goodwin, Jennifer., Luczak, Christine., Carter, Mark., Chen, Lin., James, Michael., Davis, Roger., Sudol, Marius., Rodwell, John & Herrero, Juan J. (2004). A Map of WW Domain Family Interactions. Proteomics., 4,643-655.
*[58] Huang, Jing & Shi, Feng. (2004). Support vector machines for predicting Apoptosis Proteins Types. Acta Biotheoretica., 53, 39-47.
*[59] Huang, Ni., Chen, Hu & Sun, Zhirong. (2005). CTKPred: an SVM-based method for the prediction and classification of the cytokine superfamily. Protein Engineering, Design & Selection. 18(8), 365-368.
*[60] Ito, Takashi., Chiba, Tomoko., Ozawa, Ritsuko., Yoshida, Mikio., Hattori, Masahira & Sakaki, Yoshiyuki. (2001). A Comprehensive Two- Hybrid Analysis to Explore the Yeast Protein Interactome. Proceedings of National Academy of Science., 98(8), 4569-4574.
*[61] Jaffe, Jacob D., Berg, Howard C. & Church, George M. (2003). Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics, 4, 59-77.
*[62] Jiang, Fan. (2003). Prediction of Protein Secondary Structure with a Reliability Score Estimated by Local Sequence Clustering. Protein Engineering., 16(9), 651-657.
*[63] Jones, S. & Thornton, JM. (1997). Prediction of Protein-Protein Interaction Sites using Patch Analysis. J. Mol. Biol., 272(1), 133-143.
*[64] Kikuchi, Tomonori & Abe, Shigeo. (2005). Comparison Between Error Correcting output Codes and Fuzzy Support Vector Machines. Pattern Recognition Letters., 26, 1937-1945.
*[65] Kim, Hyunsoo & Park, Haesum. (2004). Prediction of Protein Relative Solvent Accessibility with Support Vector Machines and Long-Range Interaction 3D Local Descriptor. Proteins: Structure, Function, and Bioinformatics., 54, 557-562.
*[67] Kim, Moon Kyu., Kim, Eun Sook., Kim, Dong Soo., Choi, In-Hong., Moon, Taesung, Yoon, Chang No & Shin, Jeon-Soo. (2004). Two novel mutations of Wiskott Aldrich syndrome the molecular prediction of interaction between the mutated WASP L101P with WASP interacting protein by molecular modeling. Biochimica et Biophysica Acta., 1690, 134-140.
*[68] Kim, Wan Kyu & Ison, Jon C. (2005). Survey of the Geometric Association of Domain-Domain Interfaces. Proteins., 61(4), 1075- 1088.
*[69] Kim, Wan Kyu., Park, Jong & Suh, Jung, Keun. (2002). Large Scale Statistical Prediction of Protein-Protein Interaction by Potentially Interacting Domain (PID) Pair. Genome Informatics., 13, 42-50.
*[70] Koike, Asako & Toshihisa, Takagi. (2004). Prediction of Protein-Protein Interaction Sites using Support Vector Machines. Protein Engineering, Design & Selection., 17(2), 165-173.
*[71] Kortemme, Tanja & Baker, David. (2003). Computational Design of Protein-Protein Interactions. Current Opinion in Chemical Biology., 8, 91-97.
*[72] Kortemme, Tanja., Joachimiak, Lukasz A., Bullock, Alex N., Schuler, Aaron D., Stoddard, Barry L. & Baker, David. (2004). Computational Redesign of Protein-Protein Interaction Specificity. Nature Structural & Molecular Biology., 11(4), 371-379.
*[73] Küster, Bernhard., Mortensen, Peter., Andersen, Jens S. & Mann, Matthias. (2001). Mass spectrometry allows direct identification of proteins in large genomes. Proteomics., 1, 641-650.
*[74] Lappe, Michael & Holm, Liisa. (2004).Unravelling Protein Interaction Networks with Near Optimal Efficiency. Nature Biotechnology., 22(1), 98-103.
*[75] Lee, CY., Yang, PK.,Tzou, WS. & Hwang, MJ. (1998). Estimates of Relative Binding Free Energies for HIV Protease Inhibitors Using Different Levels of Approximations. Protein Engineering., 11(6), 429-437.
*[76] Li, Chun Hua., Ma, Xiao Hui., Chen, Wei Zu & Wang, Cun Xin. (2003). A Protein-Protein Docking Algorithm Dependent on the Type of Complexes. Protein Engineering., 16(4), 265-269.
*[77] Li, Hui., Robertson, Andrew D. & Jensen, Jan H. (2005). Very Fast Empirical Prediction and Rationalization of Protein pKa Values. Proteins: Structure, Function and Bioinformatics., 61, 704- 721.
*[78] Liang, Shide., Zhang, Jian., Zhang, Shicui and Guo, Huarong. (2004).Prediction of the Interaction Site on the Surface of an Isolated Protein Structure by Analysis of Side Chain Energy Scores. Proteins: Structure, Function and Bioinformatics., 57, 548-557.
*[79] Lin, Yi., Lee, Yoonkyung & Wahba, Grace. (2002). Support Vector Machines for Classification in Nonstandard Situations. Machine Learning., 46, 191-202.
*[80] Lindauer, Klaus., Loerting, Thomas., Liedl, Klaus R. Kroemer, Romano T. (2001). Prediction of the Structure of Human Janus Kinase 2 (JAK2) Comprising of two Carboxy-terminal Reveals a Mechanism for Autoregulation. Protein Engineering., 14(1), 27-37.
*[81] Ling Lo, Siaw., Cai Zhong, Cong., Chen, Yu Zong & Chung, Maxey C. M. (2005). Effect of training datasets on support vector machine prediction of protein-protein interactions. Proteomics., 5, 876-884.
*[82] Lu, Wencong., Dong, Ning & Gábor Náray-Szabó. (2005). Predicting Anti-HIV-1 Activities of HEPT-analog Compounds by Using Support Vector Classification. QSAR Comb. Sci., 24, 1021-1025.
*[83] Lubec,Gert., Afjehi-Sadat,Leila., Yang, Jae-Won & John, Julius Paul Pradeep. (2005). Searching For Hypothetical Proteins: Theory and Practice Based Upon Original Data and Literature. Progress in Neurobiology., 77, 90-127.
*[84] Gromiha, Michael M., Oobatake, Motohisa., Kono, Hiditoshi., Uedaira, Hatsuho and Sarai, Akinori. (1999). Role of structural and sequence information in the prediction of protein stability changes: comparison between buried and partially buried mutations. Protein Engineering, 12(7),549–555
*[85] Gromiha, Michael M. and Selvaraj, S. (1998). Protein secondary structure prediction in different structural classes. Protein Engineering., 11(4).249–251.
*[86] Mahn, Andrea & Asenjo, Juan A. (2005). Prediction of Protein Retention in Hydrophobic Interaction Chromatography. Biotechnology Advances., 23,359-368.
*[87] Mamitsuka, Hiroshi. (2004). Essential Latent Knowledge for Protein-Protein Interactions: Analysis by an Unsupervised Learning Approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics., 2(2),119-130.
*[88] Mandel-Gutfreund, Yael & Margalit, Hannah. (1998). Quantitative parameters for amino acid–base interaction implications for prediction of protein–DNA binding sites. Nucleic Acids Research., 26(10), 2306-2312.
*[89] Mandell, Jeffery G., Roberts, Victoria A., Pique, Michael E., Kotlovyi, Vladimir., Mitchell, Julie C., Nelson, Erik., Tsigelny, Igor & Eyck, Lynn F. Ten. (2001). Protein Docking Using Continuum Electrostatics and Geometric Fit. Protein Engineering., 14(2), 105-113.
*[90] Margulies, Marcel., Egholm, Michael., Altman, William E., Attiya, Said., Bader, Joel S., Bemben, Lisa A., Berka, Jan., Braverman, Michael S., Chen, Yi-Ju., Chen, Zhoutao., Dewell, Scott B., Du, Lei., Fierro, Joseph M. et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature., 437, 376-380.
*[91] Markowetz, Florian., Edler, Lutz & Vingron, Martin. (2003). Support Vector Machines For Protein Fold Class Prediction. Biometrical Journal., 45(3), 377-389.
*[92] Mayordomo, Isabel & Sanz, Pascual. (2002). The Saccharomyces cerevisiae 14-3-3 protein Bmh2 is Required for Regulation of Phosphorylation Status of Fin1, a Novel Intermediate Protein. Biochemistry Journal., 365, 51-56.
*[93] Mõnnigmann, M. & Floudas, C.A. (2005). Protein Loop Structure Prediction With Flexible Stem.Geometries. Proteins: Structure, Function and Bioinformatics., 61, 748-762.
*[94] Mooney, Sean D., Liang, Mike Hsing-Ping., Deconde, Rob & Altman, Ross B. (2005). Structural Characterization of Proteins Using Residue Environments. Proteins: Structure, Function and Bioinformatics., 61, 741-747.
*[95] Nabieva, Elena., Jim, Kam., Agarwal, Amit., Chazelle, Bernard & Singh, Mona. (2005).Whole-proteome Prediction of Protein Function Via Graph-Theoretic Analysis of Interaction Maps. Bioinformatics., 21, 302-310.
*[96] Nagl, Sylvia B., Das, Sudenshna & Smith, Temple F. (2000). Prediction of Interaction Partners for Orphan Nuclear Receptors by Prior-based Protein Sequence Profiles. Journal of Molecular Recognition., 13, 117-126.
*[97] Nanni, Loris. (2005). Fusion of Classifiers for Predicting Protein-Protein Interactions. Neurocomputing., 68, 289-296.
*[98] Nanni, Loris. (2005). Hyperplanes for Predicting Protein-Protein Interactions. Neurocomputing., 69, 257-263.
*[99] Nanni, Loris. (2005). Hyperplanes for Prediction Protein-Protein Interactions. Neurocomputing., 69, 257-263.
*[100] Nguyen, Minh N. & Rajapakse, Jagath C.(2005). Prediction of Protein Relative Solvent Accessibility With a Two-Stage SVM Approach. Proteins: Structure, Function and Bioinformatics., 59, 30-37.
*[101] Palma, Nuno P., Krippahl, Ludwig., Wampler, John E., Moura, José J.G. (2000). A New (Soft) Docking Algorithm for Predicting Protein Interactions. Proteins: Structure, Function and Genetics., 39(4), 372-284.
*[102] Pazos, Florencio. & Valencia, Alfonso. (2001). Similarity in phylogenetic trees as indicator of protein- protein interaction. Protein Engineering., 14(9), 609-614.
*[103] Permyakov, Serge E., Makhatatadze, George I., Owenius, Rikard., Uversky, Vladimir N., Brooks, Charles L., Permyakov, Eugene A. & Berliner, Lawrence J. (2005). How to Improve Nature Study of the Electrostatic Properties of the Surface of a-lactalbumin. Protein Engineering, Design & Selection., 18(9), 425-433.
*[104] Prusis, Peteris., Lunstedt, Torbjörn & Wikberg, Jarl E.S. (2002). Proteo-chemometrics analysis of MSH peptide binding to melanocortin receptors. Protein Engineering., 14(4), 305-311.
*[105] Qi, Yanjun., Klein-Seetharaman, Judith. & Bar-Joseph, Ziv. (2005). Random Forest Similarity for Protein-Protein Interaction prediction from Multiple Sources. School of Computer Science, Carnegie Mellon University.
*[107] Rausch,Christian., Weber,Tilmann.,Kohlbacher,Oliver., Wohlleben,Wolfgang & Huson, Daniel H. (2005). Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs). Nucleic Acids Research., 33(18), 5799-5808.
*[108] Rhodes, David R., Tomlins, Scott A., Varambally, Sooryanarayana., Mahavisno, Vasudeva., Barrette, Terrence., Kalyana- Sundaram, Shanker., Ghosh, Debashis., Pandey, Alhilesh & Chinnaiyan, Arul M. (2005). Probabilistic model of the Human Protein-Protein Interaction Network. Nature Biotechnology., 23(8), 951-959.
*[109] Russell, Robert B., Alber, Frank., Aloy, Patrick., David, Fred P., Korkin, Dmitry., Pichaud, Matthieu., Topf, Maya & Sali, Andrej. (2004). A Structural Perspective on Protein-Protein Interactions. Current Opinion in Structural Biology., 14, 313-324.
*[110] Vajda, Sandor., Vakser, Ilya A., Sternberg, Michael J.E & Janin, Joël. (2001). First community wide experiment on the comparative evaluation of protein: Modeling of Protein Interactions in Genomes. Biomedical Engineering., Boston University.
*[111] Saraf, Manish C., Moore, Gregory L. & Maranas, Costas D. (2003). Using multiple sequence correlation analysis to characterize functionally important protein regions. Protein Engineering., 16(6), 397-406.
*[112] Sato, Tetsuya., Yamanshi, Yoshiro., Kanehisa, Minoru. & Toh, Hiroyuki. (2000). Prediction of Protein-Protein Interactions Based on Real-Valued Phylogenetic Profiles Using Partial Correlation Coefficient. Institute for Chemical Research, Kyoto University.
*[113] Seeger, Matthias. (2004).Gaussian Processes For Machine Learning. University of California.
*[114] Shilton, Alistair, Palaniswami, M., Ralph, Daniel & Tsoi, Ah Chung. (2005). Incremental Training Support Vector Machines. IEEE TRANSACTIONS ON NEURAL NETWORK., 16(1), 114-131.
*[115] Shionyu-Mitsuyama, Clara., Shirai, Tsuyoshi., Ishida, Hirokazu & Yamane, Takashi. (2003). An empirical approach for structure-based prediction of carbohydrate-binding sites on proteins. Protein Engineering., 16(7), 467-478.
*[116] Shoshana J Wodak and Rau´l Me´ndez, (2004). Prediction of protein–protein interactions: the CAPRI experiment, its evaluation and implications. Current Opinion in Structural Biology. 14, 242–249
*[117] Shwikowski, Benno., Uetz, Peter. & Fields, Stanley. (2000). A network of protein-protein interactions in yeast. Nature Biotechnology., 18,1257-1261.
*[118] Soares, Dinesh C., Gerloff, Dietlind L., Syme, Neil R., Coulson, Andrew F.W., Parkinson, John & Barlow, Paul N. (2005). Large-scale modelling as a route to multiple surface comparisons of the CCP module family. Protein Engineering, Design & Selection., 18(8), 379-388.
*[119] Socolich, Michael., Lockless, Steve W., Russ, William P., Lee, Heather., Gardner, Kevin H. & Ranganathan, Rana. (2005). Evolutionary information for specifying a protein fold. Nature., 437, 512-518.
*[120] Song, Jie & Tang, Huanwen. (2005). Support vector Machines for Classification of Homo-oligomeric Proteins by incorporating Subsequence Distributions. Journal of Molecular structure: THEOCHEM., 722, 97-101.
*[121] Srinivasan, N., Antonelli, Marcelo., Jacob, Germaine., Korn, Iris., R-Sayed, Muhammed F., Blundell, Tom L., Allende, Catherine C. & Allende, Jorge C. (1999). Structural interpretation of site-directed mutagenesis and specificity of the catalytic subunit of protein kinase CK2 using comparative modelling. Protein Engineering., 12(2), 119-127.
*[122] Su, Zhengchang., Dam, Phuongan., Chen, Xin., Olman, Victor., Jiang, Tao., Palenik, Brian & Xu, Ying. (2003). Computational Inference of Regulatory Pathways in Microbes: an Application of Phosphorus Assimilation Pathways in Synechococcus sp WH8102. Genome Informatics., 14, 3-13.
*[123] Sudarsanam, Sucha & Srinivasan, Subhashini. (1997). Sequence-dependent conformational sampling using a database of fI+1 and yi angles for predicting polypeptide backbone conformations. Protein Engineering., 10(10), 1155-1162.
*[124] Sussman, Fredy., Villaverde, M. Carmen & Martinez, Luis. (2002). Modified Solvent Accessibility Free Energy Prediction Analysis of Cyclic Urea inhibitors binding to the HIV-1 protease. Protein Engineering., 15( 9), 707-711.
*[125] Pitre, Sylvain., A. Chan, Cheetham, Jim., Dehne, Frank., Duong, Alex., Emili, Andrew., Greenblatt, Jack., Krogan, Nevan., Luo, Xuemei & Golshani, Ashkan. (2005). PIPE: A PROTEIN-PROTEIN INTERACTION PREDICTION ENGINE BASED ON THE RE-OCCURRING SHORT AMINO ACID SEQUENCES BETWEEN KNOWN INTERACTING PROTEIN PAIRS.
*[126] Tang, Yuchun., Jin, Bo & Zhang, Yan-Qing. (2005). Granular Support Vector Machines with Association Rules Mining For Protein Homology Prediction. Artificial Intelligence in Medicine., 35, 121-134.
*[127] Taroni, Chiara., Jones, Susan. & Thornton, Janet M. (2000). Analysis and prediction of carbohydrate binding sites. Protein Engineering., 13(2), 89-98.
*[128] Terashi, Genki., Takeda-Shitaka, Mayuko., Takaya, Daisuke., Komatsu, Katsuichiro & Umeyama, Hideaki. (2005). Searching for Protein-Protein Interaction Sites and Docking by Mothods of Molecular Dynamics, Grid Scoring, and the Pairwise Interaction Potential of Amino Acid Residues. Proteins: Structure, Function and Bioinformatics., 60, 289-295.
*[129] Thierry-Mieg, Nicolas. (2000). Protein-Protein Interaction Prediction for C. elgans. Laboratoire LSR-IMAG, France.
*[130] Tong, Amy Hin Yang., Drees, Becky., Nardelli, Giuliano., Bader, Gary D., Branetti, Barbara., Castagnoli, Luisa., Evangelista, Marie., Ferracuti, Silvia et al. (2002). A Combined Experimental and Computational Strategy to Define Protein Interaction Networks for Peptide Recognition Modules. Science., 295, 321-324.
*[131] Tropsha, Alexander & Edelsbrunner, Herbert. Biogeometry Applications of Computational Geometry to Molecular Structure. School of Pharmacy, University of North Carolina.
*[132] Tsuda, Koji., Shin, HyunJung & Scholkopf, Bernhard. (2005). Fast Protein Classification with Multiple Networks. Bioinformatics., 21(2), ii59-ii65.
*[133] Tuffery, Pierre & Derreumaux, Phillippe. (2005). Dependency Between Consecutive Local Conformations Helps Assemble Protein Structures From Secondary Structures Using Go Potential and Greedy Algorithm. Proteins: Function, Structure and Bioinformatics., 61, 732-740.
*[134] Uetz, Peter & Vollert, Carolina S. (2005). Protein-Protein Interactions. Encyclopedic References of Genomics and Proteomics in Molecular Medicine.
*[135] Vajda, Sandor & Camacho, Carlos J. (2004). Protein-Protein Docking: Is the Glass Half-Empty? Trends in Biotechnology., 22(3), 110-116.
*[136] Valencia, Alfonso & Pazos, Florencio. (2002). Computational Methods for the Prediction of Protein Interactions. Current Opinion in Structural Biology., 12,368-373.
*[137] Vazquez, Alexei., Flammini, Alessandro., Maritan, Amos. & Vespegnani, Alessandro. (2003). Global Protein Function Prediction From Protein-Protein Interaction Networks. Nature Technology., 21(6), 697-700.
*[138] Wang, Meng., Yang, Jie., Liu, Guo-Ping., Xu, Zhi-Jie & Chou, Kuo-Chen. (2004). Weighted-Support vector Machines for Predicting Membrane Protein Types based on Pseudo-amino acid Composition. Protein Engineering, Design & Selection., 17(6), 509-516.
*[139] Webb-Robertson, Bobbie-Jo., Oehmen, Christopher & Matzke, Melissa. (2005). SVM-BALSA Remote homology detection based on Bayesian sequence alignment. Computational Biology and Chemistry., 29, 440-443.
*[140] Weber, Irene T. & Harrison, Robert W. (1999). Molecular mechanics analysis of drug-resistant mutants of HIV. Protein Engineering., 12(6), 469-474.
*[141] William J. Greenleaf, Michael T. Woodside, Elio A. Abbondanzieri, and Steven M. (2005). Passive All-Optical Force Clamp for High-Resolution Laser Trapping. Block Phys. Rev. Lett., 95, 208-102
*[142] Wodak, Shoshana J. & Mendez, Raul. (2004). Prediction of Protein-Protein Interactions: the CAPRI Experiment, its evaluation and implications. Current Opinion in Structural Biology., 14, 242-249.
*[143] Wojcik, Jérôme., Boneca, Ivo. & Legrain, Pierre. (2002). Prediction, Assessment and Validation of Protein Interaction Maps in Bacteria. J. Mol. Biol., 323, 763-770.
*[144] Wright, JD & Lim, C. (1998). Prediction of an anti-IgE binding site on IgE. Protein Engineering., 11(6), 421-427.
*[145] Xie, Dan., Li, Ao., Wang, Minghui., Fan, Zhewan & Feng, Huanqing. (2005).LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Research., 33, W105-W110.
*[146] Yang, Zheng Rong., (2005). Orthogonal Kernel Machine for the Prediction of Functional Sites in Proteins. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS., 35(1), 100-106.
*[147] Yu, Chenggang., Zavaljevski1,Nela., Stevens, Fred J., Yackovich,Kelly & Reifman, Jaques. (2005). Classifying Noisy Protein Sequence Data: A case study of immunoglobulin light Chains. Bioinformatics. 21(1), i495-i501.
*[148] Yu, Hui., Gao, Lei., Tu, Kang & Guo, Zheng. (2005).Broadly Predicting Specific Gene Functions with Expression Similarity and Taxonomy Similarity. Gene., 352, 75-81.
*[149] Yuan, Zheng & Huang, Bixing. (2004). Prediction of Protein Accessible Surface Areas by Support Vector Regression. Proteins: Structure, Function and Bioinformatics., 57, 558-564.
*[150] Zavaljevski, Nela., Stevens, Fred J. & Reifman, Jaques. (2002). Support Vector Machines with Selective Kernel Scaling for Protein Classification and identification of Key Amino acid positions. Bioinformatics., 18(5), 689-696.
*[151] Zeng, Jun., Nheu,Thao., Zorzet1, Anna., Catime ,Bruno., Nice, Ed., Maruta, Hiroshi., Burgess, Antony W. & Treutlein, Herbert R. (2001). Design of Inhibitors of Ras-Raf interaction using a Computational Combinatorial Algorithm. Protein Engineering., 14(1), 39-45.
*[152] Zhao, Xing-Ming., Cheung, Yiu-Ming & Huang, De-Shuang. (2005). A Novel Approach to Extracting Features from Motif Content and Protein Compositiong for Protein Sequence Classification. Neural networks., 18, 1019-1028.
*[153] Zhou, Huan-Xiang & Shan, Yibing. (2001). Prediction of Protein Interaction Sites from Sequence Profile and Residue Neighbor List. Proteins: Structure, Function and Genetics., 44, 336-343.
[[Category:Bioinformatics]]