Structural alignment: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:41, 23 September 2020 edit Cems2 (talk \| contribs) 336 edits →Evaluating Similarity ← Previous edit		Latest revision as of 20:17, 27 June 2025 edit undo Citation bot (talk \| contribs) Bots 5,867,164 edits Alter: journal, pages. Add: pages, pmid. Formatted dashes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 728/823
(36 intermediate revisions by 22 users not shown)
Line 1: {{for\|structural alignment in cognitive science\|Analogy#Structural alignment}} {{short description\|Aligning molecular sequences using sequence and structural information}} <div style="background: var(--background-color-transparent); color: inherit">[[Image:Alignment of thioredoxins2.png\|thumb\|300px\|right\|Structural alignment of [[thioredoxin]]s from humans and the fly [[Drosophila melanogaster]]. The proteins are shown as ribbons, with the human protein in red, and the fly protein in yellow. Generated from PDB [http://www.rcsb.org/pdb/explore.do?structureId=3TRX 3TRX] and [http://www.rcsb.org/pdb/explore.do?structureId=1XWC 1XWC].]]</div> '''Structural alignment''' attempts to establish [[Sequence homology\|homology]] between two or more [[polymer]] structures based on their shape and three-dimensional [[tertiary structure\|conformation]]. This process is usually applied to [[protein]] [[tertiary structure]]s but can also be used for large [[RNA]] molecules. In contrast to simple structural superposition, where at least some equivalent residues of the two structures are known, structural alignment requires no ''a priori'' knowledge of equivalent positions. Structural alignment is a valuable tool for the comparison of proteins with low sequence similarity, where evolutionary relationships between proteins cannot be easily detected by standard [[sequence alignment]] techniques. Structural alignment can therefore be used to imply [[evolution]]ary relationships between proteins that share very little common sequence. However, caution should be used in using the results as evidence for shared evolutionary ancestry because of the possible confounding effects of [[convergent evolution]] by which multiple unrelated [[amino acid]] sequences converge on a common [[tertiary structure]]. Line 18 ⟶ 19: The most basic possible comparison between protein structures makes no attempt to align the input structures and requires a precalculated alignment as input to determine which of the residues in the sequence are intended to be considered in the RMSD calculation. Structural superposition is commonly used to compare multiple conformations of the same protein (in which case no alignment is necessary, since the sequences are the same) and to evaluate the quality of alignments produced using only sequence information between two or more sequences whose structures are known. This method traditionally uses a simple least-squares fitting algorithm, in which the optimal rotations and translations are found by minimizing the sum of the squared distances among all structures in the superposition.<ref name="martin"/> More recently, maximum likelihood and Bayesian methods have greatly increased the accuracy of the estimated rotations, translations, and covariance matrices for the superposition.<ref name="theobald"/><ref name="theobald2"/> Algorithms based on multidimensional rotations and modified [[quaternion]]s have been developed to identify topological relationships between protein structures without the need for a predetermined alignment. Such algorithms have successfully identified canonical folds such as the [[helix bundle\|four-helix bundle]].<ref name="Diederichs"/> The [http://wishart.biology.ualberta.ca/SuperPose/ SuperPose] {{Webarchive\|url=https://web.archive.org/web/20151031151001/http://wishart.biology.ualberta.ca/SuperPose/ \|date=2015-10-31 }} method is sufficiently extensible to correct for relative ___domain rotations and other structural pitfalls.<ref name="Maiti"/> ===Evaluating ~~Similarity~~similarity=== Often the purpose of seeking a structural superposition is not so much the superposition itself, but an evaluation of the similarity of two structures or a confidence in a remote alignment.<ref name="casp11"/><ref name="Malmstrom" /><ref name="robetta"/> A subtle but important distinction from maximal structural superposition is the conversion of an alignment to a meaningful similarity score.<ref name="Mammoth" /><ref ~~Many~~name="ZhangTMscore"/> Most methods output some sort of "score" indicating the quality of the superposition.<ref name="zemla" /><ref name="fischer"/><ref name="poleksic"/><ref name="Mammoth"/><ref name="ZhangTMscore"/> However, what one actually wants is ''not'' merely an ''estimated'' "Z-score" or an ''estimated'' E-value of seeing the observed superposition by chance but instead one desires that the ''estimated'' E-value is tightly ~~correlation~~correlated to the true E-value. Critically, even if a method's estimated E-value is precisely correct ''on average'', if it lacks a low standard deviation on its estimated value generation process, then the rank ordering of the relative similarities of a query protein to a comparison set will rarely agree with the "true" ordering.<ref name="Mammoth"/><ref name="ZhangTMscore"/> Different methods will superimpose different numbers of residues because they use different quality assurances and different definitions of "overlap"; some only include residues meeting multiple local and global superposition criteria and others are more greedy, flexible, and promiscuous. A greater number of atoms superposed can mean more similarity but it may not always produce the best E-value quantifying the unlikeliness of the superposition and thus not as useful for assessing similarity, especially in remote homologs.<ref name="casp11"/><ref name="Malmstrom" /><ref name="robetta" /><ref name="skolnick" /> Line 40 ⟶ 41: ===DALI=== [[Image:Ssap-vectors.png\|frame\|class=skin-invert-image\|Illustration of the atom-to-atom vectors calculated in SSAP. From these vectors a series of vector differences, e.g., between (FA) in Protein 1 and (SI) in Protein 2 would be constructed. The two sequences are plotted on the two dimensions of a matrix to form a difference matrix between the two proteins. Dynamic programming is applied to all possible difference matrices to construct a series of optimal local alignment paths that are then summed to form the summary matrix, on which a second round of dynamic programming is performed.]]A common and popular structural alignment method is the DALI, or Distance-matrix ALIgnment method, which breaks the input structures into hexapeptide fragments and calculates a distance matrix by evaluating the contact patterns between successive fragments.<ref name="holm"/> [[Secondary structure]] features that involve residues that are contiguous in sequence appear on the matrix's [[main diagonal]]; other diagonals in the matrix reflect spatial contacts between residues that are not near each other in the sequence. When these diagonals are parallel to the main diagonal, the features they represent are parallel; when they are perpendicular, their features are antiparallel. This representation is memory-intensive because the features in the square matrix are symmetrical (and thus redundant) about the main diagonal. When two proteins' distance matrices share the same or similar features in approximately the same positions, they can be said to have similar folds with similar-length loops connecting their secondary structure elements. DALI's actual alignment process requires a similarity search after the two proteins' distance matrices are built; this is normally conducted via a series of overlapping submatrices of size 6x6. Submatrix matches are then reassembled into a final alignment via a standard score-maximization algorithm — the original version of DALI used a [[Monte Carlo method\|Monte Carlo]] simulation to maximize a structural similarity score that is a function of the distances between putative corresponding atoms. In particular, more distant atoms within corresponding features are exponentially downweighted to reduce the effects of noise introduced by loop mobility, helix torsions, and other minor structural variations.<ref name="Mount" /> Because DALI relies on an all-to-all distance matrix, it can account for the possibility that structurally aligned features might appear in different orders within the two sequences being compared. The DALI method has also been used to construct a database known as [[Families of structurally similar proteins\|FSSP]] (Fold classification based on Structure-Structure alignment of Proteins, or Families of Structurally Similar Proteins) in which all known protein structures are aligned with each other to determine their structural neighbors and fold classification. There is ana [http://ekhidna.biocenter.helsinki.fi/dali searchable database] based on DALI as well as a [http://ekhidna.biocenter.helsinki.fi/dali/README.v5.html downloadable program] and [http://ekhidna.biocenter.helsinki.fi/dali web search] based on a standalone version known as DaliLite. ===Combinatorial extension=== Line 62 ⟶ 63: }}</ref> A number of similarity metrics are possible; the original definition of the CE method included only structural superpositions and inter-residue distances but has since been expanded to include local environmental properties such as secondary structure, solvent exposure, hydrogen-bonding patterns, and [[dihedral angle]]s.<ref name="shindyalov" /> An alignment path is calculated as the optimal path through the similarity matrix by linearly progressing through the sequences and extending the alignment with the next possible high-scoring AFP pair. The initial AFP pair that nucleates the alignment can occur at any point in the sequence matrix. Extensions then proceed with the next AFP that meets given distance criteria restricting the alignment to low gap sizes. The size of each AFP and the maximum gap size are required input parameters but are usually set to empirically determined values of 8 and 30 respectively.<ref name="shindyalov" /> Like DALI and SSAP, CE has been used to construct an all-to-all fold classification [http://cl.sdsc.edu/ database] {{Webarchive\|url=https://web.archive.org/web/19981203071023/http://cl.sdsc.edu/ \|date=1998-12-03 }} from the known protein structures in the PDB. The [[Protein Data Bank\|RCSB PDB]] has recently released an updated version of CE, Mammoth, and FATCAT as part of the [http://www.rcsb.org/pdb/workbench/workbench.do RCSB PDB Protein Comparison Tool]. It provides a new variation of CE that can detect [[Circular Permutation Proteins\|circular permutations]] in protein structures.<ref name="prlic"/> Line 78 ⟶ 79: \|volume=11 \| issue=11 \|pages=~~2606-2621~~2606–2621 \|doi=10.1110/ps.0215902 \| pmc= 2373724 \|doi-access=free }}</ref> approaches the alignment problem from a different objective than almost all other methods. Rather than trying to find an alignment that maximally superimposes the largest number of residues, it seeks the subset of the structural alignment least likely to occur by chance. To do this it marks a local motif alignment with flags to indicate which residues simultaneously satisfy more stringent criteria: 1) Local structure overlap 2) regular secondary structure 3) 3D-superposition 4) same ordering in primary sequence. It converts the statistics of the number of residues with high-confidence matches and the size of the protein to compute an Expectation value for the outcome by chance. It excels at matching remote homologs, particularly structures generated by ab initio structure prediction to structure families such as SCOP, because it emphasizes extracting a statistically reliable sub alignment and not in achieving the maximal sequence alignment or maximal 3D superposition.<ref name="Malmstrom">< /~~ref~~><ref name="robetta">{{cite journal \|journal=Nucleic Acids Research \|year= 2004 Line 88 ⟶ 90: \|pmid= 15215442 \|title=Protein structure prediction and analysis using the Robetta server \|~~authors~~author1=David E. Kim, \|author2=Dylan Chivian~~, and~~ \|author3=David Baker \|issue= Web Server issue \|pages= W526–W531 \|pmc= 441606 \|doi-access= free }}</ref> For every overlapping window of 7 consecutive residues it computes the set of displacement direction unit vectors between adjacent C-alpha residues. All-against-all local motifs are compared based on the URMS score. These values becomes the pair alignment score entries for dynamic programming which produces a seed pair-wise residue alignment. The second phase uses a modified MaxSub algorithm: a single 7 reside aligned pair in each proteins is used to orient the two full length protein structures to maximally superimpose these just these 7 C-alpha, then in this orientation it scans for any additional aligned pairs that are close in 3D. It re-orients the structures to superimpose this expanded set and iterates ~~till~~until no more pairs coincide in 3D. This process is restarted for every 7 residue window in the seed alignment. The output is the maximal number of atoms found from any of these initial seeds. This statistic is converted to a calibrated E-value for the similarity of the proteins. Mammoth makes no attempt to re-iterate the initial alignment or extend the high quality sub-subset. Therefore, the seed alignment it displays can't be fairly compared to DALI or TM align as ~~its~~it was formed simply as a heuristic to prune the search space. (It can be used if one wants an alignment based solely on local structure-motif similarity agnostic of long range rigid body atomic alignment.) Because of that same parsimony, it is well over ten times faster than DALI, CE and TM-align. <ref name="foldclass">{{cite journal \|title=Efficient SCOP-fold classification and retrieval using index-based protein substructure alignments \|~~authors~~author1=Pin-Hao Chi, \|author2=Bin Pang, \|author3=Dmitry Korkin, \|author4=Chi-Ren Shyu \|journal=Bioinformatics \|volume=25 Line 102 ⟶ 108: \|pages=2559–2565 \|doi=10.1093/bioinformatics/btp474 \|pmid=19667079 \|doi-access=free }}</ref> It is often used in conjunction with these slower tools to pre-screen large data bases to extract the just the best E-value related structures for more exhaustive superposition or expensive calculations. <ref name="grishin04">{{cite journal \|journal=BMC Bioinformatics Line 110 ⟶ 117: \|issue= 197 \| doi=10.1186/1471-2105-5-197 \|~~PMID~~pmid= 15598351 \|title=SCOPmap: Automated assignment of protein structures to evolutionary superfamilies \|~~authors~~author1=Sara Cheek, \|author2=Yuan Qi, \|author3=Sri Krishna, \|author4=Lisa N Kinch~~, and~~ \|author5=Nick V Grishin \|page= 197 \|pmc= 544345 \|doi-access=free }}</ref> <ref name="fssa">{{cite journal \|title=FSSA: a novel method for identifying functional signatures from structural alignments \|~~authors~~author1=Kai Wang, \|author2=Ram Samudrala \|journal=Bioinformatics \|year=2005 \|volume=21 \|issue=13 \|pages=2969–2977 \|doi=10.1093/bioinformatics/bti471 \|pmid=15860561 \|doi-access=free }}</ref> It has been particularly successful at analyzing "decoy" structures from ab initio structure prediction.<ref name="casp11">{{cite journal \|~~authors~~vauthors=Kryshtafovych A, Monastyrskyy B, Fidelis K. \|title=CASP11 statistics and the prediction center evaluation system. \ \|journal=Proteins \|year= 2016 \|volume=84 \|~~pages~~issue=(Suppl 1~~):15-19.~~ \|pages=(Suppl 1):15–19 \| doi=10.1002/prot.25005 \|pmid=26857434 }}</ref><ref name="Malmstrom"></ref><ref name="robetta"></ref> These decoys are notorious for getting local fragment motif structure correct, and forming some kernels of correct 3D tertiary structure but getting the full length tertiary structure wrong. In this twilight remote homology regime, Mammoth's e-values for the CASP<ref name="casp11"></ref> protein structure prediction evaluation have been show to be significantly more correlated with human ranking than SSAP or DALI.<ref name=Mammoth></ref> Mammoths ability to extract the multi-criteria partial overlaps with proteins of known structure and rank these with proper E-values, combined with its speed facilitates scanning vast numbers of decoy models against the PDB data base for identifying the most likely correct decoys based on their remote homology to known proteins. ▼ \|pmc=5479680 \|doi-access=free ▲}}</ref><ref name="Malmstrom">< /~~ref~~><ref name="robetta">< /~~ref~~> These decoys are notorious for getting local fragment motif structure correct, and forming some kernels of correct 3D tertiary structure but getting the full length tertiary structure wrong. In this twilight remote homology regime, Mammoth's e-values for the CASP<ref name="casp11">< /~~ref~~> protein structure prediction evaluation have been ~~show~~shown to be significantly more correlated with human ranking than SSAP or DALI.<ref name=Mammoth>< /~~ref~~> Mammoths ability to extract the multi-criteria partial overlaps with proteins of known structure and rank these with proper E-values, combined with its speed facilitates scanning vast numbers of decoy models against the PDB data base for identifying the most likely correct decoys based on their remote homology to known proteins. <ref name="Malmstrom">{{cite journal \|title=Superfamily Assignments for the Yeast Proteome through Integration of Structure Prediction with the Gene Ontology \| ~~authors~~author1=Lars Malmström Michael Riffle, \|author2=Charlie EM Strauss, \|author3=Dylan Chivian, \|author4=Trisha N Davis, \|author5=Richard Bonneau, \|author6=David Baker \|year=2007 \|journal=~~PLoS~~PLOS Biol \| volume=5 \|issue=4 \|pages= e76corresponding author1,2 \|doi=10.1371/journal.pbio.0050076 \| pmid=17373854 \| pmc=1828141 \|doi-access=free }}</ref> Line 150 ⟶ 168: SSAP originally produced only pairwise alignments but has since been extended to multiple alignments as well.<ref name="taylor"/> It has been applied in an all-to-all fashion to produce a hierarchical fold classification scheme known as [[CATH]] (Class, Architecture, Topology, Homology),<ref name="orengo"/> which has been used to construct the [https://web.archive.org/web/20070517161248/http://www.cathdb.info/latest/index.html CATH Protein Structure Classification] database. ===Alphabet methods === A special class of protein structual alignment programs convert the input structure into a sequence of letters describing the structure. This allows methods from [[sequence alignment]] to be translated into this field to enable more efficient searching, and in some implementations, to also align and superimpose in real 3D space. * The simplest method only considers the position of the backbone. The input is divided into groups of four residues and each group is described by the closest one-letter descriptor. To further reuse protein-based tools, 20 letters are chosen.<ref>{{cite journal \|last1=Le \|first1=Q \|last2=Pollastri \|first2=G \|last3=Koehl \|first3=P \|title=Structural alphabets for protein structure classification: a comparison study. \|journal=Journal of Molecular Biology \|date=27 March 2009 \|volume=387 \|issue=2 \|pages=431–50 \|doi=10.1016/j.jmb.2008.12.044 \|pmid=19135454\|pmc=2772874 }}</ref> * Foldseek uses the 3D interaction (3Di) alphabet, which classifies the relationship between one residue's Cα atom and its spatially closest residue into 20 letters. Each residue of the input structure receives one letter. The similarities between letters is defined by a substitution matrix. Foldseek is able to provide a high sensitivity similar to typical structual alignment while being hundreds of times faster. It is able to search, align, and superimpose.<ref>{{cite journal \|last1=van Kempen \|first1=Michel \|last2=Kim \|first2=Stephanie S. \|last3=Tumescheit \|first3=Charlotte \|last4=Mirdita \|first4=Milot \|last5=Lee \|first5=Jeongjae \|last6=Gilchrist \|first6=Cameron L. M. \|last7=Söding \|first7=Johannes \|last8=Steinegger \|first8=Martin \|title=Fast and accurate protein structure search with Foldseek \|journal=Nature Biotechnology \|date=February 2024 \|volume=42 \|issue=2 \|pages=243–246 \|doi=10.1038/s41587-023-01773-0\|pmid=37156916 \|pmc=10869269 }}</ref> * Reseek represents each residue and its structural context in a discrete feature vector, effectively creating an alphabet of 10<sup>11</sup> letters. The similarity between each feature vector is defined component-wise using pre-collected data. This method also allows multiple structure alignment (MUSCLE-3D).<ref>{{cite journal \|last1=Edgar \|first1=Robert C \|title=Protein structure alignment by Reseek improves sensitivity to remote homologs \|journal=Bioinformatics \|date=1 November 2024 \|volume=40 \|issue=11 \|pages=btae687 \|doi=10.1093/bioinformatics/btae687\|pmid=39546374 \|pmc=11601161 }}</ref> ==Recent developments== Improvements in structural alignment methods constitute an active area of research, and new or modified methods are often proposed that are claimed to offer advantages over the older and more widely distributed techniques. A recent example, TM-align, uses a novel method for weighting its distance matrix, to which standard [[dynamic programming]] is then applied.<ref name="ZhangTMalign"/><ref name="ZhangTMscore"/> The weighting is proposed to accelerate the convergence of dynamic programming and correct for effects arising from alignment lengths. In a benchmarking study, TM-align has been reported to improve in both speed and accuracy over DALI and CE.<ref name="ZhangTMalign"/> Other promising methods of structural alignment are local structural alignment methods. These provide comparison of pre-selected parts of proteins (e.g. binding sites, user-defined structural motifs) <ref>{{cite journal\|author1=Stefano Angaran \|author2=[[Mary Ellen Bock]] \|author3=Claudio Garutti \|author4=Concettina Guerra1 \|title=MolLoc: a web tool for the local structural alignment of molecular surfaces\|journal=Nucleic Acids Research\|date=2009\|volume=37 \|issue=Web Server issue \|pmc=2703929 \|pmid=19465382 \|doi=10.1093/nar/gkp405 \|pages=W565–70}}</ref><ref>{{cite journal\|author1=Gaëlle Debret \|author2=Arnaud Martel \|author3=Philippe Cuniasse \|title=RASMOT-3D PRO: a 3D motif search webserver\|journal=Nucleic Acids Research\|date=2009\|volume=37 \|issue=Web Server issue \|pmc=2703991 \|pmid=19417073 \|doi=10.1093/nar/gkp304 \|pages=W459–64}}</ref><ref name=Shulman2008>{{cite journal\|author1=Alexandra Shulman-Peleg \|author2=Maxim Shatsky \|author3=Ruth Nussinov \|author4=Haim J. Wolfson \|title=MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions\|journal=Nucleic Acids Research\|date=2008\|volume=36 \|issue=Web Server issue \|pmc=2447750 \|pmid=18467424 \|doi=10.1093/nar/gkn185 \|pages=W260–4}}</ref> against binding sites or whole-protein structural databases. The MultiBind and MAPPIS servers <ref name=Shulman2008 /><ref name=Shulman2007>{{cite journal\|author1=Alexandra Shulman-Peleg \|author2=Maxim Shatsky \|author3=Ruth Nussinov \|author4=Haim J Wolfson \|title=Spatial chemical conservation of hot spot interactions in protein-protein complexes \|journal=BMC Biology\|date=2007\|volume=5 \|issue=43 \|pages=43 \|doi=10.1186/1741-7007-5-43 \|pmid=17925020 \|pmc=2231411 \|doi-access=free }}</ref> allow the identification of common spatial arrangements of physicochemical properties such as H-bond donor, acceptor, aliphatic, aromatic or hydrophobic in a set of user provided protein binding sites defined by interactions with small molecules (MultiBind) or in a set of user-provided protein–protein interfaces (MAPPIS). Others provide comparison of entire protein structures <ref>{{cite journal\|author1=Gabriele Ausiello \|author2=Pier Federico Gherardini \|author3=Paolo Marcatili \|author4=Anna Tramontano \|author5=Allegra Via \|author6=Manuela Helmer-Citterich \|title=FunClust: a web server for the identification of structural motifs in a set of non-homologous protein structures \|journal=BMC Biology\|date=2008\|volume=9\|issue=Suppl 2 \|pages=S2 \|doi=10.1186/1471-2105-9-S2-S2 \|pmid=18387204 \|pmc=2323665 \|doi-access=free }}</ref> against a number of user submitted structures or against a large database of protein structures in reasonable time ([[ProBiS]]<ref>{{cite journal \|author1=Janez Konc \|author2=Dušanka Janežič \|title=ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment \|journal=Bioinformatics \|volume=26 \|issue=9 \|pages=1160–1168 \|year=2010 \| url =~~http://bioinformatics.oxfordjournals.org/content/26/9/1160.full.pdf+html~~ \|doi=10.1093/bioinformatics/btq100 \|pmid=20305268 \|pmc=2859123}}</ref>). Unlike global alignment approaches, local structural alignment approaches are suited to detection of locally conserved patterns of functional groups, which often appear in binding sites and have significant involvement in ligand binding.<ref name=Shulman2007 /> As an example, comparing G-Losa,<ref>{{cite journal \|author1=Hui Sun Lee \|author2=Wonpil Im \|title=Identification of Ligand Templates using Local Structure Alignment for Structure-Based Drug Design \|journal=Journal of Chemical Information and Modeling \|volume=52 \|issue=10 \|pages=2784–2795 \|year=2012 \|doi=10.1021/ci300178e\|pmid=22978550 \|pmc=3478504 }}</ref> a local structure alignment tool, with TM-align, a global structure alignment based method. While G-Losa predicts drug-like ~~ligands’~~ligands' positions in single-chain protein targets more precisely than TM-align, the overall success rate of TM-align is better.<ref>{{cite journal \|author1=Hui Sun Lee \|author2=Wonpil Im \|title=Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity \|journal=Journal of Chemical Information and Modeling \|volume=53 \|issue=9 \|pages=2462–2470 \|year=2013 \|doi=10.1021/ci4003602\|pmid=23957286 \|pmc=3821077 }}</ref> However, as algorithmic improvements and computer performance have erased purely technical deficiencies in older approaches, it has become clear that there is no one universal criterion for the 'optimal' structural alignment. TM-align, for instance, is particularly robust in quantifying comparisons between sets of proteins with great disparities in sequence lengths, but it only indirectly captures hydrogen bonding or secondary structure order conservation which might be better metrics for alignment of evolutionarily related proteins. Thus recent developments have focused on optimizing particular attributes such as speed, quantification of scores, correlation to alternative gold standards, or tolerance of imperfection in structural data or ab initio structural models. An alternative methodology that is gaining popularity is to use the ''consensus'' of various methods to ascertain proteins structural similarities.<ref name="Bartheletal"/> Line 177 ⟶ 201: * [[Multiple sequence alignment]] * [[List of sequence alignment software]] * [[Sequence alignment]] * [[Structural Classification of Proteins]] * [[SuperPose]] Line 185 ⟶ 208: ==References== {{reflist\|35em\|refs= <ref name="Bartheletal">{{cite journal\|author=Barthel D., Hirst J.D., Blazewicz J., Burke E.K. and Krasnogor N.\|year= 2007\|title= ProCKSI: a decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information\|journal=BMC Bioinformatics\|volume= 8\|pages=416\|doi=10.1186/1471-2105-8-416\|pmid=17963510\|pmc=2222653\|doi-access= free}}</ref> <ref name="cech">{{cite journal\|vauthors=Cech P, Svozil D, Hoksza D \|year=2012\|title= SETTER: web server for RNA structure comparison\|journal=Nucleic Acids Research\|volume= 40\|issue=W1\|pages=W42–W48\|doi=10.1093/nar/gks560\|pmid=22693209\|pmc=3394248}}</ref> <ref name="Diederichs">{{cite journal\|author=Diederichs K. \|year=1995\|title= Structural superposition of proteins with unknown alignment and detection of topological similarity using a six-dimensional search algorithm\|journal=Proteins\|volume= 23\|issue=2\|pages=187–95\|doi=10.1002/prot.340230208\|pmid=8592700\|s2cid=3469775\|url=http://nbn-resolving.de/urn:nbn:de:bsz:352-opus-40461}}</ref> <ref name="fischer">{{cite journal\|vauthors=Siew N, Elofsson A, Rychlewsk L, Fischer D \|year=2000\|title= MaxSub: an automated measure for the assessment of protein structure prediction quality\|journal=Bioinformatics\|volume= 16\|pages=776–85\|doi=10.1093/bioinformatics/16.9.776\|pmid=11108700\|issue=9\|doi-access=free}}</ref> Line 197 ⟶ 220: <ref name="havgaard">{{cite journal\|vauthors=Havgaard JH, Lyngso RB, Stormo GD, Gorodkin J \|year=2005\|title= Pairwise local structural alignment of RNA sequences with sequence similarity less than 40%\|journal=Bioinformatics\|volume= 21\|issue=9\|pages=1815–24\|doi=10.1093/bioinformatics/bti279\|pmid=15657094\|doi-access=free}}</ref> <ref name="hoksza">{{cite journal\|vauthors=Hoksza D, Svozil D \|year=2012\|title= Efficient RNA pairwise structure comparison by SETTER method\|journal=Bioinformatics\|volume= 28\|issue=14\|pages=1858–1864\|doi=10.1093/bioinformatics/bts301\|pmid=22611129\|url=https://zenodo.org/record/890287~~/files/article.pdf~~\|doi-access=free}}</ref> <ref name="holm">{{cite journal\|vauthors=Holm L, Sander C \|pmid=8662544\|year=1996\|title=Mapping the protein universe\|volume=273\|issue=5275\|pages=595–603\|journal=Science\|doi=10.1126/science.273.5275.595\|bibcode=1996Sci...273..595H \|s2cid=7509134}}</ref> <ref name="kolodny">{{cite journal\|vauthors=Kolodny R, Linial N \|year=2004\|title= Approximate protein structural alignment in polynomial time \|journal=PNAS\|volume= 101\|issue=33\|pages= 12201–12206\|doi=10.1073/pnas.0404383101\|pmid=15304646\|pmc=514457\|doi-access=free}}</ref> <ref name="lathrop">{{cite journal\|author=Lathrop RH. \|year=1994\|title= The protein threading problem with sequence amino acid interaction preferences is NP-complete\|journal=Protein Eng\|volume= 7\|issue=9\|pages=1059–68\|doi=10.1093/protein/7.9.1059\|pmid=7831276\|citeseerx=10.1.1.367.9081}}</ref> <ref name="lovo1">{{cite journal\|author=Martinez L, Andreani, R, Martinez, JM. \|year=2007\|title= Convergent algorithms for protein structural alignment \|journal=BMC Bioinformatics\|volume= 8\|pages=306\|doi=10.1186/1471-2105-8-306\|pmid=17714583\|pmc=1995224 \|doi-access=free }}</ref> <ref name="Maiti">{{cite journal\|author4-link=David S. Wishart\|vauthors=Maiti R, Van Domselaar GH, Zhang H, Wishart DS \|year=2004\|title= SuperPose: a simple server for sophisticated structural superposition\|journal=Nucleic Acids Res\|volume= 32\|pages=W590–4\|doi=10.1093/nar/gkh477\|pmid=15215457\|issue=Web Server issue\|pmc=441615}}</ref> <ref name="martin">{{cite journal\|author=Martin ACR\|year=1982\|title= Rapid Comparison of Protein Structures\|journal=Acta Crystallogr A\|volume= 38\|pages= 871–873\|doi=10.1107/S0567739482001806\|issue=6\|bibcode=1982AcCrA..38..871M }}</ref> <ref name="Mathews">{{cite journal\|vauthors=Mathews DH, Turner DH \|year=2006\|title= Prediction of RNA secondary structure by free energy minimization\|journal=Curr Opin Struct Biol\|volume= 16\|issue=3\|pages=270–8\|doi=10.1016/j.sbi.2006.05.010\|pmid=16713706}}</ref> <ref name="orengo">{{cite journal\|vauthors=Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM \|year=1997\|title= CATH: A hierarchical classification of protein ___domain structures\|journal=Structure\|volume= 5\|issue=8\|pages= 1093–1108\|doi=10.1016/S0969-2126(97)00260-8\|pmid=9309224\|doi-access=free}}</ref> <ref name="poleksic">{{cite journal\|author=Poleksic A \|year=2009\|title= Algorithms for optimal protein structure alignment\|journal=Bioinformatics\|volume= 25\|pages= 2751–2756\|doi=10.1093/bioinformatics/btp530\|pmid=19734152\|issue=21\|doi-access=free}}</ref> Line 219 ⟶ 242: <ref name="prlic">{{cite journal\|vauthors=Prlic A, Bliven S, Rose PW, Bluhm WF, Bizon C, Godzik A, Bourne PE \|year=2010\|title= Pre-calculated protein structure alignments at the RCSB PDB website \|pmid=20937596\|pages= 2983–2985\|volume=26\|issue=23\|doi=10.1093/bioinformatics/btq572\|pmc=3003546\|journal=Bioinformatics}}</ref> <ref name="skolnick">{{cite journal\|vauthors=Zhang Y, Skolnick J \|year=2005\|title= The protein structure prediction problem could be solved using the current PDB library\|journal= Proc Natl Acad Sci USA\|pmid=15653774\|doi=10.1073/pnas.0407152101 \|volume=102\|issue=4\|pages=1029–34\|pmc=545829\|bibcode=2005PNAS..102.1029Z \|doi-access=free}}</ref> <ref name="taylor">{{cite journal\|vauthors=Taylor WR, Flores TP, Orengo CA \|year=1994\|title= Multiple protein structure alignment\|journal=Protein Sci\|volume= 3\|issue=10\|pages=1858–70\|doi=10.1002/pro.5560031025\|pmid=7849601\|pmc=2142613}}</ref> <ref name="theobald">{{cite journal\|vauthors=Theobald DL, Wuttke DS \|year=2006\|title= Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem\|journal=Proceedings of the National Academy of Sciences\|volume= 103\|pages=18521–18527\|doi=10.1073/pnas.0508445103\|issue=49\|pmid=17130458\|pmc=1664551\|bibcode=2006PNAS..10318521T \|doi-access=free}}</ref> <ref name="theobald2">{{cite journal\|vauthors=Theobald DL, Wuttke DS \|year=2006\|journal=Bioinformatics\|volume= 22\|pages=2171–2172\|doi=10.1093/bioinformatics/btl332\|title=THESEUS: Maximum likelihood superpositioning and analysis of macromolecular structures\|issue=17\|pmid=16777907\|pmc=2584349}}</ref> Line 229 ⟶ 252: <ref name="Torarinsson">{{cite journal\|vauthors=Torarinsson E, Sawera M, Havgaard JH, Fredholm M, Gorodkin J \|year=2006\|title= Thousands of corresponding human and mouse genomic regions unalignable in primary sequence contain common RNA structure\|journal=Genome Res\|volume= 16\|issue=7\|pages=885–9\|doi=10.1101/gr.5226606\|pmid=16751343\|pmc=1484455}}</ref> <ref name="wang">{{cite journal \|last1=Wang \|first1=Lusheng \|last2=Jiang \|first2=Tao \|title=On the complexity of multiple sequence alignment. \|journal=Journal of Computational Biology \|date=1994 \|volume=1 \|issue=4 \|pages=337–48 \|doi=10.1089/cmb.1994.1.337 \|pmid=8790475 \|name-list-~~format~~style=vanc\|citeseerx=10.1.1.408.894 }}</ref> <ref name="zemla">{{cite journal\|author=Zemla A. \|year=2003\|title= LGA — A Method for Finding 3-D Similarities in Protein Structures\|journal=Nucleic Acids Research\|volume= 31\|issue=13\|pages=3370–3374\|doi=10.1093/nar/gkg571\|pmid=12824330\|pmc=168977}}</ref> Line 235 ⟶ 258: <ref name="ZhangTMalign">{{cite journal\|vauthors=Zhang Y, Skolnick J \|year=2005\|journal=Nucleic Acids Research\|volume= 33\|pages= 2302–2309\|doi=10.1093/nar/gki524\|pmid=15849316\|title=TM-align: A protein structure alignment algorithm based on the TM-score\|issue=7\|pmc=1084323}}</ref> <ref name="ZhangTMscore">{{cite journal\|vauthors=Zhang Y, Skolnick J \|year=2004\|journal=Proteins\|volume= 57\|pages= 702–710\|doi=10.1002/prot.20264\|pmid=15476259\|title=Scoring function for automated assessment of protein structure template quality\|issue=4\|s2cid=7954787}}</ref> }} Line 242 ⟶ 265: * Bourne PE, Shindyalov IN. (2003): ''Structure Comparison and Alignment''. In: Bourne, P.E., Weissig, H. (Eds): ''Structural Bioinformatics''. Hoboken NJ: Wiley-Liss. {{ISBN\|0-471-20200-2}} * Yuan X, Bystroff C. (2004) "Non-sequential Structure-based Alignments Reveal Topology-independent Core Packing Arrangements in Proteins", ''Bioinformatics''. Nov 5, 2004 * {{cite journal \|vauthors=Jung J, Lee B \| year = 2000 \| title = Protein structure alignment using environmental profiles ~~\| url =~~ \| journal = Protein Eng \| volume = 13 \| issue = 8\| pages = 535–543 \| doi=10.1093/protein/13.8.535\| pmid = 10964982 \| doi-access = free }} * {{cite journal \|vauthors=Ye Y, Godzik A \| year = 2005 \| title = Multiple flexible structure alignment using partial order graphs ~~\| url = http://bioinformatics.oxfordjournals.org/cgi/content/abstract/21/10/2362~~ \| journal = Bioinformatics \| volume = 21 \| issue = 10\| pages = 2362–2369 \| doi=10.1093/bioinformatics/bti353\| pmid = 15746292 \| doi-access = free }} * {{cite journal \|vauthors=Sippl M, Wiederstein M \| year = 2008 \| title = A note on difficult structure alignment problems ~~\| url = http://bioinformatics.oxfordjournals.org/cgi/content/full/24/3/426?ijkey=oWkOpf10zxhKaF2&keytype=ref~~ \| journal = Bioinformatics \| volume = 24 \| issue = 3\| pages = 426–427 \| doi=10.1093/bioinformatics/btm622\| pmid = 18174182 \| doi-access = free }} {{Protein methods}} ~~{{good article}}~~ {{DEFAULTSORT:Structural Alignment}} [[Category:Protein methods]] [[Category:NP-complete problems]]