Structural alignment: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: pmc updated in citation with #oabot.
m convert special characters found by Wikipedia:Typo Team/moss (via WP:JWB)
Line 172:
A special class of protein structual alignment programs convert the input structure into a sequence of letters describing the structure. This allows methods from [[sequence alignment]] to be translated into this field to enable more efficient searching, and in some implementations, to also align and superimpose in real 3D space.
* The simplest method only considers the position of the backbone. The input is divided into groups of four residues and each group is described by the closest one-letter descriptor. To further reuse protein-based tools, 20 letters are chosen.<ref>{{cite journal |last1=Le |first1=Q |last2=Pollastri |first2=G |last3=Koehl |first3=P |title=Structural alphabets for protein structure classification: a comparison study. |journal=Journal of molecular biology |date=27 March 2009 |volume=387 |issue=2 |pages=431-50 |doi=10.1016/j.jmb.2008.12.044 |pmid=19135454|pmc=2772874 }}</ref>
* Foldseek uses the 3D interaction (3Di) alphabet, which classifies the relationship between one residue's C&alpha; atom and its spatially closest residue into 20 letters. Each residue of the input structure receives one letter. The similarities between letters is defined by a substitution matrix. Foldseek is able to provide a high sensitivity similar to typical structual alignment while being hundreds of times faster. It is able to search, align, and superimpose.<ref>{{cite journal |last1=van Kempen |first1=Michel |last2=Kim |first2=Stephanie S. |last3=Tumescheit |first3=Charlotte |last4=Mirdita |first4=Milot |last5=Lee |first5=Jeongjae |last6=Gilchrist |first6=Cameron L. M. |last7=Söding |first7=Johannes |last8=Steinegger |first8=Martin |title=Fast and accurate protein structure search with Foldseek |journal=Nature Biotechnology |date=February 2024 |volume=42 |issue=2 |pages=243–246 |doi=10.1038/s41587-023-01773-0|pmc=10869269 }}</ref>
* Reseek represents each residue and its structural context in a discrete feature vector, effectively creating an alphabet of 10<sup>11</sup> letters. The similarity between each feature vector is defined component-wise using pre-collected data. This method also allows multiple structure alignment (MUSCLE-3D).<ref>{{cite journal |last1=Edgar |first1=Robert C |title=Protein structure alignment by Reseek improves sensitivity to remote homologs |journal=Bioinformatics |date=1 November 2024 |volume=40 |issue=11 |doi=10.1093/bioinformatics/btae687|pmc=11601161 }}</ref>
 
Line 178:
Improvements in structural alignment methods constitute an active area of research, and new or modified methods are often proposed that are claimed to offer advantages over the older and more widely distributed techniques. A recent example, TM-align, uses a novel method for weighting its distance matrix, to which standard [[dynamic programming]] is then applied.<ref name="ZhangTMalign"/><ref name="ZhangTMscore"/> The weighting is proposed to accelerate the convergence of dynamic programming and correct for effects arising from alignment lengths. In a benchmarking study, TM-align has been reported to improve in both speed and accuracy over DALI and CE.<ref name="ZhangTMalign"/>
 
Other promising methods of structural alignment are local structural alignment methods. These provide comparison of pre-selected parts of proteins (e.g. binding sites, user-defined structural motifs) <ref>{{cite journal|author1=Stefano Angaran |author2=[[Mary Ellen Bock]] |author3=Claudio Garutti |author4=Concettina Guerra1 |title=MolLoc: a web tool for the local structural alignment of molecular surfaces|journal=Nucleic Acids Research|date=2009|volume=37 |issue=Web Server issue |pmc=2703929 |pmid=19465382 |doi=10.1093/nar/gkp405 |pages=W565–70}}</ref><ref>{{cite journal|author1=Gaëlle Debret |author2=Arnaud Martel |author3=Philippe Cuniasse |title=RASMOT-3D PRO: a 3D motif search webserver|journal=Nucleic Acids Research|date=2009|volume=37 |issue=Web Server issue |pmc=2703991 |pmid=19417073 |doi=10.1093/nar/gkp304 |pages=W459–64}}</ref><ref name=Shulman2008>{{cite journal|author1=Alexandra Shulman-Peleg |author2=Maxim Shatsky |author3=Ruth Nussinov |author4=Haim J. Wolfson |title=MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions|journal=Nucleic Acids Research|date=2008|volume=36 |issue=Web Server issue |pmc=2447750 |pmid=18467424 |doi=10.1093/nar/gkn185 |pages=W260–4}}</ref> against binding sites or whole-protein structural databases. The MultiBind and MAPPIS servers <ref name=Shulman2008 /><ref name=Shulman2007>{{cite journal|author1=Alexandra Shulman-Peleg |author2=Maxim Shatsky |author3=Ruth Nussinov |author4=Haim J Wolfson |title=Spatial chemical conservation of hot spot interactions in protein-protein complexes |journal=BMC Biology|date=2007|volume=5 |issue=43 |pages=43 |doi=10.1186/1741-7007-5-43 |pmid=17925020 |pmc=2231411 |doi-access=free }}</ref> allow the identification of common spatial arrangements of physicochemical properties such as H-bond donor, acceptor, aliphatic, aromatic or hydrophobic in a set of user provided protein binding sites defined by interactions with small molecules (MultiBind) or in a set of user-provided protein–protein interfaces (MAPPIS). Others provide comparison of entire protein structures <ref>{{cite journal|author1=Gabriele Ausiello |author2=Pier Federico Gherardini |author3=Paolo Marcatili |author4=Anna Tramontano |author5=Allegra Via |author6=Manuela Helmer-Citterich |title=FunClust: a web server for the identification of structural motifs in a set of non-homologous protein structures |journal=BMC Biology|date=2008|volume=9|issue=Suppl 2 |pages=S2 |doi=10.1186/1471-2105-9-S2-S2 |pmid=18387204 |pmc=2323665 |doi-access=free }}</ref> against a number of user submitted structures or against a large database of protein structures in reasonable time ([[ProBiS]]<ref>{{cite journal |author1=Janez Konc |author2=Dušanka Janežič |title=ProBiS algorithm for detection of structurally similar protein binding sites by local structural alignment |journal=Bioinformatics |volume=26 |issue=9 |pages=1160–1168 |year=2010 | url= |doi=10.1093/bioinformatics/btq100 |pmid=20305268 |pmc=2859123}}</ref>). Unlike global alignment approaches, local structural alignment approaches are suited to detection of locally conserved patterns of functional groups, which often appear in binding sites and have significant involvement in ligand binding.<ref name=Shulman2007 /> As an example, comparing G-Losa,<ref>{{cite journal |author1=Hui Sun Lee |author2=Wonpil Im |title=Identification of Ligand Templates using Local Structure Alignment for Structure-Based Drug Design |journal=Journal of Chemical Information and Modeling |volume=52 |issue=10 |pages=2784–2795 |year=2012 |doi=10.1021/ci300178e|pmid=22978550 |pmc=3478504 }}</ref> a local structure alignment tool, with TM-align, a global structure alignment based method. While G-Losa predicts drug-like ligands’ligands' positions in single-chain protein targets more precisely than TM-align, the overall success rate of TM-align is better.<ref>{{cite journal |author1=Hui Sun Lee |author2=Wonpil Im |title=Ligand Binding Site Detection by Local Structure Alignment and Its Performance Complementarity |journal=Journal of Chemical Information and Modeling |volume=53 |issue=9 |pages=2462–2470 |year=2013 |doi=10.1021/ci4003602|pmid=23957286 |pmc=3821077 }}</ref>
 
However, as algorithmic improvements and computer performance have erased purely technical deficiencies in older approaches, it has become clear that there is no one universal criterion for the 'optimal' structural alignment. TM-align, for instance, is particularly robust in quantifying comparisons between sets of proteins with great disparities in sequence lengths, but it only indirectly captures hydrogen bonding or secondary structure order conservation which might be better metrics for alignment of evolutionarily related proteins. Thus recent developments have focused on optimizing particular attributes such as speed, quantification of scores, correlation to alternative gold standards, or tolerance of imperfection in structural data or ab initio structural models. An alternative methodology that is gaining popularity is to use the ''consensus'' of various methods to ascertain proteins structural similarities.<ref name="Bartheletal"/>