Structural alignment: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: pmc updated in citation with #oabot.
Line 171:
===Alphabet methods ===
A special class of protein structual alignment programs convert the input structure into a sequence of letters describing the structure. This allows methods from [[sequence alignment]] to be translated into this field to enable more efficient searching, and in some implementations, to also align and superimpose in real 3D space.
* The simplest method only considers the position of the backbone. The input is divided into groups of four residues and each group is described by the closest one-letter descriptor. To further reuse protein-based tools, 20 letters are chosen.<ref>{{cite journal |last1=Le |first1=Q |last2=Pollastri |first2=G |last3=Koehl |first3=P |title=Structural alphabets for protein structure classification: a comparison study. |journal=Journal of molecular biology |date=27 March 2009 |volume=387 |issue=2 |pages=431-50 |doi=10.1016/j.jmb.2008.12.044 |pmid=19135454|pmc=2772874 }}</ref>
* Foldseek uses the 3D interaction (3Di) alphabet, which classifies the relationship between one residue's C&alpha; atom and its spatially closest residue into 20 letters. Each residue of the input structure receives one letter. The similarities between letters is defined by a substitution matrix. Foldseek is able to provide a high sensitivity similar to typical structual alignment while being hundreds of times faster. It is able to search, align, and superimpose.<ref>{{cite journal |last1=van Kempen |first1=Michel |last2=Kim |first2=Stephanie S. |last3=Tumescheit |first3=Charlotte |last4=Mirdita |first4=Milot |last5=Lee |first5=Jeongjae |last6=Gilchrist |first6=Cameron L. M. |last7=Söding |first7=Johannes |last8=Steinegger |first8=Martin |title=Fast and accurate protein structure search with Foldseek |journal=Nature Biotechnology |date=February 2024 |volume=42 |issue=2 |pages=243–246 |doi=10.1038/s41587-023-01773-0|pmc=10869269 }}</ref>
* Reseek represents each residue and its structural context in a discrete feature vector, effectively creating an alphabet of 10<sup>11</sup> letters. The similarity between each feature vector is defined component-wise using pre-collected data. This method also allows multiple structure alignment (MUSCLE-3D).<ref>{{cite journal |last1=Edgar |first1=Robert C |title=Protein structure alignment by Reseek improves sensitivity to remote homologs |journal=Bioinformatics |date=1 November 2024 |volume=40 |issue=11 |doi=10.1093/bioinformatics/btae687|pmc=11601161 }}</ref>
 
==Recent developments==