Content deleted Content added
Artoria2e5 (talk | contribs) |
m Open access bot: pmc updated in citation with #oabot. |
||
Line 171:
===Alphabet methods ===
A special class of protein structual alignment programs convert the input structure into a sequence of letters describing the structure. This allows methods from [[sequence alignment]] to be translated into this field to enable more efficient searching, and in some implementations, to also align and superimpose in real 3D space.
* The simplest method only considers the position of the backbone. The input is divided into groups of four residues and each group is described by the closest one-letter descriptor. To further reuse protein-based tools, 20 letters are chosen.<ref>{{cite journal |last1=Le |first1=Q |last2=Pollastri |first2=G |last3=Koehl |first3=P |title=Structural alphabets for protein structure classification: a comparison study. |journal=Journal of molecular biology |date=27 March 2009 |volume=387 |issue=2 |pages=431-50 |doi=10.1016/j.jmb.2008.12.044 |pmid=19135454|pmc=2772874 }}</ref>
* Foldseek uses the 3D interaction (3Di) alphabet, which classifies the relationship between one residue's Cα atom and its spatially closest residue into 20 letters. Each residue of the input structure receives one letter. The similarities between letters is defined by a substitution matrix. Foldseek is able to provide a high sensitivity similar to typical structual alignment while being hundreds of times faster. It is able to search, align, and superimpose.<ref>{{cite journal |last1=van Kempen |first1=Michel |last2=Kim |first2=Stephanie S. |last3=Tumescheit |first3=Charlotte |last4=Mirdita |first4=Milot |last5=Lee |first5=Jeongjae |last6=Gilchrist |first6=Cameron L. M. |last7=Söding |first7=Johannes |last8=Steinegger |first8=Martin |title=Fast and accurate protein structure search with Foldseek |journal=Nature Biotechnology |date=February 2024 |volume=42 |issue=2 |pages=243–246 |doi=10.1038/s41587-023-01773-0|pmc=10869269 }}</ref>
* Reseek represents each residue and its structural context in a discrete feature vector, effectively creating an alphabet of 10<sup>11</sup> letters. The similarity between each feature vector is defined component-wise using pre-collected data. This method also allows multiple structure alignment (MUSCLE-3D).<ref>{{cite journal |last1=Edgar |first1=Robert C |title=Protein structure alignment by Reseek improves sensitivity to remote homologs |journal=Bioinformatics |date=1 November 2024 |volume=40 |issue=11 |doi=10.1093/bioinformatics/btae687|pmc=11601161 }}</ref>
==Recent developments==
|