Simplified Molecular Input Line Entry System: Difference between revisions

Content deleted Content added
WikiCleanerBot (talk | contribs)
m v2.03b - Bot 22 LintError/bogus-image-options - WP:WCW project (Bogus image options)
Monkbot (talk | contribs)
m Terminology: Task 17: replace to-be-deprecated: |name-list-format= (1× replaced; usage: 1 of 9);
Line 31:
Typically, a number of equally valid SMILES strings can be written for a molecule. For example, <code>CCO</code>, <code>OCC</code> and <code>C(O)C</code> all specify the structure of [[ethanol]]. Algorithms have been developed to generate the same SMILES string for a given molecule; of the many possible strings, these algorithms choose only one of them. This SMILES is unique for each structure, although dependent on the [[canonicalization]] algorithm used to generate it, and is termed the canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure; an algorithm then examines that structure and produces a unique SMILES string. Various algorithms for generating canonical SMILES have been developed and include those by [[Daylight Chemical Information Systems]], [[OpenEye Scientific Software]], [[MEDIT]], [[Chemical Computing Group]], [[MolSoft LLC]], and the [[Chemistry Development Kit]]. A common application of canonical SMILES is indexing and ensuring uniqueness of molecules in a [[Chemical database|database]].
 
The original paper that described the CANGEN<ref name="Weininger-1989" /> algorithm claimed to generate unique SMILES strings for graphs representing molecules, but the algorithm fails for a number of simple cases (e.g. [[cuneane]], 1,2-dicyclopropylethane) and cannot be considered a correct method for representing a graph canonically.<ref>{{cite book |publisher=Springer |___location=Berlin |isbn=978-3-540-27967-9 |volume=3615 |pages=145–157 | editor-first = Bertram | editor-last=Ludäscher | last1 = Hutchison | first1 = David | first2 = Takeo | last2 = Kanade | first3 = Josef | last3 = Kittler | first4 = Jon M. | last4 = Klienberg | author-link4 = Jon Kleinberg | first5 = Friedemann | last5 = Mattern | first6 = John C. | last6 = Mitchell | first7 = Moni | last7 = Naor | author-link7 = Moni Naor | first8 = Oscar | last8 = Nierstrasz | first9 = C. Pandu | last9 = Rangan | author-link9 = Bernhard Steffen (computer scientist) | first10 = Bernhard | last10 = Steffen | first11 = Madu | last11 = Sudan | author-link11 = Madhu Sudan | first12 = Demetri | last12 = Terzopoulos | first13 = Dough | last13 = Tygar | first14 = Moshe Y. | last14 = Vardi | author-link14 = Moshe Y. Vardi | first15 = Gerhard | last15 = Weikum | first16 = Louiqa | last16 = Raschid |author16-link=Louiqa Raschid | first17 = Greeshma | last17 = Neglur | first18 = Robert L. | last18 = Grossman | first19 = Bing | last19 = Liu | name-list-formatstyle = vanc | series = Lecture Notes in Computer Science |title=Data Integration in the Life Sciences |chapter=Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples |accessdate=2013-02-12 |year=2005 |chapterurl=https://doi.org/10.1007%2F11530084_13 |doi=10.1007/11530084_13 }}</ref> There is currently no systematic comparison across commercial software to test if such flaws exist in those packages.
 
SMILES notation allows the specification of [[molecular configuration|configuration at tetrahedral centers]], and double bond geometry. These are structural features that cannot be specified by connectivity alone, and therefore SMILES which encode this information are termed isomeric SMILES. A notable feature of these rules is that they allow rigorous partial specification of chirality. The term isomeric SMILES is also applied to SMILES in which [[isomer]]s are specified.