Simplified Molecular Input Line Entry System: Difference between revisions

Content deleted Content added
m Examples: Citework
Since there were different cite formats used, migrate the five "further reading" sources to more current inline references. Since date formats were mixed with no clear MOS:RETAIN, choosing {Use mdy dates} per MOS:STRONGTIES.
Line 1:
{{Redirect|SMILES|other uses|Smiles (disambiguation)}}
{{Use mdy dates|date=July 2020}}
{{Infobox file format
| name = SMILES
Line 19 ⟶ 20:
 
==History==
The original SMILES specification was initiated by David Weininger at the USEPA Mid-Continent Ecology Division Laboratory in [[Duluth, Minnesota|Duluth]] in the 1980s.<ref name="SMILES1Weininger-1988">{{harvnbcite journal| last1=Weininger| first1=David| title=SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules| journal=Journal of Chemical Information and Computer Sciences| volume=28| issue= 1|pages=31–6|date=February 1988|doi=10.1021/ci00057a005 }}</ref><ref name="SMILES2Weininger-1989">{{harvnbcite journal| last1=Weininger| first1=David| last2=Weininger| first2=Arthur| last3=Weininger| first3=Joseph L.| title=SMILES. 2. Algorithm for generation of unique SMILES notation| journal=Journal of Chemical Information and Modeling| volume=29| issue=2| pages=97–101|date=May 1989|doi=10.1021/ci00062a008 }}</ref><ref name="SMILES3Weininger-1990">{{harvnbcite journal| last1=Weininger| first1=David| title=SMILES. 3. DEPICT. Graphical depiction of chemical structures| journal=Journal of Chemical Information and Modeling| volume=30| issue= 3|pages=237–43|date=August 1990|doi=10.1021/ci00067a005 }}</ref><ref name="Swanson-2004">{{cite book|author1=Swanson, Richard Pommier|editor1-last=Rayward|editor1-first=W. [Warden] Boyd|editor2-last=Bowden|editor2-first=Mary Ellen|title=The History and Heritage of Scientific and Technological Information Systems: Proceedings of the 2002 Conference of the American Society of Information Science and Technology and the Chemical Heritage Foundation|date=2004|publisher=Information Today|___location=Medford, NJ|isbn=1-57387-229-6|page=205|ref=ASIST monograph series 2002|chapter=The Entrance of Informatics into Combinatorial Chemistry|quote=https://wayback.archive-it.org/2118/20100925010036/http://64.251.202.97/pubs/asist2002/17-swanson.pdf}}</ref> Acknowledged for their parts in the early development were "Gilman Veith and Rose Russo (USEPA) and Albert Leo and [[Corwin Hansch]] (Pomona College) for supporting the work, and Arthur Weininger (Pomona; Daylight CIS) and Jeremy Scofield (Cedar River Software, Renton, WA) for assistance in programming the system."<ref name="Weininger-1998">{{cite web|last=Weininger|first=Dave|title=Acknowledgements on Daylight Tutorial smiles-etc page|url=http://www.daylight.com/meetings/summerschool98/course/dave/smiles-etc.html|accessdate=24 June 2013 |date=1998 }}</ref> The [[Environmental Protection Agency]] funded the initial project to develop SMILES.<ref name="Anderson-1987">{{harvnbcite book |year=1987| title= SMILES: A line notation and computerized interpreter for chemical structures |id=Report No. EPA/600/M-87/021 |publisher= U.S. EPA, Environmental Research Laboratory-Duluth |___location=Duluth, MN |last1=Anderson |first1=E. |last2=Veith |first2=G. D. |last3=Weininger |1987first3=D. }}</ref><ref name="SMILES Tutorial: What is SMILES?">{{Cite web|url=http://www.epa.gov/med/Prods_Pubs/smiles.htm |title=SMILES Tutorial: What is SMILES? |publisher=U.S. Environmental Protection Agency |accessdate=2012-09-23 |work=}}</ref>
 
It has since been modified and extended by others, most notably by [[Daylight Chemical Information Systems]]. In 2007, an [[open standard]] called "OpenSMILES" was developed by the [[Blue Obelisk]] open-source chemistry community. Other 'linear' notations include the [[Wiswesser Line Notation]] (WLN), [[ROSDAL]] and [[SYBYL Line Notation|SLN]] (Tripos Inc).
Line 26 ⟶ 27:
 
== Terminology ==
 
The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms "canonical" and "isomeric" can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive.
 
Typically, a number of equally valid SMILES strings can be written for a molecule. For example, <code>CCO</code>, <code>OCC</code> and <code>C(O)C</code> all specify the structure of [[ethanol]]. Algorithms have been developed to generate the same SMILES string for a given molecule; of the many possible strings, these algorithms choose only one of them. This SMILES is unique for each structure, although dependent on the [[canonicalization]] algorithm used to generate it, and is termed the canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure; an algorithm then examines that structure and produces a unique SMILES string. Various algorithms for generating canonical SMILES have been developed and include those by [[Daylight Chemical Information Systems]], [[OpenEye Scientific Software]], [[MEDIT]], [[Chemical Computing Group]], [[MolSoft LLC]], and the [[Chemistry Development Kit]]. A common application of canonical SMILES is indexing and ensuring uniqueness of molecules in a [[Chemical database|database]].
 
The original paper that described the CANGEN<ref name="SMILES2Weininger-1989" /> algorithm claimed to generate unique SMILES strings for graphs representing molecules, but the algorithm fails for a number of simple cases (e.g. [[cuneane]], 1,2-dicyclopropylethane) and cannot be considered a correct method for representing a graph canonically.<ref>{{cite book |publisher=Springer |___location=Berlin |isbn=978-3-540-27967-9 |volume=3615 |pages=145–157 | editor-first = Bertram | editor-last=Ludäscher | last1 = Hutchison | first1 = David | first2 = Takeo | last2 = Kanade | first3 = Josef | last3 = Kittler | first4 = Jon M. | last4 = Klienberg | author-link4 = Jon Kleinberg | first5 = Friedemann | last5 = Mattern | first6 = John C. | last6 = Mitchell | first7 = Moni | last7 = Naor | author-link7 = Moni Naor | first8 = Oscar | last8 = Nierstrasz | first9 = C. Pandu | last9 = Rangan | author-link9 = Bernhard Steffen (computer scientist) | first10 = Bernhard | last10 = Steffen | first11 = Madu | last11 = Sudan | author-link11 = Madhu Sudan | first12 = Demetri | last12 = Terzopoulos | first13 = Dough | last13 = Tygar | first14 = Moshe Y. | last14 = Vardi | author-link14 = Moshe Y. Vardi | first15 = Gerhard | last15 = Weikum | first16 = Louiqa | last16 = Raschid |author16-link=Louiqa Raschid | first17 = Greeshma | last17 = Neglur | first18 = Robert L. | last18 = Grossman | first19 = Bing | last19 = Liu | name-list-format = vanc | series = Lecture Notes in Computer Science |title=Data Integration in the Life Sciences |chapter=Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples |accessdate=2013-02-12 |year=2005 |chapterurl=https://doi.org/10.1007%2F11530084_13 |doi=10.1007/11530084_13 }}</ref> There is currently no systematic comparison across commercial software to test if such flaws exist in those packages.
 
SMILES notation allows the specification of [[molecular configuration|configuration at tetrahedral centers]], and double bond geometry. These are structural features that cannot be specified by connectivity alone, and therefore SMILES which encode this information are termed isomeric SMILES. A notable feature of these rules is that they allow rigorous partial specification of chirality. The term isomeric SMILES is also applied to SMILES in which [[isomer]]s are specified.
 
== Graph-based definition ==
 
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a [[depth-first search|depth-first]] [[tree traversal]] of a [[chemical graph]]. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a [[spanning tree (mathematics)|spanning tree]]. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.
 
Line 231 ⟶ 230:
 
== Conversion ==
SMILES can be converted back to two-dimensional representations using structure diagram generation (SDG) algorithms.<ref (name="Helson,-1999">{{cite book |last=Helson |first=H. E. |year=1999) |chapter=Structure Diagram Generation |title= Rev. Comput. Chem. |editor1-last=Lipkowitz |editor1-first=K. B. |editor2-last=Boyd |editor2-first=D. B. |___location=New York |pages=313–398 |publisher=Wiley-VCH |doi=10.1002/9780470125908.ch6 |volume=13 }}</ref> This conversion is not always unambiguous. Conversion to three-dimensional representation is achieved by energy-minimization approaches. There are many downloadable and web-based conversion utilities.
 
SMILES can be converted back to two-dimensional representations using structure diagram generation (SDG) algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to three-dimensional representation is achieved by energy-minimization approaches. There are many downloadable and web-based conversion utilities.
 
== See also ==
Line 244 ⟶ 242:
== References ==
{{Reflist|33em}}
 
== Further reading ==
{{refbegin|33em}}
* {{cite book |year=1987| title= SMILES: A line notation and computerized interpreter for chemical structures |id=Report No. EPA/600/M-87/021 |publisher= U.S. EPA, Environmental Research Laboratory-Duluth |___location=Duluth, MN |last1=Anderson |first1=E. |last2=Veith |first2=G. D. |last3=Weininger |first3=D. |ref=harv | name-list-format = vanc }}
* {{cite book |last=Helson |first=H. E. |year=1999 |chapter=Structure Diagram Generation |title= Rev. Comput. Chem. |editor1-last=Lipkowitz |editor1-first=K. B. |editor2-last=Boyd |editor2-first=D. B. |___location=New York |pages=313–398 |publisher=Wiley-VCH |doi=10.1002/9780470125908.ch6 |volume=13 |ref=harv | name-list-format = vanc }}
* {{cite journal| last1=Weininger| first1=David| title=SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules| journal=Journal of Chemical Information and Computer Sciences| volume=28| issue= 1|pages=31–6|date=February 1988|doi=10.1021/ci00057a005 |ref=harv | name-list-format = vanc}}
* {{cite journal| last1=Weininger| first1=David| last2=Weininger| first2=Arthur| last3=Weininger| first3=Joseph L.| title=SMILES. 2. Algorithm for generation of unique SMILES notation| journal=Journal of Chemical Information and Modeling| volume=29| issue=2| pages=97–101|date=May 1989|doi=10.1021/ci00062a008 |ref=harv | name-list-format = vanc }}
* {{cite journal| last1=Weininger| first1=David| title=SMILES. 3. DEPICT. Graphical depiction of chemical structures| journal=Journal of Chemical Information and Modeling| volume=30| issue= 3|pages=237–43|date=August 1990|doi=10.1021/ci00067a005 |ref=harv | name-list-format = vanc}}
{{refend}}
 
== External links ==