Simplified Molecular Input Line Entry System: Difference between revisions

Content deleted Content added
Extensions: Included description and link to BigSMILES, an extension of SMILES for describing polymers
consistent citation formatting
Line 21:
 
==History==
The original SMILES specification was initiated by [[David Weininger]] at the USEPA Mid-Continent Ecology Division Laboratory in [[Duluth, Minnesota|Duluth]] in the 1980s.<ref name="Weininger-1988">{{cite journal| last1vauthors = Weininger| first1=DavidD | title=SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules| journal=Journal of Chemical Information and Computer Sciences| volume=28| issue= 1|pages=31–6|date=February 1988|doi=10.1021/ci00057a005 }}</ref><ref name="Weininger-1989">{{cite journal| last1vauthors = Weininger| first1=David|D, last2=Weininger| first2=Arthur|A, last3=Weininger| first3=JosephJL L.| title=SMILES. 2. Algorithm for generation of unique SMILES notation| journal=Journal of Chemical Information and Modeling| volume=29| issue=2| pages=97–101|date=May 1989|doi=10.1021/ci00062a008 }}</ref><ref name="Weininger-1990">{{cite journal| last1vauthors = Weininger| first1=DavidD | title=SMILES. 3. DEPICT. Graphical depiction of chemical structures| journal=Journal of Chemical Information and Modeling| volume=30| issue= 3|pages=237–43|date=August 1990|doi=10.1021/ci00067a005 }}</ref><ref name="Swanson-2004">{{cite book |author1 vauthors = Swanson, Richard PommierRP |editor1-last=Rayward |editor1-firstveditors =W. [Warden]Rayward BoydWB, |editor2-last=Bowden |editor2-first=Mary EllenME |title=The History and Heritage of Scientific and Technological Information Systems: Proceedings of the 2002 Conference of the American Society of Information Science and Technology and the Chemical Heritage Foundation |date=2004 |publisher=[[Information Today]] |___location=Medford, NJ |isbn=9781573872294 |page=205 |url=https://books.google.com/books?id=76OOQannpBgC&pg=PA205 |ref=ASIST monograph series 2002 |chapter=The Entrance of Informatics into Combinatorial Chemistry |chapter-url=https://wayback.archive-it.org/2118/20100925010036/http://64.251.202.97/pubs/asist2002/17-swanson.pdf }}</ref> Acknowledged for their parts in the early development were "Gilman Veith and Rose Russo (USEPA) and Albert Leo and [[Corwin Hansch]] (Pomona College) for supporting the work, and Arthur Weininger (Pomona; Daylight CIS) and Jeremy Scofield (Cedar River Software, Renton, WA) for assistance in programming the system."<ref name="Weininger-1998">{{cite web|last vauthors = Weininger|first=Dave D |title=Acknowledgements on Daylight Tutorial smiles-etc page|url=http://www.daylight.com/meetings/summerschool98/course/dave/smiles-etc.html|access-date=24 June 2013 |date=1998 }}</ref> The [[Environmental Protection Agency]] funded the initial project to develop SMILES.<ref name="Anderson-1987">{{cite book |year=1987 |title= SMILES: A line notation and computerized interpreter for chemical structures |id=Report No. EPA/600/M-87/021 |publisher=[[United States Environmental Protection Agency|U.S. EPA]], Environmental Research Laboratory-Duluth |___location=Duluth, MN |url=https://nepis.epa.gov/Exe/ZyPDF.cgi/2000CAUR.PDF?Dockey=2000CAUR.PDF |last1 vauthors = Anderson |first1=E., |last2=Veith |first2=G.GD, D. |last3=Weininger |first3=D. }}</ref><ref name="SMILES Tutorial: What is SMILES?">{{Cite web|url=http://www.epa.gov/med/Prods_Pubs/smiles.htm |title=SMILES Tutorial: What is SMILES? |publisher=[[United States Environmental Protection Agency|U.S. EPA]] |access-date=2012-09-23 }}</ref>
 
It has since been modified and extended by others, most notably by [[Daylight Chemical Information Systems]]. In 2007, an [[open standard]] called "OpenSMILES" was developed by the [[Blue Obelisk]] open-source chemistry community. Other 'linear' notations include the [[Wiswesser Line Notation]] (WLN), [[ROSDAL]] and [[SYBYL Line Notation|SLN]] (Tripos Inc).
Line 32:
Typically, a number of equally valid SMILES strings can be written for a molecule. For example, <code>CCO</code>, <code>OCC</code> and <code>C(O)C</code> all specify the structure of [[ethanol]]. Algorithms have been developed to generate the same SMILES string for a given molecule; of the many possible strings, these algorithms choose only one of them. This SMILES is unique for each structure, although dependent on the [[canonicalization]] algorithm used to generate it, and is termed the canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure; an algorithm then examines that structure and produces a unique SMILES string. Various algorithms for generating canonical SMILES have been developed and include those by Daylight Chemical Information Systems, [[OpenEye Scientific Software]], [[MEDIT]], [[Chemical Computing Group]], [[MolSoft]] LLC, and the [[Chemistry Development Kit]]. A common application of canonical SMILES is indexing and ensuring uniqueness of molecules in a [[Chemical database|database]].
 
The original paper that described the CANGEN<ref name="Weininger-1989" /> algorithm claimed to generate unique SMILES strings for graphs representing molecules, but the algorithm fails for a number of simple cases (e.g. [[cuneane]], 1,2-dicyclopropylethane) and cannot be considered a correct method for representing a graph canonically.<ref>{{cite book | vauthors = Neglur G, Grossman RL, Liu B |publisher=Springer |___location=Berlin |isbn=978-3-540-27967-9 |volume=3615 |pages=145–157 | editor-firstveditors = Bertram | editor-last=Ludäscher | last1 = Hutchison | first1 = David | first2 = Takeo | last2 = Kanade | first3 = Josef | last3 = Kittler | first4 = Jon M. | last4 = Klienberg | author-link4 = Jon Kleinberg | first5 = Friedemann | last5 = Mattern | first6 = John C. | last6 = Mitchell | first7 = Moni | last7 = Naor | author-link7 = Moni Naor | first8 = Oscar | last8 = Nierstrasz | first9 = C. Pandu | last9 = Rangan | author-link9 = Bernhard Steffen (computer scientist) | first10 = Bernhard | last10 = Steffen | first11 = Madu | last11 = Sudan | author-link11 = Madhu Sudan | first12 = Demetri | last12 = Terzopoulos | first13 = Dough | last13 = Tygar | first14 = Moshe Y. | last14 = Vardi | author-link14 = Moshe Y. Vardi | first15 = Gerhard | last15 = Weikum | first16 = Louiqa | last16 = Raschid |author16-link=Louiqa Raschid | first17 = Greeshma | last17 = Neglur | first18 = Robert L. | last18 = Grossman | first19 = Bing | last19 = Liu | name-list-style = vancB | series = Lecture Notes in Computer Science |title=Data Integration in the Life Sciences |chapter=Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples |access-date=2013-02-12 |year=2005 |chapter-url=https://doi.org/10.1007%2F11530084_13 |doi=10.1007/11530084_13 }}</ref> There is currently no systematic comparison across commercial software to test if such flaws exist in those packages.
 
SMILES notation allows the specification of [[molecular configuration|configuration at tetrahedral centers]], and double bond geometry. These are structural features that cannot be specified by connectivity alone, and therefore SMILES which encode this information are termed isomeric SMILES. A notable feature of these rules is that they allow rigorous partial specification of chirality. The term isomeric SMILES is also applied to SMILES in which [[isomer]]s are specified.
Line 200:
|<code>CC(=O)OCCC(/C)=C\C[C@H](C(C)=C)CCC=C</code>
|-----
|(2''S'',5''R'')-[[Chalcogran]]: a [[pheromone]] of the [[Scolytinae|bark beetle]] ''[[Pityogenes chalcographus]]''<ref>{{cite journal |last1 vauthors = Byers JA, Birgersson G, Löfqvist J, Appelgren M, Bergström G |first1 title =JA Isolation of pheromone synergists of bark beetle,Pityogenes chalcographus, from complex insect-plant odors by fractionation and subtractive-combination bioassay |last2 journal =Birgersson Journal of Chemical Ecology |first2 volume =G 16 |last3 issue =Löfqvist 3 |first3 pages =J 861–876 | date = March 1990 | pmid = 24263601 | doi = 10.1007/BF01016496 | s2cid = 226090 }}</ref>
|last4=Appelgren|first4=M|last5=Bergström|first5=G| title = Isolation of pheromone synergists of bark beetle, ''Pityogenes chalcographus'', from complex insect-plant odors by fractionation and subtractive-combination bioassay | journal = Journal of Chemical Ecology | volume = 16 | issue = 3 | pages = 861–76 | date = Mar 1990 | pmid = 24263601 | doi = 10.1007/BF01016496 |s2cid=226090| url = http://www.chemical-ecology.net/pdf/Byersetal1990a.pdf}}</ref>
|[[Image:2S,5R-chalcogran-skeletal.svg|130px|(2''S'',5''R'')-2-ethyl-1,6-dioxaspiro[4.4]nonane]]
|<code>CC[C@H](O1)CC[C@@]12CCCO2</code>
Line 234 ⟶ 233:
{{anchor|SMIRKS}}SMIRKS, a superset of "reaction SMILES" and a subset of "reaction SMARTS", is a line notation for specifying reaction transforms. The general syntax for the reaction extensions is <code>REACTANT&gt;AGENT&gt;PRODUCT</code> (without spaces), where any of the fields can either be left blank or filled with multiple molecules deliminated with a dot (<code>.</code>), and other descriptions dependent on the base language. Atoms can additionally be identified with a number (e.g. <code>[C:1]</code>) for mapping,<ref>{{cite web |title=SMIRKS Tutorial |url=http://daylight.com/dayhtml_tutorials/languages/smirks/ |website=Daylight |access-date=29 October 2018}}</ref> for example in .<ref>{{cite web |title=Reaction SMILES and SMIRKS |url=http://www.daylight.com/meetings/summerschool01/course/basics/smirks.html |access-date=29 October 2018}}</ref>
 
SMILES corresponds to discrete molecular structures. However many materials are macromolecules, which are too large (and often stochastic) to conveniently generate SMILES for. BigSMILES is an extension of SMILES that aims to provide an efficient representation system for macromolecules.<ref>{{Citecite journal |last vauthors = Lin|first=Tzyy-Shyang|last2= TS, Coley|first2=Connor W.|last3=CW, Mochigase|first3=Hidenobu|last4= H, Beech|first4=Haley K.|last5=HK, Wang|first5=Wencong|last6= W, Wang|first6=Zi|last7= Z, Woods|first7=Eliot|last8= E, Craig|first8=Stephen L.|last9=SL, Johnson|first9=Jeremiah A.|last10=JA, Kalow|first10=Julia A.|last11=JA, Jensen KF, Olsen BD |first11 display-authors =Klavs F.|date=2019-09-256 | title = BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules |url=https://doi.org/10.1021/acscentsci.9b00476| journal = ACS Central Science | volume = 5 | issue = 9 | pages = 1523–1531 | date = September 2019 | pmid = 31572779 | pmc = 6764162 | doi = 10.1021/acscentsci.9b00476|issn=2374-7943|pmc=PMC6764162|pmid=31572779 }}</ref>
 
== Conversion ==
SMILES can be converted back to two-dimensional representations using structure diagram generation (SDG) algorithms.<ref name="Helson-1999">{{cite book |last vauthors = Helson |first=H. E.HE |year=1999 |chapter=Structure Diagram Generation |title= Rev. Comput. Chem. |series=Reviews in Computational Chemistry |editor1-last veditors = Lipkowitz |editor1-first=K.KB, B. |editor2-last=Boyd |editor2-first=D. B.DB |___location=New York |pages=313–398 |publisher=Wiley-VCH |doi=10.1002/9780470125908.ch6 |volume=13 |isbn=9780470125908 }}</ref> This conversion is not always unambiguous. Conversion to three-dimensional representation is achieved by energy-minimization approaches. There are many downloadable and web-based conversion utilities.
 
== See also ==