Simplified Molecular Input Line Entry System: Difference between revisions

Content deleted Content added
No edit summary
References: var cols
 
(6 intermediate revisions by 4 users not shown)
Line 12:
| extended to =
}}
[[Image:SMILES.png|thumb|class=skin-invert-image|300px|SMILES generation algorithm for [[ciprofloxacin]]: break cycles, then write as branches off a main backbone]]
 
The '''Simplified Molecular Input Line Entry System''' ('''SMILES''') is a specification in the form of a [[line notation]] for describing the structure of [[chemical species]] using short [[ASCII]] [[string (computer science)|strings]]. SMILES strings can be imported by most [[molecule editor]]s for conversion back into [[two-dimensional]] drawings or [[dimension|three-dimensional]] models of the molecules.
Line 19:
 
==History==
The original SMILES specification was initiated by [[David Weininger]] at the USEPA Mid-Continent Ecology Division Laboratory in [[Duluth, Minnesota|Duluth]] in the 1980s.<ref name="Weininger-1988">{{cite journal| vauthors = Weininger D | title=SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules| journal=Journal of Chemical Information and Computer Sciences| volume=28| issue= 1|pages=31–6|date=February 1988|doi=10.1021/ci00057a005 }}</ref><ref name="Weininger-1989">{{cite journal| vauthors = Weininger D, Weininger A, Weininger JL | title=SMILES. 2. Algorithm for generation of unique SMILES notation| journal=Journal of Chemical Information and Modeling| volume=29| issue=2| pages=97–101|date=May 1989|doi=10.1021/ci00062a008 }}</ref><ref name="Weininger-1990">{{cite journal| vauthors = Weininger D | title=SMILES. 3. DEPICT. Graphical depiction of chemical structures| journal=Journal of Chemical Information and Modeling| volume=30| issue= 3|pages=237–43|date=August 1990|doi=10.1021/ci00067a005 }}</ref><ref name="Swanson-2004">{{cite book | vauthors = Swanson RP | veditors = Rayward WB, Bowden ME |title=The History and Heritage of Scientific and Technological Information Systems: Proceedings of the 2002 Conference of the American Society of Information Science and Technology and the Chemical Heritage Foundation |date=2004 |publisher=[[Information Today]] |___location=Medford, NJ |isbn=978-1-57387-229-4 |page=205 |url=https://books.google.com/books?id=76OOQannpBgC&pg=PA205 |ref=ASIST monograph series 2002 |chapter=The Entrance of Informatics into Combinatorial Chemistry |chapter-url=https://wayback.archive-it.org/2118/20100925010036/http://64.251.202.97/pubs/asist2002/17-swanson.pdf }}</ref> Acknowledged for their parts in the early development were "Gilman Veith and Rose Russo (USEPA) and Albert Leo and [[Corwin Hansch]] ([[Pomona College]]) for supporting the work, and Arthur Weininger (Pomona; Daylight CIS) and Jeremy Scofield (Cedar River Software, Renton, WA) for assistance in programming the system."<ref name="Weininger-1998">{{cite web| vauthors = Weininger D |title=Acknowledgements on Daylight Tutorial smiles-etc page|url=http://www.daylight.com/meetings/summerschool98/course/dave/smiles-etc.html|access-date=24 June 2013 |date=1998 }}</ref> The [[United States Environmental Protection Agency|Environmental Protection Agency]] funded the initial project to develop SMILES.<ref name="Anderson-1987">{{cite book |year=1987 |title= SMILES: A line notation and computerized interpreter for chemical structures |id=Report No. EPA/600/M-87/021 |publisher=[[United States Environmental Protection Agency|U.S. EPA]], Environmental Research Laboratory-Duluth |___location=Duluth, MN |url=https://nepis.epa.gov/Exe/ZyPDF.cgi/2000CAUR.PDF?Dockey=2000CAUR.PDF | vauthors = Anderson E, Veith GD, Weininger D }}</ref><ref name="SMILES Tutorial: What is SMILES?">{{Cite web|url=http://www.epa.gov/med/Prods_Pubs/smiles.htm | archive-url = https://web.archive.org/web/20080328080430/https://www.epa.gov/med/Prods_Pubs/smiles.htm | archive-date = 28 March 2008 |title=SMILES Tutorial: What is SMILES? |publisher=[[United States Environmental Protection Agency|U.S. EPA]] |access-date=2012-09-23 }}</ref>
 
It has since been modified and extended by others, most notably by [[Daylight Chemical Information Systems]]. In 2007, an [[open standard]] called "OpenSMILES" was developed by the [[Blue Obelisk]] open-source chemistry community. Other 'linear' notations include the [[Wiswesser Line Notation]] (WLN), [[ROSDAL]] and [[SYBYL Line Notation|SLN]] (Tripos Inc).
Line 90:
[[Aromaticity|Aromatic]] rings such as [[benzene]] may be written in one of three forms:
# In [[August Kekulé|Kekulé]] form with alternating single and double bonds, e.g. <code>C1=CC=CC=C1</code>,
# Using the aromatic bond symbol <code>:</code>, e.g. <code>C:1:C:C:C:C:C1</code>,{{Citation needed|date=June 2025|reason=Not mentioned in www.daylight.com/dayhtml/doc/theory/theory.smiles.html, probably SMARTS related.}} or
# Most commonly, by writing the constituent B, C, N, O, P and S atoms in lower-case forms <code>b</code>, <code>c</code>, <code>n</code>, <code>o</code>, <code>p</code> and <code>s</code>, respectively.
 
Line 101:
The Daylight and OpenEye algorithms for generating canonical SMILES differ in their treatment of aromaticity.
 
[[Image:3-cyanoanisole SMILES.svg|right|thumb|class=skin-invert-image|350px|Visualization of 3-cyanoanisole as <code>COc(c1)cccc1C#N</code>.]]
 
=== Branching ===
Line 115:
 
=== Stereochemistry ===
{{See also|Skeletal formula}}[[File:Trans-1,2-difluoroethylene.svg|thumb|right|class=skin-invert-image|upright=0.5|''trans''-1,2-difluoroethylene]]
<!--[[File:Cis-1,2-difluoroethylene.svg|thumb|right|class=skin-invert-image|upright=0.5|''cis''-1,2-difluoroethylene]]-->
SMILES permits, but does not require, specification of [[stereoisomer]]s.
 
Line 123:
Bond direction symbols always come in groups of at least two, of which the first is arbitrary. That is, <code>F\C=C\F</code> is the same as <code>F/C=C/F</code>. When alternating single-double bonds are present, the groups are larger than two, with the middle directional symbols being adjacent to two double bonds. For example, the common form of (2,4)-hexadiene is written <code>C/C=C/C=C/C</code>.
 
[[File:Beta-Carotene_conjugation.svg|thumb|right|class=skin-invert-image|upright=0.866|[[Beta-carotene]], with the eleven double bonds highlighted.]]
As a more complex example, [[beta-carotene]] has a very long backbone of alternating single and double bonds, which may be written <code>CC1CCC/C(C)=C1/C=C/C(C)=C/C=C/C(C)=C/C=C/C=C(C)/C=C/C=C(C)/C=C/C2=C(C)/CCCC2(C)C</code>.
 
Line 138:
 
===Isotopes===
[[Isotopes]] are specified with a number equal to the integer isotopic mass preceding the atomic symbol. [[Benzene]] in which one atom is [[carbon-14]] is written as <code>[14c14cH]1ccccc1</code> and [[deuterochloroform]] is <code>[2H]C(Cl)(Cl)Cl</code>.
 
=== Examples ===
Line 151:
|-----
|[[Methyl isocyanate]] (MIC)
|[[File:Methyl isocyanate.svg|frameless|120px|class=skin-invert-image]]
|CH<sub>3</sub>−N=C=O
|<code>CN=C=O</code>
|-----
Line 159:
|-----
|[[Vanillin]]
|[[Image:Vanillin.svg|class=skin-invert-image|70px|Molecular structure of vanillin]]
|<code>O=Cc1ccc(O)c(OC)c1</code><br/><code>COc1cc(C=O)ccc1O</code>
|-----
|[[Melatonin]] (C<sub>13</sub>H<sub>16</sub>N<sub>2</sub>O<sub>2</sub>)
|[[Image:Melatonin2.svg|class=skin-invert-image|160px|Molecular structure of melatonin]]
|<code>CC(=O)NCCC1=CNc2c1cc(OC)cc2</code><br/><code>CC(=O)NCCc1c[nH]c2ccc(OC)cc12</code>
|-----
|[[Flavopereirin]] (C<sub>17</sub>H<sub>15</sub>N<sub>2</sub>)
|[[Image:Flavopereirine.svg|class=skin-invert-image|160px|Molecular structure of flavopereirin]]
|<code>CCc(c1)ccc2[n+]1ccc3c2[nH]c4c3cccc4</code><br/><code>CCc1c[n+]2ccc3c4ccccc4[nH]c3c2cc1</code>
|-----
|[[Nicotine]] (C<sub>10</sub>H<sub>14</sub>N<sub>2</sub>)
|[[Image:Nicotine.svg|class=skin-invert-image|80px|Molecular structure of nicotine]]
|<code>CN1CCC[C@H]1c2cccnc2</code>
|-----
|[[Oenanthotoxin]] (C<sub>17</sub>H<sub>22</sub>O<sub>2</sub>)
|[[Image:Oenanthotoxin-structure.png|class=skin-invert-image|180px|Molecular structure of oenanthotoxin]]
|<code>CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO</code><br/><code>CCC[C@@H](O)CC/C=C/C=C/C#CC#C/C=C/CO</code>
|-----
|[[Pyrethrin]] II (C<sub>22</sub>H<sub>28</sub>O<sub>5</sub>)
|[[Image:Pyrethrin-II-2D-skeletal.svg|class=skin-invert-image|180px|Molecular structure of pyrethrin II]]
|<code>CC1=C(C(=O)C[C@@H]1OC(=O)[C@@H]2[C@H](C2(C)C)/C=C(\C)/C(=O)OC)C/C=C\C=C</code>
|-----
|[[Aflatoxin]] B<sub>1</sub> (C<sub>17</sub>H<sub>12</sub>O<sub>6</sub>)
|[[Image:Aflatoxin B1.svg|class=skin-invert-image|130px|Molecular structure of aflatoxin B<sub>1</sub>]]
|<code>O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5</code>
|-----
|[[Glucose]] (β-<small>D</small>-glucopyranose) (C<sub>6</sub>H<sub>12</sub>O<sub>6</sub>)
|[[Image:Beta-D-Glucose.svg|class=skin-invert-image|140px|Molecular structure of glucopyranose]]
|<code>OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@H](O)1</code>
|-----
|[[Bergenin]] (cuscutin, a [[resin]]) (C<sub>14</sub>H<sub>16</sub>O<sub>9</sub>)
|[[Image:Cuscutine.svg|class=skin-invert-image|130px|Molecular structure of cuscutine (bergenin)]]
|<code>OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H]2[C@@H]1c3c(O)c(OC)c(O)cc3C(=O)O2</code>
|-----
|A [[pheromone]] of the Californian [[scale insect]]
|[[Image:Pheromone cochenille californienne.svg|class=skin-invert-image|180px|(3''Z'',6''R'')-3-methyl-6-(prop-1-en-2-yl)deca-3,9-dien-1-yl acetate]]
|<code>CC(=O)OCCC(/C)=C\C[C@H](C(C)=C)CCC=C</code>
|-----
|(2''S'',5''R'')-[[Chalcogran]]: a [[pheromone]] of the [[Scolytinae|bark beetle]] ''[[Pityogenes chalcographus]]''<ref>{{cite journal | vauthors = Byers JA, Birgersson G, Löfqvist J, Appelgren M, Bergström G | title = Isolation of pheromone synergists of bark beetle, Pityogenes chalcographus, from complex insect-plant odors by fractionation and subtractive-combination bioassay | journal = Journal of Chemical Ecology | volume = 16 | issue = 3 | pages = 861–876 | date = March 1990 | pmid = 24263601 | doi = 10.1007/BF01016496 | bibcode = 1990JCEco..16..861B | s2cid = 226090 }}</ref>
|[[Image:2S,5R-chalcogran-skeletal.svg|class=skin-invert-image|130px|(2''S'',5''R'')-2-ethyl-1,6-dioxaspiro[4.4]nonane]]
|<code>CC[C@H](O1)CC[C@@]12CCCO2</code>
|-----
|[[Thujone|α-Thujone]] (C<sub>10</sub>H<sub>16</sub>O)
|[[Image:Alpha-thujone.svg|class=skin-invert-image|100px|Molecular structure of thujone]]
|<code>CC(C)[C@@]12C[C@@H]1[C@@H](C)C(=O)C2</code>
|-----
|[[Thiamine]] (vitamin B<sub>1</sub>, C<sub>12</sub>H<sub>17</sub>N<sub>4</sub>OS<sup>+</sup>)
|[[Image:Thiamin.svg|class=skin-invert-image|150px|Molecular structure of thiamin]]
|<code>OCCc1c(C)[n+](cs1)Cc2cnc(C)nc2N</code>
|}
Line 213:
To illustrate a molecule with more than 9 rings, consider [[cephalostatin]]-1,<ref name="PubChem-183413">{{cite web |title=CID 183413 |url=https://pubchem.ncbi.nlm.nih.gov/compound/183413 |website=[[PubChem]] |access-date=May 12, 2012 |language=en}}</ref> a steroidic 13-ringed [[pyrazine]] with the [[empirical formula]] C<sub>54</sub>H<sub>74</sub>N<sub>2</sub>O<sub>10</sub> isolated from the [[Indian Ocean]] [[hemichordate]] ''[[Cephalodiscus gilchristi]]'':
{{Clear}}
:[[Image:Cephalostatine-1.svg|class=skin-invert-image|360px|Molecular structure of cephalostatin-1]]
 
Starting with the left-most methyl group in the figure:
Line 245:
 
== References ==
{{Reflist|33em}}
 
{{Molecular visualization}}