Content deleted Content added
Citation bot (talk | contribs) Added bibcode. | Use this bot. Report bugs. | Suggested by Abductive | Category:Chemical nomenclature | #UCB_Category 8/86 |
→References: var cols |
||
(9 intermediate revisions by 7 users not shown) | |||
Line 3:
{{Use mdy dates|date=July 2020}}
{{Infobox file format
| name = SMILES
| extension = .smi
| owner = ▼
| genre = [[chemical file format]]▼
▲| owner =
▲| genre = [[chemical file format]]
| container for =
| contained by =
| extended from =
| extended to =
}}
[[Image:SMILES.png|thumb|class=skin-invert-image|300px|SMILES generation algorithm for [[ciprofloxacin]]: break cycles, then write as branches off a main backbone]]
The '''Simplified Molecular Input Line Entry System''' ('''SMILES''') is a specification in the form of a [[line notation]] for describing the structure of [[chemical species]] using short [[ASCII]] [[string (computer science)|strings]]. SMILES strings can be imported by most [[molecule editor]]s for conversion back into [[two-dimensional]] drawings or [[dimension|three-dimensional]] models of the molecules.
Line 21 ⟶ 19:
==History==
The original SMILES specification was initiated by [[David Weininger]] at the USEPA Mid-Continent Ecology Division Laboratory in [[Duluth, Minnesota|Duluth]] in the 1980s.<ref name="Weininger-1988">{{cite journal| vauthors = Weininger D | title=SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules| journal=Journal of Chemical Information and Computer Sciences| volume=28| issue= 1|pages=31–6|date=February 1988|doi=10.1021/ci00057a005 }}</ref><ref name="Weininger-1989">{{cite journal| vauthors = Weininger D, Weininger A, Weininger JL | title=SMILES. 2. Algorithm for generation of unique SMILES notation| journal=Journal of Chemical Information and Modeling| volume=29| issue=2| pages=97–101|date=May 1989|doi=10.1021/ci00062a008 }}</ref><ref name="Weininger-1990">{{cite journal| vauthors = Weininger D | title=SMILES. 3. DEPICT. Graphical depiction of chemical structures| journal=Journal of Chemical Information and Modeling| volume=30| issue= 3|pages=237–43|date=August 1990|doi=10.1021/ci00067a005 }}</ref><ref name="Swanson-2004">{{cite book | vauthors = Swanson RP | veditors = Rayward WB, Bowden ME |title=The History and Heritage of Scientific and Technological Information Systems: Proceedings of the 2002 Conference of the American Society of Information Science and Technology and the Chemical Heritage Foundation |date=2004 |publisher=[[Information Today]] |___location=Medford, NJ |isbn=978-1-57387-229-4 |page=205 |url=https://books.google.com/books?id=76OOQannpBgC&pg=PA205 |ref=ASIST monograph series 2002 |chapter=The Entrance of Informatics into Combinatorial Chemistry |chapter-url=https://wayback.archive-it.org/2118/20100925010036/http://64.251.202.97/pubs/asist2002/17-swanson.pdf }}</ref> Acknowledged for their parts in the early development were "Gilman Veith and Rose Russo (USEPA) and Albert Leo and [[Corwin Hansch]] ([[Pomona College]]) for supporting the work, and Arthur Weininger (Pomona; Daylight CIS) and Jeremy Scofield (Cedar River Software, Renton, WA) for assistance in programming the system."<ref name="Weininger-1998">{{cite web| vauthors = Weininger D |title=Acknowledgements on Daylight Tutorial smiles-etc page|url=http://www.daylight.com/meetings/summerschool98/course/dave/smiles-etc.html|access-date=24 June 2013 |date=1998 }}</ref> The [[United States Environmental Protection Agency|Environmental Protection Agency]] funded the initial project to develop SMILES.<ref name="Anderson-1987">{{cite book |year=1987 |title= SMILES: A line notation and computerized interpreter for chemical structures |id=Report No. EPA/600/M-87/021 |publisher=[[United States Environmental Protection Agency|U.S. EPA]], Environmental Research Laboratory-Duluth |___location=Duluth, MN |url=https://nepis.epa.gov/Exe/ZyPDF.cgi/2000CAUR.PDF?Dockey=2000CAUR.PDF | vauthors = Anderson E, Veith GD, Weininger D }}</ref><ref name="SMILES Tutorial: What is SMILES?">{{Cite web|url=http://www.epa.gov/med/Prods_Pubs/smiles.htm | archive-url = https://web.archive.org/web/20080328080430/https://www.epa.gov/med/Prods_Pubs/smiles.htm | archive-date = 28 March 2008 |title=SMILES Tutorial: What is SMILES? |publisher=[[United States Environmental Protection Agency|U.S. EPA]] |access-date=2012-09-23 }}</ref>
It has since been modified and extended by others, most notably by [[Daylight Chemical Information Systems]]. In 2007, an [[open standard]] called "OpenSMILES" was developed by the [[Blue Obelisk]] open-source chemistry community. Other 'linear' notations include the [[Wiswesser Line Notation]] (WLN), [[ROSDAL]] and [[SYBYL Line Notation|SLN]] (Tripos Inc).
Line 92 ⟶ 90:
[[Aromaticity|Aromatic]] rings such as [[benzene]] may be written in one of three forms:
# In [[August Kekulé|Kekulé]] form with alternating single and double bonds, e.g. <code>C1=CC=CC=C1</code>,
# Using the aromatic bond symbol <code>:</code>, e.g. <code>C:1:C:C:C:C:C1</code>,{{Citation needed|date=June 2025|reason=Not mentioned in www.daylight.com/dayhtml/doc/theory/theory.smiles.html, probably SMARTS related.}} or
# Most commonly, by writing the constituent B, C, N, O, P and S atoms in lower-case forms <code>b</code>, <code>c</code>, <code>n</code>, <code>o</code>, <code>p</code> and <code>s</code>, respectively.
Line 103 ⟶ 101:
The Daylight and OpenEye algorithms for generating canonical SMILES differ in their treatment of aromaticity.
[[Image:3-cyanoanisole SMILES.svg|right|thumb|class=skin-invert-image|350px|Visualization of 3-cyanoanisole as <code>COc(c1)cccc1C#N</code>.]]
=== Branching ===
Branches are described with parentheses, as in <code>CCC(=O)O</code> for [[propionic acid]] and <code>FC(F)F</code> for [[fluoroform]]. The first atom within the parentheses, and the first atom after the parenthesized group, are both bonded to the same branch point atom. The bond symbol must appear inside the parentheses; outside (
Substituted rings can be written with the branching point in the ring as illustrated by the SMILES <code>COc(c1)cccc1C#N</code> ([https://web.archive.org/web/20130522091354/http://www.daylight.com/daycgi/depict?434f6328633129636363633143234e see depiction]) and <code>COc(cc1)ccc1C#N</code> ([https://web.archive.org/web/20130522074308/http://www.daylight.com/daycgi/depict?434f6328636331296363633143234e see depiction]) which encode the 3 and 4-cyanoanisole isomers. Writing SMILES for substituted rings in this way can make them more human-readable.
Line 117 ⟶ 115:
=== Stereochemistry ===
{{See also|Skeletal formula}}[[File:Trans-1,2-difluoroethylene.svg|thumb|right|class=skin-invert-image|upright=0.5|''trans''-1,2-difluoroethylene]]
<!--[[File:Cis-1,2-difluoroethylene.svg|thumb|right|class=skin-invert-image|upright=0.5|''cis''-1,2-difluoroethylene]]-->
SMILES permits, but does not require, specification of [[stereoisomer]]s.
Line 125 ⟶ 123:
Bond direction symbols always come in groups of at least two, of which the first is arbitrary. That is, <code>F\C=C\F</code> is the same as <code>F/C=C/F</code>. When alternating single-double bonds are present, the groups are larger than two, with the middle directional symbols being adjacent to two double bonds. For example, the common form of (2,4)-hexadiene is written <code>C/C=C/C=C/C</code>.
[[File:Beta-Carotene_conjugation.svg|thumb|right|class=skin-invert-image|upright=0.866|[[Beta-carotene]], with the eleven double bonds highlighted.]]
As a more complex example, [[beta-carotene]] has a very long backbone of alternating single and double bonds, which may be written <code>CC1CCC/C(C)=C1/C=C/C(C)=C/C=C/C(C)=C/C=C/C=C(C)/C=C/C=C(C)/C=C/C2=C(C)/CCCC2(C)C</code>.
Line 140 ⟶ 138:
===Isotopes===
[[Isotopes]] are specified with a number equal to the integer isotopic mass preceding the atomic symbol. [[Benzene]] in which one atom is [[carbon-14]] is written as <code>[
=== Examples ===
Line 153 ⟶ 151:
|-----
|[[Methyl isocyanate]] (MIC)
|[[File:Methyl isocyanate.svg|frameless|120px|class=skin-invert-image]]
|<code>CN=C=O</code>
|-----
Line 161 ⟶ 159:
|-----
|[[Vanillin]]
|[[Image:Vanillin.svg|class=skin-invert-image|70px|Molecular structure of vanillin]]
|<code>O=Cc1ccc(O)c(OC)c1</code><br/><code>COc1cc(C=O)ccc1O</code>
|-----
|[[Melatonin]] (C<sub>13</sub>H<sub>16</sub>N<sub>2</sub>O<sub>2</sub>)
|[[Image:Melatonin2.svg|class=skin-invert-image|160px|Molecular structure of melatonin]]
|<code>CC(=O)NCCC1=CNc2c1cc(OC)cc2</code><br/><code>CC(=O)NCCc1c[nH]c2ccc(OC)cc12</code>
|-----
|[[Flavopereirin]] (C<sub>17</sub>H<sub>15</sub>N<sub>2</sub>)
|[[Image:Flavopereirine.svg|class=skin-invert-image|160px|Molecular structure of flavopereirin]]
|<code>CCc(c1)ccc2[n+]1ccc3c2[nH]c4c3cccc4</code><br/><code>CCc1c[n+]2ccc3c4ccccc4[nH]c3c2cc1</code>
|-----
|[[Nicotine]] (C<sub>10</sub>H<sub>14</sub>N<sub>2</sub>)
|[[Image:Nicotine.svg|class=skin-invert-image|80px|Molecular structure of nicotine]]
|<code>CN1CCC[C@H]1c2cccnc2</code>
|-----
|[[Oenanthotoxin]] (C<sub>17</sub>H<sub>22</sub>O<sub>2</sub>)
|[[Image:Oenanthotoxin-structure.png|class=skin-invert-image|180px|Molecular structure of oenanthotoxin]]
|<code>CCC[C@@H](O)CC\C=C\C=C\C#CC#C\C=C\CO</code><br/><code>CCC[C@@H](O)CC/C=C/C=C/C#CC#C/C=C/CO</code>
|-----
|[[Pyrethrin]] II (C<sub>22</sub>H<sub>28</sub>O<sub>5</sub>)
|[[Image:Pyrethrin-II-2D-skeletal.svg|class=skin-invert-image|180px|Molecular structure of pyrethrin II]]
|<code>CC1=C(C(=O)C[C@@H]1OC(=O)[C@@H]2[C@H](C2(C)C)/C=C(\C)/C(=O)OC)C/C=C\C=C</code>
|-----
|[[Aflatoxin]] B<sub>1</sub> (C<sub>17</sub>H<sub>12</sub>O<sub>6</sub>)
|[[Image:Aflatoxin B1.svg|class=skin-invert-image|130px|Molecular structure of aflatoxin B<sub>1</sub>]]
|<code>O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5</code>
|-----
|[[Glucose]] (β-<small>D</small>-glucopyranose) (C<sub>6</sub>H<sub>12</sub>O<sub>6</sub>)
|[[Image:Beta-D-Glucose.svg|class=skin-invert-image|140px|Molecular structure of glucopyranose]]
|<code>OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@H](O)1</code>
|-----
|[[Bergenin]] (cuscutin, a [[resin]]) (C<sub>14</sub>H<sub>16</sub>O<sub>9</sub>)
|[[Image:Cuscutine.svg|class=skin-invert-image|130px|Molecular structure of cuscutine (bergenin)]]
|<code>OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H]2[C@@H]1c3c(O)c(OC)c(O)cc3C(=O)O2</code>
|-----
|A [[pheromone]] of the Californian [[scale insect]]
|[[Image:Pheromone cochenille californienne.svg|class=skin-invert-image|180px|(3''Z'',6''R'')-3-methyl-6-(prop-1-en-2-yl)deca-3,9-dien-1-yl acetate]]
|<code>CC(=O)OCCC(/C)=C\C[C@H](C(C)=C)CCC=C</code>
|-----
|(2''S'',5''R'')-[[Chalcogran]]: a [[pheromone]] of the [[Scolytinae|bark beetle]] ''[[Pityogenes chalcographus]]''<ref>{{cite journal | vauthors = Byers JA, Birgersson G, Löfqvist J, Appelgren M, Bergström G | title = Isolation of pheromone synergists of bark beetle, Pityogenes chalcographus, from complex insect-plant odors by fractionation and subtractive-combination bioassay | journal = Journal of Chemical Ecology | volume = 16 | issue = 3 | pages = 861–876 | date = March 1990 | pmid = 24263601 | doi = 10.1007/BF01016496 | bibcode = 1990JCEco..16..861B | s2cid = 226090 }}</ref>
|[[Image:2S,5R-chalcogran-skeletal.svg|class=skin-invert-image|130px|(2''S'',5''R'')-2-ethyl-1,6-dioxaspiro[4.4]nonane]]
|<code>CC[C@H](O1)CC[C@@]12CCCO2</code>
|-----
|[[Thujone|α-Thujone]] (C<sub>10</sub>H<sub>16</sub>O)
|[[Image:Alpha-thujone.svg|class=skin-invert-image|100px|Molecular structure of thujone]]
|<code>CC(C)[C@@]12C[C@@H]1[C@@H](C)C(=O)C2</code>
|-----
|[[Thiamine]] (vitamin B<sub>1</sub>, C<sub>12</sub>H<sub>17</sub>N<sub>4</sub>OS<sup>+</sup>)
|[[Image:Thiamin.svg|class=skin-invert-image|150px|Molecular structure of thiamin]]
|<code>OCCc1c(C)[n+](cs1)Cc2cnc(C)nc2N</code>
|}
Line 215 ⟶ 213:
To illustrate a molecule with more than 9 rings, consider [[cephalostatin]]-1,<ref name="PubChem-183413">{{cite web |title=CID 183413 |url=https://pubchem.ncbi.nlm.nih.gov/compound/183413 |website=[[PubChem]] |access-date=May 12, 2012 |language=en}}</ref> a steroidic 13-ringed [[pyrazine]] with the [[empirical formula]] C<sub>54</sub>H<sub>74</sub>N<sub>2</sub>O<sub>10</sub> isolated from the [[Indian Ocean]] [[hemichordate]] ''[[Cephalodiscus gilchristi]]'':
{{Clear}}
:[[Image:Cephalostatine-1.svg|class=skin-invert-image|360px|Molecular structure of cephalostatin-1]]
Starting with the left-most methyl group in the figure:
Line 247 ⟶ 245:
== References ==
{{Reflist
{{Molecular visualization}}
|