Simplified Molecular Input Line Entry System: Difference between revisions

Content deleted Content added
deleted broken links
Tags: Reverted Visual edit
WP:REDLINK, at least some of these look okay. Undid revision 1032553911 by 112.135.0.62 (talk)
Line 23:
The original SMILES specification was initiated by [[David Weininger]] at the USEPA Mid-Continent Ecology Division Laboratory in [[Duluth, Minnesota|Duluth]] in the 1980s.<ref name="Weininger-1988">{{cite journal| last1=Weininger| first1=David| title=SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules| journal=Journal of Chemical Information and Computer Sciences| volume=28| issue= 1|pages=31–6|date=February 1988|doi=10.1021/ci00057a005 }}</ref><ref name="Weininger-1989">{{cite journal| last1=Weininger| first1=David| last2=Weininger| first2=Arthur| last3=Weininger| first3=Joseph L.| title=SMILES. 2. Algorithm for generation of unique SMILES notation| journal=Journal of Chemical Information and Modeling| volume=29| issue=2| pages=97–101|date=May 1989|doi=10.1021/ci00062a008 }}</ref><ref name="Weininger-1990">{{cite journal| last1=Weininger| first1=David| title=SMILES. 3. DEPICT. Graphical depiction of chemical structures| journal=Journal of Chemical Information and Modeling| volume=30| issue= 3|pages=237–43|date=August 1990|doi=10.1021/ci00067a005 }}</ref><ref name="Swanson-2004">{{cite book |author1=Swanson, Richard Pommier |editor1-last=Rayward |editor1-first=W. [Warden] Boyd |editor2-last=Bowden |editor2-first=Mary Ellen |title=The History and Heritage of Scientific and Technological Information Systems: Proceedings of the 2002 Conference of the American Society of Information Science and Technology and the Chemical Heritage Foundation |date=2004 |publisher=[[Information Today]] |___location=Medford, NJ |isbn=9781573872294 |page=205 |url=https://books.google.com/books?id=76OOQannpBgC&pg=PA205 |ref=ASIST monograph series 2002 |chapter=The Entrance of Informatics into Combinatorial Chemistry |chapter-url=https://wayback.archive-it.org/2118/20100925010036/http://64.251.202.97/pubs/asist2002/17-swanson.pdf }}</ref> Acknowledged for their parts in the early development were "Gilman Veith and Rose Russo (USEPA) and Albert Leo and [[Corwin Hansch]] (Pomona College) for supporting the work, and Arthur Weininger (Pomona; Daylight CIS) and Jeremy Scofield (Cedar River Software, Renton, WA) for assistance in programming the system."<ref name="Weininger-1998">{{cite web|last=Weininger|first=Dave|title=Acknowledgements on Daylight Tutorial smiles-etc page|url=http://www.daylight.com/meetings/summerschool98/course/dave/smiles-etc.html|access-date=24 June 2013 |date=1998 }}</ref> The [[Environmental Protection Agency]] funded the initial project to develop SMILES.<ref name="Anderson-1987">{{cite book |year=1987 |title= SMILES: A line notation and computerized interpreter for chemical structures |id=Report No. EPA/600/M-87/021 |publisher=[[United States Environmental Protection Agency|U.S. EPA]], Environmental Research Laboratory-Duluth |___location=Duluth, MN |url=https://nepis.epa.gov/Exe/ZyPDF.cgi/2000CAUR.PDF?Dockey=2000CAUR.PDF |last1=Anderson |first1=E. |last2=Veith |first2=G. D. |last3=Weininger |first3=D. }}</ref><ref name="SMILES Tutorial: What is SMILES?">{{Cite web|url=http://www.epa.gov/med/Prods_Pubs/smiles.htm |title=SMILES Tutorial: What is SMILES? |publisher=[[United States Environmental Protection Agency|U.S. EPA]] |access-date=2012-09-23 }}</ref>
 
It has since been modified and extended by others, most notably by [[Daylight Chemical Information Systems]]. In 2007, an [[open standard]] called "OpenSMILES" was developed by the [[Blue Obelisk]] open-source chemistry community. Other 'linear' notations include the [[Wiswesser Line Notation]] (WLN), [[ROSDAL]] and [[SYBYL Line Notation|SLN]] (Tripos Inc).
 
In July 2006, the [[International Union of Pure and Applied Chemistry|IUPAC]] introduced the [[International Chemical Identifier|InChI]] as a standard for formula representation. SMILES is generally considered to have the advantage of being more human-readable than InChI; it also has a wide base of software support with extensive theoretical backing (such as [[graph theory]]).
Line 30:
The term SMILES refers to a line notation for encoding molecular structures and specific instances should strictly be called SMILES strings. However, the term SMILES is also commonly used to refer to both a single SMILES string and a number of SMILES strings; the exact meaning is usually apparent from the context. The terms "canonical" and "isomeric" can lead to some confusion when applied to SMILES. The terms describe different attributes of SMILES strings and are not mutually exclusive.
 
Typically, a number of equally valid SMILES strings can be written for a molecule. For example, <code>CCO</code>, <code>OCC</code> and <code>C(O)C</code> all specify the structure of [[ethanol]]. Algorithms have been developed to generate the same SMILES string for a given molecule; of the many possible strings, these algorithms choose only one of them. This SMILES is unique for each structure, although dependent on the [[canonicalization]] algorithm used to generate it, and is termed the canonical SMILES. These algorithms first convert the SMILES to an internal representation of the molecular structure; an algorithm then examines that structure and produces a unique SMILES string. Various algorithms for generating canonical SMILES have been developed and include those by [[Daylight Chemical Information Systems]], [[OpenEye Scientific Software]], [[MEDIT]], [[Chemical Computing Group]], [[MolSoft LLC]], and the [[Chemistry Development Kit]]. A common application of canonical SMILES is indexing and ensuring uniqueness of molecules in a [[Chemical database|database]].
 
The original paper that described the CANGEN<ref name="Weininger-1989" /> algorithm claimed to generate unique SMILES strings for graphs representing molecules, but the algorithm fails for a number of simple cases (e.g. [[cuneane]], 1,2-dicyclopropylethane) and cannot be considered a correct method for representing a graph canonically.<ref>{{cite book |publisher=Springer |___location=Berlin |isbn=978-3-540-27967-9 |volume=3615 |pages=145–157 | editor-first = Bertram | editor-last=Ludäscher | last1 = Hutchison | first1 = David | first2 = Takeo | last2 = Kanade | first3 = Josef | last3 = Kittler | first4 = Jon M. | last4 = Klienberg | author-link4 = Jon Kleinberg | first5 = Friedemann | last5 = Mattern | first6 = John C. | last6 = Mitchell | first7 = Moni | last7 = Naor | author-link7 = Moni Naor | first8 = Oscar | last8 = Nierstrasz | first9 = C. Pandu | last9 = Rangan | author-link9 = Bernhard Steffen (computer scientist) | first10 = Bernhard | last10 = Steffen | first11 = Madu | last11 = Sudan | author-link11 = Madhu Sudan | first12 = Demetri | last12 = Terzopoulos | first13 = Dough | last13 = Tygar | first14 = Moshe Y. | last14 = Vardi | author-link14 = Moshe Y. Vardi | first15 = Gerhard | last15 = Weikum | first16 = Louiqa | last16 = Raschid |author16-link=Louiqa Raschid | first17 = Greeshma | last17 = Neglur | first18 = Robert L. | last18 = Grossman | first19 = Bing | last19 = Liu | name-list-style = vanc | series = Lecture Notes in Computer Science |title=Data Integration in the Life Sciences |chapter=Assigning Unique Keys to Chemical Compounds for Data Integration: Some Interesting Counter Examples |access-date=2013-02-12 |year=2005 |chapter-url=https://doi.org/10.1007%2F11530084_13 |doi=10.1007/11530084_13 }}</ref> There is currently no systematic comparison across commercial software to test if such flaws exist in those packages.
Line 168:
|<code>CC(=O)NCCC1=CNc2c1cc(OC)cc2</code><br/><code>CC(=O)NCCc1c[nH]c2ccc(OC)cc12</code>
|-----
|[[Flavopereirin]] (C<sub>17</sub>H<sub>15</sub>N<sub>2</sub>)
|[[Image:Flavopereirine.svg|160px|Molecular structure of flavopereirin]]
|<code>CCc(c1)ccc2[n+]1ccc3c2[nH]c4c3cccc4</code><br/><code>CCc1c[n+]2ccc3c4ccccc4[nH]c3c2cc1</code>
Line 200:
|<code>CC(=O)OCCC(/C)=C\C[C@H](C(C)=C)CCC=C</code>
|-----
|(2''S'',5''R'')-[[Chalcogran]]: a [[pheromone]] of the [[Scolytinae|bark beetle]] ''[[Pityogenes chalcographus]]''<ref>{{cite journal|last1=Byers|first1=JA|last2=Birgersson|first2=G|last3=Löfqvist|first3=J
|last4=Appelgren|first4=M|last5=Bergström|first5=G| title = Isolation of pheromone synergists of bark beetle, ''Pityogenes chalcographus'', from complex insect-plant odors by fractionation and subtractive-combination bioassay | journal = Journal of Chemical Ecology | volume = 16 | issue = 3 | pages = 861–76 | date = Mar 1990 | pmid = 24263601 | doi = 10.1007/BF01016496 |s2cid=226090| url = http://www.chemical-ecology.net/pdf/Byersetal1990a.pdf}}</ref>
|[[Image:2S,5R-chalcogran-skeletal.svg|130px|(2''S'',5''R'')-2-ethyl-1,6-dioxaspiro[4.4]nonane]]
Line 214:
|}
{{Clear}}
To illustrate a molecule with more than 9 rings, consider [[cephalostatin]]-1,<ref name="PubChem-183413">{{cite web |title=CID 183413 |url=https://pubchem.ncbi.nlm.nih.gov/compound/183413 |website=[[PubChem]] |access-date=May 12, 2012 |language=en}}</ref> a steroidic 13-ringed [[pyrazine]] with the [[empirical formula]] C<sub>54</sub>H<sub>74</sub>N<sub>2</sub>O<sub>10</sub> isolated from the [[Indian Ocean]] [[hemichordate]] ''[[Cephalodiscus gilchristi]]'':
{{Clear}}
:[[Image:Cephalostatine-1.svg|360px|Molecular structure of cephalostatin-1]]
Line 226:
=== Other examples of SMILES ===
 
The SMILES notation is described extensively in the SMILES theory manual provided by [[Daylight Chemical Information Systems]] and a number of illustrative examples are presented. Daylight's depict utility provides users with the means to check their own examples of SMILES and is a valuable educational tool.
 
== Extensions ==