General Architecture for Text Engineering: Difference between revisions

Content deleted Content added
Mquantin (talk | contribs)
Bender the Bot (talk | contribs)
m Features: HTTP to HTTPS for Blogspot
 
(19 intermediate revisions by 16 users not shown)
Line 1:
{{Infobox software
| name = GATE
| screenshot = [[Image:GATE5 main window.png|250px]]
| screenshot size = 250px
| caption = GATE Developer v5 main window
| developer = [httphttps://gate.ac.uk/people GATE research team], [http://www.dcs.shef.ac.uk/ Dept. Computer Science, University of Sheffield]
| released = {{start date and age |1995}}
| frequently_updated = yes<!-- Release version update? Don't edit this page, just click on the version number! -->
| programming language = [[Java (programming language)|Java]]
| operating system = [[Cross-platform]]
| language = English
| genre = [[Text mining]] [[Information Extractionextraction]]
| license = [[LGPL]]
| website = {{url|httphttps://gate.ac.uk}}
}}
'''General Architecture for Text Engineering''' or ('''GATE''') is a [[Java (programming language)|Java]] suite of tools originally developed at the [[Universitynatural oflanguage Sheffieldprocessing]] beginning(NLP) in 1995 and now used worldwide by a wide community of scientists, companies, teachers and studentstools for many [[natural language processing]]man tasks, including [[information extraction]] in many languages.<ref>Languages mentioned on httphttps://gate.ac.uk/gate/plugins/ include Arabic, Bulgarian, Cebuano, Chinese, French, German, Hindi, Italian, Romanian and Russian.</ref> It is now used worldwide by a wide community of scientists, companies, teachers and students. It was originally developed at the [[University of Sheffield]] beginning in 1995.
 
As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from [[SourceForge]] are recorded since the project moved to SourceForge in 2005.<ref>{{cite web|url=httphttps://sourceforge.net/projects/gate/|title=GATE|publisher=|accessdateaccess-date=17 December 2016}}</ref> The paper "GATE: A Frameworkframework and Graphicalgraphical Developmentdevelopment Environmentenvironment for Robustrobust NLP Toolstools and Applicationsapplications"<ref>[httphttps://gatewww.acaclweb.ukorg/saleanthology/acl02P02-1022/acl-main.pdf "GATE: A Frameworkframework and Graphicalgraphical Developmentdevelopment Environmentenvironment for Robustrobust NLP Toolstools and Applicationsapplications"], by Cunningham H., [[Diana Maynard|Maynard D.]], Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002)]</ref> has received over 8002000 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide,<ref>{{cite web|url=httphttps://gate.ac.uk/userguide/|title=GATE.ac.uk - sale/tao/split.html|publisher=|accessdateaccess-date=17 December 2016}}</ref> include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady,<ref>Konchady, Manu. [https://books.google.com/books?id=mcM-OAAACAAJ&dqq=Building+Search+Applications:+Lucene,+LingPipe,+and+Gate&hl=en&ei=avbDTczPJITqrQfk1IXQBA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDEQ6AEwAA Building Search Applications: Lucene, LingPipe, and Gate]. Mustru Publishing. 2008.</ref> and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.<ref>{{cite webbook|url=https://books.google.com/books?id=TDQJb1UgVywC&dqq=Introduction%20to%20Linguistic%20Annotation%20and%20Text%20Analytics&printsec=frontcover&source=bl&ots=bAF26ZQSTx&sig=TbxZ_-3tRy3IeDBKFofeVN6bAIc&hl=en&ei=vc0gS7PlLo-64QaSgqnfCQ&sa=X&oi=book_result&ct=result&resnum=2&ved=0CBcQ6AEwAQ#v=onepage&q=&f=false|title=Introduction to Linguistic Annotation and Text Analytics|first=Graham|last=Wilcock|date=1 January 2009|publisher=Morgan & Claypool Publishers|accessdateisbn=9781598297386|access-date=17 December 2016|via=Google Books}}</ref>
GATE has been compared to [[NLTK]], [[R (programming language)|R]] and [[RapidMiner]].<ref>{{cite web|url=http://www.b-eye-network.com/view/9516|title=Open Source Text Analytics by Seth Grimes - BeyeNETWORK|publisher=|accessdate=17 December 2016}}</ref> As well as being widely used in its own right, it forms the basis of the KIM semantic platform.<ref>{{cite journal|url=https://www.cambridge.org/core/journals/natural-language-engineering/article/div-classtitlekim-a-semantic-platform-for-information-extraction-and-retrievaldiv/7249CC61F5AB25CBC7AAE182509DFEDE|title=KIM – a semantic platform for information extraction and retrieval|first1=Borislav|last1=Popov|first2=Atanas|last2=Kiryakov|first3=Damyan|last3=Ognyanoff|first4=Dimitar|last4=Manov|first5=Angel|last5=Kirilov|date=1 September 2004|publisher=|volume=10|issue=3-4|pages=375–392|accessdate=17 December 2016|via=Cambridge Core|doi=10.1017/S135132490400347X}}</ref>
 
GATE community and research has been involved in several European research projects including: [[Transitioning Applications to Ontologies|TAO]], [[SEKT]], NeOn, Media-Campaign, Musing, [[Service-Finder]], LIRICS and [[KnowledgeWeb Project|KnowledgeWeb]], as well as many other projects.
 
As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from [[SourceForge]] are recorded since the project moved to SourceForge in 2005.<ref>{{cite web|url=http://sourceforge.net/projects/gate/|title=GATE|publisher=|accessdate=17 December 2016}}</ref> The paper "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications"<ref>[http://gate.ac.uk/sale/acl02/acl-main.pdf "GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications", by Cunningham H., Maynard D., Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002)]</ref> has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide,<ref>{{cite web|url=http://gate.ac.uk/userguide/|title=GATE.ac.uk - sale/tao/split.html|publisher=|accessdate=17 December 2016}}</ref> include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady,<ref>Konchady, Manu. [https://books.google.com/books?id=mcM-OAAACAAJ&dq=Building+Search+Applications:+Lucene,+LingPipe,+and+Gate&hl=en&ei=avbDTczPJITqrQfk1IXQBA&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDEQ6AEwAA Building Search Applications: Lucene, LingPipe, and Gate]. Mustru Publishing. 2008.</ref> and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.<ref>{{cite web|url=https://books.google.com/books?id=TDQJb1UgVywC&dq=Introduction%20to%20Linguistic%20Annotation%20and%20Text%20Analytics&printsec=frontcover&source=bl&ots=bAF26ZQSTx&sig=TbxZ_-3tRy3IeDBKFofeVN6bAIc&hl=en&ei=vc0gS7PlLo-64QaSgqnfCQ&sa=X&oi=book_result&ct=result&resnum=2&ved=0CBcQ6AEwAQ#v=onepage&q=&f=false|title=Introduction to Linguistic Annotation and Text Analytics|first=Graham|last=Wilcock|date=1 January 2009|publisher=Morgan & Claypool Publishers|accessdate=17 December 2016|via=Google Books}}</ref>
 
== Features ==
Line 27 ⟶ 25:
Languages currently handled in GATE include [[English language|English]], [[Standard Chinese|Chinese]], [[Arabic]], [[Bulgarian language|Bulgarian]], [[French language|French]], [[German language|German]], [[Hindi]], [[Italian language|Italian]], [[Cebuano language|Cebuano]], [[Romanian language|Romanian]], [[Russian language|Russian]], [[Danish language|Danish]].
 
Plugins are included for [[machine learning]] with [[Weka (machine learning)|Weka]], RASP, MAXENT, SVM Light, as well as a [[LIBSVM]] integration and an in-house [[perceptron]] implementation, for managing [[Ontology (information science)|ontologies]] like [[WordNet]], for querying [[search engines]] like [[Google]] or [[Yahoo]], for [[part of speech tagging]] with [[Brill tagger|Brill]] or TreeTagger, and many more. Many external plugins are also available, for handling e.g. [[Twitter|tweets]].<ref>{{cite web|url=https://gate.ac.uk/wiki/twitie.html|title=GATE.ac.uk - wiki/twitie.html|publisher=|accessdateaccess-date=17 December 2016}}</ref>
 
GATE accepts input in various formats, such as [[Text file|TXT]], [[HTML]], [[XML]], [[DOC (computing)|Doc]], [[PDF]] documents, and [[Serialization|Java Serial]], [[PostgreSQL]], [[Lucene]], [[Oracle database|Oracle]] Databases with help of [[RDBMS]] storage over [[JDBC]].
 
[[JAPE (linguistics)|JAPE]] transducers are used within GATE to manipulate annotations on text. Documentation is provided in the GATE User Guide.<ref>{{cite web|url=httphttps://gate.ac.uk/userguide/chap:jape|title=GATE.ac.uk - sale/tao/splitch8.html|publisher=|accessdateaccess-date=17 December 2016}}</ref> A tutorial has also been written by Press Association Images.<ref>{{cite web|url=httphttps://realizingsemanticweb.blogspot.com/2009/07/jape-grammar-tutorial.html|title=Realizing Semantic Web: JAPE grammar tutorial|first=Dhavalkumar|last=Thakker|date=17 July 2009|publisher=|accessdateaccess-date=17 December 2016}}</ref>
 
== GATE Developer ==
Line 37 ⟶ 35:
[[Image:GATE5 main window.png|thumb|400px|GATE 5 main window.]]
 
The screenshot shows the document viewer used to display a document and its annotations. In pink are <A>{{tag|a|o}} hyperlink annotations from an [[Hypertext Markup Language|HTML]] file. The right list is the annotation sets list, and the bottom table is the annotation list. In the center is the annotation editor window.
 
== GATE Mímir ==
Line 44 ⟶ 42:
 
==See also==
{{Portal|Free and open-source software}}
* [[Unstructured Information Management Architecture]] (UIMA)
* [[OpenNLP]]
* [[List of natural language processing toolkits]]
* [[Pheme (project)|Pheme]], a major EU project managed by the GATE group on early detection of false information in social media
 
==References==
<references/>
 
==External links==
* {{Official website|https://gate.ac.uk/}}
 
{{DEFAULTSORT:General Architecture For Text Engineering}}