General Architecture for Text Engineering: Difference between revisions

Content deleted Content added
update Infobox - developper
Bender the Bot (talk | contribs)
m Features: HTTP to HTTPS for Blogspot
 
(69 intermediate revisions by 68 users not shown)
Line 1:
{{Infobox Softwaresoftware
| name = GATE
| screenshot = GATE5 main = [[Image:GATE5_main_windowwindow.png|250px]]
| screenshot size = 250px
| caption = GATE 5Developer v5 main window
| developer = [httphttps://gate.ac.uk/people GATE research team], [http://www.dcs.shef.ac.uk/ Dept. Computer Science, University of Sheffield]
| released = 1996
| released = {{start date and age |1995}}
| frequently_updated = yes<!-- Release version update? Don't edit this page, just click on the version number! -->
| programming language = [[Java (programming language)|Java]]
| operating system = [[Cross-platform]]
| language = English
| genre = [[Text mining]] [[Information Extractionextraction]]
| license = [[GNU Lesser General Public License|LGPL]]
| website = [http{{url|https://gate.ac.uk/ http://gate.ac.uk/]}}
}}
'''General Architecture for Text Engineering''' or ('''GATE''') is a [[Java (programming language)|Java]] softwaresuite toolkitof originally[[natural developedlanguage atprocessing]] the(NLP) tools for man tasks, including [[Universityinformation of Sheffieldextraction]] sincein 1995many languages.<ref>Languages mentioned on https://gate.ac.uk/gate/plugins/ include Arabic, Bulgarian, Cebuano, Chinese, French, German, Hindi, Italian, Romanian and Russian.</ref> It is now used worldwide by a wide community of scientists, companies, teachers and students. forIt allwas sortsoriginally ofdeveloped [[Naturalat language processing | natural language processing]] tasks, includingthe [[InformationUniversity extractionof | information extractionSheffield]] beginning in many languages1995.
 
As of May 28, 2011, 881 people are on the gate-users mailing list at SourceForge.net, and 111,932 downloads from [[SourceForge]] are recorded since the project moved to SourceForge in 2005.<ref>{{cite web|url=https://sourceforge.net/projects/gate/|title=GATE|access-date=17 December 2016}}</ref> The paper "GATE: A framework and graphical development environment for robust NLP tools and applications"<ref>[https://www.aclweb.org/anthology/P02-1022/ "GATE: A framework and graphical development environment for robust NLP tools and applications"], by Cunningham H., [[Diana Maynard|Maynard D.]], Bontcheva K. and Tablan V. (In proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics, 2002)</ref> has received over 2000 citations since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide,<ref>{{cite web|url=https://gate.ac.uk/userguide/|title=GATE.ac.uk - sale/tao/split.html|access-date=17 December 2016}}</ref> include "Building Search Applications: Lucene, LingPipe, and Gate", by Manu Konchady,<ref>Konchady, Manu. [https://books.google.com/books?id=mcM-OAAACAAJ&q=Building+Search+Applications:+Lucene,+LingPipe,+and+Gate Building Search Applications: Lucene, LingPipe, and Gate]. Mustru Publishing. 2008.</ref> and "Introduction to Linguistic Annotation and Text Analytics", by Graham Wilcock.<ref>{{cite book|url=https://books.google.com/books?id=TDQJb1UgVywC&q=Introduction%20to%20Linguistic%20Annotation%20and%20Text%20Analytics|title=Introduction to Linguistic Annotation and Text Analytics|first=Graham|last=Wilcock|date=1 January 2009|publisher=Morgan & Claypool Publishers|isbn=9781598297386|access-date=17 December 2016|via=Google Books}}</ref>
GATE comprises an architecture, a [[free software|free open source]] API, framework and graphical development environment.
 
GATE community and research ishas been involved in several European research projects including: [[Transitioning Applications to Ontologies|TAO]], SEKT, NeOn, Media-Campaign, Musing, Service-Finder, LIRICS and [[SEKTKnowledgeWeb Project|KnowledgeWeb]].
 
== Features ==
 
GATE includes an [[information extraction]] system called '''ANNIE''' ('''A Nearly-New Information Extraction System''') which is a set of modules comprising a [[Lexical analysis|tokenizer]], a [[Gazetteer|gazetteer]], a [[Sentence boundary disambiguation|sentence splitter]], a [[Part-of-speech tagging|part of speech tagger]], a [[Named entity recognition|named entities]] transducer and a [[Coreference|coreference]] tagger. ANNIE can be used as-is to provide basic [[information extraction]] functionality, or provide a starting point for more specific tasks.
 
Languages currently handled in GATE include [[English, Spanishlanguage|English]], [[Standard Chinese|Chinese]], [[Arabic]], [[Bulgarian language|Bulgarian]], [[French language|French]], [[German language|German]], [[Hindi]], [[Italian language|Italian]], [[Cebuano language|Cebuano]], [[Romanian language|Romanian]], [[Russian language|Russian]], [[Danish language|Danish]].
 
TherePlugins isare a large set of pluginsincluded for [[machine learning]] with [[Weka (machine learning)|Weka]], RASP, MAXENT, SVM Light, as well as a [[LIBSVM]] integration and an in-house [[perceptron]] implementation, for managing [[OntologiesOntology (information science)|ontologies]] like [[WordNet]], for querying [[search engines]] like [[Google]] or [[Yahoo]], for [[part of speech tagging]] with [[Brill tagger|Brill]] or TreeTagerTreeTagger, and many more. Many external plugins are also available, for handling e.g. [[Twitter|tweets]].<ref>{{cite web|url=https://gate.ac.uk/wiki/twitie.html|title=GATE.ac.uk - wiki/twitie.html|access-date=17 December 2016}}</ref>
 
GATE can handleaccepts input in various formats, such as [[Text file|TXT]], [[HTML]], [[XML]], [[DOC (computing)|Doc]], [[PDF]] documents, and [[Serialization|Java Serial]], [[PostgreSQL]], [[Lucene]], [[Oracle database|Oracle]] Databases with help of [[RDBMS]] storage over [[JDBC]].
 
[[JAPE (linguistics)|JAPE]] transducers are used within GATE to manipulate annotations on text. Documentation is provided in the GATE User Guide.<ref>{{cite web|url=https://gate.ac.uk/userguide/chap:jape|title=GATE.ac.uk - sale/tao/splitch8.html|access-date=17 December 2016}}</ref> A tutorial has also been written by Press Association Images.<ref>{{cite web|url=https://realizingsemanticweb.blogspot.com/2009/07/jape-grammar-tutorial.html|title=Realizing Semantic Web: JAPE grammar tutorial|first=Dhavalkumar|last=Thakker|date=17 July 2009|access-date=17 December 2016}}</ref>
It also uses the JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. A debugger, corpus benchmark and annotations comparator tools are also present.
 
== GATE Developer ==
== Description of the graphical user interface ==
 
[[Image:GATE5_main_windowGATE5 main window.png|thumb|400px|GATE 5 main window.]]
 
Here youThe canscreenshot seeshows the document viewer useused to display a document and its annotations. In pink are <A>{{tag|a|o}} hyperlink annotations from an [[Hypertext Markup Language|HTML]] file. The right list is the annotation sets list, and the bottom table is the annotation list. In the center is the annotation editor window.
The GATE main GUI consist of a top menu and row of icons, a left vertical resources tree, a central-right tabbed pane of the resource viewers and a message field at the bottom.
 
== GATE Mímir ==
The resources tree and the menu are use to load, save and run resources. The resources tree display the loaded resources and allows to show a resource in a resource viewer by double-clicking on it or pressing Enter key.
<!-- re-written to remove any lingering copyright worries -->
GATE generates vast quantities of information including; natural language text, semantic annotations, and ontological information. Sometimes the data itself is the end product of an application but often the information would be more useful if it could be efficiently searched. GATE Mimir provides support for indexing and searching the linguistic and semantic information generated by such applications and allows for querying the information using arbitrary combinations of text, structural information, and [[SPARQL]].
 
==See also==
Each loaded resource can be displayed in a specific resource viewer that take most of the space in the GUI.
{{Portal|Free and open-source software}}
* [[Unstructured Information Management Architecture]] (UIMA)
* [[OpenNLP]]
* [[Pheme (project)|Pheme]], a major EU project managed by the GATE group on early detection of false information in social media
 
==References==
Here you can see the document viewer use to display a document and its annotations. In pink are <A> hyperlink annotations from an HTML file. The right list is the annotation sets list and the bottom table is the annotation list. In the center is the annotation editor window.
<references/>
 
==External References links==
* {{Official website|https://gate.ac.uk/}}
*[http://gate.ac.uk/ GATE website] at [http://nlp.shef.ac.uk/ University of Sheffield Natural Language Processing Group]
 
{{DEFAULTSORT:General Architecture For Text Engineering}}
==See also==
[[Category:Data mining and machine learning software]]
{{Portal|Free software|Free Software Portal Logo.svg}}
[[Category:OntologyFree (computer science)libraries]]
* [[Unstructured Information Management Architecture]] (UIMA)
 
[[Category:SourceForge projects]]
[[Category:Free application software]]
[[Category:Free development toolkits and libraries]]
[[Category:Free science software]]
[[Category:Free software programmed in Java (programming language)]]
[[Category:Open sourceFree integrated development environments]]
[[Category:Free cross-platform software]]
 
[[Category:Knowledge representation]]
[[Category:MachineNatural learninglanguage processing toolkits]]
[[Category:Natural language processing]]
[[Category:Data mining]]
[[Category:Ontology (computer science)]]
[[Category:Ontology editors]]