General Architecture for Text Engineering or GATE is a Java software toolkit originally developed at the University of Sheffield since 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages.
GATE | |
---|---|
Developer(s) | GATE research team, Dept. Computer Science, University of Sheffield |
Initial release | 1996 |
Stable release | 8.6.1 (January 17, 2020[±] | )
Preview release | 9.0-SNAPSHOT (August 29, 2025 (Nightly builds released every day)) [±] |
Repository | |
Written in | Java |
Operating system | Cross-platform |
Available in | English |
Type | Text mining Information Extraction |
License | LGPL |
Website | http://gate.ac.uk/ |
GATE comprises an architecture, a free open source API, framework and graphical development environment.
GATE community and research is involved in several European research projects including TAO and SEKT.
Features
GATE includes an information extraction system called ANNIE (A Nearly-New Information Extraction System) which is a set of modules comprising a tokenizer, a gazetteer, a sentence splitter, a part of speech tagger, a named entities transducer and a coreference tagger.
Languages currently handled in GATE include English, Spanish, Chinese, Arabic, French, German, Hindi, Cebuano, Romanian, Russian.
There is a large set of plugins for machine learning with Weka, RASP, MAXENT, SVM Light, for managing Ontologies like WordNet, for querying search engines like Google or Yahoo, for part of speech tagging with Brill or TreeTager, and many more.
GATE can handle input in various formats, such as TXT, HTML, XML, Doc, PDF documents, and Java Serial, PostgreSQL, Lucene, Oracle Databases with help of RDBMS storage over JDBC.
It also uses the JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. A debugger, corpus benchmark and annotations comparator tools are also present.
Description of the graphical user interface
The GATE main GUI consist of a top menu and row of icons, a left vertical resources tree, a central-right tabbed pane of the resource viewers and a message field at the bottom.
The resources tree and the menu are use to load, save and run resources. The resources tree display the loaded resources and allows to show a resource in a resource viewer by double-clicking on it or pressing Enter key.
Each loaded resource can be displayed in a specific resource viewer that take most of the space in the GUI.
Here is the document viewer use to display a document and its annotations. You can see in pink a annotations or hyperlinks from an HTML file. The right list is the annotation sets list and the bottom table is the annotation list. In the center is the annotation editor window.