Content deleted Content added
Description of real features |
addition of internal link |
||
Line 1:
'''General Architecture for Text Engineering''' or '''GATE''' is a [[Java (programming language)|Java]] software toolkit originally developed at the [[University of Sheffield]] since 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of [[Natural language processing | natural language processing]] tasks, including [[Information extraction | information extraction]] in many languages.
GATE comprises an architecture, a [[free software|free open source]] API, framework and graphical development environment.
Line 5:
GATE community and research is involved in several European research projects including [[Transitioning Applications to Ontologies|TAO]] and [[SEKT]].
The main part is Annie (a Nearly-New Information Extraction System) which is a set of modules comprising a [[Lexical analysis|tokenizer]], a [[Gazetteer|gazetteer]], a [[Sentence boundary disambiguation|sentence splitter]], a [[Part-of-speech tagging|part of speech tagger]], a [[Named entity recognition|named entities]] transducer and a [[Coreference|coreference]] tagger. Languages actually taken into account are English, Spanish, Chinese, Arabic, French, German, Hindi, Cebuano, Romanian. A lot of plugins exist. For [[machine learning]] with [[Weka (machine learning)|Weka]], RASP, MAXENT, SVM Light, for managing [[Ontologies]] like [[WordNet]], for querying [[search engines]] like [[Google]] or [[Yahoo]], for part of speech tagging with [[Brill tagger|Brill]] or TreeTager.
Gate can work at least with [[Text file|TXT]], [[HTML]], [[XML]], [[DOC (computing)|Doc]], [[PDF]] documents and [[Serialization|Java Serial]], [[PostgreSQL]], [[Lucene]], [[Oracle database|Oracle]] Databases with help of RDBMS storage over [[JDBC]].
It also uses JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. A debugger, corpus benchmark and annotations comparator tools are also present.
|