GATE community and research is involved in several European research projects including [[Transitioning Applications to Ontologies|TAO]] and [[SEKT]].
TheGATE mainincludes partan isinformation Annieextraction system called ANNIE (aA Nearly-New Information Extraction System) which is a set of modules comprising a [[Lexical analysis|tokenizer]], a [[Gazetteer|gazetteer]], a [[Sentence boundary disambiguation|sentence splitter]], a [[Part-of-speech tagging|part of speech tagger]], a [[Named entity recognition|named entities]] transducer and a [[Coreference|coreference]] tagger. Languages actuallycurrently takenhandled intoin accountGATE areinclude English, Spanish, Chinese, Arabic, French, German, Hindi, Cebuano, Romanian, Russian. AThere lotis a large set of plugins exist. Forfor [[machine learning]] with [[Weka (machine learning)|Weka]], RASP, MAXENT, SVM Light, for managing [[Ontologies]] like [[WordNet]], for querying [[search engines]] like [[Google]] or [[Yahoo]], for part of speech tagging with [[Brill tagger|Brill]] or TreeTager, and many more.
GateGATE can workhandle atinput leastin withvarious formats, such as [[Text file|TXT]], [[HTML]], [[XML]], [[DOC (computing)|Doc]], [[PDF]] documents, and [[Serialization|Java Serial]], [[PostgreSQL]], [[Lucene]], [[Oracle database|Oracle]] Databases with help of RDBMS storage over [[JDBC]].
It also uses the JAPE (Java Annotation Patterns Engine) language for building rules in order to annotate documents with tags. A debugger, corpus benchmark and annotations comparator tools are also present.
== References ==
|