Content deleted Content added
Nikkimaria (talk | contribs) rm orphan tag |
Rvt to more precise language, more wikilinking |
||
Line 5:
| importance = February 2009
| jargon = February 2009
| orphan = February 2009
| technical = February 2009
| tone = February 2009
}}
'''XML Retrieval''', or XML Information Retrieval, is the content-based retrieval of documents structured with [[XML]] (eXtensible Markup Language). As such it is used for
== Queries ==
Most XML retrieval approaches
== Ranking ==▼
Ranking in XML-Retrieval can use both content relevance and structural similarity, which is the resemblance between the structure given in the query and the structure of the document. Also, the retrieval units resulting from an XML query may not always be entire documents, but can be any deeply nested XML elements, i.e. dynamic documents. The aim is to find the smallest retrieval unit that is relevant. Relevance can be defined according to specificity, which is the extent to which a retrieval unit focuses on the topic of request.<ref name="INEX2006">{{cite web|url=http://www.cs.otago.ac.nz/homepages/andrew/2006-10.pdf|title=Overview of INEX 2006|last=Malik|first=Saadia|coauthors=Trotman, Andrew; Lalmas, Mounia; Fuhr, Norbert|date=2007|work=Proceedings of the Fifth Workshop of the INitiative for the Evaluation of XML Retrieval|accessdate=2009-02-10}}</ref>▼
== Exploiting XML structure ==
Taking advantage of the [[Self-documenting|self-describing]] structure of XML documents can improve the search for XML documents significantly. This includes the use of CAS
▲== Ranking ==
▲Ranking in XML-Retrieval can
== Existing XML search engines ==
An overview of two potential approaches is available.<ref>{{cite web|url=http://www.sigmod.org/record/issues/0612/p16-article-yahia.pdf|title=XML Search: Languages, INEX and Scoring|last=Amer-Yahia|first=Sihem|coauthors=Lalmas, Mounia|date=2006|publisher=SIGMOD Rec. Vol. 35, No. 4|accessdate=2009-02-10}}</ref><ref>{{cite web|url=http://66.102.1.104/scholar?q=cache:R6ZYFNoTRrUJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.109.5986%26rep%3Drep1%26type%3Dpdf|title=XML Retrieval: A Survey|last=Pal|first=Sukomal|date=June 30, 2006|publisher=Technical Report, CVPR|accessdate=2009-02-10}}</ref> The INitiative for the Evaluation of XML-Retrieval (''
===Traditional XML query languages===
[[Query language]]s such as the [[W3C]] standard [[XQuery]]<ref>{{cite web|url=http://www.w3.org/TR/2007/REC-xquery-20070123/|title=XQuery 1.0: An XML Query Language|last=Boag|first=Scott|coauthors=Chamberlin, Don; Fernández, Mary F.; Florescu, Daniela; Robie, Jonathan; Siméon, Jérôme|date=23 January 2007|work=W3C Recommendation|publisher=World Wide Web Consortium|accessdate=2009-02-10}}</ref> supply complex queries, but only look for exact matches. Therefore, they need to be extended to allow for vague search with relevance computing. Most XML-centered approaches imply a quite exact knowledge of the
===Databases===
Line 29 ⟶ 30:
===Information retrieval===
Classic information retrieval models such as the [[vector space model]] provide relevance ranking, but do not include document structure; only flat queries are supported. Also, they apply a static document concept, so retrieval units usually are entire documents.<ref name="Schlieder2002"/> They can be extended to consider structural information and dynamic document retrieval. Examples for approaches extending the vector space models are available: they use document [[subtree]]s (index terms plus structure) as dimensions of the vector space.<ref>{{cite web|url=http://www.cobase.cs.ucla.edu/tech-docs/sliu/SIGIR04.pdf|title=Configurable Indexing and Ranking for XML Information Retrieval|last=Liu|first=Shaorong|coauthors=Zou, Qinghua; Chu, Wesley W.|date=2004|work=
==See also==
|