Content deleted Content added
m clean up, References after punctuation per WP:REFPUNC and WP:CITEFOOT using AWB (8792) |
m Reflist |
||
(43 intermediate revisions by 28 users not shown) | |||
Line 1:
{{Short description|Content-based retrieval of XML documents}}
'''XML retrieval''', or '''XML information retrieval''', is the content-based retrieval of documents structured with [[XML]] (eXtensible Markup Language). As such it is used for computing [[Relevance (information retrieval)|relevance]] of XML documents.<ref>{{Cite web |last=Lalmas |first=Mounia |date=2009 |title=XML Retrieval |url=https://fi.wikipedia.org/wiki/XML-tiedonhaku#L%C3%A4hteet |publisher=Morgan & Claypool}}</ref>
==Queries==
Line 14 ⟶ 9:
==Ranking==
Ranking in XML-Retrieval can incorporate both content relevance and structural similarity, which is the resemblance between the structure given in the query and the structure of the document. Also, the retrieval units resulting from an XML query may not always be entire documents, but can be any deeply nested XML elements, i.e. dynamic documents. The aim is to find the smallest retrieval unit that is highly relevant. Relevance can be defined according to the notion of specificity, which is the extent to which a retrieval unit focuses on the topic of request.<ref name="INEX2006">{{Cite web|url=http://www.cs.otago.ac.nz/homepages/andrew/2006-10.pdf |title=Overview of INEX 2006 |last=Malik |first=Saadia |
==Existing XML search engines==
An overview of two potential approaches is available.<ref>{{Cite
===Traditional XML query languages===
[[Query language]]s such as the [[W3C]] standard [[XQuery]]<ref>{{Cite web|url=http://www.w3.org/TR/2007/REC-xquery-20070123/|title=XQuery 1.0: An XML Query Language|last=Boag|first=Scott|
===Databases===
Line 26 ⟶ 21:
===Information retrieval===
Classic information retrieval models such as the [[vector space model]] provide relevance ranking, but do not include document structure; only flat queries are supported. Also, they apply a static document concept, so retrieval units usually are entire documents.<ref name="Schlieder2002"/> They can be extended to consider structural information and dynamic document retrieval. Examples for approaches extending the vector space models are available: they use document [[subtree]]s (index terms plus structure) as dimensions of the vector space.<ref>{{Cite web|url=http://www.cobase.cs.ucla.edu/tech-docs/sliu/SIGIR04.pdf|title=Configurable Indexing and Ranking for XML Information Retrieval|last=Liu|first=Shaorong|
== Data-centric XML datasets ==
For data-centric XML datasets, the unique and distinct keyword search method, namely, XDMA<ref>{{Cite journal|last1=Selvaganesan|first1=S.|last2=Haw|first2=Su-Cheng|last3=Soon|first3=Lay-Ki|title=XDMA: A Dual Indexing and Mutual Summation Based Keyword Search Algorithm for XML Databases|journal=International Journal of Software Engineering and Knowledge Engineering|language=en-US|volume=24|issue=4|pages=591–615|doi=10.1142/s0218194014500223|year=2014}}</ref> for XML databases is designed and developed based on dual indexing and mutual summation.
==See also==
Line 37 ⟶ 35:
{{DEFAULTSORT:Xml-Retrieval}}
[[Category:XML]]
[[Category:Information retrieval genres]]
|