XML retrieval: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:57, 11 February 2022 edit ManjushriSword (talk \| contribs) 6 edits m →Data-centric XML datasets ← Previous edit		Latest revision as of 04:38, 26 May 2025 edit undo GünniX (talk \| contribs) Extended confirmed users 337,297 edits m Reflist
(10 intermediate revisions by 5 users not shown)
Line 1: {{Short description\|Content-based retrieval of XML documents}} '''XML retrieval''', or '''XML information retrieval''', is the content-based retrieval of documents structured with [[XML]] (eXtensible Markup Language). As such it is used for computing [[Relevance (information retrieval)\|relevance]] of XML documents.<ref>{{Cite web~~\|url=ftp://ftp.tm.informatik.uni-frankfurt.de/pub/papers/ir/An%20Architecture%20for%20XML%20Information%20Retrieval%20in%20a%20Peer-to-Peer%20Environment_2007.pdf\|title=An~~ ~~Architecture for XML Information Retrieval in a Peer-to-Peer Environment~~\|last=~~Winter~~Lalmas \|first=~~Judith~~Mounia \|~~author2~~date=~~Drobnik, Oswald~~2009 \|~~date~~title=~~November~~XML 9,Retrieval ~~2007~~\|~~publisher~~url=~~ACM~~https://fi.wikipedia.org/wiki/XML-tiedonhaku#L%C3%A4hteet \|~~access-date~~publisher=~~2009-02-10~~Morgan & Claypool}}</ref> ==Queries== Line 8 ⟶ 9: ==Ranking== Ranking in XML-Retrieval can incorporate both content relevance and structural similarity, which is the resemblance between the structure given in the query and the structure of the document. Also, the retrieval units resulting from an XML query may not always be entire documents, but can be any deeply nested XML elements, i.e. dynamic documents. The aim is to find the smallest retrieval unit that is highly relevant. Relevance can be defined according to the notion of specificity, which is the extent to which a retrieval unit focuses on the topic of request.<ref name="INEX2006">{{Cite web\|url=http://www.cs.otago.ac.nz/homepages/andrew/2006-10.pdf \|title=Overview of INEX 2006 \|last=Malik \|first=Saadia \|author2=Trotman, Andrew \|author3=Lalmas, Mounia \|author4=Fuhr, Norbert \|year=2007 \|work=Proceedings of the Fifth Workshop of the INitiative for the Evaluation of XML Retrieval \|access-date=2009-02-10 ~~\|url-status=dead~~ \|archive-url=https://web.archive.org/web/20081016101202/http://www.cs.otago.ac.nz/homepages/andrew/2006-10.pdf \|archive-date=October 16, 2008 }}</ref> ==Existing XML search engines== An overview of two potential approaches is available.<ref>{{Cite journal\|url=http://www.sigmod.org/record/issues/0612/p16-article-yahia.pdf\|title=XML Search: Languages, INEX and Scoring\|last=Amer-Yahia\|first=Sihem\|author2=Lalmas, Mounia \|year=2006\|journal=SIGMOD Rec. \|volume=35 \|issue=4\|access-date=2009-02-10\|doi=10.1145/1228268.1228271\|s2cid=17300151}} {{Dead link\|date=October 2010\|bot=H3llBot}}</ref><ref>{{Cite ~~document~~CiteSeerX \|citeseerx = 10.1.1.109.5986\|title=XML Retrieval: A Survey\|last=Pal\|first=Sukomal\|date=June 30, 2006~~\|publisher=Technical Report, CVPR~~ }}</ref> The INitiative for the Evaluation of XML-Retrieval (''INEX'') was founded in 2002 and provides a platform for evaluating such [[algorithm]]s.<ref name="INEX2006" /> Three different areas influence XML-Retrieval:<ref name="INEX2002">{{Cite web\|url=http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fuhr_etal:02a.pdf \|title=INEX: Initiative for the Evaluation of XML Retrieval \|last=Fuhr \|first=Norbert \|author2=Gövert, N. \|author3=Kazai, Gabriella \|author4=Lalmas, Mounia \|year=2003 \|work=Proceedings of the First INEX Workshop, Dagstuhl, Germany, 2002 \|publisher=ERCIM Workshop Proceedings, France \|access-date=2009-02-10 ~~\|url-status=dead~~ \|archive-url=https://web.archive.org/web/20081121135758/http://www.is.informatik.uni-duisburg.de/bib/pdf/ir/Fuhr_etal:02a.pdf \|archive-date=November 21, 2008 }}</ref> ===Traditional XML query languages=== [[Query language]]s such as the [[W3C]] standard [[XQuery]]<ref>{{Cite web\|url=http://www.w3.org/TR/2007/REC-xquery-20070123/\|title=XQuery 1.0: An XML Query Language\|last=Boag\|first=Scott\|author2=Chamberlin, Don \|author3=Fernández, Mary F. \|author4=Florescu, Daniela \|author5=Robie, Jonathan \|author6= Siméon, Jérôme \|date=23 January 2007\|work=W3C Recommendation\|publisher=World Wide Web Consortium\|access-date=2009-02-10}}</ref> supply complex queries, but only look for exact matches. Therefore, they need to be extended to allow for vague search with relevance computing. Most XML-centered approaches imply a quite exact knowledge of the documents' [[Database schema\|schemas]].<ref name="Schlieder2002">{{Cite journal\|url=http://www.cis.uni-muenchen.de/people/Meuss/Pub/JASIS02.ps.gz \|title=Querying and Ranking XML Documents \|last=Schlieder \|first=Torsten \|author2=Meuss, Holger \|year=2002 \|journal=Journal of the American Society for Information Science and Technology \|volume=53 \|issue=6 \|pages=489–503 \|access-date=2009-02-10 ~~\|url-status=dead~~ \|archive-url=https://web.archive.org/web/20070610002349/http://www.cis.uni-muenchen.de/people/Meuss/Pub/JASIS02.ps.gz \|archive-date=June 10, 2007 \|doi=10.1002/asi.10060 \|url-access=subscription }}</ref> ===Databases===