Compound-term processing: Difference between revisions

Content deleted Content added
Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.1
Citation bot (talk | contribs)
Alter: journal. Add: pages. Removed parameters. Formatted dashes. | You can use this bot yourself. Report bugs here. | Suggested by Headbomb | All pages linked from cached copy of Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | via #UCB_webform_linked 80/450
 
Line 5:
== Techniques ==
 
In August 2003, [[Concept Searching Limited]] introduced the idea of using statistical compound-term processing.<ref>{{cite journal|url=http://www.conceptsearching.com/Web/UserFiles/File/Concept%20Searching%20Lateral%20Thinking.pdf|title=Lateral Thinking in Information Retrieval|journal=INFORMATIONInformation MANAGEMENTManagement ANDand TECHNOLOGYTechnology|volume=36 PART 4|access-date=2008-06-20|archive-url=https://web.archive.org/web/20171115145846/https://www.conceptsearching.com/Web/UserFiles/File/Concept%20Searching%20Lateral%20Thinking.pdf|archive-date=2017-11-15|url-status=dead}} The British Library Direct catalogue entry can be found here:[http://direct.bl.uk/bld/PlaceOrder.do?UIN=138451913&ETOC=RN] {{Webarchive|url=https://web.archive.org/web/20120210133832/http://direct.bl.uk/bld/PlaceOrder.do?UIN=138451913&ETOC=RN |date=2012-02-10 }}</ref>
 
CLAMOUR is a European collaborative project which aims to find a better way to classify when collecting and disseminating industrial information and statistics. CLAMOUR appears to use a linguistic approach, rather than one based on [[statistical model|statistical modelling]].<ref>[http://webarchive.nationalarchives.gov.uk/20040117000117/statistics.gov.uk/methods_quality/clamour/default.asp] National Statistics CLAMOUR project</ref>
Line 11:
== History ==
 
Techniques for probabilistic weighting of single word terms date back to at least 1976 in the landmark publication by [[Stephen Robertson (computer scientist)|Stephen E. Robertson]] and [[Karen Spärck Jones]].<ref>{{Cite journal | doi = 10.1002/asi.4630270302| title = Relevance weighting of search terms| journal = Journal of the American Society for Information Science| volume = 27| issue = 3| pages = 129| year = 1976| last1 = Robertson | first1 = S. E. | authorlink1 = Stephen Robertson (computer scientist)| last2 = Spärck Jones | first2 = K. | authorlink2 = Karen Spärck Jones}}</ref> Robertson stated that the assumption of word independence is not justified and exists as a matter of mathematical convenience. His objection to the term independence is not a new idea, dating back to at least 1964 when H. H. Williams stated that "[t]he assumption of independence of words in a document is usually made as a matter of mathematical convenience".<ref>{{cite journal |last=WILLIAMS |first=J.H. |title=Results of classifying documents with multiple discriminant functions |url=http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=AD0612272 |journal=Statistical Association Methods for Mechanized Documentation, National Bureau of Standards |___location=Washington |pppages=217-224217–224 |year=1965 |access-date=2015-05-21 |archive-url=https://web.archive.org/web/20110717145048/http://oai.dtic.mil/oai/oai?verb=getRecord |archive-date=2011-07-17 |url-status=dead }}</ref>
 
In 2004, Anna Lynn Patterson filed patents on "phrase-based searching in an information retrieval system"<ref>{{patent|US|20060031195}}</ref> to which [[Google]] subsequently acquired the rights.<ref>[http://www.seobythesea.com/2012/02/google-acquires-cuil-patent-applications/ Google Acquires Cuil Patent Applications]</ref>