Revision as of 02:28, 27 July 2020 edit InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,695,291 edits Rescuing 2 sources and tagging 0 as dead.) #IABot (v2.0.1 ← Previous edit		Latest revision as of 00:05, 1 January 2021 edit undo Citation bot (talk \| contribs) Bots 5,870,806 edits Alter: journal. Add: pages. Removed parameters. Formatted dashes. \| You can use this bot yourself. Report bugs here. \| Suggested by Headbomb \| All pages linked from cached copy of Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| via #UCB_webform_linked 80/450
Line 5: == Techniques == In August 2003, [[Concept Searching Limited]] introduced the idea of using statistical compound-term processing.<ref>{{cite journal\|url=http://www.conceptsearching.com/Web/UserFiles/File/Concept%20Searching%20Lateral%20Thinking.pdf\|title=Lateral Thinking in Information Retrieval\|journal=~~INFORMATION~~Information ~~MANAGEMENT~~Management ~~AND~~and ~~TECHNOLOGY~~Technology\|volume=36 PART 4\|access-date=2008-06-20\|archive-url=https://web.archive.org/web/20171115145846/https://www.conceptsearching.com/Web/UserFiles/File/Concept%20Searching%20Lateral%20Thinking.pdf\|archive-date=2017-11-15\|url-status=dead}} The British Library Direct catalogue entry can be found here:[http://direct.bl.uk/bld/PlaceOrder.do?UIN=138451913&ETOC=RN] {{Webarchive\|url=https://web.archive.org/web/20120210133832/http://direct.bl.uk/bld/PlaceOrder.do?UIN=138451913&ETOC=RN \|date=2012-02-10 }}</ref> CLAMOUR is a European collaborative project which aims to find a better way to classify when collecting and disseminating industrial information and statistics. CLAMOUR appears to use a linguistic approach, rather than one based on [[statistical model\|statistical modelling]].<ref>[http://webarchive.nationalarchives.gov.uk/20040117000117/statistics.gov.uk/methods_quality/clamour/default.asp] National Statistics CLAMOUR project</ref> Line 11: == History == Techniques for probabilistic weighting of single word terms date back to at least 1976 in the landmark publication by [[Stephen Robertson (computer scientist)\|Stephen E. Robertson]] and [[Karen Spärck Jones]].<ref>{{Cite journal \| doi = 10.1002/asi.4630270302\| title = Relevance weighting of search terms\| journal = Journal of the American Society for Information Science\| volume = 27\| issue = 3\| pages = 129\| year = 1976\| last1 = Robertson \| first1 = S. E. \| authorlink1 = Stephen Robertson (computer scientist)\| last2 = Spärck Jones \| first2 = K. \| authorlink2 = Karen Spärck Jones}}</ref> Robertson stated that the assumption of word independence is not justified and exists as a matter of mathematical convenience. His objection to the term independence is not a new idea, dating back to at least 1964 when H. H. Williams stated that "[t]he assumption of independence of words in a document is usually made as a matter of mathematical convenience".<ref>{{cite journal \|last=WILLIAMS \|first=J.H. \|title=Results of classifying documents with multiple discriminant functions \|url=http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=AD0612272 \|journal=Statistical Association Methods for Mechanized Documentation, National Bureau of Standards \|___location=Washington \|pppages=~~217-224~~217–224 \|year=1965 \|access-date=2015-05-21 \|archive-url=https://web.archive.org/web/20110717145048/http://oai.dtic.mil/oai/oai?verb=getRecord \|archive-date=2011-07-17 \|url-status=dead }}</ref> In 2004, Anna Lynn Patterson filed patents on "phrase-based searching in an information retrieval system"<ref>{{patent\|US\|20060031195}}</ref> to which [[Google]] subsequently acquired the rights.<ref>[http://www.seobythesea.com/2012/02/google-acquires-cuil-patent-applications/ Google Acquires Cuil Patent Applications]</ref>

Compound-term processing: Difference between revisions