Desktop search: Difference between revisions

Content deleted Content added
m Technologies: disambig
m Stray words removed - everything should be a fact
 
(356 intermediate revisions by more than 100 users not shown)
Line 1:
{{Multiple issues|
'''Desktop search''' is the name for the emerging field of search tools which search the contents of a user's own computer files, rather than searching the Internet. The emphasis is on [[data mining]] all the information that is available on the user's PC, including web browser histories, e-mail archives, word-processor documents, and so on.
{{technical|date=October 2014}}
{{how-to||date=October 2016}}
}}
[[File:AdunaAutoFocus5.png|thumb|OSL Desktop Search engines software Aduna AutoFocus 5]]
'''Desktop search''' tools search within a user's own [[computer files]] as opposed to searching the Internet. These tools are designed to find information on the user's PC, including web browser history, e-mail archives, text documents, sound files, images, and video. A variety of desktop search programs are now available; see [[List of search engines#Desktop search engines|this list]] for examples. Most desktop search programs are standalone applications. Desktop search products are software alternatives to the search software included in the [[operating system]], helping users sift through desktop files, emails, attachments, and more.<ref>[http://www.brianmadden.com/blogs/brianmadden/archive/2015/03/11/what-do-you-do-for-desktop-search-in-vdi-and-rdsh.aspx "What do you do for desktop search in VDI and RDSH?"]. Blogpost by Brian Madden on brainmadden.com. Retrieved on March 25, 2015.</ref><ref>{{cite web|url=https://venturebeat.com/2008/06/02/lookeen-offers-a-new-way-way-for-outlook-users-to-search/|title=Lookeen offers a new way for Outlook users to search|author=Anthony Ha|date=2 June 2008|work=VentureBeat|access-date=8 March 2016}}</ref><ref>{{cite web|url=http://www.computerworld.com/article/2475293/desktop-apps/x1-rises-again-with-desktop-search-8--virtual-edition.html/|title=X1 rises again with Desktop Search 8, Virtual Edition|author=Robert L. Mitchell|date=8 May 2013|work=Computerworld|access-date=24 June 2015}}</ref>
 
Desktop search emerged as a concern for large firms for two main reasons: untapped productivity and security. According to analyst firm Gartner, up to 80% of some companies' data is locked up inside [[unstructured data]] — the information stored on a user's PC, the directories (folders) and files they've created on a [[Computer network|network]], documents stored in repositories such as corporate [[intranet]]s and a multitude of other locations.<ref>{{Citation | url = http://www.computerweekly.com/Articles/2006/04/25/215622/security-special-report-who-sees-your-data.htm | title = Security special report: Who sees your data? | newspaper = Computer Weekly | date = 2006-04-25}}.</ref> Moreover, many companies have structured or unstructured information stored in older [[file formats]] to which they don't have ready access.
 
The sector attracted considerable attention in the late 2004 to early 2005 period from the struggle between Microsoft and Google.<ref>{{cite news|url=http://news.bbc.co.uk/1/hi/technology/3952285.stm|title=BBC NEWS - Technology - Search wars hit desktop computers|work=bbc.co.uk|date=26 October 2004|access-date=24 June 2015}}</ref><ref>{{cite web|url=http://www.kmworld.com/Articles/Editorial/Features/The-evolution-of-desktop-search--Good-news-for-the-knowledge-worker-9608.aspx|title=KMWorld - The Evolution of Desktop Search|date=February 2005 |access-date=7 January 2019}}.</ref><ref>{{cite web|url=https://www.dtsearch.co.uk/the-blog/blog/2014/october/23/desktop-wars.aspx|title=dtSearch UK Blog - Desktop Wars|access-date=8 January 2019}}</ref> According to market analysts, both companies were attempting to leverage their monopolies (of [[web browser]]s and [[search engine]]s, respectively) to strengthen their dominance. Due to [[Google]]'s complaint that users of Windows Vista cannot choose any competitor's desktop search program over the built-in one, an agreement was reached between [[US Justice Department]] and [[Microsoft]] that [[Windows Vista Service Pack 1]] would enable users to choose between the built-in and other desktop search programs, and select which one is to be the default.<ref>{{cite web|url=http://goebelgroup.com/searchtoolblog/2007/06/20/microsoft-agrees-to-change-vista-desktop-search-tool/|title=SearchMax|work=goebelgroup.com|access-date=24 June 2015|archive-url=https://web.archive.org/web/20131227130749/http://goebelgroup.com/searchtoolblog/2007/06/20/microsoft-agrees-to-change-vista-desktop-search-tool/|archive-date=27 December 2013|url-status=dead}}</ref> As of September 2011, Google ended life for [[Google Desktop]].
 
== Technologies ==
Most desktop search engines build and maintain an [[Index (search engine)|index database]] to improve performance when searching large amounts of [[data]]. Indexing usually takes place when the computer is idle and most search applications can be set to suspend indexing if a portable computer is running on batteries, in order to save power. There are notable exceptions, however: [[Everything (software)|Voidtools' Everything Search Engine]],<ref>{{cite web|title=Everything Search Engine|url=http://www.voidtools.com/|publisher=voidtools|access-date=27 December 2013}}</ref> which performs searches over only file names, not contents, is able to build its index from scratch in just a few seconds. Another exception is Vegnos Desktop Search Engine,<ref>{{cite web|title=Vegnos|url=http://www.vegnos.com|publisher=Vegnos|access-date=27 December 2013}}</ref> which performs searches over filenames and files' contents without building any indices. An index may also not be up-to-date, when a query is performed. In this case, results returned will not be accurate (that is, a hit may be shown when it is no longer there, and a file may not be shown, when in fact it is a hit). Some products have sought to remedy this disadvantage by building a real-time indexing function into the software. There are disadvantages to not indexing. Namely, the time to complete a query can be significant, and the issued query can also be resource-intensive.
The only way to achieve reasonable performance when searching several gigabytes of data is to build and maintain an index database. When indexing the files, desktop search tools collect three types of information about files:
 
Desktop search tools typically collect three types of information about files:
#* file and directoryfolder names
* [[metadata]], such as titles, authors, comments in file types such as [[MP3]], [[Portable Document Format|PDF]] and [[JPEG]]
* file content, for the types of documents supported by the tool
 
Long-term goals for desktop search include the ability to search the [[content-based image retrieval|contents of image files]], sound files and video by context.<ref>{{cite web|url=http://www.niallkennedy.com/blog/archives/2006/10/video-search.html|title=The current state of video search|author=Niall Kennedy|date=17 October 2006|work=Niall Kennedy|access-date=24 June 2015}}</ref><ref>{{cite web|url=http://www.niallkennedy.com/blog/archives/2006/10/audio-search.html|title=The current state of audio search|author=Niall Kennedy|date=15 October 2006|work=Niall Kennedy|access-date=24 June 2015}}</ref>
 
==Platforms & their histories==
 
=== Windows ===
[[File:Lookeen_Desktop_Search_-_Screenshot_of_the_Software.jpg|thumb|[[Lookeen]] desktop search on Windows]]
[[Indexing Service]], a "base service that extracts content from files and constructs an indexed catalog to facilitate efficient and rapid searching",<ref>{{cite web|url=https://msdn.microsoft.com/en-us/library/ee805985%28v=vs.85%29.aspx|title=Indexing Service|publisher=Microsoft|work=microsoft.com|access-date=24 June 2015}}</ref> was originally released in August 1996. It was built in order to speed up manually searching for files on Personal Desktops and Corporate Computer Network. Indexing service helped by using Microsoft web servers to index files on the desired hard drives. Indexing was done by file format. By using terms that users provided, a search was conducted that matched terms to the data within the file formats. The largest issue that Indexing service faced was that every time a file was added, it had to be indexed. This coupled with the fact that the indexing cached the entire index in RAM, made the hardware a huge limitation.<ref>{{cite web|url=https://msdn.microsoft.com/en-us/library/dd582937%28v=office.11%29.aspx|title=Indexing with Microsoft Index Server|publisher=Microsoft|work=microsoft.com|access-date=24 June 2015}}</ref> This made indexing large amounts of files require extremely powerful hardware and very long wait times.
 
In 2003, [[Windows Desktop Search]] (WDS) replaced Microsoft Indexing Service. Instead of only matching terms to the details of the file format and file names, WDS brings in content indexing to all Microsoft files and text-based formats such as e-mail and text files. This means, that WDS looked into the files and indexed the content. Thus, when a user searched a term, WDS no longer matched just information such as file format types and file names, but terms, and values stored within those files. WDS also brought "Instant searching" meaning the user could type a character and the query would instantly start searching and updating the query as the user typed in more characters.<ref>{{cite web|url=http://www.microsoft.com/windows/products/winfamily/desktopsearch/technicalresources/techfaq.mspx|archive-url=https://web.archive.org/web/20110924212903/http://www.microsoft.com/windows/products/winfamily/desktopsearch/technicalresources/techfaq.mspx|title=Windows Search: Technical FAQ|archive-date=24 September 2011|publisher=Microsoft|work=microsoft.com|access-date=24 June 2015}}</ref> Windows Search apparently used up a lot of processing power, as Windows Desktop Search would only run if it was directly queried or while the PC was idle. Even only running while directly queried or while the computer was idled, indexing the entire hard drive still took hours. The index would be around 10% of the size of all the files that it indexed, e.g. if the indexed files amounted to around 100GB, the index size would be 10GB.
 
With the release of [[Windows Vista]] came [[Windows Search]] 3.1. Unlike its predecessors WDS and Windows Search 3.0, 3.1 could search through both indexed and non indexed locations seamlessly. Also, the [[RAM]] and [[CPU]] requirements were greatly reduced, cutting back indexing times immensely. Windows Search 4.0 is currently running on all PCs with [[Windows 7]] and up.
 
=== Mac OS ===
In 1994 the [[AppleSearch]] search engine was introduced, allowing users to fully search all documents within their Macintosh computer, including file format types, meta-data on those files, and content within the files. AppleSearch was a [[Client–server model|client/server application]], and as such required a server separate from the main device in order to function. The biggest issue with AppleSearch were its large resource requirements: "AppleSearch requires at least a 68040 processor and 5MB of RAM."<ref>{{cite web|url=http://infomotions.com/musings/tricks/manuscript/1600-0001.html|title=AppleSearch|work=infomotions.com|access-date=24 June 2015}}</ref> At the time, a Macintosh computer with these specifications was priced at approximately $1400; equivalent to $2050 in 2015.<ref>{{cite web|url=http://stats.areppim.com/calc/calc_usdlrxdeflator.php|title=Converter of current to real US dollars - using the GDP deflator|author=eduardo casais|work=areppim.com|access-date=24 June 2015}}</ref> On top of this, the software itself cost an additional $1400 for a single license.
 
In 1997, [[Sherlock (software)|Sherlock]] was released alongside Mac OS 8.5. Sherlock (named after the famous fictional detective [[Sherlock Holmes]]) was integrated into Mac OS's file browser&nbsp;– [[Finder (software)|Finder]]. Sherlock extended the desktop search function to the World Wide Web, allowing users to search both locally and externally. Adding additional functions—such as internet access—to Sherlock was relatively simple, as this was done through plugins written as plain text files. Sherlock was included in every release of Mac OS from [[Mac OS 8]], before being deprecated and replaced by [[Spotlight (software)|Spotlight]] and [[Dashboard (Mac OS)|Dashboard]] in [[Mac OS X Tiger|Mac OS X 10.4 Tiger]]. It was officially removed in [[Mac OS X Leopard|Mac OS X 10.5 Leopard]]
# file and directory names
# meta data, such as titles, authors, comments
# content of supported documents.
 
[[Spotlight (software)|Spotlight]] was released in 2005 as part of [[Mac OS X Tiger|Mac OS X 10.4 Tiger]]. It is a Selection-based search tool, which means the user invokes a query using only the mouse. Spotlight allows the user to search the Internet for more information about any keyword or phrase contained within a document or webpage, and uses a built-in calculator and Oxford American Dictionary to offer quick access to small calculations and word definitions.<ref>{{cite web|url=https://www.apple.com/pr/library/2005/04/12Apple-to-Ship-Mac-OS-X-Tiger-on-April-29.html|title=Apple - Press Info - Apple to Ship Mac OS X "Tiger" on April 29|work=apple.com|access-date=24 June 2015}}</ref> While Spotlight initially has a long startup time, this decreases as the hard disk is indexed. As files are added by the user, the index is constantly updated in the background using minimal CPU & RAM resources.
To search within documents, the tools need to be able to parse many different types of document. This is achieved by using filters that interpret selected file formats. For example, ''Microsoft Office Filter'' is used to search inside MS Office documents.
 
=== Linux ===
Long-term goals for desktop search include the ability to search the contents of image files, sound files and video by context.
There are a wide range of desktop search options for Linux users, depending upon the skill level of the user, their preference to use desktop tools which tightly integrate into their desktop environment, command-shell functionality (often with advanced scripting options), or browser-based users interfaces to locally running software. In addition, many users create their own indexing from a variety of indexing packages (e.g. one which does extraction and indexing of PDF/DOC/DOCX/[[OpenDocument|ODT]] documents well, another search engine which works ith/ vcard, LDAP, and other directory/contact databases, as well as the conventional <code>find</code> and <code>locate</code> commands.
 
====Ubuntu====
The sector has attracted considerable attention from the emerging struggle between Microsoft and Google. According to market analysts, both companies are attempting to leverage their monopolies (of [[web browser]]s and [[search engine]]s, respectfully) to strengthen their dominance. This bring back memories of the [[browser wars]] of the 1990's.
[[File:App Lens on Ubuntu 16.04LTS.png|thumb|[[Unity Dash]] search tool in Ubuntu 16.04]]
[[Ubuntu distribution|Ubuntu Linux]] didn't have desktop search until release [[Ubuntu version history#Ubuntu 7.04 (Feisty Fawn)|Feisty Fawn 7.04]]. Using [[Tracker (search software)|Tracker]]<ref>{{cite web|url=https://arstechnica.com/information-technology/2007/07/afirst-look-at-tracker-0-6-0/|title=A first look at Tracker 0.6.0|work=Ars Technica|date=26 July 2007|access-date=24 June 2015}}</ref> desktop search, the desktop search feature was very similar to Mac OS's AppleSearch and Sherlock. It not only featured the basic features of file format sorting and meta-data matching, but support for searching through emails and instant messages was added. In 2014 [[Recoll]]<ref>{{cite web|url=http://www.lesbonscomptes.com/recoll/usermanual/index.html#RCL.INDEXING|title=Recoll user manual|work=lesbonscomptes.com|access-date=24 June 2015}}</ref> was added to Linux distributions, working with other search programs such as Tracker and [[Beagle (software)|Beagle]] to provide efficient full text search. This greatly increased the types of queries and file types that Linux desktop searches could handle. A major advantage of Recoll is that it allows for greater customization of what is indexed; Recoll will index the entire hard disk by default, but can be made to index only selected directories, omitting directories that will never need to be searched.<ref>{{cite web|url=http://archive09.linux.com/feature/114283|title=Linux.com|access-date=24 June 2015}}</ref>
 
====[[openSUSE]]====
Some of the players in this emerging search market segment are:
<!--TODO! Prior desktop search before KDE 3.5-->
* [[Beagle_search_tool|Beagle]] - Multi-platform desktop search tool primarily developed for Linux by Novell.
Starting with [[KDE4]], the [[NEPOMUK (software)|NEPOMUK]] was introduced. It provided the ability to index a wide range of desktop content, email, and use semantic web technologies (e.g. [[Resource Description Framework|RDF]]) to annotate the database. The introduction faced a few glitches, much of which seemed to be based on the [[triplestore]]. Performance improved (at least for queries) by switching the backend to a stripped-down version of the [[Virtuoso Universal Server|Virtuoso]] Open Source Edition, however indexing remained a common user complaint.
* [[Beetext]] - Beetext Find Desktop, LAN and Web-based corporate Search Engines http://www.beetext.com.
Based on user feedback, the Nepomuk indexing and search has been replaced with the Baloo framework<ref>{{Cite web|url=https://community.kde.org/Baloo|title = Baloo - KDE Community Wiki}}</ref> based on [[Xapian]].<ref>{{cite web |url=http://www.opensuse.org/ |title=Home |website=opensuse.org}}</ref>
* [[HotBot]] - Lycos HotBot has an [[adware]] desktop search toolbar for IE.
* [[Copernic]] - Products include [http://www.copernic.com/en/products/desktop-search/index.html Copernic Desktop Search].
* [[Google]] - [[Google_Desktop_Search#Google_Desktop_Search|Google Desktop Search]]. Integrates with the main Google search engine page.
* [[Autonomy (software)|Autonomy]] - IDOL Enterprise Desktop Search.
* [[Microsoft]] - [[MSN Toolbar Suite]] [[Development stage|beta]] incorporates much of the technology promised for [[Longhorn]], the next version of [[Microsoft Windows]]. The search integrates into the [[task bar]] and [[Internet Explorer]] windows.
* [[Ask Jeeves]]
* [[Yahoo!]] - Yahoo's beta Desktop Search is based around [[X1 Desktop Search|X1]].
* [[AOL]] - plans to release its own application based on Copernic technology.
* [[Apple Computer|Apple]] - [[Mac OS X]] "Tiger" includes a desktop search feature called [[Spotlight (software)|Spotlight]].
* [[X1 Desktop Search]] - Expensive, but comprehensive, corporate tool.
* [[Blinkx]] - Offers a desktop search tool.
 
== See Also also==
*[[List of search engines#Desktop search engines|List of desktop search engines]]
* [[Desktop organizer]]
* [[Search engine]]
 
== External linksReferences ==
{{reflist|2}}
* ''[http://slate.msn.com/id/2111643/ Keeper Finders]'', by Paul Boutin, ''[[Slate (magazine)|Slate]]'', December 31, 2004 - comparison of Google, Ask Jeeves, HotBot, MSN and Copernic desktop search tools. Copernic is the best, MSN search is second.
* Marc Orchant at [http://office.weblogsinc.com/ The Office Weblog] is very interested in Desktop Search tools and offers comparative reviews.
* [http://www.goebelgroup.com/desktopmatrix.htm Desktop Search tools comparison chart]
* [http://www.windowsecurity.com/articles/Security-Risks-Desktop-Searches.html The Security Risks Of Desktop Searches]
 
{{Navigationbox Desktopsearch}}
[[zh:&#26700;&#38754;&#25628;&#32034;]]
 
{{DEFAULTSORT:Desktop Search}}
[[Category:Information technology]]
[[Category:Desktop search engines| ]]
[[Category:Information technologyretrieval genres]]