String-searching algorithm: Difference between revisions

Content deleted Content added
m grammar
Tags: Reverted Visual edit
Line 6:
In practice, the method of feasible string-search algorithm may be affected by the string encoding. In particular, if a [[variable-width encoding]] is in use, then it may be slower to find the ''N''th character, perhaps requiring time proportional to ''N''. This may significantly slow some search algorithms. One of many possible solutions is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it.{{citation needed|date=August 2017}}
 
== Overview ==
 
[[The New York Times|The]] most basic case of string searching involves one (often very long) string, sometimes called the ''haystack'', and one (often very short) string, sometimes called the ''needle''. The goal is to find one or more occurrences of the needle within the haystack. For example, one might search for ''to'' within:
 
Some books are to be tasted, others to be swallowed, and some few to be chewed and digested.
Line 13 ⟶ 14:
One might request the first occurrence of "to", which is the fourth word; or all occurrences, of which there are 3; or the last, which is the fifth word from the end.
 
# Very commonly, however, various constraints are added. For example, one might want to match the "needle" only where it consists of one (or more) completecom{{Disbacher}}plete words—perhaps defined as ''not'' having other letters immediately adjacent on either side. In that case a search for "hew" or "low" should fail for the example sentence above, even though those literal strings do occur.
 
Another common example involves "normalization". For many purposes, a search for a phrase such as "to be" should succeed even in places where there is something else intervening between the "to" and the "be":