Substring: Difference between revisions

Content deleted Content added
top: suggest to use double quotes, so no nowiki tags at all are needed
this has a cite in the body immediately below
 
(44 intermediate revisions by 27 users not shown)
Line 1:
{{Short description|Contiguous part of a sequence of symbols}}
{{About|the definition of a substring|the computer function which performs this operation|String functions (programming)}}
{{Distinguish|text=[[subsequence]], a generalization of substring}}
[[File:Substring.png|thumb|"''string''" is a substring of "''substring''"]]
A '''substring''' is a contiguous sequence of characters within a [[String (computer science)|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. For example, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
 
AIn [[Formal language|formal language theory]] and [[computer science]], a '''substring''' is a contiguous sequence of characters[[Character (computing)|character]]s within a [[String (computer science)|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. ForIn examplecontrast, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
'''Prefix''' and '''suffix''' are special cases of substring. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>. A suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>.
 
'''PrefixPrefixes''' and '''suffixsuffixes''' are special cases of substringsubstrings. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>.; Alikewise, a suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>.
The list of all substrings of the string "''apple''" would be "''apple''", "''appl''", "''pple''", "''app''", "''ppl''", "''ple''", "''ap''", "''pp''", "''pl''", "''le''", "''a''", "''p''", "''l''", "''e''", "".
 
The substrings of the string "''apple''" would be:
"''a''", "''ap''", "''app''", "''appl''", "''apple''",
"''p''", "''pp''", "''ppl''", "''pple''",
"''pl''", "''ple''",
"''l''", "''le''"
"''e''", ""
(note the [[empty string]] at the end).
 
== Substring ==
 
A string <math>u</math> is a substring (or factor)<ref name=Lot97/> of a string <math>t</math> if there exists two strings <math>p</math> and <math>s</math> such that <math>t = pus</math>. In particular, the empty string is a substring of every string.
A substring (or factor) of a string <math>T = t_1 \dots t_n</math> is a string <math>\hat T = t_{1+i} \dots t_{m+i}</math>, where <math>0 \leq i</math> and <math>m + i \leq n</math>. A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If <math>\hat T</math> is a substring of <math>T</math>, it is also a [[subsequence]], which is a more general concept. Given a pattern <math>P</math>, you can find its occurrences in a string <math>T</math> with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].
 
Example: The string <math>u=</math><code>ana</code> is equal to substrings (and subsequences) of <math>t=</math><code>banana</code> at two different offsets:
 
banana
Line 19 ⟶ 28:
ana
 
The first occurrence is obtained with <math>p=</math><code>b</code> and <math>s=</math><code>na</code>, while the second occurrence is obtained with <math>p=</math><code>ban</code> and <math>s</math> being the empty string.
In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe).
 
A substring (or factor) of a string <math>T = t_1 \dots t_n</math> is a string[[#Prefix|prefix]] <math>\hatof Ta =[[#Suffix|suffix]] t_{1+i}of \dotsthe t_{m+i}</math>string, whereand <math>0equivalently \leqa i</math>suffix andof <math>ma +prefix; ifor \leqexample, n<code>nan</mathcode>. A substring of a string is a prefix of a<code>nana</code>, suffixwhich ofis thein string, and equivalentlyturn a suffix of a prefix<code>banana</code>. If <math>\hat Tu</math> is a substring of <math>Tt</math>, it is also a [[subsequence]], which is a more general concept. GivenThe aoccurrences patternof <math>P</math>,a yougiven can find its occurrencespattern in a given string <math>T</math>can be found with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].
Not including the empty substring, the number of substrings of a string of length <math>n</math> where symbols only occur once, is the number of ways to choose two distinct places between symbols to start/end the substring. Including the very beginning and very end of the string, there are <math>n+1</math> such places. So there are <math>\tbinom{n+1}{2} = \tfrac{n(n+1)}{2}</math> non-empty substrings.
In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe). {{citation needed|date=November 2020}}
 
== Prefix ==
{{see also|String operations#Prefixes}}
A prefix of a string <math>Tp</math> =is t_1a \dotsprefix<ref t_n<name=Lot97/math> isof a string <math>\widehatt</math> Tif =there t_1exists \dotsa t_{m}string <math>s</math>, wheresuch that <math>mt \leq= nps</math>. A ''proper prefix'' of a string is not equal to the string itself (<math>0 \leq m < n</math>);<ref name=Kel95/> some sources<ref name=Gus97/> in addition restrict a proper prefix to be non-empty (<math>0 < m < n</math>). A prefix can be seen as a special case of a substring.
 
Example: The string <code>ban</code> is equal to a prefix (and substring and subsequence) of the string <code>banana</code>:
Line 33 ⟶ 43:
ban
 
The square subset symbol is sometimes used to indicate a prefix, so that <math>\widehat Tp \sqsubseteq Tt</math> denotes that <math>\widehat Tp</math> is a prefix of <math>Tt</math>. This defines a [[binary relation]] on strings, called the [[prefix relation]], which is a particular kind of [[prefix order]].
 
In [[formal language theory]], the term ''prefix of a string'' is also commonly understood to be the set of all prefixes of a string, with respect to that language.
 
== Suffix ==
 
A string <math>s</math> is a suffix<ref name=Lot97/> of a string is<math>t</math> anyif substringthere ofexists thea string which<math>p</math> includessuch itsthat last<math>t letter,= including itselfps</math>. A ''proper suffix'' of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty.{{ref|Gus97}}. A suffix can be seen as a special case of a substring.
 
Example: The string <code>nana</code> is equal to a suffix (and substring and subsequence) of the string <code>banana</code>:
Line 51 ⟶ 59:
== Border ==
 
A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "babooneatingakebabbaboon eating a kebab").{{citation needed|date=January 2022}}
 
== Superstring ==
 
A '''superstring''' of a finite set <math>P</math> of strings is a single string that contains every string in <math>P</math> as a substring. For example, <math>\text{bcclabccefab}</math> is a superstring of <math>P = \{\text{abcc}, \text{efab}, \text{bccla}\}</math>, and <math>\text{efabccla}</math> is a shorter one. Generally, one is interested in finding superstrings whose length is as small as possible;{{Clarify|reason=why are we interested in them?|date=June 2010}} a concatenation ofConcatenating all stringsmembers of <math>P</math>, in anyarbitrary order, always givesobtains a trivial superstring of <math>P</math>. Finding superstrings whose length is as small as possible is a more interesting problem.
 
A string that contains every possible permutation of a specified character set is called a [[superpermutation]].
 
== See also ==
Line 67 ⟶ 77:
| last = Gusfield
| first = Dan
| origyearorig-year = 1997
| year = 1999
| title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
| publisher = Cambridge University Press
| ___location = USAUS
| isbn = 0-521-58519-8
}}</ref>
Line 82 ⟶ 92:
| ___location = London
| isbn = 0-13-497777-7
}}</ref>
<ref name=Lot97>{{cite book
| last = Lothaire
| first = M.
| year = 1997
| title = Combinatorics on words
| publisher = Cambridge University Press
| ___location = Cambridge
| isbn = 0-521-59924-5
}}</ref>
}}
 
==External links==
*{{Commonscatinline}}
 
[[Category:String (computer science)]]