Substring: Difference between revisions

Content deleted Content added
No edit summary
Tags: Mobile edit Mobile web edit
this has a cite in the body immediately below
 
(47 intermediate revisions by 28 users not shown)
Line 1:
{{Short description|Contiguous part of a sequence of symbols}}
{{About|the definition of a substring|the computer function which performs this operation|String functions (programming)}}
[[File:Substring.png|thumb|'<nowiki/>''string''<nowiki/>' is
{{Distinguish|text=[[subsequence]], a generalization of substring}}
[[File:Substring.png|thumb|'<nowiki/>"''string''<nowiki/>'" is a substring of "''substring''"]]
 
AIn [[Formal language|formal language theory]] and [[computer science]], a '''substring''' is a contiguous sequence of characters[[Character (computing)|character]]s within a [[String (computer science)|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. ForIn examplecontrast, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
Duwbeyebdehsisbw d eueibstring''<nowiki/>']]
A '''substring''' is a contiguous sequence of characters within a [[String (computer science)|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. For example, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
 
'''PrefixPrefixes''' and '''suffixsuffixes''' are special cases of substringsubstrings. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>.; Alikewise, a suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>.
 
The list of all substrings of the string "''apple''" would be :
"''applea''", "''applap''", "''ppleapp''", "''appappl''", "''pplapple''",
"''plep''", "''appp''", "''ppppl''", "''plpple''",
"''lepl''", "''aple''",
"''pl''", "''lle''",
"''e''", "".
(note the [[empty string]] at the end).
 
== Substring ==
 
A string <math>u</math> is a substring (or factor)<ref name=Lot97/> of a string <math>t</math> if there exists two strings <math>p</math> and <math>s</math> such that <math>t = pus</math>. In particular, the empty string is a substring of every string.
A substring (or factor) of a string <math>T = t_1 \dots t_n</math> is a string <math>\hat T = t_{1+i} \dots t_{m+i}</math>, where <math>0 \leq i</math> and <math>m + i \leq n</math>. A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If <math>\hat T</math> is a substring of <math>T</math>, it is also a [[subsequence]], which is a more general concept. Given a pattern <math>P</math>, you can find its occurrences in a string <math>T</math> with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].
 
Example: The string <math>u=</math><code>ana</code> is equal to substrings (and subsequences) of <math>t=</math><code>banana</code> at two different offsets:
 
banana
Line 21 ⟶ 28:
ana
 
The first occurrence is obtained with <math>p=</math><code>b</code> and <math>s=</math><code>na</code>, while the second occurrence is obtained with <math>p=</math><code>ban</code> and <math>s</math> being the empty string.
In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe).
 
A substring (or factor) of a string <math>T = t_1 \dots t_n</math> is a string[[#Prefix|prefix]] <math>\hatof Ta =[[#Suffix|suffix]] t_{1+i}of \dotsthe t_{m+i}</math>string, whereand <math>0equivalently \leqa i</math>suffix andof <math>ma +prefix; ifor \leqexample, n<code>nan</mathcode>. A substring of a string is a prefix of a<code>nana</code>, suffixwhich ofis thein string, and equivalentlyturn a suffix of a prefix<code>banana</code>. If <math>\hat Tu</math> is a substring of <math>Tt</math>, it is also a [[subsequence]], which is a more general concept. GivenThe aoccurrences patternof <math>P</math>,a yougiven can find its occurrencespattern in a given string <math>T</math>can be found with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].
Not including the empty substring, the number of substrings of a string of length <math>n</math> where symbols only occur once, is the number of ways to choose two distinct places between symbols to start/end the substring. Including the very beginning and very end of the string, there are <math>n+1</math> such places. So there are <math>\tbinom{n+1}{2} = \tfrac{n(n+1)}{2}</math> non-empty substrings.
In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe). {{citation needed|date=November 2020}}
 
== Prefix ==
{{see also|String operations#Prefixes}}
A prefix of a string <math>Tp</math> =is t_1a \dotsprefix<ref t_n<name=Lot97/math> isof a string <math>\widehatt</math> Tif =there t_1exists \dotsa t_{m}string <math>s</math>, wheresuch that <math>mt \leq= nps</math>. A ''proper prefix'' of a string is not equal to the string itself (<math>0 \leq m < n</math>);<ref name=Kel95/> some sources<ref name=Gus97/> in addition restrict a proper prefix to be non-empty (<math>0 < m < n</math>). A prefix can be seen as a special case of a substring.
 
Example: The string <code>ban</code> is equal to a prefix (and substring and subsequence) of the string <code>banana</code>:
Line 35 ⟶ 43:
ban
 
The square subset symbol is sometimes used to indicate a prefix, so that <math>\widehat Tp \sqsubseteq Tt</math> denotes that <math>\widehat Tp</math> is a prefix of <math>Tt</math>. This defines a [[binary relation]] on strings, called the [[prefix relation]], which is a particular kind of [[prefix order]].
 
In [[formal language theory]], the term ''prefix of a string'' is also commonly understood to be the set of all prefixes of a string, with respect to that language.
 
== Suffix ==
 
A string <math>s</math> is a suffix<ref name=Lot97/> of a string is<math>t</math> anyif substringthere ofexists thea string which<math>p</math> includessuch itsthat last<math>t letter,= including itselfps</math>. A ''proper suffix'' of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty.{{ref|Gus97}}. A suffix can be seen as a special case of a substring.
 
Example: The string <code>nana</code> is equal to a suffix (and substring and subsequence) of the string <code>banana</code>:
Line 53 ⟶ 59:
== Border ==
 
A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "babooneatingakebabbaboon eating a kebab").{{citation needed|date=January 2022}}
 
== Superstring ==
 
A '''superstring''' of a finite set <math>P</math> of strings is a single string that contains every string in <math>P</math> as a substring. For example, <math>\text{bcclabccefab}</math> is a superstring of <math>P = \{\text{abcc}, \text{efab}, \text{bccla}\}</math>, and <math>\text{efabccla}</math> is a shorter one. Generally, one is interested in finding superstrings whose length is as small as possible;{{Clarify|reason=why are we interested in them?|date=June 2010}} a concatenation ofConcatenating all stringsmembers of <math>P</math>, in anyarbitrary order, always givesobtains a trivial superstring of <math>P</math>. Finding superstrings whose length is as small as possible is a more interesting problem.
 
A string that contains every possible permutation of a specified character set is called a [[superpermutation]].
 
== See also ==
Line 69 ⟶ 77:
| last = Gusfield
| first = Dan
| origyearorig-year = 1997
| year = 1999
| title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
| publisher = Cambridge University Press
| ___location = USAUS
| isbn = 0-521-58519-8
}}</ref>
Line 84 ⟶ 92:
| ___location = London
| isbn = 0-13-497777-7
}}</ref>
<ref name=Lot97>{{cite book
| last = Lothaire
| first = M.
| year = 1997
| title = Combinatorics on words
| publisher = Cambridge University Press
| ___location = Cambridge
| isbn = 0-521-59924-5
}}</ref>
}}
 
==External links==
*{{Commonscatinline}}
 
[[Category:String (computer science)]]