Substring: Difference between revisions

Content deleted Content added
Substring: fix typo; suggest to explain new definition along the example
this has a cite in the body immediately below
 
(42 intermediate revisions by 26 users not shown)
Line 1:
{{Short description|Contiguous part of a sequence of symbols}}
{{About|the definition of a substring|the computer function which performs this operation|String functions (programming)}}
{{Distinguish|text=[[subsequence]], a generalization of substring}}
[[File:Substring.png|thumb|"''string''" is a substring of "''substring''"]]
A '''substring''' is a contiguous sequence of characters within a [[String (computer science)|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. For example, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
 
AIn [[Formal language|formal language theory]] and [[computer science]], a '''substring''' is a contiguous sequence of characters[[Character (computing)|character]]s within a [[String (computer science)|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. ForIn examplecontrast, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.
'''Prefix''' and '''suffix''' are special cases of substring. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>. A suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>.
 
'''PrefixPrefixes''' and '''suffixsuffixes''' are special cases of substringsubstrings. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>.; Alikewise, a suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>.
The list of all substrings of the string "''apple''" would be "''apple''", "''appl''", "''pple''", "''app''", "''ppl''", "''ple''", "''ap''", "''pp''", "''pl''", "''le''", "''a''", "''p''", "''l''", "''e''", "".
 
The substrings of the string "''apple''" would be:
"''a''", "''ap''", "''app''", "''appl''", "''apple''",
"''p''", "''pp''", "''ppl''", "''pple''",
"''pl''", "''ple''",
"''l''", "''le''"
"''e''", ""
(note the [[empty string]] at the end).
 
== Substring ==
 
A string <math>u</math> is a substring (or factor)<ref name=Lot97/> of a string <math>t</math> if there exists two strings <math>p</math> and <math>s</math> such that <math>t = pus</math>. In particular, the empty string is a substring of every string. A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If <math>s</math> is a substring of <math>t</math>, it is also a [[subsequence]], which is a more general concept. Given a pattern, you can find its occurrences in a string with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].
 
Example: The string <math>u=</math><code>ana</code> is equal to substrings (and subsequences) of <math>t=</math><code>banana</code> at two different offsets:
Line 19 ⟶ 28:
ana
 
The first occurrence is obtained with <math>p=</math><code>b</code> and <math>s=</math><code>nena</code>, while the second occurenceoccurrence is obtained with <math>p=</math><code>ban</code> and <math>s</math> being the empty string.
 
A substring of a string is a [[#Prefix|prefix]] of a [[#Suffix|suffix]] of the string, and equivalently a suffix of a prefix; for example, <code>nan</code> is a prefix of <code>nana</code>, which is in turn a suffix of <code>banana</code>. If <math>u</math> is a substring of <math>t</math>, it is also a [[subsequence]], which is a more general concept. The occurrences of a given pattern in a given string can be found with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].
In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe). {{citation needed|date=November 2020}}
 
== Prefix ==
Line 37 ⟶ 47:
== Suffix ==
 
A string <math>s</math> is a suffix<ref name=Lot97/> of a string <math>t</math> if there exists a string <math>p</math> such that <math>t = ps</math>. A ''proper suffix'' of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty.{{ref|Gus97}}. A suffix can be seen as a special case of a substring.
 
Example: The string <code>nana</code> is equal to a suffix (and substring and subsequence) of the string <code>banana</code>:
Line 49 ⟶ 59:
== Border ==
 
A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "babooneatingakebabbaboon eating a kebab").{{citation needed|date=January 2022}}
 
== Superstring ==
 
A '''superstring''' of a finite set <math>P</math> of strings is a single string that contains every string in <math>P</math> as a substring. For example, <math>\text{bcclabccefab}</math> is a superstring of <math>P = \{\text{abcc}, \text{efab}, \text{bccla}\}</math>, and <math>\text{efabccla}</math> is a shorter one. Generally, one is interested in finding superstrings whose length is as small as possible;{{Clarify|reason=why are we interested in them?|date=June 2010}} a concatenation ofConcatenating all stringsmembers of <math>P</math>, in anyarbitrary order, always givesobtains a trivial superstring of <math>P</math>. Finding superstrings whose length is as small as possible is a more interesting problem.
 
A string that contains every possible permutation of a specified character set is called a [[superpermutation]].
 
== See also ==
Line 65 ⟶ 77:
| last = Gusfield
| first = Dan
| origyearorig-year = 1997
| year = 1999
| title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology
| publisher = Cambridge University Press
| ___location = USAUS
| isbn = 0-521-58519-8
}}</ref>
Line 91 ⟶ 103:
}}</ref>
}}
 
==External links==
*{{Commonscatinline}}
 
[[Category:String (computer science)]]