Substring: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:58, 10 September 2019 edit Jochen Burghardt (talk \| contribs) Extended confirmed users 24,302 edits →top: suggest to use double quotes, so no nowiki tags at all are needed ← Previous edit		Latest revision as of 07:25, 30 May 2025 edit undo Graue (talk \| contribs) Extended confirmed users 951 edits this has a cite in the body immediately below Tag: Visual edit
(44 intermediate revisions by 27 users not shown)
Line 1: {{Short description\|Contiguous part of a sequence of symbols}} {{About\|the definition of a substring\|the computer function which performs this operation\|String functions (programming)}} {{Distinguish\|text=[[subsequence]], a generalization of substring}} [[File:Substring.png\|thumb\|"''string''" is a substring of "''substring''"]] A '''substring''' is a contiguous sequence of characters within a [[String (computer science)\|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. For example, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring.▼ ▲AIn [[Formal language\|formal language theory]] and [[computer science]], a '''substring''' is a contiguous sequence of ~~characters~~[[Character (computing)\|character]]s within a [[String (computer science)\|string]]. For instance, "''the best of''" is a substring of "''It was the best of times''". ~~This is not to be confused with [[subsequence]], which is a [[generalization]] of substring. For~~In ~~example~~contrast, "''Itwastimes''" is a subsequence of "''It was the best of times''", but not a substring. '''Prefix''' and '''suffix''' are special cases of substring. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>. A suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>.▼ ▲'''~~Prefix~~Prefixes''' and '''~~suffix~~suffixes''' are special cases of ~~substring~~substrings. A prefix of a string <math>S</math> is a substring of <math>S</math> that occurs at the ''beginning'' of <math>S</math>.; Alikewise, a suffix of a string <math>S</math> is a substring that occurs at the ''end'' of <math>S</math>. The list of all substrings of the string "''apple''" would be "''apple''", "''appl''", "''pple''", "''app''", "''ppl''", "''ple''", "''ap''", "''pp''", "''pl''", "''le''", "''a''", "''p''", "''l''", "''e''", "". The substrings of the string "''apple''" would be: "''a''", "''ap''", "''app''", "''appl''", "''apple''", "''p''", "''pp''", "''ppl''", "''pple''", "''pl''", "''ple''", "''l''", "''le''" "''e''", "" (note the [[empty string]] at the end). == Substring == A string <math>u</math> is a substring (or factor)<ref name=Lot97/> of a string <math>t</math> if there exists two strings <math>p</math> and <math>s</math> such that <math>t = pus</math>. In particular, the empty string is a substring of every string. A substring (or factor) of a string <math>T = t_1 \dots t_n</math> is a string <math>\hat T = t_{1+i} \dots t_{m+i}</math>, where <math>0 \leq i</math> and <math>m + i \leq n</math>. A substring of a string is a prefix of a suffix of the string, and equivalently a suffix of a prefix. If <math>\hat T</math> is a substring of <math>T</math>, it is also a [[subsequence]], which is a more general concept. Given a pattern <math>P</math>, you can find its occurrences in a string <math>T</math> with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]].▼ Example: The string <math>u=</math><code>ana</code> is equal to substrings (and subsequences) of <math>t=</math><code>banana</code> at two different offsets: banana Line 19 ⟶ 28: ana The first occurrence is obtained with <math>p=</math><code>b</code> and <math>s=</math><code>na</code>, while the second occurrence is obtained with <math>p=</math><code>ban</code> and <math>s</math> being the empty string. In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe).▼ ▲A substring ~~(or factor)~~ of a string ~~<math>T = t_1 \dots t_n</math>~~ is a ~~string~~[[#Prefix\|prefix]] ~~<math>\hat~~of Ta =[[#Suffix\|suffix]] ~~t_{1+i}~~of ~~\dots~~the ~~t_{m+i}</math>~~string, ~~where~~and ~~<math>0~~equivalently ~~\leq~~a ~~i</math>~~suffix ~~and~~of ~~<math>m~~a +prefix; ifor ~~\leq~~example, n<code>nan</~~math~~code>~~. A substring of a string~~ is a prefix of a<code>nana</code>, ~~suffix~~which ofis ~~the~~in ~~string, and equivalently~~turn a suffix of ~~a prefix~~<code>banana</code>. If <math>~~\hat T~~u</math> is a substring of <math>Tt</math>, it is also a [[subsequence]], which is a more general concept. ~~Given~~The aoccurrences ~~pattern~~of ~~<math>P</math>,~~a ~~you~~given ~~can find its occurrences~~pattern in a given string ~~<math>T</math>~~can be found with a [[string searching algorithm]]. Finding the longest string which is equal to a substring of two or more strings is known as the [[longest common substring problem]]. Not including the empty substring, the number of substrings of a string of length <math>n</math> where symbols only occur once, is the number of ways to choose two distinct places between symbols to start/end the substring. Including the very beginning and very end of the string, there are <math>n+1</math> such places. So there are <math>\tbinom{n+1}{2} = \tfrac{n(n+1)}{2}</math> non-empty substrings. ▲In the mathematical literature, substrings are also called '''subwords''' (in America) or '''factors''' (in Europe). {{citation needed\|date=November 2020}} == Prefix == {{see also\|String operations#Prefixes}} A ~~prefix of a~~ string <math>Tp</math> =is ~~t_1~~a ~~\dots~~prefix<ref ~~t_n<~~name=Lot97/~~math~~> isof a string <math>~~\widehat~~t</math> Tif =there ~~t_1~~exists ~~\dots~~a ~~t_{m}~~string <math>s</math>, ~~where~~such that <math>mt ~~\leq~~= nps</math>. A ''proper prefix'' of a string is not equal to the string itself ~~(<math>0 \leq m < n</math>)~~;<ref name=Kel95/> some sources<ref name=Gus97/> in addition restrict a proper prefix to be non-empty ~~(<math>0 < m < n</math>)~~. A prefix can be seen as a special case of a substring. Example: The string <code>ban</code> is equal to a prefix (and substring and subsequence) of the string <code>banana</code>: Line 33 ⟶ 43: ban The square subset symbol is sometimes used to indicate a prefix, so that <math>~~\widehat T~~p \sqsubseteq Tt</math> denotes that <math>~~\widehat T~~p</math> is a prefix of <math>Tt</math>. This defines a [[binary relation]] on strings, called the [[prefix relation]], which is a particular kind of [[prefix order]]. ~~In [[formal language theory]], the term ''prefix of a string'' is also commonly understood to be the set of all prefixes of a string, with respect to that language.~~ == Suffix == A string <math>s</math> is a suffix<ref name=Lot97/> of a string is<math>t</math> ~~any~~if ~~substring~~there ofexists ~~the~~a string ~~which~~<math>p</math> ~~includes~~such ~~its~~that ~~last~~<math>t ~~letter,~~= ~~including itself~~ps</math>. A ''proper suffix'' of a string is not equal to the string itself. A more restricted interpretation is that it is also not empty.{{ref\|Gus97}}. A suffix can be seen as a special case of a substring. Example: The string <code>nana</code> is equal to a suffix (and substring and subsequence) of the string <code>banana</code>: Line 51 ⟶ 59: == Border == A border is suffix and prefix of the same string, e.g. "bab" is a border of "babab" (and also of "~~babooneatingakebab~~baboon eating a kebab").{{citation needed\|date=January 2022}} == Superstring == A '''superstring''' of a finite set <math>P</math> of strings is a single string that contains every string in <math>P</math> as a substring. For example, <math>\text{bcclabccefab}</math> is a superstring of <math>P = \{\text{abcc}, \text{efab}, \text{bccla}\}</math>, and <math>\text{efabccla}</math> is a shorter one. ~~Generally, one is interested in finding superstrings whose length is as small as possible;{{Clarify\|reason=why are we interested in them?\|date=June 2010}} a concatenation of~~Concatenating all ~~strings~~members of <math>P</math>, in ~~any~~arbitrary order, always ~~gives~~obtains a trivial superstring of <math>P</math>. Finding superstrings whose length is as small as possible is a more interesting problem. A string that contains every possible permutation of a specified character set is called a [[superpermutation]]. == See also == Line 67 ⟶ 77: \| last = Gusfield \| first = Dan \| ~~origyear~~orig-year = 1997 \| year = 1999 \| title = Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology \| publisher = Cambridge University Press \| ___location = ~~USA~~US \| isbn = 0-521-58519-8 }}</ref> Line 82 ⟶ 92: \| ___location = London \| isbn = 0-13-497777-7 }}</ref> <ref name=Lot97>{{cite book \| last = Lothaire \| first = M. \| year = 1997 \| title = Combinatorics on words \| publisher = Cambridge University Press \| ___location = Cambridge \| isbn = 0-521-59924-5 }}</ref> }} ~~==External links==~~ *{{Commonscatinline}} [[Category:String (computer science)]]