Content deleted Content added
Reverted good faith edits by Dominus (talk): Not clearly improving |
|||
(39 intermediate revisions by 26 users not shown) | |||
Line 1:
In [[computer science]], in the area of [[formal language theory]], frequent use is made of a variety of [[string functions]]; however, the notation used is different from that used
==Strings and languages==
Line 6:
The concatenation of two string <math>s</math> and <math>t</math> is denoted by <math>s \cdot t</math>, or shorter by <math>s t</math>.
Concatenating with the empty string makes no difference: <math>s \cdot \varepsilon = s = \varepsilon \cdot s</math>.
Concatenation of strings is [[associative]]: <math>s \cdot (t \cdot u) = (s \cdot t) \cdot u</math>.
For example, <math>(\langle b \rangle \cdot \langle l \rangle) \cdot (\varepsilon \cdot \langle ah \rangle) = \langle bl \rangle \cdot \langle ah \rangle = \langle blah \rangle</math>.
Line 13:
Besides the usual set operations like union, intersection etc., concatenation can be applied to languages:
if both <math>S</math> and <math>T</math> are languages, their concatenation <math>S \cdot T</math> is defined as the set of concatenations of any string from <math>S</math> and any string from <math>T</math>, formally <math>S \cdot T = \{ s \cdot t \mid s \in S \land t \in T \}</math>.
Again, the concatenation dot <math>\cdot</math> is often omitted for
The language <math>\{\varepsilon\}</math> consisting of just the empty string is to be distinguished from the empty language <math>\{\}</math>.
Line 34:
==String substitution==
Let ''L'' be a [[language (computer science)|language]], and let Σ be its alphabet. A '''string substitution''' or simply a '''substitution''' is a mapping ''f'' that maps
:''f''(ε)=ε
Line 42:
:''f''(''sa'')=''f''(''s'')''f''(''a'')
for string ''s'' ∈ ''L'' and character ''a'' ∈ Σ. String substitutions may be extended to entire languages as <ref>Hopcroft, Ullman (1979), Sect.3.2, p.60</ref>
:<math>f(L)=\bigcup_{s\in L} f(s)</math>
[[Regular language]]s are closed under string substitution. That is, if each
Similarly, [[context-free language]]s are closed under string substitution.<ref>Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.2, p.131</ref><ref group="note">Although every regular language is also context-free, the previous theorem is not implied by the current one, since the former yields a shaper result for regular languages.</ref>
A simple example is the conversion ''f''<sub>uc</sub>(.) to
{| class="wikitable"
|-
!
|-
! ''x'' !! ''f''<sub>uc</sub>(''x'') !!
|-
| ‹''a''› || { ‹''A''› } || map
|-
| ‹''A''› || { ‹''A''› } || map
|-
| ‹''ß''› || { ‹''SS''› } || no
|-
| ‹0› || { ε } || map digit to empty string
Line 76:
For the extension of ''f''<sub>uc</sub> to languages, we have e.g.
* ''f''<sub>uc</sub>({ ‹Straße›, ‹u2›, ‹Go!› }) = { ‹STRASSE› } ∪ { ‹U› } ∪ { } = { ‹STRASSE›, ‹U› }.
Another example is the conversion of an [[EBCDIC]]-encoded string to [[ASCII]].▼
==String homomorphism==
A '''string homomorphism''' (often referred to simply as a [[Homomorphism#
String homomorphisms are [[monoid morphism]]s on the [[free monoid]], preserving the empty string and the [[binary operation]] of [[string concatenation]]. Given a language
while the inverse homomorphic image of a language
In general,
<math>f(f^{-1}(L)) \subseteq L</math>
and
<math>L \subseteq f^{-1}(f(L))</math>
for any language
The class of regular languages is closed under homomorphisms and inverse homomorphisms.<ref>Hopcroft, Ullman (1979), Sect.3.2, Theorem 3.5, p.61</ref>
Similarly, the context-free languages are closed under homomorphisms<ref group="note">This follows from the [[#String substitution|above-mentioned]] closure under arbitrary substitutions.</ref> and inverse homomorphisms.<ref>Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.3, p.132</ref>
A string homomorphism is said to be ε-free (or e-free) if
An example string homomorphism ''g''<sub>uc</sub> can also be obtained by defining similar to the [[#String_substitution|above]] substitution: ''g''<sub>uc</sub>(‹a›) = ‹A›, ..., ''g''<sub>uc</sub>(‹0›) = ε, but letting ''g''<sub>uc</sub> be undefined on punctuation chars.
<!---omitted, since sloppy notation, anyway, see above---
Besides this restriction of its input ___domain, ''g''<sub>uc</sub> differs from ''f''<sub>uc</sub> by returning strings,<ref group=note name="singleton sets"/> while the latter returned singleton sets of strings.
Line 114 ⟶ 112:
For the latter language, ''g''<sub>uc</sub>(''g''<sub>uc</sub><sup>−1</sup>({ ‹A›, ‹bb› })) = ''g''<sub>uc</sub>({ ‹a› }) = { ‹A› } ≠ { ‹A›, ‹bb› }.
The homomorphism ''g''<sub>uc</sub> is not ε-free, since it maps e.g. ‹0› to ε.
▲
==String projection==
If ''s'' is a string, and <math>\Sigma</math> is an alphabet, the '''string projection''' of ''s'' is the string that results by removing all
:<math>\pi_\Sigma(s) = \begin{cases}
Line 124:
\end{cases}</math>
Here <math>\varepsilon</math> denotes the [[empty string]]. The projection of a string is essentially the same as a [[projection in relational algebra]].
String projection may be promoted to the '''projection of a language'''. Given a [[formal language]] ''L'', its projection is given by
:<math>\pi_\Sigma (L)=\{\pi_\Sigma(s)\ \vert\ s\in L \}</math>{{citation needed|date=August 2017}}
==Right quotient==▼
The '''right quotient''' of a letter ''a'' from a string ''s'' is the truncation of the letter ''a'' in the string ''s'', from the right hand side. It is denoted as <math>s/a</math>. If the string does not have ''a'' on the right hand side, the result is the empty string. Thus:▼
▲== Right and left quotient ==
:<math>(sa)/ b = \begin{cases} ▼
▲The '''right quotient''' of a
<!---This definition deviates from Hopcroft.Ullman.1979, as remarked below. I guess the former doesn't have widespread use, if it has a source at all.--->
▲: <math>(sa)/ b = \begin{cases}
s & \mbox{if } a=b \\
\varepsilon & \mbox{if } a \ne b
Line 139:
The quotient of the empty string may be taken:
: <math>\varepsilon / a = \varepsilon</math>▼
Similarly, given a subset <math>S\subset M</math> of a monoid <math>M</math>, one may define the quotient subset as
▲:<math>\varepsilon / a = \varepsilon</math>
: <math>S/a=\{s\in M\ \vert\ sa\in S\}</math>▼
'''Left quotients''' may be defined similarly, with operations taking place on the left of a string.{{citation needed|date=August 2017}}▼
▲Similarly, given a subset <math>S\subset M</math> of a monoid <math>M</math>, one may define the quotient subset as
Hopcroft and Ullman (1979) define the quotient ''L''<sub>1</sub>/''L''<sub>2</sub> of the languages ''L''<sub>1</sub> and ''L''<sub>2</sub> over the same alphabet as {{nowrap|1=''L''<sub>1</sub>/''L''<sub>2</sub> = {{mset| ''s'' {{!}} ∃''t''∈''L''<sub>2</sub>. ''st''∈''L''<sub>1</sub> }}}}.<ref>Hopcroft, Ullman (1979), Sect.3.2, p.62</ref>
▲:<math>S/a=\{s\in M\ \vert\ sa\in S\}</math>
This is not a generalization of the above definition, since, for a string ''s'' and distinct characters ''a'', ''b'', Hopcroft's and Ullman's definition implies {{nowrap||{{mset|''sa''}} / {{mset|''b''}}}} yielding {{mset|}}, rather than {{mset| ε }}.
The left quotient (when defined similar to Hopcroft and Ullman 1979) of a singleton language ''L''<sub>1</sub> and an arbitrary language ''L''<sub>2</sub> is known as [[Brzozowski derivative]]; if ''L''<sub>2</sub> is represented by a [[regular expression]], so can be the left quotient.<ref>{{cite journal| author=Janusz A. Brzozowski| authorlink=Janusz Brzozowski (computer scientist)|title=Derivatives of Regular Expressions| journal=J ACM| year=1964| volume=11| issue=4| pages=481–494| doi=10.1145/321239.321249| s2cid=14126942| doi-access=free}}</ref>
▲Left quotients may be defined similarly, with operations taking place on the left of a string.
==Syntactic relation==
Line 157 ⟶ 160:
:<math>\{S/m\ \vert\ m\in M\}</math>
is finite. In
==Right cancellation==
The '''right cancellation''' of a
:<math>(sa)\div b = \begin{cases}
Line 173 ⟶ 176:
Clearly, right cancellation and projection [[Commutative property|commute]]:
:<math>\pi_\Sigma(s)\div a = \pi_\Sigma(s \div a )</math>{{citation needed|date=August 2017}}
==Prefixes==
Line 180 ⟶ 183:
:<math>\operatorname{Pref}_L(s) = \{t\ \vert\ s=tu \mbox { for } t,u\in \operatorname{Alph}(L)^*\}</math>
The '''prefix closure of a language''' is
:<math>\operatorname{Pref} (L) = \bigcup_{s\in L} \operatorname{Pref}_L(s) = \left\{ t\ \vert\ s=tu; s\in L; t,u\in \operatorname{Alph}(L)^* \right\}</math>
Line 189 ⟶ 192:
<math>L=\left\{abc\right\}\mbox{ then } \operatorname{Pref}(L)=\left\{\varepsilon, a, ab, abc\right\}</math>
A language is called '''prefix closed''' if <math>\operatorname{Pref} (L) = L</math>.
The prefix closure operator is [[idempotent]]:
Line 195 ⟶ 198:
:<math>\operatorname{Pref} (\operatorname{Pref} (L)) =\operatorname{Pref} (L)</math>
The '''prefix relation''' is a [[binary relation]] <math>\sqsubseteq</math> such that <math>s\sqsubseteq t </math> if and only if <math>s \in \operatorname{Pref}_L(t)</math>. This relation is a particular example of a [[prefix order]].{{citation needed|date=August 2017}}
==See also ==
Line 207 ⟶ 210:
== References ==
* {{cite book | first1=John E. | last1=Hopcroft | first2=Jeffrey D. | last2=Ullman | title=Introduction to Automata Theory, Languages and Computation | publisher=Addison-Wesley Publishing | ___location=Reading, Massachusetts | year=1979 | isbn=978-0-201-02988-
{{reflist}}
{{Strings}}
[[Category:Formal languages]]
|