String operations: Difference between revisions

Content deleted Content added
replace 'letter' (too narrow) --> 'character'; move ebcdic/ascii example to 'string homomorphism' where it fits better
Line 34:
 
==String substitution==
Let ''L'' be a [[language (computer science)|language]], and let Σ be its alphabet. A '''string substitution''' or simply a '''substitution''' is a mapping ''f'' that maps letterscharacters in Σ to languages (possibly in a different alphabet). Thus, for example, given a lettercharacter ''a'' ∈ Σ, one has ''f''(''a'')=''L''<sub>''a''</sub> where ''L''<sub>''a''</sub> ⊆ Δ[[Kleene star|<sup>*</sup>]] is some language whose alphabet is Δ. This mapping may be extended to strings as
 
:''f''(ε)=ε
Line 42:
:''f''(''sa'')=''f''(''s'')''f''(''a'')
 
for string ''s'' ∈ ''L'' and lettercharacter ''a'' ∈ Σ. String substitutions may be extended to entire languages as <ref>Hopcroft, Ullman (1979), Sect.3.2, p.60</ref>
 
:<math>f(L)=\bigcup_{s\in L} f(s)</math>
 
[[Regular language]]s are closed under string substitution. That is, if each lettercharacter in the alphabet of a regular language is substituted by another regular language, the result is still a regular language.<ref>Hopcroft, Ullman (1979), Sect.3.2, Theorem 3.4, p.60</ref>
Similarly, [[context-free language]]s are closed under string substitution.<ref>Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.2, p.131</ref><ref group="note">Although every regular language is also context-free, the previous theorem is not implied by the current one, since the former yields a shaper result for regular languages.</ref>
 
Line 53:
{| class="wikitable"
|-
! lettercharacter !! mapped to language !! remark
|-
! ''x'' !! ''f''<sub>uc</sub>(''x'') !!
Line 76:
For the extension of ''f''<sub>uc</sub> to languages, we have e.g.
* ''f''<sub>uc</sub>({ ‹Straße›, ‹u2›, ‹Go!› }) = { ‹STRASSE› } ∪ { ‹U› } ∪ { } = { ‹STRASSE›, ‹U› }.
 
Another example is the conversion of an [[EBCDIC]]-encoded string to [[ASCII]].
 
==String homomorphism==
A '''string homomorphism''' (often referred to simply as a [[Homomorphism#Homomorphisms_and_e-free_homomorphisms_in_formal_language_theory|homomorphism]] in [[formal language theory]]) is a string substitution such that each lettercharacter is replaced by a single string. That is, ''f''(''a'')=''s'', where ''s'' is a string, for each lettercharacter ''a''.<ref group="note" name="singleton sets">Strictly formally, a homomorphism yields a language consisting of just one string, i.e. ''f''(''a'') = {''s''}.</ref><ref>Hopcroft, Ullman (1979), Sect.3.2, p.60-61</ref>
 
String homomorphisms are [[monoid morphism]]s on the [[free monoid]], preserving the empty string and the [[binary operation]] of [[string concatenation]]. Given a language ''L'', the set ''f''(''L'') is called the '''homomorphic image''' of ''L''. The '''inverse homomorphic image''' of a string ''s'' is defined as
Line 114 ⟶ 112:
For the latter language, ''g''<sub>uc</sub>(''g''<sub>uc</sub><sup>−1</sup>({ ‹A›, ‹bb› })) = ''g''<sub>uc</sub>({ ‹a› }) = { ‹A› } ≠ { ‹A›, ‹bb› }.
The homomorphism ''g''<sub>uc</sub> is not ε-free, since it maps e.g. ‹0› to ε.
 
AnotherA very simple string homomorphism example that maps each charcter to just a character is the conversion of an [[EBCDIC]]-encoded string to [[ASCII]].
 
==String projection==
If ''s'' is a string, and <math>\Sigma</math> is an alphabet, the '''string projection''' of ''s'' is the string that results by removing all letterscharacters that are not in <math>\Sigma</math>. It is written as <math>\pi_\Sigma(s)\,</math>. It is formally defined by removal of letterscharacters from the right hand side:
 
:<math>\pi_\Sigma(s) = \begin{cases}
Line 131:
 
==Right quotient==
The '''right quotient''' of a lettercharacter ''a'' from a string ''s'' is the truncation of the lettercharacter ''a'' in the string ''s'', from the right hand side. It is denoted as <math>s/a</math>. If the string does not have ''a'' on the right hand side, the result is the empty string. Thus:
 
:<math>(sa)/ b = \begin{cases}
Line 160:
 
==Right cancellation==
The '''right cancellation''' of a lettercharacter ''a'' from a string ''s'' is the removal of the first occurrence of the lettercharacter ''a'' in the string ''s'', starting from the right hand side. It is denoted as <math>s\div a</math> and is recursively defined as
 
:<math>(sa)\div b = \begin{cases}