Stringa (linguaggi formali): differenze tra le versioni

Naviga nella cronologia in modo interattivo

← Differenza precedente

Contenuto cancellato Contenuto aggiunto

VisualeWikitesto

Versione delle 03:45, 2 mar 2005 modifica Snowdog (discussione \| contributi) Utenti autoverificati 95 601 modifiche Nessun oggetto della modifica ← Differenza precedente		Versione attuale delle 20:23, 6 ago 2024 modifica annulla 2003:c3:972b:a146:50bf:6593:d863:e42f (discussione) →Trattamento formale
(62 versioni intermedie di 40 utenti non mostrate)
Riga 1: {{F\|matematica\|dicembre 2010}} ~~:''Per altri significati della parola stringa veda la voce: [[stringa (disambigua)]]''~~ Nella teoria dei [[linguaggi formali]], una '''stringa''' è una sequenza composta da un certo numero di oggetti che ci si aspetta venga sottoposta ad elaborazioni come analisi, composizioni e trasformazioni in altre stringhe o strutture discrete come [[grafo\|grafi]] o configurazioni numeriche, senza modificare gli oggetti componenti. ~~----~~ In varie discipline della [[matematica]] e dell'[[informatica]], per '''stringa''' si intende una sequenza composta da un certo numero di oggetti che ci si aspetta venga sottoposta ad elaborazioni come analisi, composizioni e trasformazioni in altre stringhe o strutture discrete come grafi o configurazioni numeriche, senza modificare gli oggetti componenti. Gli oggetti costitutivi possono essere semplici (come ~~bits~~bit, caratteri o simboli) o compositi, ma da trattare come se fossero semplici (come parole, espressioni, frammenti di testo o contrassegni di oggetti compositi ma che non si vogliono analizzare o decomporre). Dal punto di vista della costituzione si distinguono innanzi tutto le stringhe di oggetti semplici: stringhe di [[Bit (informatica)\|bit]] ('''stringhe binarie'''), stringhe composte da [[carattere (informatica)\|caratteri]] ('''stringhe letterali'''), stringhe di simboli ('''stringhe simboliche'''). Tra le stringhe di oggetti compositi si collocano le stringhe composte da '''unità lessicali''' (dette anche '''token'''), i '''frammenti di testo''' (come i titoli dei paragrafi o le citazioni bibliografiche) e le stringhe che rappresentano molecole dotate di struttura complessiva filamentosa, come quelle che rappresentano una [[proteina]] o una struttura di [[DNA]]. Tra gli strumenti che possono sottoporre le stringhe ad elaborazioni si distinguono quelli formali, come gli [[automa (informatica)\|automi]], le [[macchina di Turing\|macchine di Turing]], le [[grammatiche formali]] o altri [[sistema di riscrittura\|sistemi di riscrittura]] e quelli più concreti costituiti da programmi o ~~routines~~routine di sistemi software. In linea di principio gli strumenti del secondo genere si possono considerare implementazioni dei primi. == Trattamento formale == Dato un [[insieme finito]] [[insieme vuoto\|non-vuoto]] <math>\Sigma</math>, chiamato [[alfabeto (teoria dei linguaggi formali)\|alfabeto]], composto da ''caratteri'' o ''simboli'' è possibile definire una ''stringa'' (o ''parola'') sull'alfabeto <math>\Sigma</math> come [[disposizione]] (o [[ennupla]]) finita dei caratteri di <math>\Sigma</math><ref name="uniroma1">{{Cita web \|url = http://www.dis.uniroma1.it/~fiii/materiale_schaerf/9-Linguaggi.pdf \|titolo = Linguaggi formali e grammatiche di Chomsky \|autore = \|editore = \|data = \|formato = doc \|pagine = 4-7 \|accesso = 19 maggio 2017 \|urlarchivio = https://web.archive.org/web/20100602063032/http://www.dis.uniroma1.it/~fiii/materiale_schaerf/9-Linguaggi.pdf \|dataarchivio = 2 giugno 2010 \|urlmorto = no }}</ref><ref name="PoliBa">{{Cita web \|url = http://www-ictserv.poliba.it/piscitelli/doc/lucidi%20LFC/2_LFC_i%20linguaggi%20formali_12.pdf \|titolo = Linguaggi Formali e Compilatori \|autore = Giacomo Piscitelli \|sito = \|data = \|formato = pdf \|pagina = 4 \|accesso = 17 maggio 2017 \|urlarchivio = https://web.archive.org/web/20150616083424/http://www-ictserv.poliba.it/piscitelli/doc/lucidi%20LFC/2_LFC_i%20linguaggi%20formali_12.pdf \|dataarchivio = 16 giugno 2015 \|urlmorto = sì }}</ref>. È possibile definire la ''lunghezza'' di una stringa, <math>\left \| w \right \|</math> come il numero di caratteri presenti nella stringa. La stringa di lunghezza 0 prende il nome di ''[[stringa vuota]]'' o ''stringa nulla'' e viene indicata con <math>\varepsilon</math> o <math>\lambda</math><ref name="uniroma1" />. One starts with a [[empty set\|non-empty]] [[finite]] [[set]] Σ called an ''alphabet''. Elements of this alphabet are called ''characters''. A '''string''' (or '''word''') over Σ is any finite [[sequence]] of characters from Σ. Per esempio, preso l'alfabeto binario <math>\Sigma = \left \{ 0, 1 \right \}</math>, alcune delle parole che possono essere generate a partire da <math>\Sigma</math> sono le stringhe <math>\left \{ \varepsilon, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, \cdots \right \} </math>. A particularly important string is the sequence of no characters, called the '''empty string'''. The empty string is often denoted ε or λ. Note that one does not allow infinite sequences of characters. Possiamo definire l'operazione ''elevamento a potenza'' di un alfabeto, <math>\Sigma^{k}</math>, come l'insieme di stringhe di lunghezza k sull'alfabeto, ponendo <math>\Sigma^{0} = \varepsilon</math>. ~~For example, if Σ = {0, 1}, strings over Σ are of the form~~ ~~:ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, …~~ L'insieme di tutte le parole sull'alfabeto <math>\Sigma</math> è indicato con <math>\Sigma^{}</math>. Data l'esistenza di una [[corrispondenza biunivoca]] tra la tupla <math>\left ( a_1, a_2, \cdots, a_n \right )</math> e la stringa <math>a_1 a_2 \cdots a_n</math>, è possibile definire la [[star chiusura]] come l'[[insieme infinito]] <math>\Sigma^{} = \bigcup_{n \in \N} \Sigma^{n}</math>. The set of all strings over Σ is denoted Σ. One can define a [[binary operation]] on Σ called '''string concatenation'''. If ''s'' and ''t'' are two strings, their concatenation, denoted ''st'', is defined as the sequence of characters in ''s'' followed by the sequence of characters in ''t''. È possibile definire l'insieme delle parole di lunghezza sull'alfabeto <math>\Sigma</math>, la [[cross chiusura]], come <math>\Sigma^{+} = \Sigma^{} \setminus \Sigma^{0}</math>. ~~For example, if ''s'' = <tt>bear</tt> and ''t'' = <tt>hug</tt> then ''st'' = <tt>bearhug</tt> and ''ts'' = <tt>hugbear</tt>.~~ È inoltre possibile definire un'[[operazione binaria]] denominata ''concatenazione di stringhe''<ref name="uniroma1" />. Siano ''s'' e ''t'' appartenenti rispettivamente agli alfabeti <math>\Sigma^{\prime}</math> e <math>\Sigma^{\prime\prime}</math>. La concatenazione delle due stringhe, indicata come ''st'', è definita come la stringa di lunghezza <math>\left \| s \right \| + \left \| t \right \|</math> appartenente a <math>\Sigma^{\prime} \cup \Sigma^{\prime\prime}</math> , che presenta ordinatamente i caratteri di ''s'' seguiti dai simboli di ''t''. String concatenation is an [[associative]], but non-[[commutative]] operation. The empty string serves as the [[identity element]]. In [[abstract algebra\|algebraic]] terms, the set Σ forms a [[monoid]] under string concatenation. In fact, Σ* is the [[free monoid]] generated by Σ. Ad esempio se <math>s = \mbox{topo}</math> e <math>t = \mbox{ragno}</math> allora <math>st = \mbox{toporagno}</math> e <math>ts = \mbox{ragnotopo}</math>. The ''length'' of a string is the number of characters in the string. The length can be any [[natural number]]. The length of the empty string is 0. Algebraically speaking, the length function defines a [[monoid homomorphism]] from Σ* to '''N''' (Non-negative integers with addition). La concatenazione di stringhe è un'operazione [[associatività\|associativa]], ma non [[commutatività\|commutativa]]<ref name="uniroma1" />. Rispetto alla concatenazione, la stringa vuota è l'[[elemento neutro]]. In termini [[algebra\|algebrici]], l'insieme <math>\Sigma^{}</math> costituisce un [[monoide]] rispetto alla concatenazione di stringhe<ref name="uniroma1" />. <math>\Sigma^{}</math> è il [[monoide libero]] generato da <math>\Sigma</math>, mentre <math>\Sigma^{+}</math> è il relativo [[semigruppo]] libero. A string ''s'' is said to be a '''substring''' of ''t'' if there exist two strings ''u'' and ''v'' such that ''t'' = ''usv''. One, or both, of ''u'' and ''v'' can be empty. The [[binary relation\|relation]] "is a substring of" defines a [[partial order]] on Σ, the [[least element]] of which is the empty string. Sempre da un punto di vista algebrico, dato che la lunghezza è un [[numero naturale]], è possibile dire che la funzione lunghezza definisce un [[omomorfismo]] da <math>\Sigma^{}</math> in <math>\N</math>. More often, especially in computing applications, one is interested in a different kind of ordering on the set of strings. If the alphabet Σ is [[well-order]]ed (cf. [[alphabetical order]]) one can define a well-ordering on Σ* called [[lexicographical order]]. The empty string is also the least element with respect to this ordering. Una stringa ''s'' si dice ''sottostringa'' o ''fattore'' di ''t'' se esistono due stringhe ''u'' e ''v'' tali che <math>t = usv</math>. Se <math>u = \varepsilon</math> o <math>v = \varepsilon</math> si parla di ''prefisso'' o ''suffisso'' della stringa ''t''. La [[relazione binaria\|relazione]] "è una sottostringa di" definisce una [[relazione d'ordine]] su <math>\Sigma^{}</math>, il cui minimo risulta la stringa vuota. A set of strings over Σ (i.e. a [[subset]] of Σ) is called a ''[[formal language]]'' over Σ. Note that while the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings. In fact, Σ* itself is always an infinite language. Important examples of formal languages include [[regular expression]]s and [[formal grammar]]s. Se l'alfabeto <math>\Sigma^{}</math> è [[relazione d'ordine#Ordinamenti ben fondati\|ben ordinato]], il cosiddetto [[ordine lessicografico]] costituisce un altro [[insieme totalmente ordinato\|ordinamento totale]] su <math>\Sigma^{}</math> di frequente utilizzo per le applicazioni informatiche. ~~== Tipi di dati '''string''' ==~~ Un insieme di stringhe su <math>\Sigma</math> (cioè un [[sottoinsieme]] di <math>\Sigma^{}</math>) è anche detto ''[[linguaggio formale]]'' su <math>\Sigma</math>. Nonostante l'alfabeto <math>\Sigma</math> sia finito e non vuoto e le stringhe siano di lunghezza finita, i linguaggi formali possono avere [[cardinalità]] [[infinito (matematica)\|infinita]] o [[insieme vuoto\|nulla]]. Esempi di linguaggi formali sono forniti dalle [[espressione regolare\|espressioni regolari]] e dalle [[grammatica formale\|grammatiche formali]]. A '''string datatype''' is a [[datatype]] modeled on the idea of a formal string. Strings are such an important and useful datatype that they are implemented in nearly every [[programming language]]. In some languages they are available as [[primitive type]]s and in others as [[composite type]]s. == Note == Although formal strings can have an arbitrary (but finite) length, strings in real languages have limited length. In general, there are two types of string datatypes: ''fixed length strings'' which have a fixed maximum length, and ''variable length strings'' whose length is not arbitrarly fixed. Most strings in modern programming languages are variable length strings. Despite the name, even variable length strings are limited in length; although, generally, the limit depends only on the amount of [[computer memory\|memory]] available. <references /> == Bibliografia == String datatypes also differ by the alphabet over which they are defined. Some strings are designed to work over a single fixed alphabet (such as the [[Latin alphabet]] or the characters representable by [[ASCII]]) and others over an arbitrary alphabet. For example, in the [[C Plus Plus\|C++]] standard library strings are designed as [[template (programming)\|templates]] definable over any alphabet of characters. {{cita libro \|nome = John E. \|cognome = Hopcroft \|wkautore = John Hopcroft \|coautori = Rajeev Motwani; [[Jeffrey Ullman\|Jeffrey D. Ullman]] \|titolo = Introduction to Automata Theory, Languages, and Computation \|editore = Addison Wesley \|anno = 2006 \|mese = luglio \|giorno = 15 \|lingua = inglese \|capitolo = Strings \|ISBN = 978-0-321-46225-1}} * {{cita libro \|nome = Martin \|cognome = Davis \|wkautore = Martin Davis \|coautori = Ron Sigal; Elaine J. Weyuker \|titolo = Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science \|editore = Morgan Kaufmann \|anno = 1994 \|mese = febbraio \|giorno = 17 \|lingua = inglese \|capitolo = Alphabets and Strings \|ISBN = 978-0-12-206382-4}} == ~~Rappresentazioni~~Voci correlate == * [[Stringa (informatica)]] * [[Alfabeto (teoria dei linguaggi formali)]] * [[Linguaggio formale]] {{portale\|informatica\|matematica}} Representations of strings depend heavily on the choice of character set (such as an alphabet) and the method of [[character encoding]]. Older string implementations were designed to work with the character set and encoding defined by [[ASCII]], or more recent extensions like the [[ISO 8859]] series. Modern implementations often use the extensive character set defined by [[Unicode]] along with a variety of complex encodings such as [[UTF-8]] and [[UTF-16]]. [[Categoria:Teoria dei linguaggi formali]] Most string implementations are very similar to variable-length [[array]]s with the entries storing the [[character code]]s of corresponding characters. The principal diffence is that, with certain encodings, a single logical character may take up more than one entry in the array. This happens for example with [[UTF-8]], where single characters can take anywhere from one to four [[byte]]s. In these cases the logical length of the string differs from the logical length of the array. The length of a string can be stored implicitly by using a special terminating character; often this is the [[null character]] having value zero, a convention used and perpetuated by the popular [[C programming language]]. The length of a string can also be stored explicitly, for example by prefixing the string with [[integer]] value (convention used in [[Pascal programming language\|Pascal]]). Note that with terminated strings the terminating character is not an allowable character in any string. ~~Here is an example of a '''null-terminated string''' stored in a 10 byte [[buffer]], along with its ASCII representation:~~ ~~<table cellspacing="0" celpadding="2" border="1" align="center">~~ ~~<tr><td>F</td> <td>R</td> <td>A</td> <td>N</td> <td>K</td> <td><small>NUL</small></td> <td>k</td> <td>e</td> <td>f</td> <td>w</td> </tr>~~ ~~<tr><td>46</td> <td>52</td> <td>41</td> <td>4E</td> <td>4B</td> <td>00</td> <td>6B</td> <td>65</td> <td>66</td> <td>77</td> </tr>~~ ~~</table>~~ The length of a string in the above example 5 characters, but note that it occupies 6 bytes. Characters after the terminator do not form part of the representation; they may be either part of another string or just garbage. ~~Here is the equivalent Pascal string:~~ ~~<table cellspacing="0" celpadding="2" border="1" align="center">~~ ~~<tr><td><small>length</small></td><td>F</td> <td>R</td> <td>A</td> <td>N</td> <td>K</td> <td>k</td> <td>f</td> <td>f</td> <td>w</td> </tr>~~ ~~<tr><td>05</td><td>46</td> <td>52</td> <td>41</td> <td>4E</td> <td>4B</td> <td>6B</td> <td>66</td> <td>66</td> <td>77</td> </tr>~~ ~~</table>~~ ~~While these representations are common, others are possible. Using [[rope (computer science)\|rope]]s makes certain string operations, such as insertions, deletions, and concatenations more efficient.~~ ~~== Algoritmi per le stringhe ==~~ Sono stati studiati molti [[algoritmi]] per la manipolazione delle stringhe i quali si distinguono per finalità e per diverse scelte di compromesso di fronte ad esigenze contrastanti come l'ampiezza della portata e la efficienza. ~~Questi algoritmi vengono collocati a categorie come le seguenti:~~ * [[algoritmi di ricerca di stringhe]] aventi il compito di trovare una data sottostringa o una data configurazione di caratteri, * [[algoritmi di ordinamento]] (''sorting''), * algoritmi per [[espressioni regolari]], * scansori e [[parser]] di stringhhe. Gli algoritmi più avanzati per l'elaborazione di stringhe spesso impiegano elaborati meccanismi formali e complesse strutture di dati: tra questi ricordiamo gli [[alberi di suffissi]] e le [[macchine a stati finiti]]. ~~== Linguaggi e programmi di utilità orientati all'elaborazione di stringhe ==~~ Le stringhe costituiscono un tipo di dati tanto ampiamente utilizzato da indurre lo sviluppo di numerosi linguaggi finalizzati alla facilitazione delle elaborazioni delle stringhe più richieste dalle applicazioni. Tra questi vi sono: * [[awk]] * [[Icon programming language\|Icon]] * [[Perl]] * [[Tcl]] * [[MUMPS]] * [[sed]] * [[SNOBOL]] Molti programmi di utilità dell'ambito [[Unix\|UNIX]] effettuano manipolazioni di stringhe relativamente semplici e possono essere utilizzati per programmare algoritmi di elaborazione di stringhe piuttosto efficaci. Questi strumentipossono trattare come stringhe anche i files e gli streams finiti. I recenti [[linguaggio di scripting\|linguaggi di scripting]], come [[Perl]], [[Python programming language\|Python]], [[Ruby programming language\|Ruby]] e [[Tcl]] si servono di [[espressione regolare\|espressioni regolari]] per facilitare le manovre sui testi. ~~[ [ Categoria:Terminologia del computer ] ] ?~~ ~~[[Categoria:Linguaggi formali]]~~ ~~[[en:String (computer science)]]~~ ~~[[de:Zeichenkette]]~~ ~~[[fr:Chaîne de caractères]]~~ ~~[[ru:Строка (тип данных)]]~~ ~~[[zh:字符串]]~~