Deterministic finite automaton: Difference between revisions

Content deleted Content added
Advantages and disadvantages: ext link not allowed in parameter, only one archive link is needed
m convert special characters found by Wikipedia:Typo Team/moss (via WP:JWB)
Line 24:
* a set of [[Finite-state machine#Accept .28or final.29 states|accept states]] <math>F \subseteq Q</math>
 
Let {{math|1=''w'' = ''a''<sub>1</sub>''a''<sub>2</sub>...''a<sub>n</sub>''}} be a string over the alphabet {{math|Σ}}. The automaton {{mvar|M}} accepts the string {{mvar|w}} if a sequence of states, {{math|''r''<sub>0</sub>, ''r''<sub>1</sub>, ..., ''r<sub>n</sub>''}}, exists in {{mvar|Q}} with the following conditions:
# {{math|1=''r''<sub>0</sub> = ''q''<sub>0</sub>}}
# {{math|1=''r''<sub>''i''+1</sub> = ''δ''(''r<sub>i</sub>'', ''a''<sub>''i''+1</sub>)}}, for {{math|1=''i'' = 0, ..., ''n'' − 1}}
# <math>r_n \in F</math>.
 
Line 70:
 
===Randomness===
When the start state and accept states are ignored, a DFA of {{mvar|n}} states and an alphabet of size {{mvar|k}} can be seen as a [[Directed graph|digraph]] of {{mvar|n}} vertices in which all vertices have {{mvar|k}} out-arcs labeled {{math|1, ..., ''k''}} (a {{mvar|k}}-out digraph). It is known that when {{math|''k'' ≥ 2}} is a fixed integer, with high probability, the largest [[strongly connected component]] (SCC) in such a {{mvar|k}}-out digraph chosen uniformly at random is of linear size and it can be reached by all vertices.<ref name=Grusho>{{cite journal|last1=Grusho|first1=A. A.|title=Limit distributions of certain characteristics of random automaton graphs|journal=Mathematical Notes of the Academy of Sciences of the USSR|date=1973|volume=4|pages=633–637|doi=10.1007/BF01095785|s2cid=121723743|ref=Grusho1973}}</ref> It has also been proven that if {{mvar|k}} is allowed to increase as {{mvar|n}} increases, then the whole digraph has a phase transition for strong connectivity similar to [[Erdős–Rényi model]] for connectivity.<ref name=Cai>{{cite journal |last1=Cai |first1=Xing Shi |last2=Devroye |first2=Luc |title=The graph structure of a deterministic automaton chosen at random |journal=Random Structures & Algorithms |date=October 2017 |volume=51 |issue=3 |pages=428–458 |doi=10.1002/rsa.20707|arxiv=1504.06238 |s2cid=13013344 }}</ref>
 
In a random DFA, the maximum number of vertices reachable from one vertex is very close to the number of vertices in the largest [[strongly connected component|SCC]] with high probability.<ref name=Grusho /><ref>{{cite conference |last1=Carayol |first1=Arnaud |last2=Nicaud |first2=Cyril |date=February 2012 |title=Distribution of the number of accessible states in a random deterministic automaton |volume=14 |pages=194–205 |conference=STACS'12 (29th Symposium on Theoretical Aspects of Computer Science) |___location=Paris, France |url=https://hal.archives-ouvertes.fr/hal-00678213}}</ref> This is also true for the largest [[Glossary of graph theory#Subgraphs|induced sub-digraph]] of minimum in-degree one, which can be seen as a directed version of [[Degeneracy (graph theory)#k-Cores|{{math|1}}-core]].<ref name=Cai />
Line 119:
DFAs are equivalent in computing power to [[nondeterministic finite automata]] (NFAs). This is because, firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using the [[powerset construction]] one can build a DFA that recognizes the same language as the NFA, although the DFA could have exponentially larger number of states than the NFA.<ref name=Sak105>Sakarovitch (2009) p.105</ref><ref name=Law63>Lawson (2004) p.63</ref> However, even though NFAs are computationally equivalent to DFAs, the above-mentioned problems are not necessarily solved efficiently also for NFAs. The non-universality problem for NFAs is [[PSPACE complete]] since there are small NFAs with shortest rejecting word in exponential size. A DFA is universal if and only if all states are final states, but this does not hold for NFAs. The Equality, Inclusion and Minimization Problems are also PSPACE complete since they require forming the complement of an NFA which results in an exponential blow up of size.<ref>{{Cite web |last1=Esparza Estaun |first1=Francisco Javier |last2=Sickert |first2=Salomon |last3=Blondin |first3=Michael |date=16 November 2016 |orig-date= |title=Operations and tests on sets: Implementation on DFAs |url=https://www7.in.tum.de/um/courses/auto/ws1718/slides1718/04-Implementations_sets.pdf |url-status=dead |archive-url=https://web.archive.org/web/20180808171506/https://www7.in.tum.de/um/courses/auto/ws1718/slides1718/04-Implementations_sets.pdf |archive-date=8 August 2018 |website=Automata and Formal Languages 2017/18 }}</ref>
 
On the other hand, finite-state automata are of strictly limited power in the languages they can recognize; many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA. The classic example of a simply described language that no DFA can recognize is bracket or [[Dyck language]], i.e., the language that consists of properly paired brackets such as word "(()())". Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like automaton needs to have a state to represent any possible number of "currently open" parentheses, meaning it would need an unbounded number of states. Another simpler example is the language consisting of strings of the form ''a<sup>n</sup>b<sup>n</sup>'' for some finite but arbitrary number of ''a''&apos;{{'}}s, followed by an equal number of ''b''&apos;{{'}}s.<ref name=Law46>Lawson (2004) p.46</ref>
 
==DFA identification from labeled words==
Line 133:
Gold's algorithm assumes that <math>S^+</math> and <math>S^-</math> contain a ''characteristic set'' of the regular language; otherwise, the constructed DFA will be inconsistent either with <math>S^+</math> or <math>S^-</math>.
Other notable DFA identification algorithms include the RPNI algorithm,<ref>{{Cite book | doi=10.1142/9789812797902_0004| chapter=Inferring Regular Languages in Polynomial Updated Time| title=Pattern Recognition and Image Analysis| volume=1| pages=49–61| series=Series in Machine Perception and Artificial Intelligence| year=1992| last1=Oncina| first1=J.| last2=García| first2=P.| isbn=978-981-02-0881-3}}</ref> the Blue-Fringe evidence-driven state-merging algorithm,<ref>{{Cite book |doi = 10.1007/BFb0054059|chapter = Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm|title = Grammatical Inference|volume = 1433|pages = 1–12|series = Lecture Notes in Computer Science|year = 1998|last1 = Lang|first1 = Kevin J.|last2 = Pearlmutter|first2 = Barak A.|last3 = Price|first3 = Rodney A.|isbn = 978-3-540-64776-8|url = http://eprints.maynoothuniversity.ie/10250/1/BP-Results-1998.pdf}}</ref>
and Windowed-EDSM.<ref>{{Cite book | url=https://dl.acm.org/doi/abs/10.5555/645519.655966 | title=Beyond EDSM &#124;{{pipe}} Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications| date=23 September 2002| pages=37–48| isbn=9783540442394}}</ref>
Another research direction is the application of [[evolutionary algorithm]]s: the smart state labeling evolutionary algorithm<ref>{{Cite journal |doi = 10.1109/TPAMI.2005.143|pmid = 16013754|title = Learning deterministic finite automata with a smart state labeling evolutionary algorithm|journal = IEEE Transactions on Pattern Analysis and Machine Intelligence|volume = 27|issue = 7|pages = 1063–1074|year = 2005|last1 = Lucas|first1 = S.M.|last2 = Reynolds|first2 = T.J.|s2cid = 14062047}}</ref> allowed to solve a modified DFA identification problem in which the training data (sets <math>S^+</math> and <math>S^-</math>) is ''noisy'' in the sense that some words are attributed to wrong classes.