Deterministic finite automaton: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:09, 22 September 2023 edit Sollyucko (talk \| contribs) Extended confirmed users 1,954 edits →Notes: improving some citations Tag: Visual edit ← Previous edit		Latest revision as of 06:30, 14 April 2025 edit undo JJMC89 bot III (talk \| contribs) Bots, Administrators 4,318,583 edits m Moving Category:Finite automata to Category:Finite-state machines per Wikipedia:Categories for discussion/Speedy
(29 intermediate revisions by 16 users not shown)
Line 2: {{Use dmy dates\|date=April 2020}} {{Redirect-synonym\|DFSA\|[[drug-facilitated sexual assault]]}} {{CS1 config\|mode=cs1}} [[File:DFA example multiplies of 3.svg\|thumb\|upright=1.1\|An example of a deterministic finite automaton that accepts only binary numbers that are multiples of 3. The state S<sub>0</sub> is both the start state and an accept state. For example, the string "1001" leads to the state sequence S<sub>0</sub>, S<sub>1</sub>, S<sub>2</sub>, S<sub>1</sub>, S<sub>0</sub>, and is hence accepted.]] In the [[theory of computation]], a branch of [[theoretical computer science]], a '''deterministic finite automaton''' ('''DFA''')—also known as '''[[Finite-state machine#Acceptors (recognizers)\|deterministic finite acceptor]]''' ('''DFA'''), '''deterministic finite-state machine''' ('''DFSM'''), or '''deterministic finite-state automaton''' ('''DFSA''')—is a [[finite-state machine]] that accepts or rejects a given [[String (computer science)\|string]] of symbols, by running through a state sequence uniquely determined by the string.~~<ref name="~~{{sfn\|Hopcroft ~~2001">[[Deterministic finite automaton#HMU~~\|~~Hopcroft 2001]]</ref>~~Motwani\|Ullman\|2006}} ''Deterministic'' refers to the uniqueness of the computation run. In search of the simplest models to capture finite-state machines, [[Warren McCulloch]] and [[Walter Pitts]] were among the first researchers to introduce a concept similar to finite automata in 1943.~~<ref>[[Deterministic finite automaton#MP43~~{{sfn\|McCulloch ~~and~~\| Pitts (\|1943~~)]]</ref><ref>[[Deterministic finite automaton#RS59~~}}{{sfn\|Rabin ~~and~~\| Scott (\|1959~~)]]</ref>~~}} The figure illustrates a deterministic finite automaton using a [[state diagram]]. In this example automaton, there are three states: S<sub>0</sub>, S<sub>1</sub>, and S<sub>2</sub> (denoted graphically by circles). The automaton takes a finite [[sequence]] of 0s and 1s as input. For each state, there is a transition arrow leading out to a next state for both 0 and 1. Upon reading a symbol, a DFA jumps ''deterministically'' from one state to another by following the transition arrow. For example, if the automaton is currently in state S<sub>0</sub> and the current input symbol is 1, then it deterministically jumps to state S<sub>1</sub>. A DFA has a ''start state'' (denoted graphically by an arrow coming in from nowhere) where computations begin, and a [[set (mathematics)\|set]] of ''accept states'' (denoted graphically by a double circle) which help define when a computation is successful. A DFA is defined as an abstract mathematical concept, but is often implemented in hardware and software for solving various specific problems such as [[lexical analysis]] and [[pattern matching]]. For example, a DFA can model software that decides whether or not online user input such as email addresses are syntactically valid.<ref>{{~~Citation~~citation \| last1 = Bai \| ~~last~~first1 = ~~Gouda~~Gina R. \| last2 = Clee \| first2 = Brian ~~\| first = Prabhakar~~ \| last3 = Shrestha \| first3 = Nischal ~~\| title = Application of Finite automata~~ \| last4 = Chapman \| first4 = Carl }}</ref>▼ \| last5 = Wright \| first5 = Cimone \| last6 = Stolee \| first6 = Kathryn T. \| editor1-last = Guéhéneuc \| editor1-first = Yann-Gaël \| editor2-last = Khomh \| editor2-first = Foutse \| editor3-last = Sarro \| editor3-first = Federica \| contribution = Exploring tools and strategies used during regular expression composition tasks \| contribution-url = https://par.nsf.gov/servlets/purl/10100320 \| doi = 10.1109/ICPC.2019.00039 \| pages = 197–208 \| publisher = IEEE / ACM \| title = Proceedings of the 27th International Conference on Program Comprehension, ICPC 2019, Montreal, QC, Canada, May 25-31, 2019 \| year = 2019\| isbn = 978-1-7281-1519-1 ▲ }}</ref> DFAs have been generalized to ''[[nondeterministic finite automata]] (NFA)'' which may have several arrows of the same label starting from a state. Using the [[powerset construction]] method, every NFA can be translated to a DFA that recognizes the same language. DFAs, and NFAs as well, recognize exactly the set of [[regular language]]s.~~<ref name="~~{{sfn\|Hopcroft ~~2001"/>~~\|Motwani\|Ullman\|2006}} ==Formal definition== A deterministic finite automaton {{mvar\|M}} is a 5-[[n-tuple\|tuple]], {{math\|(''Q'', Σ, ''δ'', ''q''<sub>0</sub>, ''F'')}}, consisting of * a finite [[Set (mathematics)\|set]] of [[State (computer science)\|states]] {{mvar\|Q}} * a finite set of input symbols called the [[Alphabet (computer science)\|alphabet]] {{math\|Σ}} * a transition [[function (mathematics)\|function]] {{math\|''δ'' : ''Q'' × Σ → ''Q''}} * an initial (or ~~[[Finite-state machine#Start state\|~~start) state]] <math>q_0 \in Q</math> * a set of ~~[[Finite-state~~accepting ~~machine#Accept .28or~~(or final~~.29 states\|accept~~) states]] <math>F \subseteq Q</math> Let {{math\|1=''w'' = ''a''<sub>1</sub>''a''<sub>2</sub>…...''a<sub>n</sub>''}} be a string over the alphabet {{math\|Σ}}. The automaton {{mvar\|M}} accepts the string {{mvar\|w}} if a sequence of states, {{math\|''r''<sub>0</sub>, ''r''<sub>1</sub>, …..., ''r<sub>n</sub>''}}, exists in {{mvar\|Q}} with the following conditions: # {{math\|1=''r''<sub>0</sub> = ''q''<sub>0</sub>}} # {{math\|1=''r''<sub>''i''+1</sub> = ''δ''(''r<sub>i</sub>'', ''a''<sub>''i''+1</sub>)}}, for {{math\|1=''i'' = 0, …..., ''n'' − 1}} # <math>r_n \in F</math>. Line 65 ⟶ 79: ===Local automata=== A '''local automaton''' is a DFA, not necessarily complete, for which all edges with the same label lead to a single vertex. Local automata accept the class of [[Local language (formal language)\|local languages]], those for which membership of a word in the language is determined by a "sliding window" of length two on the word.~~<ref name=Law129>~~{{sfn\|Lawson (\|2004) \|p~~.129</ref><ref name~~=~~Sak228>~~129}}{{sfn\|Sakarovitch (\|2009) \|p.=228~~</ref>~~}} A '''Myhill graph''' over an alphabet ''A'' is a [[directed graph]] with [[Vertex (graph theory)\|vertex set]] ''A'' and subsets of vertices labelled "start" and "finish". The language accepted by a Myhill graph is the set of directed paths from a start vertex to a finish vertex: the graph thus acts as an automaton.~~<ref name~~{{sfn\|Lawson\|2004\|p=~~Law129/>~~129}} The class of languages accepted by Myhill graphs is the class of local languages.~~<ref name=Law128>~~{{sfn\|Lawson (\|2004)\| p.=128~~</ref>~~}} ===Randomness=== When the start state and accept states are ignored, a DFA of {{mvar\|n}} states and an alphabet of size {{mvar\|k}} can be seen as a [[Directed graph\|digraph]] of {{mvar\|n}} vertices in which all vertices have {{mvar\|k}} out-arcs labeled {{math\|1, …..., ''k''}} (a {{mvar\|k}}-out digraph). It is known that when {{math\|''k'' ≥ 2}} is a fixed integer, with high probability, the largest [[strongly connected component]] (SCC) in such a {{mvar\|k}}-out digraph chosen uniformly at random is of linear size and it can be reached by all vertices.<ref name=Grusho>{{cite journal\|last1=Grusho\|first1=A. A.\|title=Limit distributions of certain characteristics of random automaton graphs\|journal=Mathematical Notes of the Academy of Sciences of the USSR\|date=1973\|volume=4\|pages=633–637\|doi=10.1007/BF01095785\|s2cid=121723743\|ref=Grusho1973}}</ref> It has also been proven that if {{mvar\|k}} is allowed to increase as {{mvar\|n}} increases, then the whole digraph has a phase transition for strong connectivity similar to [[Erdős–Rényi model]] for connectivity.<ref name=Cai>{{cite journal \|last1=Cai \|first1=Xing Shi \|last2=Devroye \|first2=Luc \|title=The graph structure of a deterministic automaton chosen at random \|journal=Random Structures & Algorithms \|date=October 2017 \|volume=51 \|issue=3 \|pages=428–458 \|doi=10.1002/rsa.20707\|arxiv=1504.06238 \|s2cid=13013344 }}</ref> In a random DFA, the maximum number of vertices reachable from one vertex is very close to the number of vertices in the largest [[strongly connected component\|SCC]] with high probability.<ref name=Grusho /><ref>{{cite conference \|last1=Carayol \|first1=Arnaud \|last2=Nicaud \|first2=Cyril \|date=February 2012 \|title=Distribution of the number of accessible states in a random deterministic automaton \|volume=14 \|pages=194–205 \|conference=STACS'12 (29th Symposium on Theoretical Aspects of Computer Science) \|___location=Paris, France \|url=https://hal.archives-ouvertes.fr/hal-00678213}}</ref> This is also true for the largest [[Glossary of graph theory#Subgraphs\|induced sub-digraph]] of minimum in-degree one, which can be seen as a directed version of [[Degeneracy (graph theory)#k-Cores\|{{math\|1}}-core]].<ref name=Cai /> Line 80 ⟶ 94: {{columns-list\|colwidth=30em\| Union Intersection{{sfn\|Hopcroft\|Ullman\|1979\|pp=59–60}} (see picture) Intersection<ref>{{cite book \| isbn=0-201-02988-X \| author=John E. Hopcroft and Jeffrey D. Ullman \| title=Introduction to Automata Theory, Languages, and Computation \| url=https://archive.org/details/introductiontoau00hopc \| url-access=registration \| ___location=Reading/MA \| publisher=Addison-Wesley \| year=1979 }}</ref>{{rp\|59–60}} (see picture) Concatenation [[Complementation of automata#With deterministic finite automata\|Complement]] Complement [[Kleene closure]] Reversal<ref name=rose/> Quotient~~{{citation~~<ref ~~needed\|date~~name=~~January~~rose>{{cite ~~2015}}~~journal \| last = Rose \| first = Gene F. Substitution{{citation needed\|date=January 2015}} \| doi = 10.1016/S0022-0000(68)80029-7 Homomorphism{{citation needed\|date=January 2015}} \| issue = 2 \| journal = [[Journal of Computer and System Sciences]] \| pages = 148–168 \| title = Closures which Preserve Finiteness in Families of Languages \| volume = 2 \| year = 1968}}</ref> Substitution<ref name=spanier>{{cite journal \| last = Spanier \| first = E. \| doi = 10.1080/00029890.1969.12000214 \| journal = American Mathematical Monthly \| jstor = 2316423 \| mr = 241205 \| pages = 335–342 \| title = Grammars and languages \| volume = 76 \| year = 1969\| issue = 4 }}</ref> Homomorphism<ref name=rose/><ref name=spanier/> }} Line 117 ⟶ 149: the DFA with a minimum number of states for a particular regular language (Minimization Problem) DFAs are equivalent in computing power to [[nondeterministic finite automata]] (NFAs). This is because, firstly any DFA is also an NFA, so an NFA can do what a DFA can do. Also, given an NFA, using the [[powerset construction]] one can build a DFA that recognizes the same language as the NFA, although the DFA could have exponentially larger number of states than the NFA.~~<ref name=Sak105>~~{{sfn\|Sakarovitch (\|2009) \|p.=105~~</ref><ref name=Law63>~~}}{{sfn\|Lawson (\|2004) \|p.=63~~</ref>~~}} However, even though NFAs are computationally equivalent to DFAs, the above-mentioned problems are not necessarily solved efficiently also for NFAs. The non-universality problem for NFAs is [[PSPACE complete]] since there are small NFAs with shortest rejecting word in exponential size. A DFA is universal if and only if all states are final states, but this does not hold for NFAs. The Equality, Inclusion and Minimization Problems are also PSPACE complete since they require forming the complement of an NFA which results in an exponential blow up of size.<ref>{{Cite web \|~~last~~last1=Esparza Estaun \|~~first~~first1=Francisco Javier \|last2=Sickert \|first2=Salomon \|last3=Blondin \|first3=Michael \|date=16 November 2016 \|orig-date= \|title=Operations and tests on sets: Implementation on DFAs \|url=https://www7.in.tum.de/um/courses/auto/ws1718/slides1718/04-Implementations_sets.pdf \|url-status=dead \|archive-url=~~http~~https://web.archive.org/web/20180808171506/https://www7.in.tum.de/um/courses/auto/ws1718/slides1718/04-Implementations_sets.pdf \|archive-date=8 August 2018 \|website=Automata and Formal Languages 2017/18 ~~\|postscript=. Another version of the file, created on 13 November 2019, is available at https://archive.model.in.tum.de/um/courses/auto/ws1920/slides1718/04-Implementations_sets.pdf.~~}}</ref> On the other hand, finite-state automata are of strictly limited power in the languages they can recognize; many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA. The classic example of a simply described language that no DFA can recognize is bracket or [[Dyck language]], i.e., the language that consists of properly paired brackets such as word "(()())". Intuitively, no DFA can recognize the Dyck language because DFAs are not capable of counting: a DFA-like automaton needs to have a state to represent any possible number of "currently open" parentheses, meaning it would need an unbounded number of states. Another simpler example is the language consisting of strings of the form ''a<sup>n</sup>b<sup>n</sup>'' for some finite but arbitrary number of ''a''~~'~~{{'}}s, followed by an equal number of ''b''~~'~~{{'}}s.~~<ref name=Law46>~~{{sfn\|Lawson (\|2004) \|p.=46~~</ref>~~}} ==DFA identification from labeled words== Line 125 ⟶ 157: Given a set of ''positive'' words <math>S^+ \subset \Sigma^</math> and a set of ''negative'' words <math>S^- \subset \Sigma^</math> one can construct a DFA that accepts all words from <math>S^+</math> and rejects all words from <math>S^-</math>: this problem is called ''DFA identification'' (synthesis, learning). While ''some'' DFA can be constructed in linear time, the problem of identifying a DFA with the minimal number of states is NP-complete.<ref name="Complexity of Automaton Identificat">{{cite journal\|last1=Gold\|first1=E. M.\|author1-link=E. Mark Gold\|title=Complexity of Automaton Identification from Given Data\|journal=Information and Control\|volume=37\|issue=3\|pages=302–320\|year=1978\|doi=10.1016/S0019-9958(78)90562-4\|ref=Gold78\|doi-access=}}</ref> The first algorithm for minimal DFA identification has been proposed by Trakhtenbrot and Barzdin<ref>{{Cite book \| url=https://books.google.com/books?id=h5XOBQAAQBAJ&pg=PP1 \|title = Finite Automata: Behavior and Synthesis\|isbn = 9781483297293\|last1 = De Vries\|first1 = A.\|date = 28 June 2014\| publisher=Elsevier }}</ref> and is called the ''TB-algorithm''. However, the TB-algorithm assumes that all words from <math>\Sigma</math> up to a given length are contained in either <math>S^+ \cup S^-</math>. Line 131 ⟶ 163: However, Traxbar does not guarantee the minimality of the constructed DFA. In his work<ref name="Complexity of Automaton Identificat"/> E.M. Gold also proposed a heuristic algorithm for minimal DFA identification. Gold's algorithm assumes that <math>S^+</math> and <math>S^-</math> contain a ''[[Characteristic Samples\|characteristic set]]'' of the regular language; otherwise, the constructed DFA will be inconsistent either with <math>S^+</math> or <math>S^-</math>. Other notable DFA identification algorithms include the RPNI algorithm,<ref>{{Cite book \| doi=10.1142/9789812797902_0004\| chapter=Inferring Regular Languages in Polynomial Updated Time\| title=Pattern Recognition and Image Analysis\| volume=1\| pages=49–61\| series=Series in Machine Perception and Artificial Intelligence\| year=1992\| last1=Oncina\| first1=J.\| last2=García\| first2=P.\| isbn=978-981-02-0881-3}}</ref> the Blue-Fringe evidence-driven state-merging algorithm,<ref>{{Cite book \|doi = 10.1007/BFb0054059\|chapter = Results of the Abbadingo one DFA learning competition and a new evidence-driven state merging algorithm\|title = Grammatical Inference\|volume = 1433\|pages = 1–12\|series = Lecture Notes in Computer Science\|year = 1998\|last1 = Lang\|first1 = Kevin J.\|last2 = Pearlmutter\|first2 = Barak A.\|last3 = Price\|first3 = Rodney A.\|isbn = 978-3-540-64776-8\|url = http://eprints.maynoothuniversity.ie/10250/1/BP-Results-1998.pdf}}</ref> and Windowed-EDSM.<ref>{{Cite book \| url=https://dl.acm.org/doi/abs/10.5555/645519.655966 \| title=Beyond EDSM ~~\|~~{{pipe}} Proceedings of the 6th International Colloquium on Grammatical Inference: Algorithms and Applications\| date=23 September 2002\| pages=37–48\| isbn=9783540442394\| last1=Adriaans\| first1=Pieter\| last2=Fernau\| first2=Henning\| last3=Zaanen\| first3=Menno van\| publisher=Springer}}</ref> Another research direction is the application of [[evolutionary algorithm]]s: the smart state labeling evolutionary algorithm<ref>{{Cite journal \|doi = 10.1109/TPAMI.2005.143\|pmid = 16013754\|title = Learning deterministic finite automata with a smart state labeling evolutionary algorithm\|journal = IEEE Transactions on Pattern Analysis and Machine Intelligence\|volume = 27\|issue = 7\|pages = 1063–1074\|year = 2005\|last1 = Lucas\|first1 = S.M.\|last2 = Reynolds\|first2 = T.J.\|s2cid = 14062047}}</ref> allowed to solve a modified DFA identification problem in which the training data (sets <math>S^+</math> and <math>S^-</math>) is ''noisy'' in the sense that some words are attributed to wrong classes. Line 148 ⟶ 180: '''Read-only right-moving Turing machines''' are a particular type of [[Turing machine]] that only moves right; these are almost exactly equivalent to DFAs.<ref name=RORMTM>{{cite book \| last = Davis\| first = Martin \|author2=Ron Sigal \|author3=Elaine J. Weyuker \| title = Second Edition: Computability, Complexity, and Languages and Logic: Fundamentals of Theoretical Computer Science \| edition = 2nd \| publisher = Academic Press, Harcourt, Brace & Company\| ___location = San Diego \| year = 1994\| ~~ISBN~~isbn =0-12-206382-1}}</ref> The definition based on a singly infinite tape is a 7-[[tuple]] Line 231 ⟶ 263: ==References== * {{Hopcroft and Ullman 1979\|author-link=no\|title-link=no}}{{sfn whitelist\|CITEREFHopcroftUllman1979}} * {{cite book \|last1=Hopcroft \|first1=John E. \|author-link1=John Hopcroft \|last2=Motwani \|first2=Rajeev \|author-link2=Rajeev Motwani \|last3=Ullman \|first3=Jeffrey D. \|author-link3=Jeffrey Ullman \|title=[[Introduction to Automata Theory, Languages, and Computation]] \|edition=2 \|publication-place=Boston \|url= \|access-date= \|archive-url= \|archive-date= \|url-status=dead \|publisher=[[Addison Wesley]] \|year=2001 \|isbn=0-201-44124-1 \|ref=HMU \|postscript=. [http://web.archive.org/web/20150924071109/http://www.pearsonhighered.com/educator/product/Introduction-to-Automata-Theory-Languages-and-Computation/9780201441246.page Archived] from the original on 24 September 2015.}} * {{Hopcroft, Motwani, and Ullman 2006}}{{sfn whitelist\|CITEREFHopcroftMotwaniUllman2006}} * {{cite book \| last=Lawson \| first=Mark V. \| title=Finite automata \| publisher=Chapman and Hall/CRC \| year=2004 \| isbn=1-58488-255-7 \| zbl=1086.68074 }} * {{cite journal Line 244 ⟶ 277: \|issue= 4 \|pages= 115–133 ~~\|ref=MP43~~ \|doi= 10.1007/BF02478259 \|pmid= 2185863 Line 259 ⟶ 291: \|year= 1959 \|pages= 114–125 ~~\|ref=RS59~~ \|url=https://www.researchgate.net/publication/230876408\|doi= 10.1147/rd.32.0114 }} * {{cite book \| last=Sakarovitch \| first=Jacques \| title=Elements of automata theory \| others=Translated from the French by Reuben Thomas \| ___location=Cambridge \| publisher=[[Cambridge University Press]] \| year=2009 \| isbn=978-0-521-84425-3 \| zbl=1188.68177 }} ===Further reading=== * {{cite book \| first=Michael \| last=Sipser \| title=Introduction to the Theory of Computation \| publisher=PWS \| ___location=Boston \| year=1997 \| isbn=0-534-94728-X \| url-access=registration \| url=https://archive.org/details/introductiontoth00sips }}. Section 1.1: Finite Automata, pp. 31–47. Subsection "Decidable Problems Concerning Regular Languages" of section 4.1: Decidable Languages, pp. 152–155.4.4 DFA can accept only regular language * {{Sipser 1997}} — '''1.1''': "Finite Automata" pp. 31–47. '''4.1''': "Decidable Languages - Decidable Problems Concerning Regular Languages" pp. 152–155. '''4.4''': DFA can accept only regular language {{Formal languages and grammars\|state=collapsed}} {{DEFAULTSORT:Deterministic Finite-State Machine}} [[Category:Finite-state ~~automata~~machines]]