Pattern matching: Difference between revisions

Content deleted Content added
Zemiak123 (talk | contribs)
m wikilink
Bender the Bot (talk | contribs)
 
(35 intermediate revisions by 26 users not shown)
Line 1:
{{Short description|Act of checking a given sequence of tokens for the presence of the constituents of some pattern}}
{{aboutAbout|pattern matching in [[functional programming]]||string matching|and|pattern recognition}}
{{forFor|the use of variable matching criteria in defining abstract patterns to match|regular expression}}
{{RefimproveMore citations needed|date=February 2011}}
In [[computer science]], '''pattern matching''' is the act of checking a given '''sequence''' of [[Lexical analysis#Token|tokens]] for the presence of the constituents of some [[pattern]]. In contrast to [[pattern recognition]], the match usually has tomust be exact: "either it will or will not be a match." The patterns generally have the form of either [[stringString (computer science)|sequences]] or [[tree structure]]s. Uses of pattern matching include outputting the locations (if any) of a pattern within a token sequence, to output some component of the matched pattern, and to substitute the matching pattern with some other token sequence (i.e., [[Regular expression|search and replace]]).
 
Sequence patterns (e.g., a text string) are often described using [[regular expression]]s and matched using techniques such as [[backtracking]].
 
Tree patterns are used in some [[programming language]]s as a general tool to process data based on its structure, e.g. [[C Sharp (programming language)|C#]],<ref>{{cite web|url=https://docs.microsoft.com/en-us/dotnet/csharp/pattern-matching|title=Pattern Matching - C# Guide|date=13 March 2024}}</ref> [[F Sharp (programming language)|F#]],<ref>{{cite web|url=https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching|title=Pattern Matching - F# Guide|date=5 November 2021}}</ref> [[Haskell]],<ref>[http://www.haskell.org/tutorial/patterns.html A Gentle Introduction to Haskell: Patterns]</ref> [[Java (programming language)|HaskellJava]],<ref>https://docs.oracle.com/en/java/javase/21/language/pattern-matching.html</ref> [[ML (programming language)|ML]], [[Python (programming language)|Python]],<ref>{{Cite web|title=What's New In Python 3.10 — Python 3.10.0b3 documentation|url=https://docs.python.org/3.10/whatsnew/3.10.html#pep-634-structural-pattern-matching|access-date=2021-07-06|website=docs.python.org}}</ref> [[Racket (programming language)|Racket]],<ref>{{Cite web|title=Pattern Matching|url=https://docs.racket-lang.org/reference/match.html|access-date=2025-06-25|website=docs.racket-lang.org}}</ref> [[Ruby (programming language)|Ruby]],<ref>{{Cite web|title=pattern_matching - Documentation for Ruby 3.0.0|url=https://docs.ruby-lang.org/en/3.0.0/doc/syntax/pattern_matching_rdoc.html|access-date=2021-07-06|website=docs.ruby-lang.org}}</ref> [[Rust (programming language)|Rust]],<ref>{{cite web |url=https://doc.rust-lang.org/book/ch18-03-pattern-syntax.html |title=Pattern Syntax - The Rust Programming languageLanguage}}</ref> [[Scala (programming language)|Scala]],<ref>{{Cite web|title=Pattern Matching|url=https://docs.scala-lang.org/tour/pattern-matching.html|access-date=2021-01-17|website=Scala Documentation}}</ref> [[Swift (programming language)|Swift]]<ref>{{cite web|url=https://docs.swift.org/swift-book/ReferenceManual/Patterns.html|title=Patterns — The Swift Programming Language (Swift 5.1)}}</ref> and the symbolic mathematics language [[Wolfram Mathematica|Mathematica]] have special [[Syntax (programming languages)|syntax]] for expressing tree patterns and a [[language construct]] for [[Conditional (computer programming)|conditional execution]] and value retrieval based on it.
 
Often it is possible to give alternative patterns that are tried one by one, which yields a powerful [[Conditional (programming)|conditional programming construct]]. Pattern matching sometimes includes support for [[guardGuard (computingcomputer science)|guards]].{{citation needed|date=January 2019}}
 
[[Parsing]] algorithms often rely on pattern matching to transform strings into [[Abstract syntax tree|syntax tree]]s.<ref>Warth, Alessandro, and Ian Piumarta. "[http://tinlizzie.org/~awarth/papers/dls07.pdf OMeta: an object-oriented language for pattern matching]." Proceedings of the 2007 symposium on Dynamic languages. ACM, 2007.</ref><ref>Knuth, Donald E., James H. Morris, Jr, and Vaughan R. Pratt. "Fast pattern matching in strings." SIAM journal on computing 6.2 (1977): 323-350.</ref>
 
==History==
{{See also|Regular expression#History}}
{{Expand section|date=May 2008}}
Early programming languages with pattern matching constructs include [[COMIT]] (1957), [[SNOBOL]] (1962), [[Refal]] (1968) with tree-based pattern matching, [[Prolog]] (1972), St Andrews Static Language ([[SASL (programming language)|SASL]]) (1976), [[NPL (programming language)|NPL]] (1977), and [[Kent Recursive Calculator]] (KRC) (1981).
 
The pattern matching feature of function arguments in the language [[ML (programming language)|ML]] (1973) and its dialect [[Standard ML]] (1983) has been carried over to some other [[functional programming]] languages that were influenced by them, such as [[Haskell]] (1990), [[Scala (programming language)|Scala]] (2004), and [[F Sharp (programming language)|F#]] (2005). The pattern matching construct with the <code>match</code> keyword that was introduced in the [[ML (programming language)|ML]] dialect [[Caml]] (1985) was followed by languages such as [[OCaml]] (1996), [[F Sharp (programming language)|F#]] (2005), [[F* (programming language)|F*]] (2011), and [[Rust (programming language)|Rust]] (2015).
The first computer programs to use pattern matching were text editors.{{citation needed|reason=obviously, the first programs doing pattern matching, even if ad hoc would have been compilers which came long before interactive text editors|date=November 2011}} At [[Bell Labs]], [[Ken Thompson (computer programmer)|Ken Thompson]] extended the seeking and replacing features of the [[QED (text editor)|QED editor]] to accept [[regular expression]]s. Early programming languages with pattern matching constructs include [[SNOBOL]] from 1962, [[USSR|Soviet]] language [[Refal]] from 1968 with tree-based pattern matching, [[SASL programming language|SASL]] from 1976, [[NPL programming language|NPL]] from 1977, and [[Kent Recursive Calculator|KRC]] from 1981. Another programming language with tree-based pattern matching features was Fred McBride's extension of [[LISP]], in 1970.<ref>{{cite web |url=http://www.cs.nott.ac.uk/~ctm/view.ps.gz |title=Archived copy |access-date=2007-04-14 |url-status=dead |archive-url=https://web.archive.org/web/20070203111451/http://www.cs.nott.ac.uk/~ctm/view.ps.gz |archive-date=2007-02-03 }}</ref>
 
Many [[text editor]]s support pattern matching of various kinds: the [[QED (text editor)|QED editor]] supports [[regular expression]] search, and some versions of [[TECO (text editor)|TECO]] support the OR operator in searches.
{{See also|Regular expression#History}}
 
[[Computer algebra system]]s generally support pattern matching on algebraic expressions.<ref>Joel Moses, "Symbolic Integration", MIT Project MAC MAC-TR-47, December 1967</ref>
 
==Terminology==
 
Pattern matching involves specialized terminology.
 
; Matching
: The act of comparing a ''discriminant'' to a ''pattern'' (or collection of patterns), possibly selecting a ''continuation'', extracting ''bindings'', performing a ''substitution'', or any combination of these. Also known as '''destructuring'''.
 
; Pattern
: Syntax describing expected structure in the ''discriminant'', plus specification of portions of the discriminant to extract (''bindings'') or ignore (''wildcards''). Pattern languages can be rich; see below for terminology denoting specific kinds of pattern.
 
; Discriminant
: The value to be examined and matched against a pattern. In most cases, this will be a data structure of some kind, with type [[Duality (mathematics)|dual to]] the pattern being applied. Also known as the '''subject value''' or '''scrutinee'''.
 
; Continuation
: In some languages, when multiple alternative patterns are applied to a discriminant, when one alternative matches, an associated code fragment is executed in an environment extended with the matching pattern's ''bindings''. This code fragment is the ''continuation'' associated with the pattern.
 
; Substitution
: Replacement of a portion of a discriminant data structure with some computed value. The computation may depend on the replaced portion of the discriminant as well as on other bindings extracted from the discriminant.
 
===Terminology of patterns===
 
While some concepts are relatively common to many pattern languages, other pattern languages include unique or unusual extensions.
 
; [[Name binding|Binding]]
: A way of associating a ''name'' with a portion of the discriminant, so that the name is [[Name binding|bound to]] that portion when the continuation executes. For example, in Rust, {{code|2=rust|1=match v { (a, b) => ... } }} expects <code>v</code> to be a pair, and <code>a</code> and <code>b</code> are bindings bringing variables of the same name into scope in the continuation ("<code>...</code>").
 
; Wildcard
: Often written as a single underscore, <code>_</code>, the wildcard pattern accepts all values without examining them further, ignoring their structure. Also known as '''discard''', the '''wild pattern''', the '''catch-all pattern''', or as a '''hole'''.
 
; [[Guard (computer science)|Guard]]
: A ''guard'' is an expression that must succeed (or yield boolean true) as a final step before considering a pattern to have successfully matched. In some languages (e.g. [[Erlang (programming language)|Erlang]]), guards are written using a restricted subset of the full language; in others (e.g. [[Haskell]]), guards may use the full language.
 
; Predicate
: Some pattern languages allow user-defined ''predicate'' functions to be embedded in a pattern. The predicate is applied to the portion of the discriminant corresponding to the position of the predicate in the pattern; if the predicate responds with boolean false, the pattern is considered to have failed. For example, in Racket, the pattern {{code|(list (? even?) ...)|rkt}} first expects a list, and then applies the predicate <code>even?</code> to each element; the overall pattern thus succeeds only when the discriminant is a list of even numbers.
 
; View pattern
: Languages like Haskell<ref>{{cite web|url=https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/view_patterns.html|title=View Patterns - Glasgow Haskell Compiler User's Guide}}</ref> and Racket<ref>{{cite web|url=https://docs.racket-lang.org/reference/match.html#(idx._(gentag._255._(lib._scribblings/reference/reference..scrbl)))|title=Pattern Matching: app}}</ref> include ''view patterns'', where a user-defined function transforms the portion of the discriminant corresponding to the position of the view pattern before continuing the match. View patterns generalize predicate patterns, allowing further matching on the result of the function rather than simply expecting a boolean value.
 
; Constraint
: Some pattern languages allow direct comparison of portions of the discriminant with previously-computed (or constant) data structures. For example, the pattern {{code|1=(== expr)|2=rkt}} in Racket compares the value against the result of evaluating <code>expr</code>. In Erlang, mention of any variable already in scope in a pattern causes it to act as a constraint in this way (instead of as a binding).
 
; Literal pattern; atomic pattern
: Patterns that match simple atomic data such as <code>123</code> or <code>"hello"</code> are called ''literal patterns''.
 
; Compound pattern
: Patterns that destructure compound values such as lists, hash tables, tuples, structures or records, with sub-patterns for each of the values making up the compound data structure, are called ''compound patterns''.
 
; Alternative (<code>or</code>-pattern)
: Many languages allow multiple alternatives at the top-level of a pattern matching construct, each associated with a ''continuation''; some languages allow alternatives ''within'' a pattern. In most cases, such alternatives have additional constraints placed on them: for example, every alternative may be required to produce the same set of ''bindings'' (at the same types).
 
; Macros
: Some languages allow macros in pattern context to allow abstraction over patterns. For example, in Racket, ''match expanders'' perform this role.<ref>{{cite web|url=https://docs.racket-lang.org/reference/match.html#%28form._%28%28lib._racket%2Fmatch..rkt%29._define-match-expander%29%29|title=Pattern Matching: define-match-expander}}</ref>
 
==Types==
===Primitive patterns===
The simplest pattern in pattern matching is an explicit value or a variable. For an example, consider a simple function definition in Haskell syntax (function parameters are not in parentheses but are separated by spaces, = is not assignment but definition):
 
Line 32 ⟶ 89:
</syntaxhighlight>
 
Here, the first <code>n</code> is a single variable pattern, which will match absolutely any argument and bind it to name n to be used in the rest of the definition. In Haskell (unlike at least [[Hope (programming language)|Hope]]), patterns are tried in order so the first definition still applies in the very specific case of the input being 0, while for any other argument the function returns <code>n * f (n-1)</code> with n being the argument.
 
The wildcard pattern (often written as <code>_</code>) is also simple: like a variable name, it matches any value, but does not bind the value to any name. Algorithms for [[matching wildcards]] in simple string-matching situations have been developed in a number of [[recursion|recursive]] and non-recursive varieties.<ref>{{cite web| last=Cantatore| first=Alessandro| title=Wildcard matching algorithms| year=2003| url=http://xoomer.virgilio.it/acantato/dev/wildcard/wildmatch.html}}</ref>
 
===Tree patterns===
More complex patterns can be built from the primitive ones of the previous section, usually in the same way as values are built by combining other values. The difference then is that with variable and wildcard parts, a pattern doesn'tdoes not build into a single value, but matches a group of values that are the combination of the concrete elements and the elements that are allowed to vary within the structure of the pattern.
 
A tree pattern describes a part of a tree by starting with a node and specifying some branches and nodes and leaving some unspecified with a variable or wildcard pattern. It may help to think of the [[abstract syntax tree]] of a programming language and [[algebraic data type]]s.
 
====Haskell====
In Haskell, the following line defines an algebraic data type <code>Color</code> that has a single data constructor <code>ColorConstructor</code> that wraps an integer and a string.
 
Line 64 ⟶ 122:
The creations of these functions can be automated by Haskell's data [[Record (computer science)|record]] syntax.
 
====OCaml====
This [[OcamlOCaml]] example which defines a [[Red–black_tree|red-blackred–black tree]] and a function to re-balance it after element insertion shows how to match on a more complex structure generated by a recursive data type. The compiler verifies at compile-time that the list of cases is exhaustive and none are redundant.
 
<syntaxhighlight lang="ocaml">
Line 79 ⟶ 138:
</syntaxhighlight>
 
==Usage==
===Filtering data with patterns===
Pattern matching can be used to filter data of a certain structure. For instance, in Haskell a [[list comprehension]] could be used for this kind of filtering:
 
Line 89 ⟶ 149:
[A 1, A 2]
 
===Pattern matching in Mathematica===
In [[Mathematica]], the only structure that exists is the [[Tree (data structure)|tree]], which is populated by symbols. In the [[Haskell (programming language)|Haskell]] syntax used thus far, this could be defined as
<syntaxhighlight lang="mathematicahaskell">
data SymbolTree = Symbol String [SymbolTree]
</syntaxhighlight>
An example tree could then look like
<syntaxhighlight lang="mathematica">
Symbol "a" [Symbol "b" [], Symbol "c" [] ]
</syntaxhighlight>
In the traditional, more suitable syntax, the symbols are written as they are and the levels of the tree are represented using <code>[]</code>, so that for instance <code>a[b,c]</code> is a tree with a as the parent, and b and c as the children.
 
A pattern in Mathematica involves putting "_" at positions in that tree. For instance, the pattern
Line 138 ⟶ 198:
</syntaxhighlight>
 
====Declarative programming====
In symbolic programming languages, it is easy to have patterns as arguments to functions or as elements of data structures. A consequence of this is the ability to use patterns to declaratively make statements about pieces of data and to flexibly instruct functions how to operate.
 
Line 148 ⟶ 208:
</syntaxhighlight>
 
Mailboxes in [[Erlang (programming language)|Erlang]] also work this way.
 
The [[Curry–Howard correspondence]] between proofs and programs relates [[ML (programming language)|ML]]-style pattern matching to [[Proof by cases|case analysis]] and [[proof by exhaustion]].
 
===Pattern matching and strings===
By far the most common form of pattern matching involves strings of characters. In many programming languages, a particular syntax of strings is used to represent regular expressions, which are patterns describing string characters.
 
However, it is possible to perform some string pattern matching within the same framework that has been discussed throughout this article.
 
====Tree patterns for strings====
In Mathematica, strings are represented as trees of root StringExpression and all the characters in order as children of the root. Thus, to match "any amount of trailing characters", a new wildcard ___ is needed in contrast to _ that would match only a single character.
 
In Haskell and [[functional programming]] languages in general, strings are represented as functional [[List (computing)|lists]] of characters. A functional list is defined as an empty list, or an element constructed on an existing list. In Haskell syntax:
 
<syntaxhighlight lang="haskell">
Line 185 ⟶ 245:
head[element, ]:=element
 
====Example string patterns====
In Mathematica, for instance,
<syntaxhighlight lang="mathematica">
 
StringExpression["a",_]
</syntaxhighlight>
 
will match a string that has two characters and begins with "a".
 
Line 204 ⟶ 264:
will match a string that consists of a letter first, and then a number.
 
In Haskell, [[guardGuard (computingcomputer science)|guards]] could be used to achieve the same matches:
 
<syntaxhighlight lang="haskell">
Line 212 ⟶ 272:
The main advantage of symbolic string manipulation is that it can be completely integrated with the rest of the programming language, rather than being a separate, special purpose subunit. The entire power of the language can be leveraged to build up the patterns themselves or analyze and transform the programs that contain them.
 
====SNOBOL====
{{Main|SNOBOL}}
SNOBOL (''StriNg Oriented and symBOlic Language'') is a computer programming language developed between 1962 and 1967 at [[AT&T Corporation|AT&T]] [[Bell Laboratories]] by [[David J. Farber]], [[Ralph E. Griswold]] and [[Ivan P. Polonsky]].
 
SNOBOL (''StriNg Oriented and symBOlic Language'') is a computer programming language developed between 1962 and 1967 at [[AT&T Corporation|AT&T]] [[Bell Laboratories]] by [[David J. Farber]], [[Ralph E. Griswold]] and [[Ivan P. Polonsky]].
 
SNOBOL4 stands apart from most programming languages by having patterns as a [[first-class object|first-class data type]] (''i.e.'' a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern [[concatenation]] and [[alternation (formal language theory)|alternation]]. Strings generated during execution can be treated as programs and executed.
Line 221 ⟶ 280:
SNOBOL was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in the [[humanities]].
 
Since SNOBOL's creation, newer languages such as [[AwkAWK]] and [[Perl]] have made string manipulation by means of [[regular expression]]s fashionable. SNOBOL4 patterns, however, subsume [[Backus–Naur form|BNF]] (BNF) grammars, which are equivalent to [[context-free grammar]]s and more powerful than [[regular expression]]s.<ref>Gimpel, J. F. 1973. A theory of discrete patterns and their implementation in SNOBOL4. Commun. ACM 16, 2 (Feb. 1973), 91–100. DOI=http://doi.acm.org/10.1145/361952.361960.</ref>
 
==See also==
{{Portal|Computer programming}}
* [[AIMLArtificial Intelligence Markup Language]] (AIML) for an AI language based on matching patterns in speech
* [[AWK|AWK language]] language
* [[Coccinelle (software)|Coccinelle]] pattern matches C source code
* [[Matching wildcards]]
Line 250 ⟶ 310:
 
==External links==
{{wikibooksWikibooks|Haskell|Pattern matching}}
{{Commons category}}
{{Commonscat}}
*[https://archive.today/19990225161739/http://www.haskell.org/tutorialdevelopment/patternsviews.html AViews: GentleAn IntroductionExtension to Haskell: PatternsPattern Matching]
*[https://archive.is/19990225161739/http://www.haskell.org/development/views.html Views: An Extension to Haskell Pattern Matching]
* Nikolaas N. Oosterhof, Philip K. F. Hölzenspies, and Jan Kuper. [https://web.archive.org/web/20060304053330/http://wwwhome.cs.utwente.nl/~tina/apm/applPatts.pdf Application patterns]. A presentation at Trends in Functional Programming, 2005
*[httphttps://www.cs.cornell.edu/Projects/jmatch JMatch]: the [[Java (programming language)|Java]] language extended with pattern matching
*[https://archive.istoday/20130630081135/http://www.showtrend.com/ ShowTrend]: Online pattern matching for stock prices
*[https://web.archive.org/web/20060211020429/http://cm.bell-labs.com/cm/cs/who/dmr/qed.html An incomplete history of the QED Text Editor] by [[Dennis Ritchie]] - provides the history of regular expressions in computer programs
*[http://research.microsoft.com/~simonpj/papers/slpj-book-1987/index.htm The Implementation of Functional Programming Languages, pages 53–103] Simon Peyton Jones, published by Prentice Hall, 1987.
*[https://github.com/rsdn/nemerle/wiki/Grok-Variants-and-matching#matching Nemerle, pattern matching].
Line 266 ⟶ 325:
*[http://www.datamystic.com/easypatterns_reference.html EasyPattern language] pattern matching language for non-programmers
 
{{Strings |state=collapsed}}
{{Authority control}}
 
{{DEFAULTSORT:Pattern Matching}}
Line 273 ⟶ 333:
[[Category:Articles with example Haskell code]]
[[Category:Functional programming]]
[[Category:Programming language comparisons]]
<!-- Hidden categories below -->
[[Category:Articles with example code]]
[[Category:Articles with example OCaml code]]