Regular expression examples: Difference between revisions

Content deleted Content added
 
(23 intermediate revisions by 20 users not shown)
Line 1:
{{Mergeto|#REDIRECT [[Regular expression|date=December 2008}}#Examples]]
{{Cleanup-rewrite|date=May 2009}}
{{examplefarm}}
 
{{Redirect category shell|1=
A [[regular expression]] ( also "RegEx" or "regex" ) is a string that is used to describe or match a set of strings according to certain [[syntax]] rules. The specific syntax rules vary depending on the specific [[implementation]], [[programming language]], or [[Library (computing)|library]] in use. Additionally, the functionality of regex implementations can vary between [[Software versioning|version]]s.
{{R from merge}}
 
}}
Despite this variability, and because regular expressions can be difficult to both explain and understand without examples, this article provides a basic description of some of the properties of regular expressions by way of illustration.
 
== Conventions ==
The following conventions are used in the examples.<ref name="clarify000">The character 'm' is not always required to specify a perl match operation. For example, m/[^abc]/ could also be rendered as /[^abc]/. The 'm' is only necessary if the user wishes to specify a match operation without using a forward-slash as the regex [[delimiter]]. Sometimes it is useful to specify an alternate regex delimiter in order to avoid "[[Delimiter#Delimiter collision|delimiter collision]]". See '[http://perldoc.perl.org/perlre.html perldoc perlre]' for more details.</ref>
 
metacharacter(s) ;; the metacharacters column specifies the regex syntax being demonstrated
=~ m// ;; indicates a regex '''match''' operation in perl
=~ s/// ;; indicates a regex '''substitution''' operation in perl
 
Also worth noting is that these regular expressions are all Perl-like syntax. Standard POSIX regular expressions are different.
 
== Examples ==
 
Unless otherwise indicated, the following examples conform to the [[Perl]] programming language, release 5.8.8, January 31, 2006. The syntax and conventions used in these examples coincide with that of other programming environments as well (e.g., see Java in a Nutshell - Page 213, Python Scripting for Computational Science - Page 320, Programming PHP - Page 106 ).
 
<table class="wikitable">
<tr>
<th>Metacharacter(s)</th>
<th>Description</th>
<th>Example
<br>Note that all the if statements return a TRUE value</th>
</tr>
 
<tr>
<td>'''.'''</td>
<td>Normally matches any character except a newline. Within square brackets the dot is literal.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/...../) {
print "$string1 has length >= 5\n";
}
</pre></td>
</tr>
<tr>
<td>( )</td>
<td>Groups a series of pattern elements to a single element. When you match a pattern within parentheses, you can use any of $1, $2, ... later to refer to the previously matched pattern.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/(H..).(o..)/) {
print "We matched '$1' and '$2'\n";
}
 
</pre>'''Output:'''<pre>
We matched 'Hel' and 'o W';
</pre></td>
</tr>
 
<tr>
<td>+</td>
<td>Matches the preceding pattern element one or more times.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/l+/) {
print "There are one or more consecutive letter \"l\"'s in $string1\n";
}
</pre>'''Output:'''<pre>
There are one or more consecutive letter "l"'s in Hello World
 
</pre></td>
</tr>
 
<tr>
<td>?</td>
<td>Matches the preceding pattern element zero or one times.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/H.?e/) {
print "There is an 'H' and a 'e' separated by ";
print "0-1 characters (Ex: He Hoe)\n";
}
</pre></td>
</tr>
<tr>
<td>?</td>
<td>Modifies the *, +, or {M,N}'d regexp that comes before
to match as few times as possible.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/(l.+?o)/) {
print "The non-greedy match with 'l' followed by one or ";
print "more characters is 'llo' rather than 'llo wo'.\n";
}
</pre></td>
</tr>
<tr>
<td>*</td>
<td>Matches the preceding pattern element zero or more times.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/el*o/) {
print "There is an 'e' followed by zero to many ";
print "'l' followed by 'o' (eo, elo, ello, elllo)\n";
}
</pre></td>
</tr>
<tr>
<td>{M,N}</td>
<td>Denotes the minimum M and the maximum N match count.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/l{1,2}/) {
print "There exists a substring with at least 1 ";
print "and at most 2 l's in $string1\n";
}
</pre>
</td>
</tr>
<tr>
<td>[...]</td>
<TD>Denotes a set of possible character matches.</TD>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/[aeiou]+/) {
print "$string1 contains one or more vowels.\n";
}
</pre></td>
</tr>
<tr>
<td>|</td>
<td>Separates alternate possibilities.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/(Hello|Hi|Pogo)/) {
print "At least one of Hello, Hi, or Pogo is ";
print "contained in $string1.\n";
}
</pre></td>
</tr>
<tr>
<td>\b</td>
<td>Matches a word boundary.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/llo\b/) {
print "There is a word that ends with 'llo'\n";
} else {
print "There are no words that end with 'llo'\n";
}
</pre></td>
</tr>
<tr>
<td>\w</td>
<td>Matches an alphanumeric character, including "_"; same as [A-Za-z0-9_]</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/\w/) {
print "There is at least one alphanumeric ";
print "character in $string1 (A-Z, a-z, 0-9, _)\n";
}
</pre></td>
</tr>
<tr>
<td>\W</td>
<td>Matches a <b>non</b>-alphanumeric character, excluding "_"; same as [^A-Za-z0-9_]</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/\W/) {
print "The space between Hello and ";
print "World is not alphanumeric\n";
}
</pre></td>
</tr>
 
<tr>
<td>\s</td>
<td>Matches a whitespace character (space, tab, newline, form feed)</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/\s.*\s/) {
print "There are TWO whitespace characters, which may";
print " be separated by other characters, in $string1";
}
</pre></td>
</tr>
<tr>
<td>\S</td>
<td>Matches anything BUT a whitespace.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/\S.*\S/) {
print "There are TWO non-whitespace characters, which";
print " may be separated by other characters, in $string1";
}
</pre></td>
</tr>
<tr>
<td>\d</td>
<td>Matches a digit; same as [0-9].</td>
<td align="left">
<pre>
$string1 = "99 bottles of beer on the wall.";
if ($string1 =~ m/(\d+)/) {
print "$1 is the first number in '$string1'\n";
}
 
</pre>'''Output:'''<pre>
99 is the first number in '99 bottles of beer on the wall.'
</pre></td>
</tr>
<tr>
<td>\D</td>
<td>Matches a non-digit; same as [^0-9].</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/\D/) {
print "There is at least one character in $string1";
print " that is not a digit.\n";
}
</pre></td>
</tr>
<tr>
<td>^</td>
<td>Matches the beginning of a line or string.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/^He/) {
print "$string1 starts with the characters 'He'\n";
}
</pre></td>
</tr>
<tr>
<td>$</td>
<td>Matches the end of a line or string.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/rld$/) {
print "$string1 is a line or string ";
print "that ends with 'rld'\n";
}
</pre></td>
</tr>
 
<tr>
<td>\A</td>
<td>Matches the beginning of a string (but not an internal line).</td>
<td align="left">
<pre>
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/\AH/) {
print "$string1 is a string ";
print "that starts with 'H'\n";
}
</pre></td>
</tr>
 
<tr>
<td>\z</td>
<td>Matches the end of a string (but not an internal line).<br/> see Perl Best Practices Page 240</td>
<td align="left">
<pre>
$string1 = "Hello\nWorld\n";
if ($string1 =~ m/d\n\z/) {
print "$string1 is a string ";
print "that ends with 'd\\n'\n";
}
</pre></td>
</tr>
 
<tr>
<td>[^...]</td>
<td>Matches every character except the ones inside brackets.</td>
<td align="left">
<pre>
$string1 = "Hello World\n";
if ($string1 =~ m/[^abc]/) {
print "$string1 contains a character other than ";
print "a, b, and c\n";
}
</pre></td>
</tr>
 
</table></center>
 
== Notes ==
{{Reflist}}
 
== See also ==
* [[Comparison of programming languages]]
 
[[Category:Perl]]
[[Category:Pattern matching]]
[[Category:Articles with example code]]
[[Category:Programming language topics]]
 
==External links==
* [http://regex.powertoy.org/ simple and straitforward RegEx online trainer/demo]