Perl language structure: Difference between revisions

Content deleted Content added
Add: when
 
(28 intermediate revisions by 18 users not shown)
Line 1:
{{original research|date=July 2017}}
The '''structure of the [[Perl]] programming language''' encompasses both the syntactical rules of the language and the general ways in which programs are organized. Perl's design philosophy is expressed in the commonly cited motto "[[there's more than one way to do it]]". As a [[programming paradigm|multi-paradigm]], dynamically [[type system|typed]] language, Perl allows a great degree of flexibility in program design. Perl also encourages modularization; this has been attributed to the component-based design structure of its Unix roots{{when|date=November 2018}},<ref>{{cite book | last1 = Orwant | first1 = Jon | title = Games, diversions, and Perl culture: best of the Perl journal | year = 2003 | isbn = 978-0-596-00312-8}}</ref> and is responsible for the size of the [[CPAN]] archive, a community-maintained repository of more than 100,000 modules.<ref name="home">{{cite web |title=CPAN front page|url=http://www.cpan.org/|accessdate=2011-12-09}}</ref>
 
== Basic syntax ==
In Perl, the minimal [["Hello, World!" program|Hello World]] program may be written as follows:
<sourcesyntaxhighlight lang="perl">
print "Hello, World!\n"
</syntaxhighlight>
</source>
This [[Input/output|prints]] the [[String (computer science)|string]] ''Hello, World!'' and a [[newline]], symbolically expressed by an <code>n</code> character whose interpretation is altered by the preceding [[escape character]] (a backslash). Since version 5.10, the new 'say' builtin<ref>{{cite web|url=http://perldoc.perl.org/feature.html#The-'say'-feature|title=Features|work=Perldoc|publisher=Perl.org|accessdate=24 July 2017}}</ref> produces the same effect even more simply:
<sourcesyntaxhighlight lang="perl">
say "Hello, World!"
</syntaxhighlight>
</source>
 
An entire Perl program may also be specified as a command-line parameter to Perl, so the same program can also be executed from the command line (example shown for Unix):
<sourcesyntaxhighlight lang="perl">
$ perl -e 'print "Hello, World!\n"'
</syntaxhighlight>
</source>
 
The canonical form of the program is slightly more verbose:
 
<sourcesyntaxhighlight lang="perl">
#!/usr/bin/env perl
print "Hello, World!\n";
</syntaxhighlight>
</source>
 
The hash mark character introduces a [[comment (computer programming)|comment]] in Perl, which runs up to the end of the line of code and is ignored by the compiler (except on Windows). The comment used here is of a special kind: it’s called the [[Shebang (Unix)|shebang]] line. This tells Unix-like operating systems to find the Perl interpreter, making it possible to invoke the program without explicitly mentioning <code>perl</code>. (Note that, on [[Microsoft Windows]] systems, Perl programs are typically invoked by associating the <code>.pl</code> [[Filename extension|extension]] with the Perl interpreter. In order to deal with such circumstances, <code>perl</code> detects the shebang line and parses it for switches.<ref>{{cite web | url = http://perldoc.perl.org/perlrun.html | title = perlrun | accessdate = 2011-01-08 | publisher = perldoc.perl.org - Official documentation for the Perl programming language}}</ref>)
Line 30:
Version 5.10 of Perl introduces a <code>say</code> function that implicitly appends a newline character to its output, making the minimal "Hello World" program even shorter:
 
<sourcesyntaxhighlight lang="perl">
use 5.010; # must be present to import the new 5.10 functions, notice that it is 5.010 not 5.10
say 'Hello, World!'
</syntaxhighlight>
</source>
 
==Data types==
Line 40:
 
{| class="wikitable"
|-bol
! Type
! Sigil
Line 47:
|-
|[[Scalar (computing)|Scalar]]
|{{tt|$}}
|{{code|$foo}}
|A single value; it may be a number, a [[String (computer science)|string]], a filehandle, or a [[Reference (computer science)|reference]].
|-
|[[Array data type|Array]]
|{{tt|@}}
|{{code|@foo}}
|An ordered collection of scalars.
|-
|[[Associative array|Hash]]
|{{tt|%}}
|{{code|%foo}}
|A map from strings to scalars; the strings are called ''keys'', and the scalars are called ''values''. Also known as an ''associative array''.
|-
|[[FilehandleFile handle]]
|{{CNone|none}}
|{{code|$foo}} or {{code|FOO}}
|An opaque representation of an open file or other target for reading, writing, or both.
|-
|[[Subroutine]]
|{{tt|&}}
|{{code|&foo}}
|A piece of code that may be passed arguments, be executed, and return data.
|-
|[[Perl language structure#Typeglob values|Typeglob]]
|{{tt|*}}
|{{code|*foo}}
|The [[symbol table]] entry for all types with the name 'foo'.
|}
 
Line 94:
Finally, multiline strings can be defined using [[here document]]s:
 
<sourcesyntaxhighlight lang="perl">
$multilined_string = <<EOF;
This is my multilined string
note that I am terminating it with the word "EOF".
EOF
</syntaxhighlight>
</source>
 
Numbers (numeric constants) do not require quotation. Perl will convert numbers into strings and vice versa depending on the context in which they are used. When strings are converted into numbers, trailing non-numeric parts of the strings are discarded. If no leading part of a string is numeric, the string will be converted to the number 0. In the following example, the strings <code>$n</code> and <code>$m</code> are treated as numbers. This code prints the number '5'. The values of the variables remain the same. Note that in Perl, <code>+</code> is always the numeric addition operator. The string concatenation operator is the period.
 
<sourcesyntaxhighlight lang="perl">
$n = '3 apples';
$m = '2 oranges';
print $n + $m;
</syntaxhighlight>
</source>
Functions are provided for the [[rounding]] of fractional values to integer values: <code>int</code> chops off the fractional part, rounding towards zero; <code>POSIX::ceil</code> and <code>POSIX::floor</code> round always up and always down, respectively. The number-to-string conversion of <code>printf "%f"</code> or <code>sprintf "%f"</code> round out even, use [[Rounding#Round half to even|bankers' rounding]].
 
Perl also has a boolean context that it uses in evaluating conditional statements. The following values all evaluate as false in Perl:
 
<sourcesyntaxhighlight lang="perl">
$false = 0; # the number zero
$false = 0.0; # the number zero as a float
Line 119:
$false = '0'; # the string zero
$false = ""; # the empty string
$false = (); # the empty list
$false = undef; # the return value from undef
$false = 2-3+1 # computes to 0 that is converted to "0" so it is false
</syntaxhighlight>
</source>
 
All other (non-zero evaluating) values evaluate to true. This includes the odd self-describing literal string of "0 but true", which in fact is 0 as a number, but true when used as a boolean. All non-numeric strings also have this property, but this particular string is truncated by Perl without a numeric warning. A less explicit but more conceptually portable version of this string is '{{mono|0E0}}' or '{{mono|0e0}}', which does not rely on characters being evaluated as 0, because '0E0' is literally zero times ten to the power zero. The empty hash <code>{}</code> is also true; in this context <code>{}</code> is not an empty block, because <code>perl -e 'print ref {}'</code> returns <code>HASH</code>.
 
Evaluated boolean expressions are also scalar values. The documentation does not promise which ''particular'' value of true or false is returned. Many boolean operators return 1 for true and the empty-string for false. The ''{{code|defined()''}} function determines whether a variable has any value set. In the above examples, ''{{code|defined($false)''}} is true for every value except ''{{code|undef''}}.
 
If either 1 or 0 are specifically needed, an explicit conversion can be done using the [[conditional operator]]:
 
<sourcesyntaxhighlight lang="perl">
my $real_result = $boolean_result ? 1 : 0;
</syntaxhighlight>
</source>
 
===Array values===
Line 137 ⟶ 138:
An [[Array data type|array value]] (or list) is specified by listing its elements, separated by commas, enclosed by parentheses (at least where required by operator precedence).
 
<sourcesyntaxhighlight lang="perl">
@scores = (32, 45, 16, 5);
</syntaxhighlight>
</source>
 
The qw() quote-like operator allows the definition of a list of strings without typing of quotes and commas. Almost any delimiter can be used instead of parentheses. The following lines are equivalent:
 
<sourcesyntaxhighlight lang="perl">
@names = ('Billy', 'Joe', 'Jim-Bob');
@names = qw(Billy Joe Jim-Bob);
</syntaxhighlight>
</source>
 
The split function returns a list of strings, which are split from a string expression using a delimiter string or regular expression.
 
<sourcesyntaxhighlight lang="perl">
@scores = split(',', '32,45,16,5');
</syntaxhighlight>
</source>
 
Individual elements of a list are accessed by providing a numerical index in square brackets. The scalar [[sigil (computer programming)|sigil]] must be used. Sublists (array slices) can also be specified, using a range or list of numeric indices in brackets. The array sigil is used in this case. For example, <code>$month[3]</code> is <code>"April"</code> (the first element in an array has an index value of 0), and <code>@month[4..6]</code> is <code>("May", "June", "July")</code>.
 
===Hash values===
Line 174 ⟶ 175:
</ref>). The following lines are equivalent:
 
<sourcesyntaxhighlight lang="perl">
%favorite = ('joe', "red", 'sam', "blue");
%favorite = (joe => 'red', sam => 'blue');
</syntaxhighlight>
</source>
 
Individual values in a hash are accessed by providing the corresponding key, in curly braces. The <code>$</code> sigil identifies the accessed element as a scalar. For example, {{code|$favorite{joe} }} equals {{code|'red'}}. A hash can also be initialized by setting its values individually:
 
<sourcesyntaxhighlight lang="perl">
$favorite{joe} = 'red';
$favorite{sam} = 'blue';
$favorite{oscar} = 'green';
</syntaxhighlight>
</source>
 
Multiple elements may be accessed using the <code>@</code> sigil instead (identifying the result as a list). For example,
{{code|@favorite{'joe', 'sam'} }} equals {{code|('red', 'blue')}}.
 
===Filehandles===
Line 200 ⟶ 201:
A typeglob value is a symbol table entry. The main use of typeglobs is creating symbol table aliases. For example:
 
<sourcesyntaxhighlight lang="perl">
*PI = \3.141592653; # creating constant scalar $PI
*this = *that; # creating aliases for all data types 'this' to all data types 'that'
</syntaxhighlight>
</source>
 
===Array functions===
 
The number of elements in an array can be determined either by evaluating the array in scalar context or with the help of the <code>$#</code> sigil. The latter gives the index of the last element in the array, not the number of elements. The expressions scalar({{code|@array}}) and (<code>$#array&nbsp;+&nbsp;1</code>) are equivalent.
 
===Hash functions===
Line 213 ⟶ 214:
There are a few functions that operate on entire hashes. The ''keys'' function takes a hash and returns the list of its keys. Similarly, the ''values'' function returns a hash's values. Note that the keys and values are returned in a consistent but arbitrary order.
 
<sourcesyntaxhighlight lang="perl">
# Every call to each returns the next key/value pair.
# All values will be eventually returned, but their order
# cannot be predicted.
while (($name, $address) = each %addressbook) {
print "$name lives at $address\n";
}
 
# Similar to the above, but sorted alphabetically
foreach my $next_name (sort keys %addressbook) {
print "$next_name lives at $addressbook{$next_name}\n";
}
</syntaxhighlight>
</source>
 
==Control structures==
Line 273 ⟶ 274:
Up until the 5.10.0 release, there was no [[switch statement]] in Perl 5. From 5.10.0 onward, a multi-way branch statement called <code>given</code>/<code>when</code> is available, which takes the following form:
 
use v5.10; <u># must be present to import the new 5.10 functions</u>
given ( ''expr'' ) { when ( ''cond'' ) { … } default { … } }
 
Syntactically, this structure behaves similarly to [[switch statement]]s found in other languages, but with a few important differences. The largest is that unlike switch/case structures, given/when statements break execution after the first successful branch, rather than waiting for explicitly defined break commands. Conversely, explicit <code>continue</code>s are instead necessary to emulate switch behavior.
 
For those not using Perl 5.10, the Perl documentation describes a half-dozen ways to achieve the same effect by using other control structures. There is also a Switch module, which provides functionality modeled on thethat forthcomingof sister language [[PerlRaku 6(programming language)|Raku]] re-design. It is implemented using a [[source filter]], so its use is unofficially discouraged.<ref>[http://www.perlmonks.org/?node_id=496084 using switch<!-- Bot generated title -->]</ref>
 
Perl includes a <code>goto label</code> statement, but it is rarely used. Situations where a <code>goto</code> is called for in other languages don't occur as often in Perl, because of its breadth of flow control options.
Line 287 ⟶ 288:
[[Subroutine]]s are defined with the <code>sub</code> keyword and are invoked simply by naming them. If the subroutine in question has not yet been declared, invocation requires either parentheses after the function name or an ampersand ('''&''') before it. But using '''&''' without parentheses will also implicitly pass the arguments of the current subroutine to the one called, and using '''&''' with parentheses will bypass prototypes.
 
<sourcesyntaxhighlight lang="perl">
# Calling a subroutine
 
Line 298 ⟶ 299:
 
foo; # Here parentheses are not required
</syntaxhighlight>
</source>
 
A list of arguments may be provided after the subroutine name. Arguments may be scalars, lists, or hashes.
 
<sourcesyntaxhighlight lang="perl">
foo $x, @y, %z;
</syntaxhighlight>
</source>
The parameters to a subroutine do not need to be declared as to either number or type; in fact, they may vary from call to call. Any validation of parameters must be performed explicitly inside the subroutine.
 
Line 313 ⟶ 314:
Elements of <code>@_</code> may be accessed by subscripting it in the usual way.
 
<sourcesyntaxhighlight lang="perl">
$_[0], $_[1]
</syntaxhighlight>
</source>
 
However, the resulting code can be difficult to read, and the parameters have [[Evaluation strategy#Call by reference|pass-by-reference]] semantics, which may be undesirable.
Line 321 ⟶ 322:
One common idiom is to assign <code>@_</code> to a list of named variables.
 
<sourcesyntaxhighlight lang="perl">
my ($x, $y, $z) = @_;
</syntaxhighlight>
</source>
 
This provides mnemonic parameter names and implements [[Evaluation strategy#Call by value|pass-by-value]] semantics. The <code>my</code> keyword indicates that the following variables are lexically scoped to the containing block.
Line 329 ⟶ 330:
Another idiom is to shift parameters off of <code>@_</code>. This is especially common when the subroutine takes only one argument or for handling the <code>$self</code> argument in object-oriented modules.
 
<sourcesyntaxhighlight lang="perl">
my $x = shift;
</syntaxhighlight>
</source>
 
Subroutines may assign <code>@_</code> to a hash to simulate named arguments; this is recommended in ''[[Perl Best Practices]]'' for subroutines that are likely to ever have more than three parameters.<ref>
Damian Conway, ''[http://www.oreilly.com/catalog/perlbp/chapter/ch09.pdf Perl Best Practices] {{webarchive|url=https://web.archive.org/web/20110918134430/http://oreilly.com/catalog/perlbp/chapter/ch09.pdf |date=2011-09-18 }}'', p.182</ref>
 
<sourcesyntaxhighlight lang="perl">
sub function1 {
my %args = @_;
print "'x' argument was '$args{x}'\n";
}
function1( x => 23 );
</syntaxhighlight>
</source>
 
Subroutines may return values.
 
<sourcesyntaxhighlight lang="perl">
return 42, $x, @y, %z;
</syntaxhighlight>
</source>
 
If the subroutine does not exit via a <code>return</code> statement, it returns the last expression evaluated within the subroutine body. Arrays and hashes in the return value are expanded to lists of scalars, just as they are for arguments.
Line 354 ⟶ 355:
The returned expression is evaluated in the calling context of the subroutine; this can surprise the unwary.
 
<sourcesyntaxhighlight lang="perl">
sub list { (4, 5, 6) }
sub array { @x = (4, 5, 6); @x }
Line 362 ⟶ 363:
@x = list; # returns (4, 5, 6)
@x = array; # returns (4, 5, 6)
</syntaxhighlight>
</source>
 
A subroutine can discover its calling context with the <code>wantarray</code> function.
 
<sourcesyntaxhighlight lang="perl">
sub either {
return wantarray ? (1, 2) : 'Oranges';
}
 
$x = either; # returns "Oranges"
@x = either; # returns (1, 2)
</syntaxhighlight>
</source>
===Anonymous functions===
{{Excerpt|Examples of anonymous functions|Perl 5|subsections=yes}}
 
==Regular expressions==
Line 386 ⟶ 389:
The <code>m//</code> (match) operator introduces a regular-expression match. (If it is delimited by slashes, as in all of the examples here, the leading <code>m</code> may be omitted for brevity. If the <code>m</code> is present, as in all of the following examples, other delimiters can be used in place of slashes.) In the simplest case, an expression such as
 
<sourcesyntaxhighlight lang="perl">
$x =~ /abc/;
</syntaxhighlight>
</source>
 
evaluates to true [[if and only if]] the string <code>$x</code> matches the regular expression <code>abc</code>.
Line 394 ⟶ 397:
The <code>s///</code> (substitute) operator, on the other hand, specifies a search-and-replace operation:
 
<sourcesyntaxhighlight lang="perl">
$x =~ s/abc/aBc/; # upcase the b
</syntaxhighlight>
</source>
 
Another use of regular expressions is to specify delimiters for the <code>split</code> function:
 
<sourcesyntaxhighlight lang="perl">
@words = split /,/, $line;
</syntaxhighlight>
</source>
 
The <code>split</code> function creates a list of the parts of the string that are separated by what matches the regular expression. In this example, a line is divided into a list of its own comma-separated parts, and this list is then assigned to the <code>@words</code> array.
Line 412 ⟶ 415:
Perl regular expressions can take ''modifiers''. These are single-letter suffixes that modify the meaning of the expression:
 
<sourcesyntaxhighlight lang="perl">
$x =~ /abc/i; # case-insensitive pattern match
$x =~ s/abc/aBc/g; # global search and replace
</syntaxhighlight>
</source>
 
Because the compact syntax of regular expressions can make them dense and cryptic, the <code>/x</code> modifier was added in Perl to help programmers write more-legible regular expressions. It allows programmers to place whitespace and comments ''inside'' regular expressions:
 
<sourcesyntaxhighlight lang="perl">
$x =~ /
a # match 'a'
Line 425 ⟶ 428:
c # then followed by the 'c' character
/x;
</syntaxhighlight>
</source>
 
====Capturing====
Line 431 ⟶ 434:
Portions of a regular expression may be enclosed in parentheses; corresponding portions of a matching string are ''captured''. Captured strings are assigned to the sequential built-in variables <code>$1, $2, $3, …</code>, and a list of captured strings is returned as the value of the match.
 
<sourcesyntaxhighlight lang="perl">
$x =~ /a(.)c/; # capture the character between 'a' and 'c'
</syntaxhighlight>
</source>
 
Captured strings <code>$1, $2, $3, …</code> can be used later in the code.
Line 439 ⟶ 442:
Perl regular expressions also allow built-in or user-defined functions to apply to the captured match, by using the <code>/e</code> modifier:
 
<sourcesyntaxhighlight lang="perl">
$x = "Oranges";
$x =~ s/(ge)/uc($1)/e; # OranGEs
$x .= $1; # append $x with the contents of the match in the previous statement: OranGEsge
</syntaxhighlight>
</source>
 
==Objects==
Line 449 ⟶ 452:
There are many ways to write [[Object-oriented programming|object-oriented]] code in Perl. The most basic is using "blessed" [[Reference (computer science)|references]]. This works by identifying a reference of any type as belonging to a given package, and the package provides the methods for the blessed reference. For example, a two-dimensional point could be defined this way:
 
<sourcesyntaxhighlight lang="perl">
sub Point::new {
# Here, Point->new(4, 5) will result in $class being 'Point'.
# It's a variable to support subclassing (see the perloop manpage).
my ($class, $x, $y) = @_;
bless [$x, $y], $class; # Implicit return
}
 
sub Point::distance {
my ($self, $from) = @_;
my ($dx, $dy) = ($$self[0] - $$from[0], $$self[1] - $$from[1]);
sqrt($dx * $dx + $dy * $dy);
}
</syntaxhighlight>
</source>
 
This class can be used by invoking <code>new()</code> to construct instances, and invoking <code>distance</code> on those instances.
 
<sourcesyntaxhighlight lang="perl">
my $p1 = Point->new(3, 4);
my $p2 = Point->new(0, 0);
print $p1->distance($p2); # Prints 5
</syntaxhighlight>
</source>
 
Many modern Perl applications use the [[Moose (Perl)|Moose]] object system.{{Citation needed|date=June 2010}} Moose is built on top of Class::MOP, a meta-object protocol, providing complete introspection for all Moose-using classes. Thus you can ask classes about their attributes, parents, children, methods, etc. using a simple API.
Line 494 ⟶ 497:
An example of a class written using the MooseX::Declare<ref>[http://search.cpan.org/perldoc?MooseX::Declare MooseX::Declare documentation]</ref> extension to Moose:
 
<sourcesyntaxhighlight lang="perl">
use MooseX::Declare;
 
Line 509 ⟶ 512:
}
}
</syntaxhighlight>
</source>
 
This is a class named <code>Point3D</code> that extends another class named <code>Point</code> explained in [[Moose (Perl)#Examples|Moose examples]]. It adds to its base class a new attribute <code>z</code>, redefines the method <code>set_to</code> and extends the method <code>clear</code>.
Line 523 ⟶ 526:
[[Category:Articles with example Perl code]]
[[Category:Perl]]
 
{{improve categories|date=July 2017}}