????
????
The two tricky bits are the semicolon at the end of the line and the \n, which adds a newline (line feed). If you have
a relatively new version of perl, you can use say instead of print to have the carriage return added automatically:
Version ≥ 5.10.0
use feature 'say';
say "Hello World";
The say feature is also enabled automatically with a use v5.10 (or higher) declaration:
use v5.10;
It's pretty common to just use perl on the command line using the -e option:
You can also, of course, save the script in a file. Just remove the -e command line option and use the filename of
the script: perl script.pl. For programs longer than a line, it's wise to turn on a couple of options:
use strict;
use warnings;
There's no real disadvantage other than making the code slightly longer. In exchange, the strict pragma prevents
you from using code that is potentially unsafe and warnings notifies you of many common errors.
Notice the line-ending semicolon is optional for the last line, but is a good idea in case you later add to the end of
your code.
For more options how to run Perl, see perlrun or type perldoc perlrun at a command prompt. For a more detailed
introduction to Perl, see perlintro or type perldoc perlintro at a command prompt. For a quirky interactive
tutorial, Try Perl.
# This is a comment
=begin comment
=end comment
=cut
my $integer = 3; # number
my $string = "Hello World"; # string
my $reference = \$string; # reference to $string
Perl converts between numbers and strings on the fly, based on what a particular operator expects.
When converting a string into a number, Perl takes as many digits from the front of a string as it can – hence why 20
apples is converted into 20 in the last line.
Based on whether you want to treat the contents of a scalar as a string or a number, you need to use different
operators. Do not mix them.
Attempting to use string operations on numbers will not raise warnings; attempting to use number operations on
non-numeric strings will. Do be aware that some non-digit strings such as 'inf', 'nan', '0 but true' count as
numbers.
my $other_array_reference = ["Hello"];
use v5.24;
my @contents = $array_reference->@*; # New postfix notation
When accessing an arrayref's contents by index you can use the -> syntactical sugar.
my $value = "Hello";
my $reference = \$value;
print $value; # => Hello
print $reference; # => SCALAR(0x2683310)
use v5.24;
say $reference->$*; # New postfix notation
This "de-referenced value" can then be changed like it was the original variable.
${$reference} =~ s/Hello/World/;
print ${$reference}; # => World
print $value; # => World
You want to pass a string to a function, and have it modify that string for you without it being a return value.
You wish to explicitly avoid Perl implicitly copying the contents of a large string at some point in your function
passing ( especially relevant on older Perls without copy-on-write strings )
You wish to disambiguate string-like values with specific meaning, from strings that convey content, for
example:
You wish to implement a lightweight inside out object model, where objects handed to calling code don't
carry user visible metadata:
our %objects;
my $next_id = 0;
sub new {
my $object_id = $next_id++;
$objects{ $object_id } = { ... }; # Assign data for object
my $ref = \$object_id;
return bless( $ref, "MyClass" );
}
# Use negative indices to count from the end (with -1 being last)
my $last_char_of_hello = $chars_of_hello[-1];
# You can use $# to get the last index of an array, and confuse Stack Overflow
my $last_index_of_array = $#chars_of_hello; # 4
# You can also access multiple elements of an array at the same time
# This is called "array slice"
# Since this returns multiple values, the sigil to use here on the RHS is @
my @some_chars_of_hello = @chars_of_hello[1..3]; # ('H', 'e', 'l')
my @out_of_order_chars = @chars_of_hello[1,4,2]; # ('e', 'o', 'l')
# In Python you can say array[1:-1] to get all elements but first and last
# Not so in Perl: (1..-1) is an empty list. Use $# instead
my @empty_list = @chars_of_hello[1..-1]; # ()
my @inner_chars_of_hello = @chars_of_hello[1..$#chars_of_hello-1]; # ('e','l','l')
# Setting elements beyond the end of an array does not result in an error
# The array is extended with undef's as necessary. This is "autovivification."
my @array; # ()
my @array[3] = 'x'; # (undef, undef, undef, 'x')
When used as booleans, arrays are true if they are not empty.
Typeglobs are more commonly handled when dealing with files. open, for example, produces a reference to a
typeglob when asked to create a non-global filehandle:
# You can dereference this globref, but it's not very useful.
say ref $log; # GLOB
say (*{$log}->{IO} // 'undef'); # undef
# use constant instead defines a parameterless function, therefore it's not global,
# can be used without sigils, can be imported, but does not interpolate easily.
use constant (FALSE => 0);
say FALSE; # 0
say &FALSE; # 0
say "${\FALSE}"; # 0 (ugh)
say *FALSE{CODE}; # CODE(0xMA1DBABE)
# Of course, neither is truly constant when you can manipulate the symbol table...
*TRUE = \('');
use constant (EVIL => 1);
*FALSE = *EVIL;
\@array; # \ returns the reference of what's on the right (so, a reference to @array)
$#array; # this is the index of the last element of @array
You can use braces after the sigil if you should be so inclined. Occasionally, this improves readability.
say ${value} = 5;
While you use different sigils to define variables of different types, the same variable can be accessed in different
ways based on what sigils you use.
This is especially true of references. In order to use a referenced value you can combine sigils together.
Here's a perhaps less confusing way to think about it. As we saw earlier, you can use braces to wrap what's on the
right of a sigil. So you can think of @{} as something that takes an array reference and gives you the referenced
array.
my $values = undef;
say pop @{ $values }; # ERROR: can't use undef as an array reference
say pop @{ $values // [5] } # undef // [5] gives [5], so this prints 5
# This is not an example of good Perl. It is merely a demonstration of this language feature
my $hashref = undef;
for my $key ( %{ $hashref // {} } ) {
"This doesn't crash";
}
...but if the "argument" to a sigil is simple, you can leave the braces away.
say $$scalar_reference;
say pop @$array_reference;
for keys (%$hash_reference) { ... };
Things can get excessively extravagant. This works, but please Perl responsibly.
For most normal use, you can just use subroutine names without a sigil. (Variables without a sigil are typically called
"barewords".) The & sigil is only useful in a limited number of cases.
Combined with goto, as a slightly weird function call that has the current call frame replaced with the caller.
Think the linux exec() API call, but for functions.
First, let's examine what happens when you pass a normal hash to a subroutine and modify it within there:
use strict;
use warnings;
use Data::Dumper;
sub modify
{
my %hash = @_;
$hash{new_value} = 2;
return;
}
my %example_hash = (
old_value => 1,
);
modify(%example_hash);
Notice that after we exit the subroutine, the hash remains unaltered; all changes to it were local to the modify
subroutine, because we passed a copy of the hash, not the hash itself.
In comparison, when you pass a hashref, you are passing the address to the original hash, so any changes made
within the subroutine will be made to the original hash:
use strict;
use warnings;
use Data::Dumper;
sub modify
{
my $hashref = shift;
return;
}
# Create a hashref
my $example_ref = {
old_value => 1,
};
If you give the hash simply a known key, it will serve you its value.
# You can save some typing and gain in clarity by using the "fat comma"
# syntactical sugar. It behaves like a comma and quotes what's on the left.
my %translations_of_hello = (spanish => 'Hola', german => 'Hallo', swedish => 'Hej');
In the following example, note the brackets and sigil: you access an element of %hash using $hash{key} because the
value you want is a scalar. Some consider it good practice to quote the key while others find this style visually noisy.
Quoting is only required for keys that could be mistaken for expressions like $hash{'some-key'}
my $greeting = $translations_of_hello{'spanish'};
While Perl by default will try to use barewords as strings, + modifier can also be used to indicate to Perl that key
should not be interpolated but executed with result of execution being used as a key:
# but this one will execute [shift][1], extracting first element from @_,
# and use result as a key
print $employee{+shift};
Like with arrays, you can access multiple hash elements at the same time. This is called a hash slice. The resulting
value is a list, so use the @ sigil:
Iterate over the keys of an hash with keys keys will return items in a random order. Combine with sort if you wish.
If you do not actually need the keys like in the previous example, values returns the hash's values directly:
You can also use a while loop with each to iterate over the hash. This way, you will get both the key and the value at
the same time, without a separate value lookup. Its use is however discouraged, as each can break in mistifying
ways.
# DISCOURAGED
while (my ($lang, $translation) = each %translations_of_hello) {
say $translation;
}
map and list flattening can be used to create hashes out of arrays. This is a popular way to create a 'set' of values,
e.g. to quickly check whether a value is in @elems. This operation usually takes O(n) time (i.e. proportional to the
number of elements) but can be done in constant time (O(1)) by turning the list into a hash:
This requires some explanation. The contents of @elems get read into a list, which is processed by map. map accepts a
code block that gets called for each value of its input list; the value of the element is available for use in $_. Our
code block returns two list elements for each input element: $_, the input element, and 1, just some value. Once you
account for list flattening, the outcome is that map { $_ => 1 } @elems turns qw(x y x z t) into (x => 1, y =>
1, x => 1, z => 1, t => 1).
As those elements get assigned into the hash, odd elements become hash keys and even elements become hash
values. When a key is specified multiple times in a list to be assigned to a hash, the last value wins. This effectively
discards duplicates.
The following application of hashes also exploits the fact that hashes and lists can often be used interchangeably to
implement named function args:
sub hash_args {
my %args = @_;
my %defaults = (foo => 1, bar => 0);
my %overrides = (__unsafe => 0);
my %settings = (%defaults, %args, %overrides);
}
When used as booleans, hashes are true if they are not empty.
my $name = 'Paul';
print "Hello, $name!\n"; # Hello, Paul!
use constant {
PI => '3.1415926'
};
print "I like PI\n"; # I like PI
print "I like " . PI . "\n"; # I like 3.1415926
\t horizontal tab
\n newline
\r return
\f form feed
\b backspace
\a alarm (bell)
\e escape
Interpolation of \n depends on the system where program is working: it will produce a newline character(s)
according to the current system conventions.
Perl does not interpolate \v, which means vertical tab in C and other languages.
or Unicode names:
Character with codes from 0x00 to 0xFF in the native encoding may be addressed in a shorter form:
\x0a hexadecimal
\012 octal
\c@ chr(0)
\ca chr(1)
\cb chr(2)
...
\cz chr(26)
\c[ chr(27)
\c\ chr(28) # Cannot be used at the end of a string
# since backslash will interpolate the terminating quote
\c] chr(29)
\c^ chr(30)
\c_ chr(31)
\c? chr(127)
Interpretation of all escape sequences except for \N{...} may depend on the platform since they use platform-
and encoding-dependent codes.
my $name = 'Paul';
my $age = 64;
print "My name is $name.\nI am $age.\n"; # My name is Paul.
# I am 64.
But:
You can use q{} (with any delimiter) instead of single quotes and qq{} instead of double quotes. For example,
q{I'm 64} allows to use an apostrophe within a non-interpolated string (otherwise it would terminate the string).
Statements:
do the same thing, but in the first one you do not need to escape double quotes within the string.
If your variable name clashes with surrounding text, you can use the syntax ${var} to disambiguate:
my $decade = 80;
use Time::Piece;
my $date = localtime->strftime('%m/%d/%Y');
print $date;
Output
07/26/2016
use DateTime;
$year = $dt->year;
$month = $dt->month;
$day = $dt->day;
$hour = $dt->hour;
$minute = $dt->minute;
$second = $dt->second;
Datetime subtraction:
my $dt2 = DateTime->new(
year => 2016,
month => 8,
day => 24,
);
my $duration = $dt2->subtract_datetime($dt1);
print $duration->days
my $start = time();
my $end = time();
If-Else Statements
if (EXPR) BLOCK
if (EXPR) BLOCK else BLOCK
if (EXPR) BLOCK elsif (EXPR) BLOCK ...
if (EXPR) BLOCK elsif (EXPR) BLOCK ... else BLOCK
For simple if-statements, the if can precede or succeed the code to be executed.
$number = 7;
if ($number > 4) { print "$number is greater than four!"; }
@numbers = 1..42;
for (my $i=0; $i <= $#numbers; $i++) {
print "$numbers[$i]\n";
}
The while loop evaluates the conditional before executing the associated block. So, sometimes the block is never
executed. For example, the following code would never be executed if the filehandle $fh was the filehandle for an
empty file, or if was already exhausted before the conditional.
The do/while and do/until loops, on the other hand, evaluate the conditional after each time the block is executed.
So, a do/while or a do/until loop is always executed at least once.
my $greeting_count = 0;
do {
say "Hello";
$greeting_count++;
} until ( $greeting_count > 1)
You can access the arguments by using the special variable @_, which contains all arguments as an array.
sub function_name {
my ($arg1, $arg2, @more_args) = @_;
# ...
}
Since the function shift defaults to shifting @_ when used inside a subroutine, it's a common pattern to extract the
arguments sequentially into local variables at the beginning of a subroutine:
sub function_name {
my $arg1 = shift;
my $arg2 = shift;
my @more_args = @_;
# ...
}
sub {
my $arg1 = shift;
# ...
}->($arg);
Version ≥ 5.20.0
Alternatively, the experimental feature "signatures" can be used to unpack parameters, which are passed by value
(not by reference).
You can use any expression to give a default value to a parameter – including other parameters.
Note that you can't reference parameters which are defined after the current parameter – hence the following code
doesn't work quite as expected.
Some builtins such as print or say are keywords, not functions, so e.g. &say is undefined. It also does mean that
you can define them, but you will have to specify the package name to actually call them
This should not be confused with prototypes, a facility Perl has to let you define functions that behave like built-ins.
Function prototypes must be visible at compile time and its effects can be ignored by specifying the & sigil.
Prototypes are generally considered to be an advanced feature that is best used with great care.
# This prototype makes it a compilation error to call this function with anything
# that isn't an array. Additionally, arrays are automatically turned into arrayrefs
sub receives_arrayrefs(\@\@) {
my $x = shift;
my $y = shift;
}
my @a = (1..3);
my @b = (1..4);
receives_arrayrefs(@a, @b); # okay, $x = \@a, $y = \@b, @_ = ();
receives_arrayrefs(\@a, \@b); # compilation error, "Type … must be array …"
BEGIN { receives_arrayrefs(\@a, \@b); }
sub edit {
$_[0] =~ s/world/sub/;
}
To avoid clobbering your caller's variables it is therefore important to copy @_ to locally scoped variables (my ...) as
use Data::Printer;
p $data_structure;
Data::Printer writes to STDERR, like warn. That makes it easier to find the output. By default, it sorts hash keys and
looks at objects.
use Data::Printer;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
p $ua;
It will look at all the methods of the object, and also list the internals.
LWP::UserAgent {
Parents LWP::MemberMixin
public methods (45) : add_handler, agent, clone, conn_cache, cookie_jar, credentials,
default_header, default_headers, delete, env_proxy, from, get, get_basic_credentials,
get_my_handler, handlers, head, is_online, is_protocol_supported, local_address, max_redirect,
max_size, mirror, new, no_proxy, parse_head, post, prepare_request, progress, protocols_allowed,
protocols_forbidden, proxy, put, redirect_ok, remove_handler, request, requests_redirectable,
run_handlers, send_request, set_my_handler, show_progress, simple_request, ssl_opts, timeout,
use_alarm, use_eval
private methods (4) : _agent, _need_proxy, _new_response, _process_colonic_headers
internals: {
def_headers HTTP::Headers,
handlers {
response_header HTTP::Config
},
local_address undef,
max_redirect 7,
max_size undef,
no_proxy [],
protocols_allowed undef,
protocols_forbidden undef,
proxy {},
requests_redirectable [
[0] "GET",
[1] "HEAD"
],
show_progress undef,
ssl_opts {
verify_hostname 1
},
timeout 180,
use_eval 1
You can configure it further, so it serializes certain objects in a certain way, or to include objects up to an arbitrary
depth. The full configuration is available in the documentation.
Unfortunately Data::Printer does not ship with Perl, so you need to install it from CPAN or through your package
management system.
Using Data::Dumper is an easy way to look at data structures or variable content at run time. It ships with Perl and
you can load it easily. The Dumper function returns the data structure serialized in a way that looks like Perl code.
$VAR1 = {
'foo' => 'bar',
}
That makes it very useful to quickly look at some values in your code. It's one of the most handy tools you have in
your arsenal. Read the full documentation on metacpan.
use strict;
use warnings;
use Data::Show;
my %hash = ( foo => 1, bar => { baz => 10, qux => 20 } );
my $href = \%hash;
show @array;
show %hash;
show $href;
Using Data::Dumper gives an easy access to fetch list values. The Dumper returns the list values serialized in a way
that looks like Perl code.
Output:
$VAR1 = 123;
$VAR2 = 456;
$VAR3 = 789;
$VAR4 = 'poi';
$VAR5 = 'uyt';
$VAR6 = 'rew';
$VAR7 = 'qas';
As suggested by user @dgw When dumping arrays or hashes it is better to use an array reference or a hash
reference, those will be shown better fitting to the input.
$ref_data = [23,45,67,'mnb','vcx'];
print Dumper $ref_data;
Output:
$VAR1 = [
23,
45,
67,
'mnb',
'vcx'
];
my @data_array = (23,45,67,'mnb','vcx');
print Dumper \@data_array;
Output:
$VAR1 = [
23,
my @foo = ( 4, 5, 6 );
join '-', ( 4, 5, 6 );
join '-', @foo;
Some operators only work with arrays since they mutate the list an array contains:
shift @array;
unshift @array, ( 1, 2, 3 );
pop @array;
push @array, ( 7, 8, 9 );
The => is really only a special comma that automatically quotes the operand to its left. So, you could use normal
commas, but the relationship is not as clear:
You can also use quoted strings for the left hand operand of the fat comma =>, which is especially useful for keys
containing spaces.
Internally Perl makes aliases to those arguments and put them into the array @_ which is available within the
subroutine:
sub test_subroutine {
print $_[0]; # item1
print $_[1]; # item2
}
Aliasing gives you the ability to change the original value of argument passed to subroutine:
sub test_subroutine {
$_[0] += 2;
}
my $x = 7;
test_subroutine( $x );
print $x; # 9
To prevent inadvertent changes of original values passed into your subroutine, you should copy them:
sub test_subroutine {
my( $copy_arg1, $copy_arg2 ) = @_;
$copy_arg1 += 2;
}
my $x = 7;
test_subroutine $x; # in this case $copy_arg2 will have `undef` value
print $x; # 7
To test how many arguments were passed into the subroutine, check the size of @_
sub test_subroutine {
print scalar @_, ' argument(s) passed into subroutine';
}
If you pass array arguments into a subroutine they all will be flattened:
my @x = ( 1, 2, 3 );
my @y = qw/ a b c /; # ( 'a', 'b', 'c' )
test_some_subroutine @x, 'hi', @y; # 7 argument(s) passed into subroutine
# @_ = ( 1, 2, 3, 'hi', 'a', 'b', 'c' ) # Done internally for this call
If your test_some_subroutine contains the statement $_[4] = 'd', for the above call it will cause $y[0] to have
value d afterwards:
print "@y"; # d b c
sub foo {
my @list1 = ( 1, 2, 3 );
my @list2 = ( 4, 5 );
But it is not the recommended way to do that unless you know what you are doing.
While this is OK when the result is in LIST context, in SCALAR context things are unclear. Let's take a look at the
next line:
1. Because foo() evaluated in SCALAR context, this list ( @list1, @list2 ) also evaluated in SCALAR context
2. In SCALAR context, LIST returns its last element. Here it is @list2
3. Again in SCALAR context, array @list2 returns the number of its elements. Here it is 2.
In most cases the right strategy will return references to data structures.
So in our case we should do the following instead:
Then the caller does something like this to receive the two returned arrayrefs:
It is guaranteed that key-value pairs goes together. Keys are always even indexed, values - odd. It is not guaranteed
that key-value pairs are always flattened in same order:
xyz(\@foo, 123);
...
sub xyz {
my ($arr, $etc) = @_;
print $arr->[0]; # using the first item in $arr. It is like $foo[0]
The three examples above do exactly the same thing. If you don't supply any comparator function or block, sort
assumes you want the list on its right sorted lexically. This is usually the form you want if you just need your data in
some predictable order and don't care about linguistic correctness.
sort passes pairs of items in @list to the comparator function, which tells sort which item is larger. The cmp
operator does this for strings while <=> does the same thing for numbers. The comparator is called quite often, on
average n * log(n) times with n being the number of elements to be sorted, so it's important it be fast. This is the
reason sort uses predefined package global variables ($a and $b) to pass the elements to be compared to the block
or function, instead of proper function parameters.
If you use locale, cmp takes locale specific collation order into account, e.g. it will sort Å like A under a Danish locale
but after Z under an English or German one. However, it doesn't take the more complex Unicode sorting rules into
account nor does it offer any control over the order—for example phone books are often sorted differently from
dictionaries. For those cases, the Unicode::Collate and particularly Unicode::Collate::Locale modules are
recommended.
The trouble with the first example is that the comparator is called very often and keeps recalculating values using a
slow function over and over. A typical example would be sorting file names by their file size:
use File::stat;
@sorted = sort { stat($a)->size <=> stat($b)->size } glob "*";
This works, but at best it incurs the overhead of two system calls per comparison, at worst it has to go to the disk,
twice, for every single comparison, and that disk may be in an overloaded file server on the other side of the planet.
The Schwartzian Transform basically shoves @list through three functions, bottom-to-top. The first map turns each
entry into a two-element list of the original item and the result of the slow function as a sort key, so at the end of
this we have called slow() exactly once for each element. The following sort can then simply access the sort key by
looking in the list. As we don't care about the sort keys but only need the original elements in sorted order, the final
map throws away the two-element lists from the already-sorted list it receives from @sort and returns a list of only
their first members.
This works on all versions of Perl 5 and is completely sufficient for English; it doesn't matter whether you use uc or
lc. However, it presents a problem for languages like Greek or Turkish where there is no 1:1 correspondence
between upper- and lowercase letters so you get different results depending on whether you use uc or lc.
Therefore, Perl 5.16 and higher have a case folding function called fc that avoids this problem, so modern multi-
lingual sorting should use this:
Comparing $a and $b with the <=> operator ensures they are compared numerically and not textually as per
default.
Sorting items in descending order can simply be achieved by swapping $a and $b in the comparator block.
However, some people prefer the clarity of a separate reverse even though it is slightly slower.
This is the basic idiom for "default" File IO and makes $filehandle a readable input stream of bytes, filtered by a
default system-specific decoder, which can be locally set with the open pragma
Perl itself does not handle errors in file opening, so you have to handle those yourself by checking the exit condition
of open. $! is populated with the error message that caused open to fail.
On Windows, the default decoder is a "CRLF" filter, which maps any "\r\n" sequences in the input to "\n"
This specifies that Perl should not perform a CRLF translation on Windows.
This specifies that Perl should both avoid CRLF translation, and then decode the resulting bytes into strings of
characters ( internally implemented as arrays of integers which can exceed 255 ), instead of strings of bytes
# You can then either read the file one line at a time...
while(chomp(my $line = <$fh>)) {
print $line . "\n";
}
If you know that your input file is UTF-8, you can specify the encoding:
After finished reading from the file, the filehandle should be closed:
Another and faster way to read a file is to use File::Slurper Module. This is useful if you work with many files.
use File::Slurper;
my $file = read_text("path/to/file"); # utf8 without CRLF transforms by default
print $file; #Contains the file body
#!/usr/bin/perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
# Open "output.txt" for writing (">") and from now on, refer to it as the variable $fh.
open(my $fh, ">", "output.txt")
# In case the action failed, print error message and quit.
or die "Can't open > output.txt: $!";
Now we have an open file ready for writing which we access through $fh (this variable is called a filehandle). Next
we can direct output to that file using the print operator:
The open operator has a scalar variable ($fh in this case) as its first parameter. Since it is defined in the open
operator it is treated as a filehandle. Second parameter ">" (greater than) defines that the file is opened for writing.
The last parameter is the path of the file to write the data to.
To write the data into the file, the print operator is used along with the filehandle. Notice that in the print operator
there is no comma between the filehandle and the statement itself, just whitespace.
Section 12.4: "use autodie" and you won't need to check file
open/close failures
autodie allows you to work with files without having to explicitly check for open/close failures.
Since Perl 5.10.1, the autodie pragma has been available in core Perl. When used, Perl will automatically check for
Here is an example in which all of the lines of one file are read and then written to the end of a log file.
use 5.010; # 5.010 and later enable "say", which prints arguments, then a newline
use strict; # require declaring variables (avoid silent errors due to typos)
use warnings; # enable helpful syntax-related warnings
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
use autodie; # Automatically handle errors in opening and closing files
while (my $line = readline $fh_in) # also works: while (my $line = <$fh_in>)
{
# remove newline
chomp $line;
By the way, you should technically always check print statements. Many people don't, but perl (the Perl
interpreter) doesn't do this automatically and neither does autodie.
# identify current position in file, in case the first line isn't a comment
my $current_pos = tell;
# Step back a line so that it can be processed later as the first data line
seek $fh, $current_pos, 0;
To write a gzipped file, use the module IO::Compress::Gzip and create a filehandle by creating a new instance of
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
use IO::Compress::Gzip;
my $fh_out = IO::Compress::Gzip->new("hello.txt.gz");
close $fh_out;
use IO::Compress::Gzip;
To read from a gzipped file, use the module IO::Uncompress::Gunzip and then create a filehandle by creating a
new instance of IO::Uncompress::Gunzip for the input file:
#!/bin/env perl
use strict;
use warnings;
use open qw( :encoding(UTF-8) :std ); # Make UTF-8 default encoding
use IO::Uncompress::Gunzip;
my $fh_in = IO::Uncompress::Gunzip->new("hello.txt.gz");
print $line;
This pragma changes the default mode of reading and writing text ( files, standard input, standard output, and
standard error ) to UTF-8, which is typically what you want when writing new applications.
ASCII is a subset of UTF-8, so this is not expected to cause any problems with legacy ASCII files and will help protect
you the accidental file corruption that can happen when treating UTF-8 files as ASCII.
However, it is important that you know what the encoding of your files is that you are dealing with and handle them
accordingly. (Reasons that we should not ignore Unicode.) For more in depth treatment of Unicode, please see the
Perl Unicode topic.
use Path::Tiny;
my $contents = path($filename)->slurp;
You can pass a binmode option if you need control over file encodings, line endings etc. - see man perlio:
Path::Tiny also has a lot of other functions for dealing with files so it may be a good choice.
After opening the file (read man perlio if you want to read specific file encodings instead of raw bytes), the trick is
in the do block: <$fh>, the file handle in a diamond operator, returns a single record from the file. The "input record
separator" variable $/ specifies what a "record" is—by default it is set to a newline character so "a record" means "a
single line". As $/ is a global variable, local does two things: it creates a temporary local copy of $/ that will vanish
at the end of the block, and gives it the (non-)value undef (the "value" which Perl gives to uninitialized variables).
When the input record separator has that (non-)value, the diamond operator will return the entire file. (It considers
the entire file to be a single line.)
Using do, you can even get around manually opening a file. For repeated reading of files,
can be used. Here, another global variable(@ARGV) is localized to simulate the same process used when starting a
perl script with parameters. $/ is still undef, since the array in front of it "eats" all incoming arguments. Next, the
diamond operator <> again delivers one record defined by $/ (the whole file) and returns from the do block, which
in turn return from the sub.
The sub has no explicit error handling, which is bad practice! If an error occurs while reading the file, you will
receive undef as return value, as opposed to an empty string from an empty file.
Another disadvantage of the last code is the fact that you cannot use PerlIO for different file encodings—you always
get raw bytes.
read_text() takes two optional parameters to specify the file encoding and whether line endings should be
translated between the unixish LF or DOSish CRLF standards:
When evaluated in list context, the diamond operator returns a list consisting of all the lines in the file (in this case,
assigning the result to an array supplies list context). The line terminator is retained, and can be removed by
chomping:
Going further with minimalism, specifying -n switch causes Perl to automatically read each line (in our case — the
whole file) into variable $_.
print 'This literal contains a \'postraphe '; # emits the ' but not its preceding \
print q/This is is a literal \' <-- 2 characters /; # prints both \ and '
print q^This is is a literal \' <-- 2 characters ^; # also
my $greeting = "Hello!\n";
print $greeting;
# => Hello! (followed by a linefeed)
The qq is useful here, to avoid having to escape the quotation marks. Without it, we would have to write...
Perl doesn't limit you to using a slash / with qq; you can use any (visible) character.
By default the values are space-separated – because the special variable $" defaults to a single space. This can, of
course, be changed.
If you prefer, you have the option to use English and change $LIST_SEPARATOR instead:
For anything more complex than this, you should use a loop instead.
The so-called "cart operator" causes perl to dereference @{ ... } the array reference [ ... ] that contains the
expression that you want to interpolate, 2 + 2. When you use this trick, Perl builds an anonymous array, then
dereferences it and discards it.
The ${\( ... )} version is somewhat less wasteful, but it still requires allocating memory and it is even harder to
read.
NB: Make sure you ignore stack-overflows syntax highlighter: It is very wrong.
Pending in 5.26.0* is an "Indented Heredoc" Syntax which trims left-padding off for you
Version ≥ 5.26.0
my $variable = <<~"MuchNicer";
this block of text is interpreted.
quotes\nare interpreted, and $interpolations
get interpolated...
but still, left-aligned "I Want it to End" matters.
MuchNicer
But usually, no one worries about how many newlines were removed, so chomp is usually seen in void context, and
usually due to having read lines from a file:
use 5.010;
use Text::ParseWords;
Output:
Output:
a quoted, comma
word1
word2
NOTES
By default, Text::CSV does not strip whitespace around separator character, the way Text::ParseWords does.
However, adding allow_whitespace=>1 to constructor attributes achieves that effect.
Output:
a quoted, comma
word1
word2
The library supports configurable separator character, quote character, and escape character
Documentatoin: http://search.cpan.org/perldoc/Text::CSV
Moose
package Foo;
use Moose;
sub qux {
my $self = shift;
my $barIsBaz = $self->bar eq 'baz'; # property getter
$self->baz($barIsBaz); # property setter
}
package Foo;
use Class::Accessor 'antlers';
package Foo;
use base qw(Class::Accessor);
Class::Tiny
package Foo;
use Class::Tiny qw(bar baz); # just props
sub new {
my ($class, $x, $y) = @_;
my $self = { x => $x, y => $y }; # store object data in a hash
bless $self, $class; # bind the hash to the class
return $self;
}
Whenever the arrow operator -> is used with methods, its left operand is prepended to the given argument list. So,
@_ in new will contain values ('Point', 1, 2.5).
There is nothing special in the name new. You can call the factory methods as you prefer.
There is nothing special in hashes. You could do the same in the following way:
package Point;
use strict;
sub new {
my ($class, @coord) = @_;
my $self = \@coord;
bless $self, $class;
return $self;
}
In general, any reference may be an object, even a scalar reference. But most often, hashes are the most
convenient way to represent object data.
package Point;
use strict;
sub new {
...
}
sub polar_coordinates {
...
}
1;
It is important to note that the variables declared in a package are class variables, not object (instance) variables.
Changing of a package-level variable affects all objects of the class. How to store object-specific data, see in
"Creating Objects".
Point->new(...);
my @polar = $point->polar_coordinates;
What is to the left of the arrow is prepended to the given argument list of the method. For example, after call
Point->new(1, 2);
Packages representing classes should take this convention into account and expect that all their methods will have
one extra argument.
package Point;
use strict;
...
1;
package Point2D;
use strict;
use parent qw(Point);
...
1;
package Point3D;
use strict;
use parent qw(Point);
...
1;
package Point2D;
use strict;
use parent qw(Point PlanarObject);
...
1;
Inheritance is all about resolution which method is to be called in a particular situation. Since pure Perl does not
prescribe any rules about the data structure used to store object data, inheritance has nothing to do with that.
package GeometryObject;
use strict;
1;
1;
package PlanarObject;
use strict;
use parent qw(GeometryObject);
1;
package Point2D;
use strict;
use parent qw(Point PlanarObject);
1;
1. The starting point is defined by the left operand of the arrow operator.
If it is a bare word:
Point2D->new(...);
my $class = 'Point2D';
$class->new(...);
...then the starting point is the package with the corresponding name (Point2D in both examples).
my $point = {...};
bless $point, 'Point2D'; # typically, it is encapsulated into class methods
my @coord = $point->polar_coordinates;
then the starting point is the class of the reference (again, Point2D). The arrow operator cannot be
used to call methods for unblessed references.
Point2D->new(...);
Point2D
Point (first parent of Point2D)
GeometryObject (parent of Point)
PlanarObject (second parent of Point2D)
my $point = Point2D->new(...);
$point->transpose(...);
the method that will be called is GeometryObject::transpose, even though it would be overridden in
PlanarObject::transpose.
In the previous example, you can explicitly call PlanarObject::transpose like so:
my $point = Point2D->new(...);
$point->PlanarObject::transpose(...);
5. In a similar manner, SUPER:: performs method search in parent classes of the current class.
For example,
package Point2D;
use strict;
use parent qw(Point PlanarObject);
sub new {
(my $class, $x, $y) = @_;
my $self = $class->SUPER::new;
...
}
1;
The left operand of the arrow operator -> becomes the first argument of the method to be called. It may be either a
string:
my $class = 'Point';
or an object reference:
Class methods are just the ones that expect their first argument to be a string, and object methods are the ones
that expect their first argument to be an object reference.
Class methods typically do not do anything with their first argument, which is just a name of the class. Generally, it
is only used by Perl itself for method resolution. Therefore, a typical class method can be called for an object as
well:
my $width = Point->canvas_width;
my $point = Point->new(...);
my $width = $point->canvas_width;
Object methods receive an object reference as the first argument, so they can address the object data (unlike class
methods):
package Point;
use strict;
sub polar_coordinates {
my ($point) = @_;
my $x = $point->{x};
my $y = $point->{y};
return (sqrt($x * $x + $y * $y), atan2($y, $x));
}
1;
The same method can track both cases: when it is called as a class or an object method:
sub universal_method {
my $self = shift;
if (ref $self) {
# object logic
...
}
else {
# class logic
...
}
}
A role may also require consuming classes to implement some methods instead of implementing the methods itself
(just like interfaces in Java or C#).
Perl does not have built-in support for roles but there are CPAN classes which provide such support.
Moose::Role
package Chatty;
use Moose::Role;
package Parrot;
use Moose;
with 'Chatty';
sub introduce {
print "I'm Buddy.\n";
}
Role::Tiny
Use if your OO system does not provide support for roles (e.g. Class::Accessor or Class::Tiny). Does not support
attributes.
package Chatty;
use Role::Tiny;
package Parrot;
use Class::Tiny;
use Role::Tiny::With;
with 'Chatty';
sub introduce {
print "I'm Buddy.\n";
}
my $ret;
eval {
$ret = some_function_that_might_die();
1;
} or do {
my $eval_error = $@ || "Zombie error!";
handle_error($eval_error);
};
# use $ret
We "abuse" the fact that die has a false return value, and the return value of the overall code block is the value of
the last expression in the code block:
if $ret is assigned to successfully, then the 1; expression is the last thing that happens in the eval code
block. The eval code block thus has a true value, so the or do block does not run.
if some_function_that_might_die() does die, then the last thing that happens in the eval code block is the
die. The eval code block thus has a false value and the or do block does run.
The first thing you must do in the or do block is read $@. This global variable will hold whatever argument was
passed to die. The || "Zombie Error" guard is popular, but unnecessary in the general case.
This is important to understand because some not all code does fail by calling die, but the same structure can be
used regardless. Consider a database function that returns:
In that case you can still use the same idiom, but you have to skip the final 1;, and this function has to be the last
thing in the eval. Something like this:
eval {
my $value = My::Database::retrieve($my_thing); # dies on fail
$value->set_status("Completed");
$value->set_completed_timestamp(time());
$value->update(); # returns false value on fail
} or do { # handles both the die and the 0 return value
my $eval_error = $@ || "Zombie error!";
handle_error($eval_error);
};
/^hello/ is the actual regular expression. The ^ is a special character that tells the regular expression to start with
the beginning of the string and not match in the middle somewhere. Then the regex tries to find the following
letters in order h, e, l, l, and o.
$_ = "hello world";
You can also use different delimiters is you precede the regular expression with the m operator:
m~^hello~;
m{^hello};
m|^hello|;
echo "1,2,[3,4,5],5,6,[7,8],[1,2,34],5" | \
perl -ne \
'while( /\[[^,\]]+\,.*\]/ ){
if( /\[([^\]\|]+)\]/){
$text = $1;
$text_to_replace = $text;
$text =~ s/\,/\|/g;
s/$text_to_replace/$text/;
1,2,[3|4|5],5,6,[7|8],[1|2|34],5
my $str = "hello.it's.me";
my @test = (
"hello.it's.me",
"hello/it's!me",
);
my @match = (
[ general_match=> sub { ismatched /$str/ } ],
[ qe_match => sub { ismatched /\Q$str\E/ } ],
);
for (@test) {
print "\String = '$_':\n";
Output
String = 'hello.it's.me':
- general_match: MATCHED!
- qe_match: MATCHED!
String = 'hello/it's!me':
- general_match: MATCHED!
- qe_match: DID NOT MATCH!