0% found this document useful (0 votes)
26 views

Randal Schwartz Learning Perl

This document provides a 3-sentence summary of the PDFaid.Com #1 Pdf Solutions document: The document discusses solutions for working with PDF files and provides an overview of the top PDF tools offered by PDFaid.Com, including options for converting, editing, protecting, and annotating PDFs. It promotes PDFaid.Com as the #1 solution for all PDF needs and encourages users to learn more about their products and services. The summary is focused on the key purpose and main points discussed in the original document.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Randal Schwartz Learning Perl

This document provides a 3-sentence summary of the PDFaid.Com #1 Pdf Solutions document: The document discusses solutions for working with PDF files and provides an overview of the top PDF tools offered by PDFaid.Com, including options for converting, editing, protecting, and annotating PDFs. It promotes PDFaid.Com as the #1 solution for all PDF needs and encourages users to learn more about their products and services. The summary is focused on the key purpose and main points discussed in the original document.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 159

PDFaid.

Com
#1 Pdf Solutions

Learning Perl
Randal L. Schwartz, [email protected]
Version 4.1.2 on 27 Nov 2010

This document is copyright 2010 by Randal L. Schwartz, Stonehenge Consulting Services, Inc.

1
Overview

• Anything you miss is in the book


• Including more examples
• Some “rounding off the corners” in here
• Book contains footnotes

2
History
• Created by Larry Wall in 1987
• Version 4 released in 1991, along with Camel book
• And sysadmins rejoiced!
• Version 5 released in 1995 (complete internal rewrite)
• Helped propel emerging “www” into interactivity
• Version 6 proposed in 2001 (in progress)
• Far from dead
• Continual development on new 5.x releases
• More CPAN uploads than ever
• Perl jobs listings strong

3
Philosophy
• Easy to solve easy problems (“scripting”)
• Possible to solve hard problems (“programming”)
• Many ways to say the same thing
• But with different emphasis
• Shortcuts for common operations
• Why say “can not” when you can say “can’t”?
• No reserved words
• Prefix characters distinguish data from built-ins

4
Community
• perl.com and perl.org
• Mailing lists at lists.perl.org
• Perlmonks
• StackOverflow
• IRC channels (including entire IRC network)
• Local user groups (“Perl Mongers”) listed at www.pm.org
• Conferences (OSCON and smaller regionals)
• CPAN (module sharing unlike any other project)

5
Program Syntax
• Plain text file
• Whitespace mostly insignificant
• Most statements delimited by semicolons (not newline!)
• Comment is pound-sign to end of line
• Unix execution typical of scripting languages:
#!/usr/bin/perl
print "Hello world!\n";

6
Scalars

7
Numbers
• Numbers are always floating point
• Numbers represented sanely:
2 3.5 -5.01 4.25e15
• Numbers can be non-decimal:
0777 0xFEEDBEEF 0b11001001
• Numbers take traditional operators and parens:
2 + 3, 2 * 5, 3 * (4 + 5)

8
Strings

• Sequence of characters (any bytes, any length)


• String literals can be single quoted:
'foo' 'longer string' 'don\'t leave home without it!'
• Double-quoted strings get backslash escapes:
"hello\n" "coke\tsprite"
• Double-quoted strings also interpolate variables (later)

9
String operators

• Concatenation:
'Fred' . ' ' . 'Flintstone'
• Replication:
'hello' x 3
'hello' x 3.5

10
Scalar conversion
• Numbers can be used as strings:
(13 + 4) . ' monkeys'
• Strings can be used as numbers:
'12' / 3
'12fred' / 3
'fred' / 3
• Trailing garbage ignored, completely garbage = 0

11
Warnings
• Add to your program:
use warnings;
• Or on shebang line:
#!/usr/bin/perl -W
• Questionable actions noted
• Doesn’t change execution behavior though
• Example:
"12fred" + 15 # warning about not strictly numeric
• Even more info:
use diagnostics;

12
Scalar variables
• Dollar plus one or more letters, digits, underscore:
$a $b $fred $FRED
• Case matters
• Lowercase for locals, capital for globals
• Perl doesn’t care—just a convention
• Typically get value through assignment:
$a = 3;
$b = $a + 2;
$b = $b * 4; # same as "$b *= 4"

13
Simple output

• print takes a comma-separated list of values:


$x = 3 + 4;
print "Hello, the answer is ", $x, "\n";
• Double-quoted strings are variable interpolated:
print "Hello, the answer is $x\n";

14
Comparison operators
• Numeric comparison:
== != < > <= >=
• String comparison:
eq ne lt gt le ge
• Needed to distinguish:
'12' lt '3'
'12' < '3'
• Use math symbols with numbers, words for words
• Common mistake:
$x == 'foo'

15
Simple if
• Choice based on a comparison:
if ($x > 3) {
print "x is greater than 3\n";
} else {
print "x is less than or equal to 3\n";
}
• “else” part is optional
• Braces are required (called “blocks”)
• Prevents the “dangling else” issue from C

16
Boolean logic
• Comparisons can be stored:
$is_bigger = $x > 3;
if ($is_bigger) { ... }
• What’s in $is_bigger? “true” or “false”
• False: 0, empty string, '0', or undef (described later)
• True: everything else
• Can’t count on specific return value for “true”
• It'll be some true value though
• Use “!” for not:
if (! $is_bigger) { ... }

17
Simple user input
• Use the line-input function:
$x = <STDIN>;
• Includes the newline, typically removed:
chomp($x);
• Combine those two:
chomp($x = <STDIN>);
• Parens are needed on that

18
Simple loops
• Use “while”:
$count = 0;
while ($count < 10) {
$count += 2;
print "count is now $count\n"; # gives 2 4 6 8 10
}
• Controlled by boolean expression
• Braces required
• May execute 0 times if boolean is initially false

19
The undef value
• Initial value for all variables
• Treated like 0 for math, empty string for strings
• Great for looping accumulators:
$n = 1;
while ($n < 10) {
$sum += $n; # undef initially acts like 0
$n += 2;
}
print "The total was $sum.\n";

20
Detecting undef
• Detect undef with defined()
• Often useful with “error” returns:
$line = <STDIN>; # might be at eof
if (defined $line) {
print "got a line, it was $line";
} else {
print "at eof!\n";
}

21
Lists and Arrays

22
List element access
• Use element access to initialize a list:
$fred[0] = "yabba";
$fred[1] = "dabba";
$fred[2] = "doo";
• This is not $fred! Different namespace!
• Use these like individual scalar variables:
print "$fred[0]\n";
$fred[1] .= "whatsis";

23
Computing the element
• Subscript expression can be any integer-yielding value:
$n = 2;
print $fred[$n]; # prints $fred[2]
print $fred[$n / 3]; # truncates 2/3 to 0, thus $fred[0]
• Maximum element index available:
$max = $#fred; # $max = 2
$last = $fred[$#fred]; # always last element

24
Going out of bounds
• Out-of-bound subscripts return undef:
defined $fred[17]; # false
• Assigning beyond end of array just stretches it:
$fred[17] = "hello";
• Negative subscripts count back from end:
$fred[-1] # last element
$fred[-2] # second to last element

25
List literals
• List of scalars, separated by commas, enclosed in parens:
(1, 2, 3) # same as (1, 2, 3, )
("fred", 4.5)
( ) # empty list
(1..100) # same as (1, 2, 3, ... up to ... , 98, 99, 100)
(0..$#fred) # all indicies
• Quoted words:
qw(fred barney betty wilma)
qw[fred barney betty wilma]
• Works with “paired” delimiters
• Or duplicate non-paired delimiters:
qw|foo bar|

26
List assignment

• Corresponding values are copied:


($fred, $barney, $dino) = ("flintstone", "rubble", undef);
• Too short on right? Extras get undef:
($fred, $barney, $dino) = qw(flintstone rubble);
• Too short on left? Extras are ignored

27
“All of the” shortcut
• Imagine assigning to consecutive array elements:
($rocks[0], $rocks[1], $rocks[2], $rocks[3]) =
qw(talc mica feldspar quartz);
• Simpler: “all of the”
@rocks = qw(talc mica feldspar quartz);
• Previous value is always completely erased
• Can also be used on right side of assignment:
($a, $b, $c, $d) = @rocks;
@his_rocks = @rocks;
@her_rocks = ("diamond", @rocks, "emerald");

28
Array operations
• Remove end of array:
@numbers = (1..10);
$final = pop @numbers; # $final gets 10
• Add to end of array:
push @numbers, 10..15;
• Add to beginning of array:
unshift @numbers, -10..0;
• Remove from beginning of array:
$minus_ten = shift @numbers;
• pop and shift are destructive, removing single element

29
Array interpolation
• Single elements act like scalars:
@rocks = qw(flintstone slate rubble);
print "barney $rocks[2]\n"; # barney rubble\n
• “all of the” inserts spaces:
print "got @rocks\n"; # got flintstone slate rubble\n
• Beware email addresses in double quotes:
print "My email address is [email protected]\n";
• Precede @ with \ to avoid interpolation

30
Using foreach
• Simplest way to walk a list:
foreach $rock (qw(bedrock slate lava)) {
print "One rock is $rock\n";
}
• $rock is set to each element in the list in turn
• Any outer $rock is unaffected
• Assigning to $rock affects the element:
@rocks = qw(bedrock slate lava);
foreach $rock (@rocks) { $rock = "hard $rock" }
• Leaving variable off uses $_
foreach (@rocks) { $_ = "hard $_" }
• Aside: $_ is often the default for many operations

31
List operations
• Reverse:
@rocks = qw(bedrock slate rubble granite);
@reversed = reverse @rocks;
@rocks = reverse @rocks;
• Sort (stringwise):
@rocks = sort @rocks;
@rocks = reverse sort @rocks;
• Default sort is not numeric:
@result = sort 97..102; # 100, 101, 102, 97, 98, 99
• Numeric and other user-defined sorts covered later

32
Scalar and list context
• Important to recognize when Perl needs which:
42 + something # looking for a scalar
sort something # looking for a list
• ... because some things return different values
@people = qw(fred barney betty);
@sorted = sort @people; # @people returns elements
$number = 42 + @people; # @people returns 3 (count)
• Even assignment itself has context:
@copy = @people; # list assignment (elements)
$count = @people; # scalar assignment (count)

33
List context <STDIN>
• Scalar context: one line at a time, undef at EOF
• List context: all remaining lines:
@lines = <STDIN>;
• Kill those newlines:
chomp(@lines = <STDIN>);
• Once read, we’re at EOF
• No more use of <STDIN> in that invocation
• Makes an entire list in memory
• Not good for 4GB web logs

34
Subroutines

35
Define and invoke
• Define with “sub”:
sub marine {
$n += 1;
print "Hello, sailor number $n!\n";
}
• Invoke in an expression with & in front of name:
&marine; # Hello, sailor number 1!
&marine; # Hello, sailor number 2!

36
Return values
• Last expression evaluated is return value:
sub add_a_to_b {
print "hey, I was invoked!\n";
$a + $b;
}
$a = 3; $b = 4; $c = &add_a_to_b;
• Not necessarily textually last:
sub bigger_of_a_or_b {
if ($a > $b) { $a } else { $b }
}

37
Arguments
• Values passed in parens get assigned to @_
$n = &max(10, 15);
sub max {
if ($_[0] > $_[1]) { $_[0] } else { $_[1] }
}
• Note that @_ has nothing to do with $_
• @_ is automatically local to the subroutine
• Perl doesn’t care if you pass too many or too few

38
Private sub vars
• Use “my” to create your own local vars:
sub max {
my ($x, $y);
($x, $y) = @_; # copy args to $x, $y
if ($x > $y) { $x } else { $y }
}
• Variables declared in my() are local to block
• Steps can be combined:
my($x, $y) = @_;
• Typical first line of a subroutine
• Or maybe:
my $x = shift; my $y = shift;

39
Variable arg lists
• Just respond to all of @_:
sub max {
my $best = shift;
foreach $next (@_) {
if ($next > $best) { $best = $next }
}
$best;
}
• Now it works for 15 args, 2 args, 1 arg, even 0 args!

40
“my” not just for subs
• Lexical vars (introduced with my()) works in any block
foreach (1..10) {
my $square = $_ * $_;
print "$_ squared is $square\n";
}
• Or any file:
my $value = 10;
...
print $value; # 10
• But why use that last one?

41
“use strict”
• Ensure declared variable names:
use strict; # at top of file
• Now all user-defined vars must be “introduced”
my $x = 3;
$x; # ok
$y; # not ok... no "my" in scope
{ my $z;
$z; # ok
}
$z; # not ok (out of scope)
• Helps catch typos
• Use in every program over 10 lines

42
Early subroutine exit
• Break out of a subroutine with “return”
• Sets the return value as a side effect
my @names = qw(fred barney betty dino wilma);
my $result = &which_element_is("dino", @names);
sub which_element_is {
my($what, @array) = @_;
foreach (0..$#array) { # indices of @array's elements
if ($what eq $array[$_]) { return $_ }
}
−1; # not found
}
• Most legacy code has no explicit returns

43
Persistent but local
• Back to first example:
sub marine {
state $n = 0; # initial value
$n += 1;
print "Hello, sailor number $n!\n";
}
• Introduced in 5.10

44
Simple I/O

45
Reading to EOF
• Reading a line:
chomp(my $line = <STDIN>);
• Reading all lines, one at a time:
while (defined(my $line = <STDIN>)) {
do something with $line;
}
• Shortcut for defined($_ = <STDIN>)
while (<STDIN>) { ... }
• Very common
• Don’t confuse with:
foreach (<STDIN>) { ... }

46
Filters
• Act like a Unix filter
• Read from files named on command line
• Write to standard output
• Any “-” arg, or no args at all: read standard input
• Perl uses <> (diamond) for this:
while (<>) {
# one line is in $_
}
• Invoke like a filter:
./myprogram fred barney betty

47
@ARGV
• Really a two step process
• Args copied into @ARGV at startup
• <> looks at current @ARGV
• Common to alter @ARGV before diamond:
if ($ARGV[0] eq "-v") {
$verbose = 1; shift;
}
while (<>) { ... } # now process lines

48
Formatting Output
• Use printf:
printf "hello %s, your password expires in %d days!\n",
$user, $expires;
• First arg defines constant text, and denotes parameters
• Remaining args provide parameter values
• All the standard conversions
• s(tring), f(loat), g(eneral), d(ecimal), e(xponential)
• All the standard conversions, generally formed as:
% [-] [width] [.precision] type
• Negative width is left-justify; missing width is “minimal”
• Double up the % to get a literal one: %%

49
Filehandles
• Name for input or output connection
• No prefix character
• STDIN, STDOUT, STDERR, DATA, ARGV, ARGVOUT
• Open to connect:
open MYHANDLE, "<", "inputfile";
• Direction can be:
"<" for read, ">" for write, ">>" for append
• Returns success:
my $opened = open ...;
if (! $opened) { ... }
• Close to disconnect:
close MYHANDLE;

50
die and warn
• What if opening is essential?
• Use die to abort early:
if (! $opened) { die "cannot open inputfile: $!"; }
• Sends diagnostic to Standard Error
• Adds filename and line number of error
• Use $! to indicate text of system-related error
• Not enough for fatal? Just warn:
if ($n < 100) { warn "needed 100 units!\n" }

51
Using filehandles
• Use line-input operator with your handle name:
my $one_line = <MYHANDLE>;
while (<MYHANDLE>) { ... }
• Add handle to print or printf:
print THATHANDLE "hello, world!\n";
printf STDERR "%s: %s\n", $0, $reason;
• Default filehandle is STDOUT unless you select one:
my $old = select MYHANDLE;
print "this goes to MYHANDLE\n";
select $old; # restore it

52
say
• Introduced in 5.10 to make things easier:
# in 5.8 and earlier:
print "hello, world!\n";
# in 5.10 and later:
use 5.010;
say "hello, world!";
• Newline automatically added
• Saves at least 4 characters of typing!

53
Hashes

54
Overview
• Mapping from key to value
• Keys are unique strings
• Values are any scalar (including undef)
• Useful for:
• Aggregating data against many items
• Mapping things from one domain to another
• Guaranteeing uniqueness
• Scales well, even for large datasets

55
Hash element access
• Like array element, but with {} instead of []
$family_name{"fred"} = "flintstone";
$family_name{"barney"} = "rubble";
• Access the elements:
foreach my $first_name (qw(fred barney)) {
my $family_name = $family_name{$first_name};
print "That's $first_name $family_name for ya!\n";
}

56
“All of the” hash
• Use % for hash like @ for array:
my %family_name = qw(fred flintstone barney rubble);
• Key/value pairs initialize elements in the hash
• Unwind in a list context:
my @values = %family_name;
my %new_hash = %family_name; # copy
my %given_name = reverse %family_name;
• Use “big arrow” for clarity:
my %family_name = (
fred => "flintstone", barney => "rubble",
"bamm-bamm" => "rubble", dino => undef,
);

57
Hash operations
• keys/values access... the keys and values:
my %data = (a => 1, b => 2, c => 3);
my @keys = keys %data; # some permutation of (a, b, c)
my @values = values %data; # corresponding (1, 2, 3)
• Don’t change hash between keys and values!
• Order is unpredictable (like unwinding in list context)
• Thus, often combined with sort:
foreach my $key (sort keys %data) {
my $value = $data{$key};
... do something with $key and $value
}

58
each(%hash)
• Efficient walking of a hash
• Each call to each() returns the “next” key/value pair
• If all pairs are used up, returns empty list
• Typically used in loop:
while (my($key, $value) = each %somehash) { ... }
• Note list assignment in scalar (boolean) context here
• Internal order again (like flatten or keys or values)

59
Typical hash usage
• Bedrock library tracks numbers of books checked out
• Key in hash = “has a library card”
• Value in hash = “number of books”
• Could be be undef (never used card)
• Has some books checked out?
if ($books{"fred"}) { ... }
• Has used their card?
if (defined $books{"barney"}) { ... }

60
exists and delete
• How do we say “has a card”?
• undef values whether no card, or unused card
• Use exists:
if (exists $books{"dino"}) { ... }
• True if any key matches selected key
• Revoke library card with delete:
delete $books{"slate"};
• Removes key if it exists

61
Counting things
• Count the words:
my %count;
while (<>) {
chomp;
$count{$_} += 1;
}
foreach $word (sort keys %count) {
print "$word was seen $count{$word} times\n";
}
• $count{$_} += 1 creates the initial entry automatically

62
The %ENV

• Your %ENV reflects your process environment:


foreach my $key (sort keys %ENV) {
print "$key=$ENV{$key}\n";
}
• Setting values affects child processes;
$ENV{PATH} .= ":/some/additional/place";

63
Regular Expressions

64
Overview
• Patterns that match strings
• Many examples in Unix
grep 'flint.*stone' somefile
• Not a filename match (“glob”)
• Most common use in Perl:
$_ = "yabba dabba do";
if (/abba/) { print "It matched!\n" }

65
Metacharacters
• Most characters match themselves
• /i/ matches “i”, and /2/ matches “2”
• Period matches any single character except newline
• /bet.y/ matches “bety” “betsy” “bet.y”
• Regular expressions understand double-quote things
• /coke\tsprite/
• Backslash also removes specialness
• /3\.14159/

66
Simple quantifiers
• Star means “0 or more” of preceding item
• /fred\t*barney/
• /fred.*barney/
• Plus means “1 or more” of preceding item
• /fred\t+barney/
• Question mark means “0 or 1” of preceding item
• /bamm-?bamm/

67
Parens in patterns
• Quantifiers apply to smallest item before them
• /fred+/ matches “freddddd” not “fredfred”
• Use parens to group items
• /(fred)+/ matches “fredfred” not “fredddd”
• Parens also establish backreferences
• \1 matches another copy of the first paren pair:
/(.)\1/ # same character doubled up
• Multiple backreferences permitted:
/(.)(.)\2\1/ # matches “abba” “acca” “bddb” “aaaa”

68
Relative backreferences

• Introduced in 5.10
• \g{N} counts from beginning if N is positive
• But counts relatively backwards if N is negative
• Generalizing the “abba” pattern:
• /.... (.)(.)\g{-1}\g{-2} ..../
• Other parens ahead of this won’t break it

69
Alternatives
• Vertical bar gives choice:
/fred|barney|betty/
• Low precedence
• Often used with parens:
/fred (and|or) barney/
• Careful! These aren’t the same:
/fred( +|\t+)barney/ # locked in spaces or tabs
/fred( |\t)+barney/ # any combination of spaces and tabs

70
Character classes
• List of characters delimited by []
• Matches only one character in string
• Often used with quantifiers
• List every character (order doesn’t matter):
[abcwxyz]
• Use “-” for ranges:
[abcw-z]
[a-zA-Z0-9]
• Negate with initial caret:
[^0-9] # everything except digits
[^\n] # everything except newline, same as “.”

71
Class shortcuts
• [0-9] is same as \d (“digits”)
/HAL-\d+/
• [^0-9] is same as \D
• [a-zA-Z0-9_] is same as \w (“word” characters)
• Often used as /\w+/
• And \W is the opposite of that
• [\f\t\n\r ] is the same as \s (“space”)
• \S is non-space
• Frequently used for parsing:
/(\S+)\s+(\S+)/ # two data items separated by whitespace
• Perl 5.10 adds:
\h for [\t ], \v for [\f\n\r], \R for “linebreak”

72
Using Regular
Expressions

73
Alternate delimiters
• /foo/ is actually m/foo/
• But “m” can take other delimiters like qw():
m(hello) m[fred|barney] m!this!
• Balanced delimiters nest properly:
m(fred (and|or) barney)
• Otherwise, you can always backslash the terminator:
m!foo\!bar!
• Use alternate delimiters to avoid escaping forward slash:
/http:\/\// # works, but ugly
m{http://} # much nicer

74
Modifiers
• Case insensitive matching:
print "Would you like to play a game? ";
chomp($_ = <STDIN>);
if (/yes/i) { # case-insensitive match
print "In that case, I recommend that you go bowling.\n";
}
• Match newlines with period:
/Barney.*Fred/s
• Combine the modifiers in any order
/yes.*no/is or /yes.*no/si

75
More readable regex
• Add “x” modifier to ignore most whitespace:
/-?\d+\.?\d*/ # what is this
/ -? \d+ \.? \d* /x # a little better
• Spaces and tab no longer match themeselves
• Unless escaped or inside a character class
• More common to use \s+ instead
• Pound-sign to end of line is also a comment:
/ -? # optional prefix
\d+ # some digits
\.? # optional period
\d* #optional digits
/x # closing of regex with “x”

76
Anchors
• In absence of anchors, regex float from left to right
• Caret anchors to beginning:
/^fred/ # only match fred at beginning of string
• Dollar anchors to end:
/fred$/ # only match fred at end of string
• Use both to ensure entire string is matched:
/^fred$/
• Common mistake: validation without anchor:
if (/\d+/) { # only digits allowed
• Wrong! Allows “foo34bar”
• Better: if (/\D/) { # has a non digit, fail

77
Word boundaries
• Match at edge (beginning or ending) of word with “\b”:
/\bfred\b/ # matches fred but not frederick or manfred
• Words are defined as things that match /\w+/
• Thus, in “That's a word boundary!”, the words are:
That s a word boundary
• Note the “s” is by itself as a separate word
• This means /\bcan\b/ will match in “can’t stop it!”
• Also available: “not word boundary” /\B/

78
Binding operator

• So far, matches are against $_


• Use =~ to bind to another value instead:
my $some_other = <STDIN>;
if ($some_other =~ /rubble/) { ... }
• Note this isn’t an assignment
• Merely says “don’t look at $_, look over here”

79
Interpolating patterns
• Sometimes, pieces of the regex come from operations:
my $what = "Larry";
while (<>) {
if (/^($what)/) {
print "We saw $what in beginning of $_\n";
}
}
• The value “Larry” becomes part of the regex
• Replace first line to get pattern from arguments:
my $what = shift;
• Pattern is parenthesized to permit “fred|barney”
• Ill-formed patterns are fatal exceptions

80
Match variables
• Backreferences are available after a successful match
• \1 in the regex maps to $1 as a read-only variable:
$_ = "Hello there, neighbor";
if (/\s(\w+),/) {
print "the word was $1\n"; # the word was there
}
if (/(\S+) (\S+), (\S+)/) {
print "words were $1 $2 $3\n";
}
• Failed matches do not reset the memories
• Always check success first!

81
Noncapturing parens

• Parens used for precedence also trigger memory:


if (/(\S+) (and|or) (\S+)/) { ... } # value in $1 and $3
• To avoid triggering memory, change (...) to (?:...):
if (/(\S+) (?:and|or) (\S+)/) { ... } # value in $1 and $2
• Especially handy when accessed from a distance
• But see also “named captures”...

82
Named captures
• New in 5.10
• Instead of mapping to $1, $2, $3, map to $+{name}:
if (/(?<name1>\S+) (?:and|or) (?<name2>\S+)/) {
print "$+{name1} and $+{name2}\n";
}
• Now it’s relatively safe from maintenance trouble
• Names can even be “out of order”
• And \g{name1} is a backreference to (?<name1>...)

83
Automatic Match Vars
• A regex matching a string divides the string into 3 parts:
• The part that actually matched (available in $&)
• The part before the part that matched (available in $`)
• The part after the part that matched (available in $')
• Example test harness:
my $regex = "Fred|Barney";
my $text = "I saw Barney with Fred today!";
if ($text =~ $regex) {
print "($`)($&)($')\n"; # delimits pieces with parens
}

84
Generalized quantifiers
• Using *, +, and ? for quantifiers frequently covers it
• Sometimes, you want “5 through 15” though:
/a{5,15}/
• Two numbers in braces indicate lower and upper bound
• A single number means “exactly that many”:
• /c{7}/
• /,{5}chameleon/
• Single number followed by comma: “that many or more”:
/(fred){3,}/
• Thus, * is {0,} and + is {1,} and ? is {0,1}
• But if you type those instead, people look funny at you

85
Regex precedence
• In precedence from highest to lowest:
(...) (?:...) (?<label>...) # parens for grouping/backrefs
a* a+ a? a{n} a{n,m} # quantifiers
abc ^a a$ # anchors and sequence
a|b|c # alternation
a [abc] \d \1 # atoms
• Like in math, in absence of parens, highest goes first
• Surprising example:
/^fred|barney$/ # should be /^(fred|barney)$/

86
Processing with
regular expressions

87
Substitutions
• s/old/new/ looks for regex /old/ in $_, replacing with new
$_ = "He's out bowling with Barney tonight.";
s/Barney/Fred/; # Replace Barney with Fred
print "$_\n";
• Nothing happens if match fails
• Regex match triggers normal backreference variables:
s/with (\w+)/against $1's team/;
print "$_\n";
• s/// returns true if replacement succeeds:
$_ = "fred flintstone";
if (s/fred/wilma/) { ... }

88
Global changes
• Normally, first matching wins, and we’re done
• Add “g” option for “global”:
$_ = "home, sweet home!";
s/home/cave/g; # “cave, sweet cave!”
• Transform to canonical whitespace:
s/\s+/ /g;
• Additionally, the “i”, “x”, and “s” options can be used

89
Alternate delimiters
• Just like qw() and m(), we can pick other delimiters
• If a non-paired delimiter, appears three times:
s#^https://#http://#;
• If a paired delimiter, use two pairs:
s{fred}{barney};
s{fred}[barney];
s<fred>#barney#;

90
Case shifting
• Easy to capitalize or lowercase a string
• Uppercase remaining replacement with \U
$_ = "I saw Barney with Fred.";
s/(fred|barney)/\U$1/gi;
• Lowercase with \L
• Case shifting continues until \E or end of replacement:
s/(\w+) with (\w+)/\U$2\E with $1/i;
• Lowercase versions \l and \u affect only next character:
s/(fred|barney)/\u$1/ig;
• Combine them for “initial cap”:
s/(fred|barney)/\u\L$1/ig;

91
The split function
• Break a string by delimiter:
my @fields = split /separator/, $string;
my @things = split /:/, "abc:def:g:h";
• Adjacent delimiter matches create empty values:
my @things = split /:/, "abc:def::g:h";
• .... unless the delimiter matches it all at once:
my @things = split /:+/, "abc:def::g:h";
• Leading empty fields are kept; trailing ones discarded:
my @things = split /:/, ":::a:b:c:::";
• Default args are split on whitespace in $_:
while (<>) { my @words = split; ... }

92
The join function

• Reversing split, but without a regex, just a string:


my $whole = join $delimiter, @pieces;
my $x = join ":", 4, 6, 8, 10, 12;
• Might not need the glue at all:
my @words = qw(Fred);
my $output = join ", ", @words;

93
List-context match
• m// in scalar context returns true/false
• m// in list context returns ordered backreferences:
$_ = "Hello there, neighbor!";
my($first, $second, $third) = /(\S+) (\S+), (\S+)/;
print "$second is my $third\n";
• m//g in list context returns every match:
my @words = /(\w+)/g;
• This is like a split, but you say what to keep, not discard

94
Non-greedy quantifiers
• Quantifiers normally “go long” then “back off”:
"fred and barney weigh more than just barney!"
=~ /fred.*barney/ # matches nearly entire string
• Append “?” to quantifier to make it lazy:
"fred and barney weigh more than just barney!"
=~ /fred.*?barney/ # matches first barney, not last
• Can be more efficient
• Can’t be the default for compatibility reasons

95
Multi-line text

• Typically, matches are against one line


• Could slurp an entire file into a variable:
my $text = join "", <MYHANDLE>; # not efficient
• The ^ and $ anchors bind to entire string by default
• Use “m” option to add “and any embedded line”:
if ($text =~ /^wilma\b/im) { ... }

96
Updating many files
• Editing text files “in place” requires a rename dance
• Configure diamond to do this automatically:
@ARGV = qw(my list of files here);
$^I = ".bak"; # appended after opening
while (<>) { # read lines from old version
s/foo/bar/g; # make changes
print; # print to new copy of file
}
• Even do this from the command line:
perl -pi.bak -e 's/foo/bar/g' my list of files here

97
More Control
Structures

98
unless
• Reverses “if”
if (! ($n > 10)) { ... }
unless ($n > 10) { ... }
• Useful to avoid that “empty if” strategy:
if ($n > 10) {
# do nothing
} else {
... do this ...
}
• Please don’t use “unless .. else”
• ...forbidden in Perl6 anyway

99
until
• Reverses “while”
while (! ($j > $i) ) {
$j *= 2;
}
until ($j > $i) {
$j *= 2;
}
• Just another way of saying it, sometimes clearer

100
Expression modifiers
• Simplify structures that have single-expression bodies
• Turn them “inside-out”:
if ($n < 0) { print "$n is a negative number.\n" }
print "$n is a negative number.\n" if $n < 0;
• Note the decrease in punctuation
• Doesn’t change execution order: just syntax
• Also works for other kinds:
print "invalid input" unless &valid($input);
$i *= 2 until $i > $j;
print " ", ($n += 2) while $n < 10;
&greet($_) foreach @person;

101
The naked block
• Defines a syntax boundary, useful for lexical (“my”) vars:
{
print "Enter a number: ";
chomp(my $n = <STDIN>);
my $root = sqrt $n; # square root
print "The square root of $n is $root.\n";
}
• Now $root and $n won’t pollute the neighboring code

102
The elsif clause
• Multi-way if statements:
if (first expression) {
... first expression is true ...
} elsif (second expression) {
... second expression is true ...
} elsif (third expression) {
... third expression is true ...
} else {
... none of them were true ...
}
• Only one block will be executed
• Note the spelling. not “elseif” or “else if”. Larry’s rules.

103
Autoincrement
• ++ adds one to a variable:
$a++;
• Can appear before or after the variable
• After means “value is used before increment”
my $old = $n++;
• Before means “value is used after increment”
print "we got here ", ++$n, " times\n";
• Similarly, “--” reduces by one.

104
The for loop
• Like C/Java/Javascript:
for (initializer; test; increment) {
... body ...
}
• Equivalent:
initializer; while (test) {
... body ...;
increment;
}

105
Using for
• Typically used for computed iterations:
for ($i = 1; $i <= 10; $i++) {
print "I can count to $i!\n";
}
• Note that $i is not local to the loop here
• To make it local, declare it:
for (my $i = 1; ...
• Increment doesn’t have to be 1, or even numeric:
for ($_ = "bedrock"; s/(.)//; ) {
print "one character is $1\n";
}

106
foreach and for
• The foreach loop walks a variable through a list
• The for loop computes a sequence on the fly
• And yet, the words themselves can be interchanged
for $n (1..10) { ... } # foreach loop
foreach ($n = 1; $n <= 10; $n++) { # for loop
• Perl figures out which you mean by syntax
• “for” saves 4 characters when used for “foreach”
• Most Perl loops are typically foreach

107
The last function
• Breaks out of a loop early
• Similar to “break” in C
• Jumps out of one level of loop:
while (<>) {
for (split) {
last if /#/;
print "$_ ";
}
print "\n";
}
• Loop is while, until, for, foreach, or naked block

108
The next function
• Skips the remaining processing on this iteration
• Similar to “continue” in C
• Example:
while (<>) {
next if /^#/;
for (split) {
print "$_\n";
}
}

109
The redo function
• Jumps upward in current iteration:
foreach (@words) {
print "please type $_: ";
if ("$_\n" ne <STDIN>) {
print "sorry, that isn’t it!\n";
$errors++;
redo;
}
}
• Like last and next, works with innermost block

110
Labeled blocks
• How do you get to an outer block? Name it!
LINE: while (<>) {
WORD: for (split) {
last LINE if /__END__/;
$word_count++;
}
$line_count++;
}
• Use label with last/next/redo to say “this loop”
• Perl Best Practices recommends naming all loops
• Larry recommends using nouns for names (as above)

111
The ternary ?: operator
• Like if/then/else, but within an expression
my $absolute = $n > 0 ? $n : -$n;
• Guaranteed to “short circuit”: skips unneeded expression:
my $average = $n ? ($sum/$n) : "n/a";
• Don’t use in place of full if/then/else:
$some_expression ? &do_this() : &do_that();
• If you’re not using the return value, likely bad idea

112
Logical operators
• Logical “and” is &&: both sides true for true result
• Logical “or” is ||: either side true for true result
• Short-circuit... stops when we know the answer
• “true || something” never needs to look at “something”
• “false && something” ditto
• Returns the last expression evaluated
my $last_name = $last_name{$first} || "no last name";
• Can also be spelled out (“and” “or”) for low precedence
• Perl 5.10 introduces // “defined or”:
my $last_name = $last_name{$first} // "no last name";

113
Short-circuit controls
• Short circuiting can be exploited (obfuscated):
if ($next > $best) { $best = $next }
$next > $best and $best = $next;
unless ($m > 10) { print "m is too small" }
$m > 10 or print "m is too small";
• Don’t do this, except for one case:
open MYHANDLE, "<", "somefile"
or die "cannot open somefile: $!";
• That’s not only sanctioned... it’s recommended!

114
Perl Modules

115
Installing modules
• Core vs CPAN
• Check if installed with “perldoc Module::Name”
• Install individual MakeMaker-based modules
• Extract distribution and cd into it
• Type “perl Makefile.PL”
• Type “make install”
• For Module::Build-based modules, slightly different:
• Extract distro
• Type “perl Build.PL”
• Type “./Build install”
• But what about dependencies?

116
CPAN shell
• Most modules get installed directly from the CPAN
• Modules in CPAN have dependencies noted
• CPAN shell handles download, test, depends, and installs
• Two methods of invocation:
• perl -MCPAN -eshell (works nearly everywhere)
• cpan (most modern systems)
• Once you reach a prompt, it’s just “install Foo::Bar”
• You might get asked about dependencies
• Also check out “CPANPLUS” and “CPAN minus”

117
Using Modules
• Simple task: get basename of file:
(my $basename = $name) =~ s#.*/##; # broken
• Didn’t consider \n in filename (yes, legal)
• Didn’t consider portability
• Easy fix:
use File::Basename;
my $basename = basename($name);
• Module defines basename, dirname, fileparse
• To get just some:
use File::Basename qw(basename dirname);

118
OO Modules
• Don’t fear the OO! Just enough to get you by
• The File::Spec module doesn’t export subs:
use File::Spec;
my $new_name =
File::Spec->catfile($dirname, $basename);
• The -> syntax there is calling a “class method”
• Until you learn Perl OO, just copy the syntax

119
OO Instances
• The DBI module (found in the CPAN) uses instances
• Like File::Spec, no subroutines are imported
• Calling “constructors” returns “objects”
use DBI;
my $dbh = DBI->connect(...);
• These objects can then take methods themselves:
my $sth = $dbh->prepare("...");
$sth->execute();
my @row = $sth->fetchrow_array;
$sth->finish;
• Until you learn proper OO, the examples should work

120
File tests

121
File test operators
• Many file tests, most boolean, some valued
• Test if a file exists with -e
die "$filename already exists!" if -e $filename;
• Works with a filehandle or a filename, default $_
• Boolean tests: -r (read), -w (write), -x (execute)
• -e (exists), -z (zero size), -f (plain file), -d (directory)
• -t (is a terminal), -T (“text” file), -B (“binary” file)
• Numeric values: -s (size in bytes)
• -M (mod time in days), -A (access time), -C (inode change)
• Skip retesting with underscore:
if (-r $somefile and -w _) { ... } # both read and write
if (-w -r $file) { ... } # perl 5.10 only

122
stat and lstat
• Even more info!
my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
$size, $atime, $mtime, $ctime, $blksize, $blocks)
= stat($filename);
• Designed on traditional Unix stat call
• Mapped as well as possible on non-Unix
• stat() chases a symlink, lstat() gives info on symlink itself
• Timestamps are Unix epoch time—convert with localtime
my @stat = stat("/tmp");
my $when = localtime $stat[9];
print "/tmp last modified at $when\n";

123
Bit manipulation
• Bitwise “and”:
10 & 12 # result is 8, the common bits of 1010 and 1100
• Similarly:
10 | 12 # bitwise “or”, result is 14
10 ^ 12 # bitwise “xor”, result is 6
6 << 2 # left shift, result is 24
25 >> 2 # right shift, result is 6
~ 10 # bit complement, result depends on int size
• Use this with stat to extract mode:
my $permissions = $mode & 0777;
$permissions |= 0111; # “or”-in the executable bits

124
Directory operations

125
Using chdir
• Changes the per process “current directory”
• Used for all “relative” filenames
• Inherited from parent, inherited by children (later)
• Might fail, so test result:
chdir "/etc" or die "Cannot chdir /etc: $!";
• Always test result
• Does not understand tilde-expansion (like shells)

126
Globbing
• Getting files that match a filename pattern:
$ echo *.pm
barney.pm dino.pm fred.pm wilma.pm
• Performed by shell automatically; mostly you don’t worry
$ perl -e 'print "@ARGV\n"' *.pm
barney.pm dino.pm fred.pm wilma.pm
• But what if you have “*.pm” inside your program?
my @names = glob "*.pm"; # expand as the shell would
• Ignores dotfiles... include them explicitly:
my @all = glob(".* *");

127
Ancient globbing

• You may see <*> in place of glob(‘*’)


• Same internal function, different syntax
• Collides with <HANDLE> reading
• Perl has weird rules to sort out which was which
• If you really want to avoid <>, you can use readline:
my @lines = readline HANDLE;

128
Directory handles
• Globbing is easy to type
• Sorts its results; might not be needed
• Lower level access with readdir():
opendir DH, "/some/dir" or die "opendir: $!";
foreach $file (readdir DH) { ... }
closedir DH;
• opendir is like open, readdir is like readline, etc
• Names are unsorted, and include all names
• Even . and ..
• Names don’t include any directory part
• If you need recursive directory processing, check out:
“perldoc File::Find”

129
File operations

130
Removing files
• The equivalent to command-line “rm” is unlink():
unlink "slate", "bedrock", "lava";
• Since unlink takes a list, and glob is happy to return one:
unlink glob "*.pm"; # like "rm *.pm" at the shell
• Return value from unlink is number of files deleted
• Can’t diagnose trouble unless done one at a time:
foreach $goner (qw(slate bedrock lava)) {
unlink $goner or warn "cannot unlink $goner: $!";
}

131
Renaming files
• Like Unix “mv”:
rename "old", "new" or warn "Cannot rename: $!";
• Batch rename all *.old to *.new:
foreach my $old (glob "*.old") {
(my $new = $old) =~ s/\.old$/.new/;
if (-e $new) {
warn "cannot rename $old to $new: existing file\n";
} else {
rename $old, $new or warn "$old => $new: $!\n";
}
}

132
Hard/soft links
• Hard links, like “ln”:
link $old, $new or die "Cannot link: $!";
• Symlinks (“soft” links), like “ln -s”:
symlink $old, $new or die "Cannot symlink: $!";
• Read where the link points:
my $link = readlink $new;
• readlink() returns undef if it’s not a symlink
• Interesting trick: symlinks that are invalid:
-l $name and not -e $name

133
Making directories
• Make directories with mkdir:
mkdir "fred" or warn "Cannot mkdir fred: $!";
• Default permissions applied, unless you provide it:
mkdir "private", 0700;
• The “umask” is still applied to this value though
• Note the permission is given in octal here
• Remove directories with rmdir:
rmdir "fred" or warn "Cannot remove fred: $!";
• Directories must be empty:
unlink glob "fred/.* fred/*"; ...
• Even that might fail if there are subdirs
• Consider rmtree() in File::Path

134
Modifying permissions
• Change permissions with chmod():
chmod 0755, "fred", "barney";
• Again, note the octal value for permissions
• Combine with stat() to set relative permissions:
for my $file (glob "*") {
my @stat = stat($file) or next;
my $new_perms = ($stat[2] & 0777) & ~ 0111;
chmod $new_parms, $file or warn "chmod $file: $!";
}
• Or see File::chmod in the CPAN:
use File::chmod; chmod "-UGx", $_ for glob("*");

135
Modifying ownership
• Use chown to change owner and group:
defined(my $user = getpwnam "merlyn") or die;
defined(my $group = getgrnam "users") or die;
chown $user, $group, glob "/home/merlyn/*";
• Ability to change owner and group is restricted
• “-1” means “no change” for owner and/or group
• On most modern systems

136
Modifying timestamps

• Make everything looked accessed now, modified yesterday


my $now = time; # current unix epoch time
my $ago = $now - 86400; # seconds in a day
utime $now, $ago, glob "*"; # set atime, mtime, for all files
• The ctime value is always set to “now”

137
Strings and sorting

138
Finding a substring
• index() returns 0-based value of first occurrence:
my $stuff = "Howdy world!";
my $where = index($stuff, "wor"); # $where gets 6
• Start later with third parameter:
my $first_w = index($stuff, "w"); # 2
my $second_w = index($stuff, "w", $first_w + 1); # 6
my $third_w = index($stuff, "w", $second_w + 1); # -1
• Go from right to left with rindex():
my $last_slash = rindex("/etc/passwd", "/");

139
Manipulate with substr
• Select from a string with substr():
my $part = substr($string, $start, $length);
print "J", substr("Hi, Hello!", 5, 4), "!\n";
• Initial position can be negative to count from end
• substr can also be used on left side of assignment:
my $string = "Hello, world!";
substr($string, 0, 5) = "Goodbye";
• Handy for “regional” substitutions:
substr($string, -20) =~ s/fred/barney/g;
• Or use fourth arg:
my $previous = substr($string, 0, 5, "Goodbye");

140
Formatting with sprintf

• Like printf, but into string, not handle:


my $date_tag =
sprintf "%04d/%02d/%02d %2d:%02d:%02d",
$year, $month, $day, $hour, $minute, $second;
• Great for rounding off numbers:
my $rounded = sprintf "%.2f", 2.49997;

141
Advanced sorting
• Replace the “comparison” in the built-in sort:
• Define a subroutine that compares $a to $b
• Return -1 if $a is “less than” $b, +1 if other way around
• Return 0 if they are “equal”, or incomparable
• Example:
sub numeric {
return -1 if $a < $b;
return 1 if $a > $b;
return 0;
}
• Shortcut for this:
sub numeric { return $a <=> $b }

142
Using your compare
• Place your comparison sub name after “sort” before list:
my @sorted = sort numeric 97..102;
• Routine gets called “n log n” times to provide order
• No need for external routine!
my @sorted = sort { $a <=> $b } @numbers;
• Descending? just swap $a and $b or add reverse:
my @down = sort { $b <=> $a } @numbers;
my @down = reverse sort { $a <=> $b } @numbers;
• For strings, use “cmp”
my @sorted = sort { "\L$a" cmp "\L$b" } @items;

143
Sorting hash by value
• Trophy time: sort names by value descending:
my %score =
("barney" => 195, "fred" => 205, "dino" => 30);
my @winners = sort by_score keys %score;
• What’s in by_score? Indirect sort of keys:
sub by_score { $score{$b} <=> $score{$a} }
• What if the scores tie?
$score{"bamm-bamm"} = 195;
• Add a second level sort:
sub by_score
{ $score{$b} <=> $score{$a} or $a cmp $b }
• Now sorts by descending score, and ascending name

144
Process management

145
The system function
• Fire off child process:
system "date"
• Child inherits stdin, stdout, stderr
• Perl is waiting for child
• Return value is related to exit status of child
• But 0 is good!
system "date" and die "can't launch date!";
• Any shell metachars in string cause /bin/sh to interpret
• Multiple args for pre-parsed commands (never /bin/sh)
system "tar", "cvf", $tarfile, @dirs;

146
The exec function
• Like system, but doesn’t fork
• Command overlays current Perl process
• Good for things that don’t need to return:
chdir "/tmp" or die;
$ENV{PATH} .= ":/usr/rockers/bin";
exec "bedrock", "-o", "args1", @ARGV;
• Once exec succeeds, Perl isn’t there!
• Only reason still in Perl: command not found:
exec "date";
die "date not found: $!";

147
Backquotes
• Grab output of command:
chomp(my $now = `date`);
• Includes all stdout, typically ending in newline
• Backquotes are double-quote interpolated:
$arg = "sleep";
my $doc = `perldoc -t -f $arg`;
• Might want to merge stderr in there:
my $output_and_errors = `somecommand 2>&1`;
• Stdin also inherited—might want to force it away:
my $result = `date </dev/null`;

148
Backquotes as list
• If output has multiple lines, use backquotes in list context
my @who_lines = `who`;
• Each element will be like:
merlyn tty/42 Dec 7 19:41
• Use in a loop:
foreach (`who`) {
my ($user, $tty, $date) = /(\S+)\s+(\S+)\s+(.*)/;
$ttys{$user} .= "$tty at $date\n";
}

149
Processes as filehandles
• open a pipe directly within Perl:
open DATE, "date|" or die;
open MAIL, "|mail merlyn" or die;
• Processes run in parallel, coordinated by kernel
• Read and write like any filehandle:
my $now = <DATE>;
print MAIL "The time is now $now";
• Close the writehandle to indicate EOF
close MAIL;
my $status = $?;
• Closing a process handle sets $? (like system return)

150
Why filehandles?
• For reading something simple, doesn’t help much
• For writing, about the only way to do it
• One example where reading works well:
open F, "find / -atime +90 -size +1000 -print|"
or die "fork: $!";
while (<F>) {
chomp;
printf "%s size %dK last accessed on %s\n",
$_, (1023 + -s $_)/1024, -A $_;
}

151
How low can we go?
• Full support for:
• fork
• exec
• waitpid
• exit
• arbitrary pipes and file descriptors
• System V IPC
• waiting on multiple handles for I/O
• Sophisticated frameworks have been built (like POE)

152
Simple signal handling
• Send a signal with kill():
kill 2, 4201; # send SIGINT to 4201
• Wait for a SIGINT, and clean up:
sub int_handler {
unlink glob "/tmp/*$$*"; # remove my files
exit 0;
}
$SIG{INT} = 'int_handler'; # register
• Handler doesn’t have to exit:
my $flag = 0;
sub int_handler { $flag++ }
• Now check $flag from time to time

153
But wait... there’s more

154
In Llama book

• Smart matching
• Given/when
• Trapping errors with eval
• Simple uses of grep and map
• Slices

155
In Alpaca book
• The debugger
• Packages
• References
• Storing complex data structures
• Objects
• Writing Modules
• Embedded documentation
• Testing
• Publishing to CPAN

156
Beyond that
• Even more regex things
• More functions, operators, built-in variables, switches
• Network operations
• Security
• Embedding Perl in other applications
• Dynamic loading
• Operator overloading
• Data structure tie-ing
• Unicode handling
• Direct DBM access for simple persistence

157
And Perl 6

• Early-early-early adopters can play already


• Still a ways before early adopters can use for production
• Everything you learn about Perl5 will still be useful in Perl6
• Except some of the syntax is different
• And some things are a lot easier
• And a few things are harder

158
For more information

[email protected]
• http://www.stonehenge.com/merlyn/

159

You might also like