Thursday, July 3, 2008

Learning Perl



1. Introduction


1.1 History of Perl
_______________________________________________________________

Perl is short for "Practical Extraction and Report Language," although it has also been called a
"Pathologically Eclectic Rubbish Lister." There's no point in arguing which one is more correct, because
both are endorsed by Larry Wall, Perl's creator and chief architect, implementor, and maintainer. He
created Perl when he was trying to produce some reports from a Usenet-news-like hierarchy of files for a
bug-reporting system, and awk ran out of steam. Larry, being the lazy programmer that he is, decided to
over-kill the problem with a general-purpose tool that he could use in at least one other place. The result
was the first version of Perl.
After playing with this version of Perl a bit, adding stuff here and there, Larry released it to the
community of Usenet readers, commonly known as "the Net." The users on this ragtag fugitive fleet of
systems around the world (tens of thousands of them) gave him feedback, asking for ways to do this,
that, or the other, many of which Larry had never envisioned his little Perl handling.
But as a result, Perl grew, and grew, and grew, at about the same rate as the UNIX operating system. (For
you newcomers, the entire UNIX kernel used to fit in 32K! And now we're lucky if we can get it in under
a few meg.) It grew in features. It grew in portability. What was once a little language now had over a
thousand pages of documentation split across dozens of different manpages, a 600-page Nutshell
reference book, a handful of Usenet newsgroups with 200,000 subscribers, and now this gentle
introduction.
Larry is no longer the sole maintainer of Perl, but retains his executive title of chief architect. And Perl is
still growing.

1.2 Purpose of Perl
_____________________________________________________________________________________

Perl is designed to assist the programmer with common tasks that are probably too heavy or too
portability-sensitive for the shell, and yet too weird or short-lived or complicated to code in C or some
other UNIX glue language.
Once you become familiar with Perl, you may find yourself spending less time trying to get shell quoting
(or C declarations) right, and more time reading Usenet news and downhill snowboarding, because Perl
is a great tool for leverage. Perl's powerful constructs allow you to create (with minimal fuss) some very
cool one-up solutions or general tools. Also, you can drag those tools along to your next job, because
Perl is highly portable and readily available, so you'll have even more time there to read Usenet news and
annoy your friends at karaoke bars.
Like any language, Perl can be "write-only"; it's possible to write programs that are impossible to read.
But with proper care, you can avoid this common accusation. Yes, sometimes Perl looks like line noise to
the uninitiated, but to the seasoned Perl programmer, it looks like checksummed line noise with a
mission in life. If you follow the guidelines of this book, your programs should be easy to read and easy
to maintain, but they probably won't win any obfuscated Perl contests.


1.3 Basic Concepts

_____________________________________________________________________________________

A shell script is nothing more than a sequence of shell commands stuffed into a text file. The file is then
"made executable" by turning on the execute bit (via chmod +x filename) and then the name of the file is
typed at a shell prompt. Bingo, one shell program. For example, a script to run the date command
followed by the who command can be created and executed like this:
% echo date >somescript
% echo who >>somescript
% cat somescript
date
who
% chmod +x somescript
% somescript
[output of date followed by who]
%
Similarly, a Perl program is a bunch of Perl statements and definitions thrown into a file. You then turn
on the execute bit[2] and type the name of the file at a shell prompt. However, the file has to indicate that
this is a Perl program and not a shell program, so you need an additional step.
[2] On UNIX systems, that is. For directions on how to render your scripts executable on
non-UNIX systems, see the Perl FAQ or your port's release notes.
Most of the time, this step involves placing the line
#!/usr/bin/perl
as the first line of the file. But if your Perl is stuck in some nonstandard place, or your system doesn't
understand the #! line, you'll have a little more work to do. Check with your Perl installer about this.
The examples in this book assume that you use this common mechanism.
Perl is mostly a free-format language like C - whitespace between tokens (elements of the program, like
print or +) is optional, unless two tokens put together can be mistaken for another token, in which case
whitespace of some kind is mandatory. (Whitespace consists of spaces, tabs, newlines, returns, or
formfeeds.) There are a few constructs that require a certain kind of whitespace in a certain place, but
they'll be pointed out when we get to them. You can assume that the kind and amount of whitespace
between tokens is otherwise arbitrary.


2.1 What Is Scalar Data?
_____________________________________________________________________________________


A scalar is the simplest kind of data that Perl manipulates. A scalar is either a number (like 4 or 3.25e20)or a string of characters (like hello or the Gettysburg Address). Although you may think of numbers and strings as very different things, Perl uses them nearly interchangeably, so we'll describe them together.
A scalar value can be acted upon with operators (like plus or concatenate), generally yielding a scalar.

2.2 Numbers
_____________________________________________________________________________________

Although a scalar is either a number or a string,[1] it's useful to look at numbers and strings separately
for the moment. Numbers first, strings in a minute... .
[1] Or a reference, but that's an advanced topic.
2.2.1 All Numbers Use the Same Format Internally
As you'll see in the next few paragraphs, you can specify both integers (whole numbers, like 17 or 342)and floating-point numbers (real numbers with decimal points, like 3.14, or 1.35 times 1025). But internally, Perl computes only with double-precision floating-point values.[2] This means that there are no integer values internal to Perl; an integer constant in the program is treated as the euivalent floating-point value.[3] You probably won't notice the conversion (or care much), but you should stop looking for integer operations (as opposed to floating-point operations), because there aren't any.
[2] A "double-precision floating-point value" is whatever the C compiler that compiled Perl
used for a double declaration.
[3] Unless you use "integer mode," but that's not on by default.

2.2.2 Float Literals
_____________________________________________________________________________________

A literal is the way a value is represented in the text of the Perl program. You could also call this a
constant in your program, but we'll use the term literal. Literals are the way data is represented in the
source code of your program as input to the Perl compiler. (Data that is read from or written to files is
treated similarly, but not identically.)
Perl accepts the complete set of floating-point literals available to C programmers. Numbers with and
without decimal points are allowed (including an optional plus or minus prefix), as well as tacking on a
power-of-10 indicator (exponential notation) with E notation. For example:
1.25 # about 1 and a quarter
7.25e45 # 7.25 times 10 to the 45th power (a big number)
-6.5e24 # negative 6.5 times 10 to the 24th
# (a "big" negative number)
-12e-24 # negative 12 times 10 to the -24th
# (a very small negative number)
-1.2E-23 # another way to say that
2.2.3 Integer Literals
_____________________________________________________________________________________

Integer literals are also straightforward, as in: 12 15 -2004 3485
Don't start the number with a 0, because Perl supports octal and hexadecimal (hex) literals. Octal
numbers start with a leading 0, and hex numbers start with a leading 0x or 0X.[4] The hex digits A
through F (in either case) represent the conventional digit values of 10 through 15. For example:
0377 # 377 octal, same as 255 decimal
-0xff # negative FF hex, same as -255 decimal
[4] The "leading zero" indicator works only for literals, not for automatic string-to-number
conversion. You can convert a data string that looks like an octal or hex value into a number with oct or hex.
Strings
result. A scalar value can be stored into a scalar variable. Scalars can be read from files and devices and written out as well.