Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Introduction to Perl 5.10 for Unix System Administrators

(Perl 5.10 without excessive complexity)

by Dr Nikolai Bezroukov

Contents : Foreword : Ch01 : Ch02 : Ch03 : Ch04 : Ch05 : Ch06 : Ch07 : Ch08 :


Prev | Up | Contents | Down | Next

Ch 4.1 File Operations

Version 0.9

Contents

Filehandles and standard files

To access files Perl uses so called filehandles. There are two types of filehandles in Perl -- standard and user-defined. Like in C there are just three standard filehandles in Perl:

You can also read from and write to any other file(s). To access a file from your Perl script, you must perform the following steps:

  1. Unless you use a standard filehandle (and in a couple of other cases, see <> operator below) your script should first open the file. This operation binds a filehandle to a particular file. This is usually done using open statement that tells the system what file your Perl script wants to access and how it will access it (read, write of append). Open operation associate your file handle with a pointer to some internal data structure for this file for all subsequent operations.
  2. The script can perform only operation specified during opening of the file -- either read from the file or write to the file, depending on how you have opened the file. Actually you can open file for read/write. In this case writing is done into temp file and when you close the file it is renamed.
  3. After completion of all operations with file the script may close the file. This tells the system that your script no longer needs the access to the file and disconnect the file and its file handle. If you don't do this the system will close the file automatically when you script will finish execution. After you closed file you can open it again with different opening attributes . For example if you first open file as input file after you closed it you can now open it as output file.

Writing to a file is buffered by default. To ask Perl to flush immediately after each write or print command, set the special variable $|to 1. This capability is very helpful when you are put records to a web browser in a CGI script or writing to a socket.

To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket.

Opening a File

Opening the file is essentially an operation of association of the file name in the filesystem and a filehandle. To open a file, call the built-in function open():

open(SYSPASS, "/etc/passwd");
               	    |_____________ the path to file to be opened
        |_________________________ filehandle

The first argument is file handle. It should be used in all other operation with this file. I recommend naming all your filehandles with prefix SYS. That makes the code a little bit more readable.

After the file has been opened, your Perl script accesses the file by referring to this handle. Actually you can think about file handle as a pointer to the system block that operating system allocated to the file.

The second argument is the name of the file you want to open. You can supply either the full pathname, as in /etc/passwd, or relative pathname. In Windows you can supply pathname using Unix conventions (with "/" as the delimiter), but you still need to specify a logical disk. If only the filename is supplied, the file is assumed to be in the current working directory.

In Windows you can use "/" in path as the delimiter, but you still need to specify a logical disk

By default, Perl assumes that file needs to be opened for reading. To open a file for writing, put a > (greater than) character in front of your filename (like in Unix shell):

open(SYSOUT, ">myoutput.txt") || die("Can't open file myoutput.txt for writing\n");

When you open a file for writing, the existing file will be overwritten.

The analogy with Unix shell notation holds in appending too -- to append to an existing file, you need to use >> in front of the filename:

open(SYSOUT, ">>myoutput.txt");

Notation in open statement was borrowed from i/o notation in Unix shells.
If you can do something additional using this notation in Unix shells,
you most probably will be able to do it in the Perl open statement as well

For example you can use "<" sign for opening file for reading. Sometime you need explicitly open standard input (usually the keyboard) and standard output (usually the screen) respectively:

open(STDIN, '-');	# Open standard input
open(STDOUT, '>-');	# Open standard output

The table below summarize tree major opening modes in Perl:

read mode
open(SYSIN, $fname); 
open(SYSIN, "<$fname");
Enables the script to read the existing contents of the file but does not enable it to write into the file
write mode
open(SYSIN, ">$fname");
Destroys the current contents of the file and overwrites them with the output supplied by the script
append mode
open(SYSIN, ">>$fname");
Appends output supplied by the script to the existing contents of the file

Checking Whether the Open Succeeded

You can use open() function to test whether the file is actually available, and exit the program or take some other appropriate action if not. It returns true (a non-zero value) if the open succeeds. For exiting the script with message Perl provided die() built-in function. for example:

$fname="etc/passwd"; 
	unless (open(SYSIN, "<$fname")) { 
        die("unable to open $fname for 
	reading. Reason: $!\n"); 
	}

Note that this is an example when the second form of the if statement (unless) is really useful, because we need to take action only if the action fail. Please note that unless statement should have two closing brackets:

unless (open(SYSIN, "<$fname")) {
            |________________|
|______________________|

In case, God forbid, you miss one, Perl diagnostic is really misleading.

Note that unless statement should contain two closing brackets.
In case you miss one diagnostic is really misleading.

But more often this logic is written using a simpler and more transparent Perl idiom that uses OR operator and which came from shell:

open(SYSIN, "<$fname") || die("unable 
	to open $fname for reading. Reason: $!\n");

You will often see scripts that use this idiom. It is based on the property of the short circuit || (logical OR) operator to evaluate the second operand only if the evaluation of the first one failed. In other words if-then construct and || operator are logically equivalent.

If open returns false, you can find out what went wrong by analyzing build in variable $! or using the file-test operators, that are discussed below. Here is how this variable is defined in perlvar

$! If used numerically, yields the current value of the C errno  variable, or in other words, if a system or library call fails, it sets this variable. This means that the value of $!  is meaningful only immediately after a failure:
    if (open(FH, $filename)) {
	# Here $! is meaningless.
	...
    } else {
	# ONLY here is $! meaningful.
	...
	# Already here $! might be meaningless.
    }
    # Since here we might have either success or failure,
    # here $! is meaningless.

In the above meaningless stands for anything: zero, non-zero, undef. A successful system or library call does not set the variable to zero.

If used as a string, yields the corresponding system error string. You can assign a number to $!  to set errno if, for instance, you want "$!"  to return the string for error n, or you want to set the exit value for the die() operator. (Mnemonic: What just went bang?)

Also see "Error Indicators".

In this case you should always provide additional diagnostic about why you cannot open the file.

Always check any open for failure. Never assume that open will succeed in all cases. Use built-in variable $! to make diagnostic message more informative

Closing a File

When you are finished reading from or writing to a file, you can tell the system that you are finished by calling close():

close(SYSOUT);

Note that close() is not required unless you want to reopen the same file later in the program (for example for writing). Perl automatically closes the file when the script terminates or when you open another file using a previously defined file handle.

Reading from a File

Instead of a function like read or some kind of pipe notation Perl uses a non-conventional notation. To read one line from a file, you need to assign the file handle in angle brackets to a variable. I cannot explain advantages of this decision but it does move Perl noticeably closer to syntactic perversions used in shell languages. For example:

$line = <SYSIN>;

If you just write filename in angle brackets than default variable $_ will be filled with the current record.

The question arise how to know when the input ends. The answer is that Perl assign value undef to any record that you are trying to read after the end of the file. If file is empty then it can be the first record.

Slurping: reading all text into an array

The current PCs and servers have memory of several gigabytes. That means that when processing small to medium files it might be more convenient to read all the file into the array of strings at once. That can be done by assigning a file handle in angle brackets to an array. Here is the Perl code that does exactly this with passwd file.

#!/usr/local/bin/perl
# Script to print the password file on the console like cat 
fname='/etc/passwd';
open(SYSIN, $fname);	        # Open the file
@text = <SYSIN>;		# Read it into an array @text
close(SYSIN);			# Close the file

#now we can, for example print them
print @text;			# Print the array
Please note that if you accept the file name from the user, you need to strip ass "dangerous" characters from the input. Otherwise your script can be used for execution of arbitrary commands:
fname =~tr($'"<>/;!|)( );

The file is defined by the SYSIN handle and use it one right side of the assignment statement means that all likes will go into the array. The statement

@text = <SYSIN>;

reads the file denoted by the filehandle SYSIN into the array @text. Note that the <SYSIN> expression reads in the file entirely via implicit loop. This happens because the reading takes place in the context of an array variable. If we replace @text by the scalar $text, then only the next one line would be read in. Please note that each line is stored complete with its newline character at the end.

Each line read from the file is stored in Perl with its newline character at the end

that means that

$answer=<SYSIN>;
if ("OK" eq $answer) { ....}

will never be true. The right way to program such a test in Perl is to use chomp function that we already discussed:

$answer=<SYSIN>;
chomp($answer); 
if ("OK" eq $answer) { ....}

Not that the statement $answer=<SYSIN> reads only one record of the file the file denoted by the filehandle SYSIN, because we use a scalar of the left side of the assignment statement.

Here is a very simple imitation of Unix tail utility based on reading all the file into the memory:

#!/usr/bin/perl
my @text = <>;
print @text[-12 ... -1];

Processing file one record at a time

Typical way to process file one record at a time is to use while loop

#!/usr/bin/perl
$fname=($ARGV[1]) ? $RGV[0]: "example.txt";
open FILE, "$fname" or die "cannot open the file $fname. Reason: $!\n";
my $lineno = 1;
while () {
   print "$lineno, $_ ";
   $lineno++;
}

Perl idiom while (<>) actually means while (defined(<>)) with the side effect of assignning the value of the current record to variable $_ . After file ends the variable is assigned the value undef, so the loop ends.

Writing to a File

Please note that you first need to open file for writing and check, if the open operation succeed. More often this test is written using a simpler and more transparent Perl idiom which came from shell:

open(SYSOUT, ">$fname") || die("unable to open $fname for writing. Reason $!\n");
In Unix the shell you are running is responsible for turning a command line such as
myscript *.c
into arguments. In Windows your script is responsible for interpretation of such arguments.

It is also essential to let the user know if opening operation failed. For example, the user might not have permission to access a certain file, or there is no space left of the drive. There's usually no good reason to skip analysis of open statement  return code.

To write to a file, specify the file handle when you call the function print():

 print SYSOUT "Test\n";

The file handle must be the first parameter of the print function. Whether you are writing a new file or are appending to an existing one depends on how you opened the file. In both cases you use the same print statement.

C programmers expect that the first element of @ARGV, contains the name of the script.
This not the case in Perl.

Buffering

Writing to a file is buffered by default. To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket.

We can write the while file at one if the content in the file is in array.

print SYSOUT  @text;

The copying procedure is simple enough: read a line from the source file, and then write it to
the destination:

while (<IN>) {
   print OUT $_ ;
}

Getting filenames as parameters to the script

Perl puts command like arguments into a special array variable called @ARGV . When a Perl script starts up, this variable contains a list consisting of the command-line arguments. For example, the command

$ script6_12 myfile1 myfile2

sets @ARGV to the list

("myfile1", "myfile2")

As with all other array variables, you can access individual elements of @ARGV. For example, the statement

$var = $ARGV[0];

assigns the first element of @ARGV to the scalar variable $var . You even can assign to some elements of @ARGV empty strings if you like. This not always a perversion as you can can check for this value and provide default values this way. In user did not supplied any aerguments then scalar($ARGV) will be zero. For example:

if (scalar(@ARGV)==0 || $ARGV[0] eq '' ) {
   $ARGV[0] = "/home/nnb/"; # set default for the first argument
} 

As with any array to determine the number of command-line arguments, used a scalar built-in function. We also can use assignment of the array to a scalar variable:

$args_number = @ARGV;
# search.pl -- this program will search all files for a word 
# and print total number of lines that contain the word
# format
#     search word file1 file2 ...
   print ("Word to search for: $ARGV[0]\n");
   for ($fc=1; $fc<=@ARGV; $fc++) {
      unless (open (SYSIN, $ARGV[$fc])) {
        die ("Can't open input file $ARGV[$fc]. Reason: $!\n");
      }
      $wc=0;
      while ($line = <SYSIN>) {           
         if (index($line,$ARGV[0])>-1) {$wc++} # check if the line contains the word
      } 
      close (SYSIN); # we need to close file to be able to open the next one
   }
   print ("total number of lines that contain $ARGV[0]: $wc\n");

The <> Operator and reading from the sequence of files

In many programming language (Pascal, Ada, Modula2) sequence <> (usually called diamond) is used as an "not equal". Unfortunately here like in some other places Perl redefines that meaning in a new and controversial way -- we can think that Perl users fall victim to the Larry Wall fascinations with digrams ;-).

Diamond <> operator in Perl is an input operator that provide reading of a sequence of files presented as a command line arguments. That means that it contains a hidden reference to the array @ARGV:

  1. When the Perl interpreter encounts the <> operator for the first time, the action depends on whether command line arguments are present or not (is ARGV empty). If yes, it opens the file whose name is stored in $ARGV[0] . If not it opens STDIN .
  2. After opening the file it executes shift(@ARGV); When the <> operator exhausts an input file, the Perl interpreter close the file and goes back to step 1 and repeats the cycle again.
To ask Perl to flush immediately after each write or print command, set the special variable $| to 1. Setting this value is very helpful when you are printing to a web browser in a CGI script or writing to a socket.

That simplifies scripting scripts that behave similar to UNIX commands that accept any number of files as arguments:

cat file1 file2 file3 ...

The cat command writes to STDOUT all of the files specified on the command line, starting with < file1.

Diamond operator can be used to imitate behavior of standard Unix utilities working with files

We can simulate this behavior in Perl using the <> operator:

# perlcut.pl
while (<>) { print; }

The script operates on all of the files specified on the command line in order, starting with file1. When file1 has been processed, the script then proceeds on to file2, and so on until all of the files have been exhausted.

When it reaches the end of the last file on the command line, the <> operator returns the undef value. However, if you call the <> operator after this it will try to open STDIN. (Recall that <> reads from the STDIN if there were no arguments on the command line.) This means that you have to be more careful when you use <> than when you are reading using <SYSFILE> (where SYSFILE is a file handle). If SYSFILE has been exhausted, repeated attempts to read using <SYSFILE> continue to return the undef value because there isn't anything left to read.

If file as been exhausted, repeated attempts to read using it continue to return the undef value because there isn't anything left to read.

Working with pipes

You can specify in the open statement how you open the file for reading, writing, appending, etc. What is more important you can specify pipe as you input:

open(SYSIN, "gzip -d -c $fname |");	# Open steam of records outputed by gzip

Opening Pipes

On machines running the UNIX operating system, two commands can be linked using a pipe. In this case, the standard output from the first command is linked, or piped, to the standard input to the second command.

Perl enables you to establish a pipe that links a Perl output file to the standard input file of another command. To do this, associate the file with the command by calling open, as follows:

open (SYSPOUT, "| gzip > results.gz"); #  we write to a pipe
open (SYSPIN, "gzip -dc infile.gz |");  # we read from a pipe

The | character tells the Perl interpreter to establish a pipe. For example you can use a pipe to send mail from within a Perl script. For example:

if open (SYSMES, " | mail [email protected] ") {
	print SYSMES "Hi, Nick!  An example from your book sent this!\n";
	close(SYSMES);
}

Here we need an explicit close. It will close the pipe referenced by the SYSMES handle, which tells the system that the message is complete and can be sent. The call to close actually controls the moment when the message is to be sent. (If you do not call close, SYSMES will be closed when the script terminates and only then the message will be sent).

Filter Scripts in Perl

The most often one need to write a script that perform some action on each line of the file and spit some output (also to the file). This type of scripts is called filters. For example:

#print all successful access lines from the HTTP server log
while (<STDIN>) {  # STDIN is the standard input file like in C 
   if (index($_,' 200')>-1) {print;} 
}

In the example above:

Summary

Perl accesses files by means of file variables. File variables are associated with files by the open statement.

Files can be opened in any of three modes: read mode, write mode, and append mode. A file opened in read mode cannot be written to; a file opened in either of the other modes cannot be read. Opening a file in write mode destroys the existing contents of the file. To read from an opened file, reference it using <SYSFILE>, where SYSFILE is a placeholder for the file handle associated with the file. To write to a file, specify file handle in print.

Perl defines three built-in file variables:

You can redirect STDIN and STDOUT by specifying < and >, respectively, on the command line. Messages sent to STDERR appear on the screen even if STDOUTis redirected to a file.

The close function closes the file associated with a particular file handle. close never needs to be called unless you want to control exactly when a file is to be made inaccessible.

You can use -w and -s tests to ensure that you do not overwrite a non-empty file.

The <>operator enables you to read data from files specified on the command line. This operator uses the built-in array variable @ARGV, whose elements consist of the items specified on the command line.

Perl enables you to open pipes. A pipe links the output from your Perl script to the input to another script.

Exercises

Q: How to open several files to read?
Q: Why does adding a closing newline character to the text string affect how die behaves?
Q: Which is better: to use <>, or to use @ARGV and shift when appropriate?
Q: Can I use casading pipes as input or putput?
Q: Can I connect internal functions in Perl script via pipe
Q: Can I can count how many command-line arguments were passed to the program?
Q: Can I write to a file and then read from it later?

Exercises

  1. Write a script that takes names of files form standard input, and print all attributes of this files like ls in Unix (or dir in Dos/Windows)
  2. Write a script that takes a list of files from the command line and examines their attributes and date of modification. If a file is created this week, print $name is a new file! where $name is a placeholder for the name of the file.
  3. Write and debug a script that copies a file named file1 to file2, replacing a selected word to a new one (old and new words are passed as parameters).
  4. Write a script that counts the total number of bytes, words and lines in the files specified on the command line. After that send a message to user ID postmaster indicating the total number of bytes, words and lines in each file.
  5. [Unix] Write a script that takes a list of files and indicates, for each file, whether the user has read, write, or execute permission.
  6. What is wrong with the following script?
    #!/usr/local/bin/perl 
    open (OUTFILE, "outfile"); >
    print OUTFILE ("This is my message\n");

Prev | Up | Contents | Down | Next



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019