Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Introduction to Perl for Unix System Administrators

(Perl without excessive complexity)

by Dr Nikolai Bezroukov


Prev | Up | Contents | Down | Next

3.1. Perl string operations

There are two ways of performing string operations in Perl -- procedural and non-procedural. Here we will discuss procedural capabilities of Perl.  Non-procedural (regular expressions based) capabilities will be discussed in Chapter 5

Procedural string handling in Perl is often simpler, more reliable and more easily debugged that other methods (debugging complex regular expression is difficult -- especially for users without of couple of years practice in this sport ;-) . I strongly recommend for beginners to use procedural string operations as widely as possible, unless non-procedural capabilities are definitely better fit for a particular task (or you can borrow a regular expression from the book or other script and pray that it is correct :-)

Note 1: Several string manipulation functions (tr, split, etc) work on the default input variable --  $_ . This is so called default scalar variable -- another interesting albeit slightly strange Perl creature -- useful and dangerous at the same time.

Note 2: There are also strange omissions -- classic head, tail and truncate functions are not implemented for the strings (a crippled version of tail is available as chop and a crippled version of truncate as chomp but generally it's better to use regular expression for this task in Perl). If you need better set of functions then you can use replication of  functions from REXX. See StringRexx - Perl implementation of Rexx string functions  available  from search.cpan.org

Please note that any string in Perl can be converted into array and back (for example using split and join functions), so if  a given operation is simpler on arrays it might make sense to perform such a conversion and then convert the resulting array back to string. This trick is also useful working with words.

All-in-all capabilities of working with string in Perl are one of the best of any scripting language and that is very important as many problems can be viewed (or converted to ) as manipulation of strings.

Explicit conversion to string

As we already have learned the "(double quote) is not a string delimiter in Perl -- it is an operator that concatenates everything with its scope including variables, performing variable substitution.  Variables can be either simple variable are elements of arrays and hash. Actually pretty complex expression are recognized correctly by syntax analyzer but your better test this fact to be sure.

The result is interpreted as a string so double quotes can be used to perform the conversion of operator to string -- a surrogate of type casting.

The simplest example of using double quested to force string conversion would be: 

if ("$b" eq "$a") {print " a equal b";} else {print "a not equal b";} 

If the case of letters during comparison does not matter the best way to perform explicit conversion is to use functions uc() (convert to upper case) or  lc() (convert to lower case). Both functions convert operand to string.  For example previous example can be generalized to

if (lc($b) eq lc($a)) {print " a equal b";} else {print "a not equal b";}

That is very useful idiom in Perl that helps to ensure correct comparison of strings that can be in both lower and upper case (for example user input): as string comparisons are case sensitive and often you do know what case will be used (for example in answers, why does anybody want to distinguish "Yes", "yes", and " yES"?). 

another solution is to convert strings before comparison. Mainframe tradition dictates to convert everything to upper case (first mainframes did not have a lowercase characters), but you can be more modern and convert everything into the lowercase :-).

As a side note I would like to mention that from the point of view of the language designer this possibility actually makes second set of comparison operators used in Perl redundant -- it would be simpler to require explicit conversion usage of quotes if string comparison is used and use one set of operators. But it's too late... As we already mentioned usage of two separate sets of comparison operators is one of the "Perl warts":  an error prone feature similar in effects in allowing to usage of an assignment in the if statement in C (like in if (i=1) { ... } ).

Concatenation operator (dot)

As we already discussed the dot symbol denote the concatenation operator in Perl. The operator takes two scalars, and combines them together in one scalar. Both of the scalars to the left and right are converted to strings. For example:

$line = $line . "\n"; # add a newline to the string
$line .="\n"; # same thing

Like double quotes the concatenation operator can be used for casting a numeric value into string. The following (rather unrealistic, I really need a better one) example demonstrates that when a numeric value that contains non-significant digits (0.00 in this particular case) is converted to string all non-significant digits are lost:

$a=0.00 . '';
if ( 0.00 . '' ) { print "$a is True\n";} 
else {print "$a is False\n";} # Will print "0 is False"

The trick here is that the real number 0.00 in statement " $a=0.00 . ''; " will first be converted to string "0"  and only then concatenated with null string in the concatenation operator.

All non-significant digits are lost if we convert a string to numeric representation and then back to string

 Compare with

$a="0.00";
if ($a) { print "$a is true\n"}
# will print "0.00 is true"  

In the latter example the string "0.00" will never be converted to numeric and as such will be considered true in the if statement.

It is important to know that concatenation operator enforces a scalar context and this mean that in case of an array the number of elements in the array will be substituted. For example (note that the array @ARGV represents all arguments passed on the command line to the script):

print "Arguments:".@ARGV."\n"; # a very unpleasant error.

The intent seems to be to print all command line arguments, but the Perl interpretation is quite different. Please run this example and find out what will be printed. One of the possible correct solutions is:

print map { "$_\n" } @ARGV; # map function provides an implicit foreach loop.

The x Operator

This operator is called the string repetition operator and is used to repeat a string. All you have to do is put a string on the left side of the x and a number on the right side. Like this:

"----|" x 5 # This is the same as "----|----|----|----|----|"

You can use any string as the source including strings that contain newline, for example:

print ("Hello\n" x 5);

Functions and operators to manipulate strings

Contrary to popular belief and Perl-hype regular expression are not the "universal opener" and in many cases procedural solutions is more transparent, more easily debugged and more modifiable in the future. Perl has full assortment of PL/1 string operations including substr, length, index and tr (translate).  This functions are very powerful and generally they could probably accommodate for 80% of the operations that you ever would want to perform on strings. In case someone hates regular expressions for religious reasons he/she can do almost everything without them (kind of vegetarian diet, I think ;-)

There are also several other string manipulation functions that provide for important special cases (split, chop, chomp, uc, lc, unpack etc.). They are not well designed and we will discuss them after major functions. 

Perl also have powerful array related string fictions like grep, sort, etc. We will discuss them in the part of this chapter devoted to arrays (3.2)

Please remember that one interesting idiosyncrasy of Perl is the concept of so called default argument denoted as $_. If you do not supply an argument for certain functions it will operate on default argument. If you do not supply arguments to the function it will usually operate on the default argument $_.

Getting Substring: substr

Substr is a classic string manipulation function that, as far as I know was first introduced in PL/1 in early 60th. This is the most important function for manipulating strings in Perl. One need to understand it to be effective Perl programmer.

Complete  understanding of the substr function is really important in order to become an effective Perl programmer

Like PL/1 Perl provides the substr function (substring) to extract parts of a scalar (e.g. string). In the most general case of substr invocation you need to specify up to four arguments:

  1. String to be used
  2. Starting position of the substring that you want to extract (can be negative
  3. Length of the substring
  4. (Perl innovation or over complication) Replacement string (see also splice function for arrays)

For example if you wanted to get the first character of a string:

$name = "Nick";
$initial = substr($name,0,1);
		    |	| |_______ length
		    |	|_________ starting position  
		    |_______name of the string

The second argument can be negative -- in this case the offset of the starting position will be calculated from the end of the string, but form the start of the string

$last=substr($name, length($name),1) ; # last character

$last= substr($name, -1,1); # same as above

Like in PL/1 and REXX omitting the last argument means that all characters till the end of the string (tail) will be taken.

        $last=substr($name, -1) ; # same as above

If you want, you can also use substr function to replace any fragment of the string -- like in PL/1 and REXX substr can be used on the left side of the assignment statement (such functions are called pseudofunctions or R-value functions):

substr($name,0,1)=uc(substr,$name,0,1); # capitalize the first symbol like in ucfirst.

This pseudofunction or R-value capability of substr is very useful. For example we can also to chop off the last character from the scalar $current_line:

substr($name, -1) = ''; # will truncate the string $name by two letters
This is actually more flexible then chop function as them the number of bytes we need to chop can be a variable:
substr($name, -$k) = ''; # will truncate the string $name by $k letters

Here we used negative subscript to count backwards from the end of the string. You can achieve the same result using more verbose and error prone:

$name=substr($name, 0,-2); # will truncate the string $name by two letters

Here you can see that it's nice that the negative third argument was interpreted in a similar way to negative second argument -- as offset from length of the string (length($name)-2).

If you note that substr($name,0,0) is the very beginning of the string it is clear that you can add prefix to the string using substr:

$name='bezroukov';

substr($name, 0,0)='Nick '; # will add the first name to the last

Another interesting idiom is the conversion of the first latter to upper case (kind of generalized uc function as we can convert not only the first letter but any number of letters in any part of the string. for example

substr($name,0,3) =~ tr/a-z/A-Z/; # convert the first three letter to upercase

The important difference between substr in PL/1 and REXX and substr in Perl is that it can substitute a new string instead of deleted. This is a semi-useful generalization as it is borders on overcomplexity and there is a nice simple idiom using regular expressions for substitution (with search): s/search_string/replacement_string/; (see Chapter 5). 

Here is a man entry for substr that describes this possibility (the bold in mine -NNB):

substr EXPR,OFFSET,LEN,REPLACEMENT
 
substr EXPR,OFFSET,LEN
 
substr EXPR,OFFSET
Extracts a substring out of EXPR and returns it. First character is at offset 0, or whatever you've set $[ to (but don't do that).

If OFFSET is negative (or more precisely, less than $[), starts that far from the end of the string.

If LEN is omitted, returns everything to the end of the string. If LEN is negative, leaves that many characters off the end of the string.

If you specify a substring that is partly outside the string, the part within the string is returned. If the substring is totally outside the string a warning is produced.

You can use the substr() function as an lvalue, in which case EXPR must itself be an lvalue. If you assign something shorter than LEN, the string will shrink, and if you assign something longer than LEN, the string will grow to accommodate it. To keep the string the same length you may need to pad or chop your value using sprintf().

An alternative to using substr() as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace parts of the EXPR and return what was there before in one operation, just as you can with splice().

That means that we can use it as insert function, for example:

$a="world";
$b=substr($a,0,0,"Hello ");
# a note that the length can be different
print $a;
# will print "Hello world"

For some unknown to me reason the substr function does not affect default variable $_:

$a='abba';
$_='';
substr($a,0,1);
print "$_\n"; #
Will not print the first letter of the string

In some cases instead of substr you can use sprintf. It is convenient for example to put variables in a predefined placed in dynamically generated command. For example there are some difficulties on working with UNIX permissions as they are octal and can be mangled if Perl converts them into decimal, so using sprintf in this case is simpler:

$perm=0755; 
$string = sprintf ("/bin/chmod %o $target/*", $perm);
`$string`;

We will discuss sprintf in more details in ch 7.

String searching: index and rindex functions

There are two function for searching a substring in a string -- index and rindex. Let's quote Perl man page for a more precise definition:
index STR,SUBSTR,POSITION
 
index STR,SUBSTR
The index function searches for one string within another, but without the wildcard-like behavior of a full regular-expression pattern match. It returns the position of the first occurrence of SUBSTR in STR at or after POSITION. If POSITION is omitted, starts searching from the beginning of the string. The return value is based at 0 (or whatever you've set the $[ variable to--but don't do that). If the substring is not found, returns one less than the base, ordinarily -1.
The index function search its first operand (string) in the second operand (substring) and return the offset of the first substring found. The rindex function returns the offset of the last substring found.  It returns -1 if the string is not found, which looks logical as it is the index of the last character of the string and that's where unsuccessful matching stops.

index always return offset counted from the beginning of the string even if the third argument is present. In case string is not found the result is -1 (zero corresponds to the first letter of the string)


As one can see index function is not greedy -- if find the first substring in the string. Therefore often it is simpler to use it for finding relevant parts of the string than regular expressions (although you can specify non-greedy matching in regular expressions too)
 
Let's assume the variable $string contains the value abracadabra. Here are some examples:
$string="abracadabra";
print index ($string, "ab")."\n"; # will print 0 (the 1st letter has index 0)
print index ($string, "abc")."\n";  # will print -1
print index ($string, "ab", 2)."\n"; # will print 7 (staring pos is 2)
print index ($string, "ra", 3)."\n";   # will print 9
print rindex ($string, "ab"); # will print 7 (last "ab" in the string)

If you have a string that contain double quotes and want to interpolate variable in this string instead of double quotes it's more convenient to use function qq like in

$a=qq(<font face="$font" color="$color">);

This is the best way to avoid errors connected with forgetting to escape all double quotes in such strings. Compare example above with

$a="<font face=\"$font\" color=\"$color\">"

Also please  remember that a double backslash in double quoted literals represents just one backslash.

$path="C:\\SFU\\bin";
if ( index($path,"\\SFU\\") >-1 ) {
   print "The directory belongs to Microsoft Services for Unix\n";
}

Often one needs to extract the file name at the end of the path. You might do this by searching for the last backslash using the rindex function and then using substr to return the sub-string. For example:

$fullname = qq(C:/WINDOWS/TEMP/SOME.DAT);

$d=index($fullname,':'); #

$drive=substr($fullname, 0, $d);

$p = rindex($fullname, '/') + 1; # index of the first latter after /

$fname = substr($fullname, $p);
# note that we use 2 arguments

print("File $fileName is on the drive $d\n");

Note that in the example above we used a special form of substr invocation -- if the third parameter-the length-is not supplied to substr, it simply returns the sub-string that starts at the position specified by the second parameter until the end of the string. By omitting the third argument we can avoid errors when we miscalculate the length of the substring

The important innovation of Perl in comparison with PL/1 and REXX is that you can specify the starting position of the search. Like in substr in case it is negative it will be counted from the end of the string.

Another important difference is that in case the string is not found index will return -1 not 0. This is pretty logical design decision as it is corresponds to the index of the last element of the string.

Length of the string: function length

Length is a built-in function which gives the length of a scalar.

$password = <>;
$if ( length($password) < 8 ) {
   print "bad password chosen, please use a longer one";
}

In this example, the function length counts the number of bytes in the scalar variable $password.

Regrettably length cannot be used as a pseudo-function (on the right side of the assignment statement) like substr, although it would be useful to be able to truncate string using this shorthand. Instead use one needs to use substr function like in the following example: 
 
$a=substr($a,0,n); # truncation of the string a to n letters

If no scalar is specified, the length function returns length of $_. Note that length require scalar argument and contrary to proclaimed Perl philosophy it will not work as you might expect with array or hash. To find out how many elements array or hash have one might use scalar(@array) and scalar(keys %hash) respectively.

Translate function (tr)

The tr function (actually this is an operator ;-) allows character-by-character translation with several enhancements.  It takes two argument source character set and target character set. Syntax is rather strange and belongs to "Perl warts" as it does not fit well into general string manipulation functions framework  That can be explained by the fact that the tr operator is derived from the UNIX tr utility. The UNIX sed utility uses a y for this operation -- it  is supported as a synonym for tr.

The string to be modified is not supplied as a parameter, but is taken from the $_ variable, for example:

tr/a/z/; # change all "a" into "z"  

The following expression replaces each digit with 9 so any resulting number will consist of 9 only. This sometimes can be a useful parsing technique or data scrambling technique. Of course this encoding is not really helpful but it will suit for the illustration purposes.

By default tr modifies the content of the variable $_.

The function returns the number of substitutions made, not the translated string as we might expect.

$_='Test string 123456789123456789123456789';
$k=tr/2345678/9/; # $k will contain the number of substitutions made

Unlike index and substr the tr function returns not the translated string,
 but the number of substitutions made.

If you specify more than one character in the match character list, you can translate multiple characters at a time. For example:

tr/0123456789/9999999999/; # replace all digits with 9

translates all digits into the 9 character. If the replacement list of characters is shorter than the target list of characters, the last character in the replacement list is repeated as often as needed. That means that we can rewrite the statement above as:

tr/0123456789/9/; # same as above

if more than one replacement character is given for a matched character (this is stupid idea because arguments are sets, but can happen if sets are generated automatically and corresponding check is not in place), only the first is used. The rest of the replacement list is ignored. For instance:

tr/999/123/;

results in all characters "9" in the string being converted to an 1 character.   So it's equal to

tr/999/1/;

The translation operator doesn't perform variable interpolation, for example:        

$from_set="0123456789";
$to_set  ="ABCDEFGHIJ";
tr/$from_set/$to_set/;
# does not work

The translation operator doesn't perform variable interpolation.

The translation operator several useful options: you can delete matched characters, replace repeated characters with a single character, and translate only characters that don't match the character list (see the table below).

Historically the translate function is considered to be one of pattern matching operators. That is untrue, but as you will see the syntax is derived form (also pretty strange) match and substitute operators that we will study in Ch.5. At the same time the translation function operates with strings of character sets, not with regular expressions. Delimiter can vary, but slashes are most commonly used. (slashes are also used in Perl 5 for regular expressions). Most of the special regular expression codes are not applicable.

However, like in regular expressions the dash is used to mean "between". This statement converts $_ to upper case.

tr/a-z/A-Z/; # again this is not the best way to do it. Use uc() instead

Please note that Perl 4 did not have lc and uc functions. Therefore the tr function was often used to convert case. If you see this idiom in the script that probably means that the script was initially written for Perl 4. The example above that converts all digits to 9 can be rewritten as

tr/0-9/9/; # the shortest way to replace all digits to 9

If the target set contains no characters t source set and you use modifier d that you can also delete a set of characters using translate:

tr/.,;://d;

If the new set is empty and there is no d option, then new set is assumed to be equal to the old one and function will not modify the source string -- it can be used for counting characters.  For example, the statement here counts the number of dots (dot is a special character in regular expressions in the variable $ip and stores that in the variable $total.

$_="131.1.1.1"
$total = tr/.//;

Another more complex example counts a set of characters

$k=tr/0-9//; # counts number of digits in the string $_

You can specify set not only directly, but using the idea of complement set operation:

$k=tr/0-9//c; $ will count all non digit characters

If you use tr to parse the string into lexical elements than you may not need repeated character after transliteration. In this case one can use option s. This permits easy building of primitive lexical parsers:

$k=tr/0-9a-Z_/9AA/s; #  each identifier replaced by A, each number by 9

Normally, if the match list is longer than the replacement list, the last character in the replacement list is used as the replacement for the extra characters. However, when the d option is used, the matched characters are simply deleted.

If the replacement list is empty, then no translation is done. The operator will still return the number of characters that matched, though. This is useful when you need to know how often a given letter appears in a string. This feature also can compress repeated characters using the s option.

Here is the list of all possible options: 

Options for the Translation Operator

Option Description
c This option complements the source character set. In other words, the translation is done for every character that does not match the source character set.
d This option deletes any character in the source character set that does not have a corresponding character in the target character set.
s This option reduces repeated sequences of the same character in the output to to a single instance of that character.

For example ROT13 is a simple substitution cipher that is sometimes used for distributing offensive jokes and other potentially objectionable materials on Usenet. This is a Caesar cyper with the value of key equal to 13 (A->N, B->O etc.). Using tr function for decoding ROT13 is an interesting example because the target set is constructed by concatenation of disjoint character subranges [n-z][a-m] (or [N-Z][A-M] for the upper case:

tr/[a-z][A-Z]/[n-z][a-m][N-Z][A-M]/

UNIX programmers may be familiar with using the tr utility to convert lowercase characters to uppercase characters, or vice versa. Do not do that -- Perl 5  has the lc() and uc() functions for this purpose

For complex transliterations the tr/// syntax is bad. . One of the problems is that the notation doesn't actually show which characters correspond, so you have to count characters. for example:

    tr/abcdefghijklmnopqrstuvwxyz/VCCCVCCCVCCCCCVCCCCCVCCCCC/

But in Perl there is a way to make this example more readable using different delimiters:

    tr[abcdefghijklmnopqrstuvwxyz]
      [VCCCVCCCVCCCCCVCCCCCVCCCCC]

If the first string contains duplicates, then the first corresponding character is used, not the last:

    tr/aeioua-z/VVVVVC/

Truncating last characters in strings: functions chop and chomp

Those two are special and very limited functions that are really shame of Perl language designers. They do just one thing and were not generalized to cover most other situations. those two functions are:

Chop function

The built-in function chop, chops the last character off the end of a scalar or array and returns it.  Why just one and why I cannot chop, say, ten characters is a mystery to me (actually equivalent function for arrays pop permit argument with the number of elements popped, see part 3.2 of this chapter). I suspect that this is more "optimization trick" then anything else. Of course chop can be imitated with the subsr function, but the case is rather frequent and deserves a special "short-cut" function that does not allocated new string but performs operation "in place". 

The most popular use is for comparison of a string with a line of file that used to be read with newline as a part of input, but chop is safer and better function here. For example:

echo OK | perl -e '$a =<>; if ($a eq "OK"){print "equal\n"}else{print "non equal\n";}'
non equal

So to compensate for this we will need first to chop the last character (newline): chop($a);  but its better to use chomp (see below).

Due to the fact that scalars are stored in both string and numeric representation chop can be used for dividing of a number by 10 for example:

$n = 128;
chop($n); # returns 12 (128 divided by ten!)

Function returns the character that was dropped, and there is no way to return chopped string which is somewhat unfortunate and can lead to errors like:

$n = chop($n); # probably truncated by one character string was expected here

This is an error, because it returns just the last character of the string (the character that was chopped) not the truncated string. To return the truncated string one should use substr($string, 0, -1).

At the same time when the order of characters that you process in the string is not important chop can be used for processing sting character by character, the same way as substr(string,$i,1) is usually used in forward direction. 

Conditionally truncating characters: chomp

Chomp is another "Perl wart". It usually removes the newline if such exists. Like chop chomp works both for scalars and arrays. This is essentially a very limited version of trim function as it is known on REXX. Trim as used in REXX is a function that removes repeated first and/or last characters in a string (be it newline or blanks or whatever).  Chomp can work with just trailing string (as defined by $/; it can be a regular expression):

This safer version of chop removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module). It returns the total number of characters removed from all its arguments. It's often used to remove the newline from the end of an input record when you're worried that the final record may be missing its newline. When in paragraph mode ($/ = ""), it removes all trailing newlines from the string. When in slurp mode ($/ = undef) or fixed-length record mode ($/ is a reference to an integer or the like, see perlvar) chomp() won't remove anything. If VARIABLE is omitted, it chomps $_. Example:

    while (<>) {
	chomp;	# avoid \n on last field
	@array = split(/:/);
	# ...
    }  

You can actually chomp anything that's an lvalue, including an assignment:

    chomp($cwd = `pwd`);
    chomp($answer = <STDIN>);  

If you chomp a list, each element is chomped, and the total number of characters removed is returned.

The function chomp is a conditional chop which is usually used for getting rid of newlines on the ends of Perl input. Lets say you define a 'special character' to be "\n" ( a newline). Then a statement such as:

$example = "This has a line with a newline at the end\n";
chomp($example);

In other words, chomp gets rid of the newlines only, not any last character like chop. if the string does not contain a newline at the end it will remain unchanged:

$example = "This doesn't have a newline";
chomp($example);

That makes chomp safer then chop.

Actually it does not need to be a newline -- newline is simply a default value of the special variable $/ -- input record separator --  which contains the characters that you want to be chopped. This can be set to any value you want, as in:

$/ = "/"; $path = "/This/is/a/path/"; chomp($path); $/="\n";
print ($path);
# will print '/This/is/a/path'

Please note that you need to restore the value of $/, unless you want to break a lot of scripts.  And yes it's ugly and should be just chomp("/This/is/a/path/","/") but Perl is pretty irregular language.

Manipulating Case: uc() and lc(), ucfirst(), lcfirst()

uc() returns an uppercase version of the string that you give it. For example, if you say something like:

$name = uc("Hello");

print $name; # this will prints 'HELLO'

ucfirst() returns a capitalized version:

$name = ucfirst("hello");

print $name; # prints "Hello";

If we note absence of head, tail and truncate functions in Perl, the presence of  usfirst  looks like arbitrary and probably redundant and can be implemented using substr.  TO increase usefulness of the function it would be wise to generalize it to provide the possibility to capitalize not only the first letter but any substring by providing second and third arguments.

Symmetrically lc() and lcfirst() return lowercased versions of strings. lc returns all lowercase. lcfirst() makes the first character uncapitalized -- sometimes useful for names, but again this is a very limited application and probably function needs some generalization.

One frequnt use of ucfirst and lc is to get a capitlised word:

    $word=ucfirst(lc($word);

This combination of ucfirst with lc is useful for other formattting tasks. For example, let's assume that we need to format a string as a titcle (with each word starting with a capital letter). Here is a very simple variant

@words=split(/\s+/,$title);

foreach $w (@words) { $w=ucfirst(lc($w) } # we are using side effect of foreach loop

$title=join(' ',@words);

Usually articles like "a" and "the" are not capitalized in titles so we can modify the code to accomplish this in the following way:

@words=split(/\s+/,lc($title));

foreach $w (@words) {

    next if ($w eq 'a' || $w eq 'the');    

    $w=ucfirst($w) } # we are using side effect of foreach loop

}

$title=join(' ',@words);

The same effect in a slightly more compact way can be achieved using map instead of foreach loop. This modification we leave as an exercise for the reader.

Related functions

split(PATTERN, STRING, LIMIT) -- this function is covered in array operations although it is essentially a string parsing function. This is just a tradition. See Split function  Breaks up a string based on some delimiter (can be regular expression). In an array context, it returns a list of the things that were found. In a scalar context, it returns the number of things found.  If is discussed in Chapter 5 ( 5.5 Perl Split function).

chr(NUMBER) -- Returns the character represented by NUMBER in the ASCII table. For instance, chr(65) returns the letter A.

join(STRING, ARRAY) -- Returns a string that consists of all of the elements of ARRAY joined together by STRING. For instance, join(">>", ("AA", "BB", "cc")) returns "AA>>BB>>cc".

sprintf (FORMAT, LIST) --  Returns a string formatted by the usual  C "printf" format  specifications

hex (EXPR) --  Returns the decimal value of an expression interpreted as  a hex string. If EXPR is omitted, uses $_.  The hex function can handle strings with or without a leading   0x or 0X

          $x = hex ("0xa2");                # $x is 162
      $x = hex ("a2");                  # $x is 162
      $x = hex (0xa2);                  # $x is 354 (!)


oct (EXPR) -- Returns the decimal value of an expression interpreted as an octal string. If EXPR is omitted, uses $_
    
        $x = oct ("042");                 # $x is 34
      $x = oct ("42");                  # $x is 34
      $x = oct ("0x42");                # $x is 66
      $x = oct (042);                   # $x is 28 (!)

Implementation of  some additional useful functions

The first frequent operation that is not among built-in functions of Perl is strip -- removal of blanks from both ends of the string.  You can think about it as a  generalization of chomp. You can implement it as a regular expression, for example:

sub strip {
   foreach (@_) {s/^\s*(.*?)\s*$/$1/;}
}

This implementation accepts one or several strings and applied the same operation to each.

The second function that is often useful is scan. This function removes from the string and returns as a result the first word of the string passed as an argument. If there is no words in the string the function should return empty string.

        sub scan
     {
        if ($_[0] =~s/\s*(\S+)\s+(.*)$/$2/ ) {
           return $1;
        } else  {
            return '';
        } # if 

      }    

You can generalize it to multiple arguments the way previous function was implemented. That left as an exercise for the reader.

Another useful function implementation of which can be useful exercise for the reader is subword:

subword(string, n[, length])

subword("Where is this string",3,2)

returns the string "this string"

FoundStr = subword("Where is this string",3)

assigns the value "this string" to the variable FoundStr

Summary

Perl has an impressive array of string manipulating functions that can supplement its regular expressions-based string manipulation capabilities.  Novices should probably avoid overusing regular expression string manipulation capabilities until they became more confident in understanding the associated semantics.  In case the task maps clearly into classic sting function like substr and index is also lead to more clear programs that are easier to modify and maintain.

Several important points:

Questions

1. What will the following fragment  print ?

         $name='softpanorama';

if ( index($name, 's') != -1 ) {
    printf("String '%s' has 's' in it\n", $name);
 }

2. What will the following fragment print if $string='softpanorama';

  @c = split(//, $string);
  print "$c[0]$c[4]$c[2]$s[-3]$s[3]\n";

3. What will the following fragment print  ?

 $str1='remember';
 $str2='Perl';
 $str3='warts";
 $left = $str1 . " " . $str2 . " " . $str3;
 $right = "$str1 $str2 $str3";

            if ($left == $right) { print "strings are equal\n"; }
      else { print "strings are unequal\n" }

Additional Reading

Prev | Up | Contents | Down | Next


Created: November 7 1998; Last modified: September 05, 2009