Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Perl substr function

News Perl Language Recommended Links Perl string operations Reference Shell tr command
sort substr split sprintf  index and rindex chomp
join Perl uc, lc, ucfirst and lcfirst functions x operator in Perl Regular expressions    
Nikolai Bezroukov. Simplified Perl for Unix System Administrators Trim Pipes in Perl Perl history Humor Etc

Introduction

Function substr is a classic string manipulation function that, as far as I know was first introduced in PL/1 in early 60th. This is the most important function for manipulating strings in Perl. One need to understand it to be effective Perl programmer.

Complete  understanding of the substr function is really important in order to become an effective Perl programmer

Like PL/1 Perl provides the substr function (substring) to extract parts of a scalar (e.g. string). Unfortunately Perl does not extend this function beyond semantic created by PL/1 designer. So it inherit one principal problem of PL/1 design -- inability to specify a range of characters (starting and ending positions of the substring) instead of starting position and length.  In the most general case of substr invocation you need to specify four arguments:

  1. String to be used
  2. Starting position of the substring that you want to extract (can be negative
  3. Length of the substring

For example if you wanted to get the first character of a string:

 $name = "Nick"; 
 $initial = substr($name,0,1);
		    |	 | |_______ length
		    |	 |_________ starting position  
		    |______________ name of the string 

Here is a man entry for substr that describes this possibility (the bold in mine -NNB). the entry is very weak.

substr EXPR,OFFSET,LEN,REPLACEMENT
 
substr EXPR,OFFSET,LEN
 
substr EXPR,OFFSET
Extracts a substring out of EXPR and returns it. First character is at offset 0, or whatever you've set $[  to (but don't do that).

If OFFSET is negative (or more precisely, less than $[), starts that far from the end of the string.

If LEN is omitted, returns everything to the end of the string. If LEN is negative, leaves that many characters off the end of the string.

If you specify a substring that is partly outside the string, the part within the string is returned. If the substring is totally outside the string a warning is produced.

You can use the substr() function as an lvalue, in which case EXPR must itself be an lvalue. If you assign something shorter than LEN, the string will shrink, and if you assign something longer than LEN, the string will grow to accommodate it. To keep the string the same length you may need to pad or chop your value using sprintf().

An alternative to using substr()  as an lvalue is to specify the replacement string as the 4th argument. This allows you to replace parts of the EXPR and return what was there before in one operation, just as you can with splice().

This idea with the replacement sting was pretty much redundant as you can use substr as presudo-fucntion on the left side of the assignment for exactly the same purposes. It would be better to interpret fourth argument as end of the strong and if it is present consider the third argumnet as upper limit on length (no limit if 0 or less).

Interpretation of the second argument

The second argument specifies the starting position of the substring to be extracted.

  1. If second argument is positive then it is interpreted as starting position counted from 0 from the beginning to the source string.
  2. If first argument is greater  then the length of the sting the function returns empty string.
  3. If the first argument is negative is is counted from the end of the string. For example -1 is index of the last character of the string,

For example:

$last=substr($name, length($name),1); # the last character 

$last= substr($name, -1,1); # same as above

Interpretation of the third argument

The third argument specifies length, it is is positive. If it is negative it specifies offset from the end of the string.

if (substr($t,-1,1) eq '/') {
    $m=substr($t,1,-1); # remove trailing slash (interesting usage of subscr)
} else {
    $m=substr($t,1);
}

Unfortunately there is no way to specify offset from the befinning og the stirng in the third argument and this is a rich source of "one off" type of errors.

Let's assume that we have a string 

<div class="format_text entry-content">Here’s another excerpt from my book-in-progress.</div>

and want to extract the content without div tag. You can do it the following way:

#!/bin/perl
@all=(
     '<div class="format_text entry-content">Here’s another excerpt from my book-in-progress.</div>',
     '<div></div>',
     '<div>1</div>',
     '<div>22</div>'
);

   for $line (@all) {
      print "$line\n";
      $text='';
      $start=index($line,'>');
      if ($start>-1) {
         $start+=length('>');
         $end=index($line,'</div>',$start);
         if ($end>-1) {
            $text=substr($line,$start,$end-$start);
         } # if
      } # if
      print "Extracted: '$text'\n"
   } # for

As you can see this is not a simple one liner and it is better to be factored into subroutine, for example

#!/bin/perl
@all=(
'<div class="format_text entry-content">Here’s another excerpt from my book-in-progress.</div>',
'<div></div>',
'<div>1</div>',
'<div>22</div>');

for $line (@all) {
   print "$line\n";
   $text=between($line,'>','</div>');
   print "Extracted: '$text'\n"
} # for
sub between
{
return '' if (length ($_[0])==0);
my $start=(length($_[1]==0) ? 0, index($_[0],$_[1]);
   return('') if ($start<0);
   $start+=length($_[1]);
my $end=(length($_[2]==0) ? length($_[0], index($_[0],$_[2],$start);
   return('') if ($end<=$start);
   return(substr($_[0],$start,$end-$start));
}

Like in PL/1 and REXX omitting the the third argument means that all characters till the end of the string (tail) will be taken.

$last=substr($name, -1) ; # same as above
$last-substr($name,10); # all characters of the string starting from position 10

Here you can see that in Perl substr functions the negative third argument was interpreted in a similar way to negative second argument -- as offset from length of the string (length($name)-2) .

If you note that substr($name,0,0) is the very beginning of the string it is clear that you can add a prefix to the string using substr:

$name='Bezroukov';
substr($name,0,0)='Nick '; # will add the first name to the last

Another interesting idiom is the conversion of the several first letters to upper case (kind of generalized uc function as we can convert not only the first letter but any number of letters in any part of the string. for example

substr($name,0,3) =~ tr/a-z/A-Z/; # convert the first three letters to uppercase

4th argument -- questionable Perl extension of the functionally of substr function

The important difference between substr in PL/1 and REXX and substr in Perl is that it can substitute a new string if it specified as forth augment. This is not a useful generalization as it duplicates existing functionality, smells of  over-complexity and there is a nice simple idiom using regular expressions for substitution (with search):

s/search_string/replacement_string/;
(see Chapter 5).

The worst thing about this idea is that it duplicates the ability to use substr at the left side of assignment statement and as such might well be considered as an example of useless or harmful "innovation". For example:

$string='abba';
substr($string,1,2)='vv'; # produced "avva".

is equivalent to

$string='abba';
substr($string,1,2,'vv'); # same thing

One marginally useful example when usage of forth argument makes some sense is inserting a substring from a certain position of the string, for example:

$a="world";
$b=substr($a,0,0,"Hello "); # note that the length can be different
print $a; # will print "Hello world"

Usage as a pseudo function of the left hand of the assignment

If you want, you can also use substr function to replace any fragment of the string -- like in PL/1 and REXX substr can be used on the left side of the assignment statement (such functions are called pseudo-functions or L-value functions):

substr($name,0,1)=uc(substr,$name,0,1); # capitalize the first symbol like in ucfirst.

We can also to chop off the last character from the scalar

substr($name,-1,1) = '';  # will truncate the string $name by one character
This is actually more flexible then chop function as them the number of bytes we need to chop can be a variable:
substr($name,-$k) = ''; # will truncate the string $name by $k letters

Here we used negative subscript to count backwards from the end of the string. You can achieve the same result using negative value of length parameter in substr function, which will be interpreted as length of the string minus this offset:

$name=substr($name,0,-2);  # will truncate the string $name by two letters

Using substr with ranges of characters

Often you have first position of the substring and the last position of the the substring that you intend to extract. 

Perl does not provide direct ability to extract the substring based on those two parameters. In this case you need to calculate length yourself, but beware that negative length is interpreted in Perl a special way.

As you can see from the example below this is not a trivial function to program, as there are multiple special cases with the most important connected with the way Perl interpret negative position and "out of bound" indexes:

sub substr2
{
my $string=$_[0];
my $from=($_[1]>=0) ? $_[1] : length($string)-$_[1];
my $to=($_[2]>=0) ? $_[2] : length($string)-$_[2];
   return '' if (length ($string)==0 || $from>length($string));
   if ($to > length($string) ){
      $to=length($string);
   }
   if( $to>=$from ){
      return(substr($string,$_[1],$to-$from+1));
   }
   return('');
}

Quirks

Function index  in Perl returns the value of -1, if the the substring is not found in the string. This is a legitimate value of starting index in substr, which means the last symbol of the string. That means that you need always check if  index is successful, if you use it with substr. 

You just can't write simple sequence of index and substr function without such checking. For example

# str_after function returns tail of the string after a certain marker, if found. It accepts two arguments:
# 1 -- string to search
# 2 -- marker to find
# essentially it works like split function. 
sub str_after {
my $string=$_[0]; # string to search
my $marker=$_[1]; # market that should be present in the string for the function to succeed
my $pos=index($string,$marker);
   if ($pos>-1) {
      return(substr($string,$pos+length($marker));
   }
   return('');
}

For some unknown to me reason the substr function does not affect default variable $_:

$a='abba';
$_='';
substr($a,0,1);
print "The first letter is: $_\n"; # Will not print the first letter of the string

substr usage tips

Like in PL/1 you can use substr function of the left side of the assignment sign (as pseudo function). Also in some cases sprintf function can be used instead.

Calculating length based on two found points in the string

the basec rule is that if you have index of the first charater  of the string and the index of the first character after end of the string you need to extract:

$test='abracadabra"

$start=index($text,'ac');
$end=index($test,'ab);
$extracted=substr($text,$start,$end=$start);

In some cases it is better to use sprintf instead

In some cases instead of substr you can use sprintf (see sprintf in Perl) It is convenient for example to put variables in a predefined places in dynamically generated command. For example there are some difficulties on working with UNIX permissions as they are octal and can be mangled if Perl converts them into decimal, so using sprintf in this case is simpler:

$perm=0755;
$string = sprintf ("/bin/chmod %o $target/*", $perm);
`$string`;

We will discuss sprintf in more details in sprintf in Perl


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[May 07, 2017] A useful capability of Perl substr function

Perl subst function can used as pseudo function on the left side of assignment, That allow to insert a substring into arbitrary point of the string

For example, the code fragment:

$test_string='<cite>xxx<blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>';
print "Before: $test_string\n"; 
substr($test_string,length('<cite>xxx'),0)='</cite>';
print "After: $test_string\n"; 
will print
Before: <cite>xxx<blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>
After:  <cite>xxx</cite><blockquote>test to show to insert substring into string using substr as pseudo-function</blockquote>

Please note that is you found the symbol of string bafore which you need to insert the string you need to substrac one from the found position

$pos=index($test_string,'<blockquote>;);
if( $pos > -1 ){
    substr($test_string,$pos-1,0)='</cite>';
}

[Jul 30, 2015] Manipulating a Substring with substr (Learning Perl, 3rd Edition)

The substr operator works with only a part of a larger string. It looks like this:
$part = substr($string, $initial_position, $length);

It takes three arguments: a string value, a zero-based initial position (like the return value of index), and a length for the substring. The return value is the substring:

my $mineral = substr("Fred J. Flintstone", 8, 5);  # gets "Flint"
my $rock = substr "Fred J. Flintstone", 13, 1000;  # gets "stone"

As you may have noticed in the previous example, if the requested length (1000 characters, in this case) would go past the end of the string, there's no complaint from Perl, but you simply get a shorter string than you might have. But if you want to be sure to go to the end of the string, however long or short it may be, just omit that third parameter (the length), like this:

my $pebble = substr "Fred J. Flintstone", 13;  # gets "stone"

The initial position of the substring in the larger string can be negative, counting from the end of the string (that is, position -1 is the last character). In this example, position -3 is three characters from the end of the string, which is the location of the letter i:

[334]This is analogous to what we saw with array indices in Chapter 3, "Lists and Arrays ". Just as arrays may be indexed either from 0 (the first element) upwards or from -1 (the last element) downwards, substring locations may be indexed from position 0 (at the first character) upwards or from position -1 (at the last character) downwards.

my $out = substr("some very long string", -3, 2);  # $out gets "in"

As you might expect, index and substr work well together. In this example, we can extract a substring that starts at the location of the letter l:

my $long = "some very very long string";
my $right = substr($long, index($long, "l") );

Now here's something really cool: The selected portion of the string can be changed if the string is a variable:

[335]Well, technically, it can be any lvalue. What that term means precisely is beyond the scope of this book, but you can think of it as anything that can be put on the left side of the equals sign (=) in a scalar assignment. That's usually a variable, but it can (as you see here) even be an invocation of the substr operator.

my $string = "Hello, world!";
substr($string, 0, 5) = "Goodbye";  # $string is now "Goodbye, world!"

As you see, the assigned (sub)string doesn't have to be the same length as the substring it's replacing. The string's length is adjusted to fit. Or if that wasn't cool enough to impress you, you could use the binding operator (=~) to restrict an operation to work with just part of a string. This example replaces fred with barney wherever possible within just the last twenty characters of a string:

substr($string, -20) =~ s/fred/barney/g;

To be completely honest, we've never actually needed that functionality in any of our own code, and chances are that you'll never need it either. But it's nice to know that Perl can do more than you'll ever need, isn't it?

Much of the work that substr and index do could be done with regular expressions. Use those where they're appropriate. But substr and index can often be faster, since they don't have the overhead of the regular expression engine: they're never case-insensitive, they have no metacharacters to worry about, and they don't set any of the memory variables.

Besides assigning to the substr function (which looks a little weird at first glance, perhaps), you can also use substr in a slightly more traditional manner with the four-argument version, in which the fourth argument is the replacement substring:

By traditional we mean in the "function invocation" sense, but not the "Perl" sense, since this feature was introduced to Perl relatively recently.

my $previous_value = substr($string, 0, 5, "Goodbye");

The previous value comes back as the return value, although as always, you can use this function in a void context to simply discard it.


Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Internal

External



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: December, 26, 2017