Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Perl tr function

News Perl Language Recommended Links Perl string operations Reference Shell tr command
sort substr split sprintf  index and rindex chomp
join Perl uc, lc, ucfirst and lcfirst functions x operator in Perl Regular expressions    
Nikolai Bezroukov. Simplified Perl for Unix System Administrators Trim Pipes in Perl Perl history Humor Etc

Introduction

The tr function (actually this is an operator ;-) allows character-by-character translation with several enhancements.  Semantic is close to classic Unix shell command tr For reasons that will be explained later tr description in Perl documentation is in a very strange place: perlop - perldoc.perl.org  http://perldoc.perl.org/perlop.html  (see Reference).

Function tr  takes two argument source character set and target character set.  Generally they should be of equal length, but if target character set is shorter it is expanded using the last character to the necessary length.

Syntax is rather strange and belongs to "Perl warts" as it does not fit well into general string manipulation functions framework. Use of =~ symbol to supply the parameter (the  string to be translated) is completely bizarre, as tr function has nothing to do with regular expressions.  This is a classic text processing function like substr and index. By default tr destroys the initial value of the string replacing it with the result of the translation (IMPORTANT: in versions of Perl after 5.8 this behaviour can be avoided by using option r -- see below)

$text=tr/a/z/; # change all "a" into "z" in $text

Some idiosyncrasies (but not the one mentioned above)  can be explained by the fact that the tr operator is derived from the UNIX tr utility and it looks like the designer of the language tried to preserve compatibility. But the road to hell is paved with good intentions. The UNIX sed utility uses a y for this operation -- in Perl it is supported as a synonym for tr.

If the string to be modified is not supplied, then the operation is performed on the $_ variable, for example:

 tr/a/z/; # change all "a" into "z" in $_

Function tr has several gotcha. One that is facilitated by  the usage of =~ symbol.  is that classes of symbols in tr should be used without square brackets unlike classes of characters in regular expressions:

This is another one for the Just Plain Wrong camp. For some reason, tr (transliterate) is an operator that is bound to a variable via =~, just as the operators m// and s/// are. When Perl was invented, transliteration constituted a more important part of the language and its applications than it does now, and so it made sense to minimize its syntax; if you’re transliterating $_, you don’t even need to use =~ at all, for example:

tr/a-zA-Z/n-za-mN-ZA-M/;         # rot-13

Unfortunately, seeing tr in the company of =~ so often leads some people to believe that it is an operator that takes regular expressions, so they write code like this:

tr/[a-z]/[A-Z]/;                 # Don't use this

thinking that they are employing character classes that will uppercase text. By coincidence, they get the result they wanted, without realizing that this code means, “Turn left square brackets into left square brackets, turn lowercase letters to uppercase letters, and turn right square brackets into right square brackets.” They have made an incorrect assumption that will lead them into trouble when they try something different with tr.

The following expression replaces each digit with 9 so any resulting number will consist of 9 only. As the string to operate is not given the operation is performed on $_. 

 tr/0-9/9/; # change all digits into 9

Note that classes do not use square brackets in tr. This sometimes can be a useful parsing technique or as data scrambling technique.

By default tr modifies the content of the variable $_ or variable supplied as parameter (on the right side of =~). Use option r if you want it to create a new string as in . For example
newstr=(oldstr=~tr/abc/cde/r);

Please note that the function returns the number of substitutions made, not the translated string as we might expect (which makes it useful for counting  characters in string, see below)

$_='Test string 123456789123456789123456789';
$k=tr/2345678/9/; # $k will contain the number of substitutions made

Unlike index and substr the tr function returns not the translated string,  but the number of substitutions made.

Important gotcha: the tr function doesn't perform variable interpolation in sets

This is another important gotcha: both sets are treated as single quoted strings, not as double quoted strings. So sets should be constants, known at compile time. Backslashes (escape sequences) can be used, though:

Characters may be literals or any of the escape sequences accepted in double-quoted strings. But there is no variable interpolation, so "$" and "@" are treated as literals. A hyphen at the beginning or end, or preceded by a backslash is considered a literal. Escape sequence details are in the table near the beginning of this section.

The tr function doesn't perform variable interpolation: the $ symbol is treated as a regular symbol in it, not as the  start of the variable name, for example:

$from_set="0123456789";
$to_set  ="ABCDEFGHIJ";
tr/$from_set/$to_set/; # does not work
You need to use function  eval to get the desired behaviour
statement="tr/$from_set/$to_set/;"
eval($statement);

Context dependent literals in Perl

Strings with arbitrary delimiters after tr, m, s, etc are a special, additional type of literals. Each with its own rules. And those rules are different from rules that exist for single quoted strings, or double quoted strings, or regex (three most popular types of literals in Perl). For example, the treatment of backslash in "tr literal" is different from single quoted strings:
"A single-quoted, literal string. A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated."

This means that in Perl there is a dozen or so of different types of literals, each with its own idiosyncratic rules. Which create confusion even for long time Perl users, as they tend to forget detail of constructs they use rarely and extrapolate them from more often used constructs. In other words, the nature of those "context-dependent-literals" (on the level of lexical scanner they are all literals) is completely defined not by delimiters they are using (which are arbitrary), but by the operator used before it. If there is none, m is assumed.

This "design decision" (in retrospect this is a design decision, although in reality it was "absence of design decision" situation ;-) adds unnecessary complexity to the language and several new (and completely unnecessary) types of bugs. This "design decision" is also poorly documented and for typical "possible blunders" (for tr that would be usage of "[","$","@" without preceding backslash) there is no warnings. This trick of putting tr description into http://perldoc.perl.org/perlop.html  mentioned in the Introduction,  now can be viewed as an attempt to hide this additional complexity.

In reality in Perl q, qq, qr, m, s, tr are functions each of which accepts (and interpret) a specific, unique type of "context-dependent-literal" as the argument. That's the reality of this, pretty unique, situation with the Perl language, as I see it. Quote-Like-Operators section of Perl docs shows two interesting examples with tr:

tr[aeiouy][yuoiea] or tr(+\-*/)/ABCD/

The second variant look like a perversion for me. I never thought that this is possible. I thought that the "arbitrary delimiter" is "catched" after the operator and after that they should be uniform within the operator ;-).

And the first is not without problems either: if you "extrapolate" your skills with regex into tr you can write instead of

tr[aeiouy][yuoiea]
obviously incorrect
tr/[aeiouy]/[yuoiea]/

that will work fine as long as strings in set1 and set2 are of equal length.

Options

Here is the list of all possible options: 

Option Description
c This option complements the source character set. In other words, the translation is done for every character that does not match the source character set.
d This option deletes any character in the source character set that does not have a corresponding character in the target character set. (Deletes found but unreplaced characters.)
r Return the modified string and leave the original string untouched. $HOST = $host =~ tr/a-z/A-Z/r;
s This option reduces repeated sequences of the same character in the output to to a single instance of that character.. If the replacement list is empty all characters in source string are squashed.

Treatment of repeated characters in the  first set and padding of the second character set in case of unequal length

If you specify the same character in the target set for group of characters in the source set you can perform primitive lexical analysys. For example:

tr/0123456789/9999999999/; # replace all digits with 9
tr/9/d; compresses groups of 9 into a single 9
translates all digits into the 9 character. If the replacement list of characters is shorter than the target list of characters, the last character in the replacement list is repeated as often as needed. That means that we can rewrite the statement above as:
tr/0123456789/9/; # same as above 

if more than one replacement character is given for a matched character (this is stupid idea because arguments are sets, but can happen if sets are generated automatically and corresponding check is not in place), only the first is used. The rest of the replacement list is ignored. For instance:

tr/999/123/; 

results in all characters "9" in the string being converted to an 1 character.   So it's equal to

tr/9/1/;

The translation operator doesn't perform variable interpolation.

Character classes and square brackets gotcha

Like in regular expressions the dash is used to mean "between".  But unlike regular expressions they should NOT be enclosed in square brackets. This statement converts $_ to upper case.

 tr/a-z/A-Z/; # again this is not the best way to do it. Use uc() instead

Please note that Perl 4 did not have lc and uc functions. Therefore the tr function was often used to convert case. If you see this idiom in the script that probably means that the script was initially written for Perl 4. The example above that converts all digits to 9 can be rewritten as

tr/0-9/9/; # the shortest way to replace all digits to 9

For example ROT13 is a simple substitution cipher that is sometimes used for distributing offensive jokes and other potentially objectionable materials on Usenet.

This is a Caesar cyper with the value of key equal to 13 (A->N, B->O etc.).

Using tr function for decoding ROT13 is an interesting example because the target set is constructed by concatenation of disjoint character subranges [n-z][a-m] (or [N-Z][A-M] for the upper case:

tr/[a-z][A-Z]/[n-z][a-m][N-Z][A-M]/
UNIX programmers may be familiar with using the tr utility to convert lowercase characters to uppercase characters, or vice versa. Do not do that -- Perl 5  has the lc() and uc() functions for this purpose  

 

Deleting characters from the source set

If the target set contains no characters and you use modifier d that operations deletes characters from the source set that were not replaced

tr/.,;://d;

If you want to deleting all  characters in the  the source set, then you do need to specify empty second set with option d  or the function does not work as expected

So this option is an exception to the rule that target character send is extended to the length of the source character set. With this option it is not, if target set is empty. 

# cat test
   $test='test ';
   print "Before test 1: |$test|\n";
   $test=~tr/ / /d;
   print "After: test 1: |$test|\n";
   $test='test ';
   print "Before test 2: |$test|\n";
   $test=~tr/ //d;
   print "After test 2:  |$test|\n";
# perl test
Before test 1: |test |
After: test 1: |test |
Before test 2: |test |
After test 2:  |test|

Counting characters using tr

If the new set is empty and there is no d  option, then target set is assumed to be equal to the source one and function will not modify the source string -- it can be used for counting characters from the specified set in the string.

For example, the statement here counts the number of dots (dot is a special character in regular expressions in the variable $ip and stores that in the variable $total.

$_="131.1.1.1"
$total = tr/.//;

Another more complex example counts a set of characters

$k=($ip=~tr/0-9//); # counts number of digits in the string $ip
You can specify set not only directly, but using the idea of complement set operation:
$k=tr/0-9//c; $ will count all non digit characters
This feature also can be used for imitation some set operations on character sets. For example we can test if character sets in two strings str1 and str2 are equal (which means complement of the character set n the second string (str2) and the set of characters in the first string (str1) has no common characters)
if( ($str1=~tr/$str2//c)==0 ) {
  print "character sets in string str1 and str2 are equal\n";
}

Squashing duplicate characters (option s)

Three examples:

tr/a-zA-Z//s; # bookkeeper -> bokeper (squeeze in its pure form should use empty target set which will be assumed to be equal to source set
tr/a-zA-Z/ /cs; # change non-alphas to single space
@stripped = map tr/a-zA-Z/ /csr, @original; . # /r with map

If you use tr to parse the string into lexical elements then you need to squash  repeated character after transliteration. In this case one can use option s. This permits easy building of primitive lexical parsers:

$k=tr/0-9a-Z_/9999999999A/s; # each identifier replaced by A, each number by 9 (target set is extended to the length of the source with letter A)
Normally, if the match list is longer than the replacement list, the last character in the replacement list is used as the replacement for the extra characters. However, when the d option is used, the matched characters are simply deleted.

If the replacement list is empty, then no translation is done. The operator will still return the number of characters that matched, though. This is useful when you need to know how often a given letter appears in a string. This feature also can compress repeated characters using the s option.

Tips

The translation operator several useful options: you can delete multiple matched characters compressing them into one, replace repeated characters with a single character, and translate only characters that don't match the character set using complement option  (see the table below).

Historically the translate function was considered to be string operation not pattern matching operation. so usage of =~ is very confusing. The translation function operates with strings of character sets, not with regular expressions.

Delimiter can vary, but slashes are most commonly used. (slashes are also used in Perl 5 for regular expressions). Most of the special regular expression codes are not applicable.

For complex transliterations the tr/// syntax is bad. . One of the problems is that the notation doesn't actually show which characters correspond, so you have to count characters. for example:

tr/abcdefghijklmnopqrstuvwxyz/VCCCVCCCVCCCCCVCCCCCVCCCCC/

But in Perl there is a way to make this example more readable using different delimiters:

    tr[abcdefghijklmnopqrstuvwxyz]
      [VCCCVCCCVCCCCCVCCCCCVCCCCC]

If the first string contains duplicates, then the first corresponding character is used, not the last:

       tr/aeioua-z/VVVVVC/

The function returns the number of substitutions made, not the translated string as we might expect.

$_='Test string 123456789123456789123456789';
$k=tr/2345678/9/; # $k will contain the number of substitutions made

In general the more you know about the search string and the text in which you search, the faster you can search. If some/most of the symbols in text you search do not occur in the search string and you are simply interested in if  (or how many times) the search string occurs in the target string all those "missing" symbols can be translated to a single "non-occurring" symbol and string them can be compressed with  tr by removing all consecutive "non-occurring symbols".  For example

$text='We search for word abba in this string';
$str='abba';
$text=~tr/abba/?/cs;
print "text='$text'\n";
As you see from the result of execution of this fragment in this case we would compress the search string to
text='?a?abba?'

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Mar 05, 2020] Converting between uppercase and lowercase on the Linux command line

Mar 05, 2020 | www.networkworld.com

https://www.networkworld.com/article/3529409/converting-between-uppercase-and-lowercase-on-the-linux-command-line.html

There are many ways to change text on the Linux command line from lowercase to uppercase and vice versa. In fact, you have an impressive set of commands to choose from. This post examines some of the best commands for the job and how you can get them to do just what you want.

Using tr

The tr (translate) command is one of the easiest to use on the command line or within a script. If you have a string that you want to be sure is in uppercase, you just pass it through a tr command like this:

$ echo Hello There | tr [:lower:] [:upper:]
HELLO THERE
[Get regularly scheduled insights by signing up for Network World newsletters.]

Below is an example of using this kind of command in a script when you want to be sure that all of the text that is added to a file is in uppercase for consistency:

#!/bin/bash

echo -n "Enter department name: "
read dept
echo $dept | tr [:lower:] [:upper:] >> depts

Switching the order to [:upper:] [:lower:] would have the opposite effect, putting all the department names in lowercase:

... ... ...

[Oct 30, 2018] 10 tr Command Examples in Linux

Oct 30, 2018 | www.tecmint.com

8. Here is an example of breaking a single line of words (sentence) into multiple lines, where each word appears in a separate line.

$ echo "My UID is $UID"

My UID is 1000

$ echo "My UID is $UID" | tr " "  "\n"

My 
UID 
is 
1000

9. Related to the previous example, you can also translate multiple lines of words into a single sentence as shown.

$ cat uid.txt

My 
UID 
is 
1000

$ tr "\n" " " < uid.txt

My UID is 1000

10. It is also possible to translate just a single character, for instance a space into a " : " character, as follows.

$ echo "Tecmint.com =>Linux-HowTos,Guides,Tutorials" | tr " " ":"

Tecmint.com:=>Linux-HowTos,Guides,Tutorials

There are several sequence characters you can use with tr , for more information, see the tr man page.

... ... ...

[Nov 16, 2017] Re^4 Strange behaviour of tr function in case the set1 is supplied by a variable

Nov 16, 2017 | perlmonks.com

likbez

// is an abbreviation for m// (be careful of context). But // is can be replaced by (almost?) any delimiter, by using m or s or tr.

You make a very good point. Now I started to understand why they put description of tr, which is actually a function into this strange place

http://perldoc.perl.org/perlop.html#Quote-Like-Operators
Strings with arbitrary delimiters after tr, m, s, etc are a special, additional type of literals. Each with its own rules. And those rules are different from rules that exist for single quoted strings, or double quoted strings or regex (three most popular types of literals in Perl).

For example, the treatment of backslash in "tr literal" is different from single quoted strings:

"A single-quoted, literal string. A backslash represents a backslash unless followed by the delimiter or another backslash, in which case the delimiter or backslash is interpolated."

This means that in Perl there is a dozen or so of different types of literals, each with its own idiosyncratic rules. Which create confusion even for long type Perl users as they tend to forget detail of constructs they use rarely and extrapolate them from more often used constructs.

For example, in my case, I was burned by the fact that "m literals" allows interpolation of variables, but "tr literals" do not. And even created a test case to study this behavior :-)

In other words, the nature of those "context-dependent-literals" (on the level of lexical scanner they are all literals) is completely defined not by delimiters they are using (which are arbitrary), but by the operator used before it. If there none, m is assumed.

This "design decision" (in retrospect this is a design decision, although in reality it was "absence of design decition" situation ;-) adds unnecessary complexity to the language and several new (and completely unnecessary) types of bugs.

This "design decision" is also poorly documented and for typical "possible blunders" (for tr that would be usage of "[","$","@" without preceding backslash) there is no warnings.

This trick of putting tr description into http://perldoc.perl.org/perlop.html that I mentioned before now can be viewed as an attempt to hide this additional complexity. It might be beneficial to revise the docs along the lines I proposed.

In reality in Perl q, qq, qr, m, s, tr are functions each of which accepts (and interpret) a specific, unique type of "context-dependent-literal" as the argument. That's the reality of this, pretty unique, situation with the language, as I see it.

Quote-Like-Operators shows 2 interesting examples with tr: tr[aeiouy][yuoiea] or tr(+\-*/)/ABCD/. [download]
The second variant look like a perversion for me. I never thought that this is possible. I thought that the "arbitrary delimiter" is "catched" after the operator and after that they should be uniform within the operator ;-).

And the first is not without problems either: if you "extrapolate" your skills with regex into tr you can write instead of tr[aeiouy][yuoiea] obviously incorrect< code>tr/ aeiouy /] yuoiea / that will work fine as long as strings are of equal length.

[Nov 16, 2017] perl perlpacktut not making sense for me - Stack Overflow

Nov 13, 2017 | stackoverflow.com

brian d foy ,Nov 13 at 2:34

The pack function puts one or more things together in a single string. It represents things as octets (bytes) in a way that it can unpack reliably in some other program. That program might be far away (like, the distance to Mars far away). It doesn't matter if it starts as something human readable or not. That's not the point.

Consider some task where you have a numeric ID that's up to about 65,000 and a string that might be up to six characters.

print pack 'S A6', 137, $ARGV[0];

It's easier to see what this is doing if you run it through a hex dumper as you run it:

$ perl pack.pl Snoopy | hexdump -C
00000000  89 00 53 6e 6f 6f 70 79                           |..Snoopy|

The first column counts the position in the output so ignore that. Then the first two octets represent the S (short, 'word', whatever, but two octets) format. I gave it the number 137 and it stored that as 0x8900. Then it stored 'Snoopy' in the next six octets.

Now try it with a shorter name:

$ perl test.pl Linus | hexdump -C
00000000  89 00 4c 69 6e 75 73 20                           |..Linus |

Now there's a space character at the end (0x20). The packed data still has six octets. Try it with a longer name:

$ perl test.pl 'Peppermint Patty' | hexdump -C
00000000  89 00 50 65 70 70 65 72                           |..Pepper|

Now it truncates the string to fit the six available spaces.

Consider the case where you immediately send this through a socket or some other way of communicating with something else. The thing on the other side knows it's going to get eight octets. It also knows that the first two will be the short and the next six will be the name. Suppose the other side stored that it $tidy_little_package . It gets the separate values by unpacking them:

my( $id, $name ) = unpack 'S A6', $tidy_little_package;

That's the idea. You can represent many values of different types in a binary format that's completely reversible. You send that packed string wherever it needs to be used.

I have many more examples of pack in Learning Perl and Programming Perl .

[Nov 15, 2017] Strange behaviour of tr function in case the set1 is supplied by a variable

Notable quotes:
"... Characters may be literals or any of the escape sequences accepted in double-quoted strings. But there is no interpolation, so "$" and "@" are treated as literals. ..."
Nov 15, 2017 | perlmonks.com
Nov 16, 2017 at 02:50 UTC ( # 1203542 = perlquestion : print w/replies , xml ) Need Help??

likbez has asked for the wisdom of the Perl Monks concerning the following question:

Looks like in tr function a scalar variable is accepted as the first argument, but is not compiled properly into set of characters

use strict;
use warnings;

my $str1 = 'abcde';
my $str2 = 'eda';
my $diff1 = 0;

eval "\$diff1=\$str1=~tr/$str2//";

print "diff1: $diff1\n";

$ perl foo.pl
diff1: 3

[download]

This produces in perl 5, version 26:

Test 1: strait set diff1=0, diff2=3
Test 2: complement set diff1=5, diff2=2

[download]

Obviously only the second result in both tests is correct. Looks like only explicitly given first set is correctly compiled. Is this a feature or a bug ?

Athanasius (Chancellor) on Nov 16, 2017 at 03:08 UTC

Re: Strange behaviour of tr function in case the set1 is supplied by a variable

Hello likbez ,

The transliteration operator tr/SEARCHLIST/REPLACEMENTLIST/ does not interpolate its SEARCHLIST , so in your first example the search list is simply the literal characters , , , , . See Quote and Quote like Operators .

Hope that helps,

Athanasius  < contra mundum Iustus alius egestas vitae, eros Piratica,

roboticus (Chancellor) on Nov 16, 2017 at 03:08 UTC

Re: Strange behaviour of tr function in case the set1 is supplied by a variable

likbez :

Feature, per the tr docs

Characters may be literals or any of the escape sequences accepted in double-quoted strings. But there is no interpolation, so "$" and "@" are treated as literals.

A hyphen at the beginning or end, or preceded by a backslash is considered a literal. Escape sequence details are in the table near the beginning of this section.

So if you want to use a string to specify the values in a tr statement, you'll probably have to do it via a string eval:

$ cat foo.pl use strict; use warnings;
my $str1 = 'abcde';
my $str2 = 'eda';
my $diff1 = 0;
eval "\$diff1=\$str1=~tr/$str2//";
print "diff1: $diff1\n";
perl foo.pl diff1: 3

[download]

... roboticus

When your only tool is a hammer, all problems look like your thumb.

Anonymous Monk on Nov 16, 2017 at 03:09 UTC

Re: Strange behaviour of tr function in case the set1 is supplied by a variable

Looks like in tr function a scalar variable is accepted as the fist argument, but is not compiled properly into set of characters

:)

you're guessing how tr /// works, you're guessing it works like s/// or m///, but you can't guess , it doesn't work like that, it doesn't interpolate variables, read perldoc -f tr for the details

likbez !!! on Nov 16, 2017 at 04:41 UTC

Re^2: Strange behaviour of tr function in case the set1 is supplied by a variable
you're guessing how tr/// works, you're guessing it works like s/// or m///, but you can't guess , it doesn't work like that, it doesn't interpolate variables, read perldoc -f tr for the details
Houston, we have a problem ;-)

First of all that limits tr area of applicability.

The second, it's not that I am guessing, I just (wrongly) extrapolated regex behavior on tr, as people more often use regex then tr. Funny, but searching my old code and comments in it is clear that I remembered (probably discovered the hard way, not by reading the documentation ;-) this nuance several years ago. Not now. Completely forgotten. Erased from memory. And that tells you something about Perl complexity (actually tr is not that frequently used by most programmers, especially for counting characters).

And that's a real situation, that we face with Perl in other areas too (and not only with Perl): Perl exceeds typical human memory capacity to hold the information about the language. That's why we need "crutches" like strict.

You simply can't remember all the nuances of more then a dozen of string-related built-in functions, can you? You probably can (and should) for index/rindex and substr , but that's about it.

So here are two problems here:

1. Are / / strings uniformly interpreted in the language, or there is a "gotcha" because they are differently interpreted by tr (essentially as a single quoted strings) and regex (as double quoted strings) ?

2. If so, what is the quality of warnings about this gotcha? There is no warning issued, if you use strict and warnings. BTW, it looks like $ can be escaped:

main::(-e:1): 0
DB<5> $_='\$bba\$'
DB<6> tr/\$/?/
DB<7> print $_
\?bba\?

[download]

Right now there is zero warnings issued with use strict and use warnings enabled. Looks like this idea of using =~ for tr was not so good, after all. Regular syntax like tr(set1, set2) would be much better. But it's to late to change and now we need warnings to be implemented.

likbez !!! on Nov 16, 2017 at 03:10 UTC

Re: Strange behaviour of tr function in case the set1 is supplied by a variable

With eval statement works correctly. So it looks like $ is treated by tr as a regular symbol and no warnings are issued.

$statement='$diff1=$str1'."=~tr/$str2//;";
eval($statement);
print "With eval: diff1=$diff1\n";
[download]

that will produce:

With eval: diff1=3

ww (Archbishop) on Nov 16, 2017 at 03:16 UTC

Re: Strange behaviour of tr function in case the set1 is supplied by a variable

Same results in AS 5.24 under Win7x64.

Suspected problem might have arisen from lack of strict, warnings. Wrong, same results BUT using both remains a generally good idea.

Also wondered if compiling (with qr/.../ ) might change the outcome. Wrong again, albeit with variant (erroneous) output.

Correct me if I'm wrong, guessing that "strait" is a typo or personal shortening of "straight."

Update: Now that I've seen earlier replies... ouch, pounding forehead into brick wall!

[Oct 22, 2017] Unix text editing - sed, tr, cut, od

Oct 22, 2017 | seismo.berkeley.edu

A tr script to remove all non-printing characters from a file is below. Non-printing characters may be invisible, but cause problems with printing or sending the file via electronic mail. You run it from Unix command prompt, everything on one line:

> tr -d '\001'-'\011''\013''\014''\016'-'\037''\200'-'\377' 
   < filein > fileout
What is the meaning of this tr script is, that it deletes all charactes with octal value from 001 to 011, characters 013, 014, characters from 016 to 037 and characters from 200 to 377. Other characters are copied over from filein to fileout and these are printable. Please remember, you can not fold a line containing tr command, everything must be on one line, how long it would be. In practice, this script solves some mysterious Unix printing problems.

Type in a text file named "f127.TR" with the line starting tr above. Print the file on screen with cat f127.TR command, replace "filein" and "fileout" with your file names, not same the file, then copy and paste the line and run (execute) it. Please, remember this does not solve Unix end-of-file problem, that is the character '\000', also known as a 'null', in the file. Nor does it handle binary file problem, that is a file starting with two zeroes '\060' and '\060'

Sometimes there are some invisible characters causing havoc. This tr command line converts tabulate- characters into hashes (#) and formfeed- characters into stars (*).

> tr '\011\014' '#*'  < filein > fileout
The numeric value of tabulate is 9, hex 09, octal 011 and in C-notation it is \t or \011. Formfeed is 12, hex 0C, octal 014 and in C-notation it is \f or \014. Please note, tr replaces character from the first (leftmost) group with corresponding character in the second group. Characters in octal format, like \014 are counted as one character each.

Perl Tutorial - How to use the tr function for counting characters

YouTube

Perl tutorial Substitution and translation

The tr function allows character-by-character translation. The following expression replaces each a with e, each b with d, and each c with f in the variable $sentence. The expression returns the number of substitutions made.
$sentence =~ tr/abc/edf/

Most of the special RE codes do not apply in the tr function. For example, the statement here counts the number of asterisks in the $sentence variable and stores that in the $count variable.

$count = ($sentence =~ tr/*/*/);
However, the dash is still used to mean "between". This statement converts $_ to upper case.
tr/a-z/A-Z/;

Day 13 -- Process, String, and Mathematical Functions

This is an absurd idea as Perl has a built-in function for determining a string length (length ) but still if each character is replaced you might get a length of the string, if it does not contain other characters...

Retrieving String Length Using tr

The tr function provides another way of determining the length of a character string, in conjunction with the built-in system variable $_.

The syntax for the tr function is

tr/sourcelist/replacelist/

sourcelist is the list of characters to replace, and replacelist is the list of characters to replace with. (For details, see the following listing and the explanation provided with it.)

Listing 13.10. A program that uses tr to retrieve the length of a string.
1:  #!/usr/local/bin/perl

2:  

3:  $string = "here is a string";

4:  $_ = $string;

5:  $length = tr/a-zA-Z /a-zA-Z /;

6:  print ("the string is $length characters long\n");

oldsite.to.infn.it

Unlike in C, the assignment operator produces a valid lvalue. Modifying an assignment is equivalent to doing the assignment and then modifying the variable that was assigned to. This is useful for modifying a copy of something, like this:

    ($tmp = $global) =~ tr [A-Z] [a-z];
... ... ...
tr/SEARCHLIST/REPLACEMENTLIST/cds
y/SEARCHLIST/REPLACEMENTLIST/cds
Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated. (The string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those, i.e., an lvalue.) A character range may be specified with a hyphen, so tr/A-J/0-9/ does the same replacement as tr/ACEGIBDFHJ/0246813579/. For sed devotees, y is provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST has its own pair of quotes, which may or may not be bracketing quotes, e.g., tr[A-Z][a-z] or tr(+\-*/)/ABCD/.

Options:

    c   Complement the SEARCHLIST.
    d   Delete found but unreplaced characters.
    s   Squash duplicate replaced characters.

If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d modifier is specified, any characters specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some tr programs, which delete anything they find in the SEARCHLIST, period.) If the /s modifier is specified, sequences of characters that were transliterated to the same character are squashed down to a single instance of the character.

If the /d modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is replicated till it is long enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.

Examples:

$ARGV[1] =~ tr/A-Z/a-z/;    # canonicalize to lower case
$cnt = tr/*/*/;             # count the stars in $_
$cnt = $sky =~ tr/*/*/;     # count the stars in $sky
$cnt = tr/0-9//;            # count the digits in $_
tr/a-zA-Z//s;               # bookkeeper -> bokeper
($HOST = $host) =~ tr/a-z/A-Z/;
tr/a-zA-Z/ /cs;             # change non-alphas to single space
tr [\200-\377] [\000-\177];             # delete 8th bit

If multiple transliterations are given for a character, only the first one is used:

tr/AAA/XYZ/

will transliterate any A to X.

Note that because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():

    eval "tr/$oldlist/$newlist/";
    die $@ if $@;
    eval "tr/$oldlist/$newlist/, 1" or die $@;

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Top articles

Sites

tr - perldoc.perl.org

Reference

tr/SEARCHLIST/REPLACEMENTLIST/cdsr 
y/SEARCHLIST/REPLACEMENTLIST/cdsr 

Transliterates all occurrences of the characters found in the search list with the corresponding character in the replacement list. It returns the number of characters replaced or deleted. If no string is specified via the =~ or !~ operator, the $_ string is transliterated.

If the /r (non-destructive) option is present, a new copy of the string is made and its characters transliterated, and this copy is returned no matter whether it was modified or not: the original string is always left unchanged. The new copy is always a plain string, even if the input string is an object or a tied variable. Unless the /r option is used, the string specified with =~ must be a scalar variable, an array element, a hash element, or an assignment to one of those; in other words, an lvalue.

A character range may be specified with a hyphen, so tr/A-J/0-9/ does the same replacement as tr/ACEGIBDFHJ/0246813579/. For sed devotees, y is provided as a synonym for tr. If the SEARCHLIST is delimited by bracketing quotes, the REPLACEMENTLIST must have its own pair of quotes, which may or may not be bracketing quotes; for example, tr[aeiouy][yuoiea] or tr(+\-*/)/ABCD/.

Characters may be literals or any of the escape sequences accepted in double-quoted strings. But there is no variable interpolation, so "$" and "@" are treated as literals. A hyphen at the beginning or end, or preceded by a backslash is considered a literal. Escape sequence details are in the table near the beginning of this section.

Note that tr does not do regular expression character classes such as \d or \pL . The tr operator is not equivalent to the tr(1) utility. tr[a-z][A-Z] will uppercase the 26 letters "a" through "z", but for case changing not confined to ASCII, use lc, uc, lcfirst, ucfirst (all documented in perlfunc), or the substitution operator s/PATTERN/REPLACEMENT/ (with \U , \u , \L , and \l string-interpolation escapes in the REPLACEMENT portion). Most ranges are unportable between character sets, but certain ones signal Perl to do special handling to make them portable. There are two classes of portable ranges. The first are any subsets of the ranges A-Z , a-z , and 0-9 , when expressed as literal characters.

tr/h-k/H-K/

capitalizes the letters "h" , "i" , "j" , and "k" and nothing else, no matter what the platform's character set is. In contrast, all of

tr/\x68-\x6B/\x48-\x4B/
tr/h-\x6B/H-\x4B/
tr/\x68-k/\x48-K/   

do the same capitalizations as the previous example when run on ASCII platforms, but something completely different on EBCDIC ones.

The second class of portable ranges is invoked when one or both of the range's end points are expressed as \N{...}

$string =~ tr/\N{U+20}-\N{U+7E}//d;

removes from $string all the platform's characters which are equivalent to any of Unicode U+0020, U+0021, ... U+007D, U+007E. This is a portable range, and has the same effect on every platform it is run on. It turns out that in this example, these are the ASCII printable characters. So after this is run, $string has only controls and characters which have no ASCII equivalents.

But, even for portable ranges, it is not generally obvious what is included without having to look things up. A sound principle is to use only ranges that begin from and end at either ASCII alphabetics of equal case (b-e , B-E ), or digits (1-4 ). Anything else is unclear (and unportable unless \N{...} is used). If in doubt, spell out the character sets in full.

Options:

  1. c Complement the SEARCHLIST.
  2. d Delete found but unreplaced characters.
  3. s Squash duplicate replaced characters.
  4. r Return the modified string and leave the original string
  5. untouched.

If the /c modifier is specified, the SEARCHLIST character set is complemented. If the /d modifier is specified, any characters specified by SEARCHLIST not found in REPLACEMENTLIST are deleted. (Note that this is slightly more flexible than the behavior of some tr programs, which delete anything they find in the SEARCHLIST, period.) If the /s modifier is specified, sequences of characters that were transliterated to the same character are squashed down to a single instance of the character.

If the /d modifier is used, the REPLACEMENTLIST is always interpreted exactly as specified. Otherwise, if the REPLACEMENTLIST is shorter than the SEARCHLIST, the final character is replicated till it is long enough. If the REPLACEMENTLIST is empty, the SEARCHLIST is replicated. This latter is useful for counting characters in a class or for squashing character sequences in a class.

Examples:

  1. $ARGV[1] =~ tr/A-Z/a-z/; # canonicalize to lower case ASCII
  2. $cnt = tr/*/*/; # count the stars in $_
  3. $cnt = $sky =~ tr/*/*/; # count the stars in $sky
  4. $cnt = tr/0-9//; # count the digits in $_
  5. tr/a-zA-Z//s; # bookkeeper -> bokeper
  6. ($HOST = $host) =~ tr/a-z/A-Z/;
  7. $HOST = $host =~ tr/a-z/A-Z/r; # same thing
  8. $HOST = $host =~ tr/a-z/A-Z/r # chained with s///r
  9. =~ s/:/ -p/r;
  10. tr/a-zA-Z/ /cs; # change non-alphas to single space
  11. @stripped = map tr/a-zA-Z/ /csr, @original;
  12. # /r with map
  13. tr [\200-\377]
  14. [\000-\177]; # wickedly delete 8th bit

If multiple transliterations are given for a character, only the first one is used:

  1. tr/AAA/XYZ/

will transliterate any A to X.

Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():

  1. eval "tr/$oldlist/$newlist/";
  2. die $@ if $@;
  3. eval "tr/$oldlist/$newlist/, 1" or die $@;



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: November, 26, 2019