Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Unix tr command

News See also Recommended Links Reference TR Set Notation Reference
SED AWK Caesar cipher Cryptography Humor Etc

Unix tr command copies the standard input to the standard output with substitution or deletion of selected characters. In addition it can squeeze repeating characters into a singe character (with option -s). Utility performs classic alphabet1 to alphabet2 type of translation sometimes called 1:1 transliteration and as such is suitable for implementation of the Caesar cipher. Unix inherited tr from Multics as a derivative of PL/1 translate built-in function, which in turn was a generalization of a TR command in System/360 architecture (see IBM System-360 Green Card).

The format of the tr command is somewhat strange -- this is one of the few Unix commands that accepts input only from standard input.

tr [ options ] [ set1  [ set2 ] ]

Input characters in set1 are mapped to corresponding characters in set2. If length is unequal then set2 is extended to the length of set1 by repeating its last character as necessary. Excess characters in set2 are ignored. The set can be specified directly or as a complement. Sets can be specified by enumeration of characters like in

tr '{}' '()' < infile > outfile

 or using ranges like in

 tr 'A-Z' 'a-z' < infile > outfile

Special POSIX character classes can be used:

Typical usage of classes involves changing the case from upper to lower or vise versa like in the following example:

cat names | tr '[:upper:]' '[:lower:]' > lc_names

Classes can be combined to form a more complex set, for example '[:lower:][:upper:]'

The tr utility accepts three additional options which substantially increase its power:

Most Unix administrators do not suspect about existence of those options, which are quite useful and greatly extend the usability of this generally very simple command. Here is more full description of those options:

  1. -c --complement Complement set1 with respect to the universe of characters whose ASCII codes are 01 through 0377 octal. For example:

    To replace every nonprinting character, other than valid control characters, with a ? (question mark), enter:

    tr -c '[:print:][:cntrl:]' '?' < textfile > newfile

    Here is more complex and rather elegant example in which the goal is to create a list of words in a file (option -s means "squeeze repeating symbols", see below):

    tr -cs '[:lower:][:upper:]' '[\n*]' < text > words

    This translates each sequence of characters other than lowercase letters and uppercase letters into a single newline character. The * (asterisk) causes the tr command to repeat the new line character enough times to make the second string as long as the first string.

    Extract digits form a string:

    echo "Abc123d56E" | tr -cd '[[:digit:]]'
    Output:
    12356

  2. -d, --delete Delete specified set of characters defined in set1 but do not translate. The most important usage of this tr option is for security purposes: it can sanitize all arguments so the evil user cannot submit commands as arguments in a script. Such symbols as backticks, all kind of brackets ( ()[]{} ), colon and semicolon as well as =#$&!@ should be removed from the values of the argument, if they cannot occur in the argument before you start processing those values. If script is used by a considerable population there is always at least one blacksheep that will try to mangle input arguments to see what will happens ;-)

    For example:

  3. -s, --squeeze-repeats Replace sequences of the same character with one. -s uses set1 if neither translating nor deleting specified, otherwise squeeze uses set2 and occurs after translation or deletion. For example:
  4. -t, --truncate-set1. Truncate set1 to the length of set2. By default set2 is truncated to the length of set1. This option reverse the default behavior. It is available only in GNU implementation of tr.

TR Set Notation

Sets are specified as strings of characters. Most represent themselves. Interpreted sequences are:

Notes:

  1. Translation occurs if -d is not given and both set1 and set2 appear
  2. -t may be used only when translating.
  3. set2 is extended to the length of set1 by repeating its last character as necessary. Excess characters in set2 are ignored.
  4. Only [:lower:] and [:upper:] are guaranteed to expand in ascending order. They can be used in pairs to specify case conversion.
  5. -s (Squeeze all strings of repeated output characters to single characters) uses set1 if neither translating nor deleting specified, otherwise squeeze uses set2 and occurs after translation or deletion.

Examples

(by-and-large adapted from AIX man page; See tr Command )

  1. To translate braces into parentheses, enter:
    tr '{}' '()' < textfile > newfile

    This translates each { (left brace) to ( (left parenthesis) and each } (right brace) to ) (right parenthesis). All other characters remain unchanged.

  2. To translate lowercase characters to uppercase, enter:
    tr 'a-z' 'A-Z' < textfile > newfile
  3. To create a list of words in a file. Please note that in "replacement set the last character will be propagated to match the length of the first set:
    tr -cs '[:lower:][:upper:]' '\n' < textfile > newfile

    This translates each sequence of characters other than lowercase letters and uppercase letters into a single newline character.

  4. To delete all NULL characters from a file, enter:
    tr -d '\0' < textfile > newfile
  5. To replace every sequence of one or more new lines with a single new line, enter:
    tr -s '\n' < textfile > newfile

    OR

    tr -s '\012' < textfile > newfile
  6. To replace every nonprinting character, other than valid control characters, with a ? (question mark). Please note that in "replacement set the last character will be propagated to match the length of the first set:
    tr -c '[:print:][:cntrl:]' '?' < textfile > newfile

    This scans a file created in a different locale to find characters that are not printable characters in the current locale.

  7. To replace every sequence of characters in the <space> character class with a single # (pound sign) character, enter:
    tr -s '[:space:]' '#'

Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

[Jul 29, 2011] tr Command

Examples

  1. To translate braces into parentheses, type:
    tr '{}' '()' < textfile > newfile
  2. To translate braces into brackets type:
    tr '{}' '\[]' < textfile > newfile

    This translates each { (left brace) to [ (left bracket) and each } (right brace) to ] (right bracket). The left bracket must be entered with a \ (backslash) escape character.

  3. To translate lowercase characters to uppercase, type:
    tr 'a-z' 'A-Z' < textfile > newfile
  4. To create a list of words in a file, type:
    tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfile

    This translates each sequence of characters other than lowercase letters and uppercase letters into a single newline character. The * (asterisk) causes the tr command to repeat the new line character enough times to make the second string as long as the first string.

  5. To delete all NULL characters from a file, type:
    tr -d '\0' < textfile > newfile
  6. To replace every sequence of one or more new lines with a single new line, type:
    tr -s '\n' < textfile > newfile

    OR

    tr -s '\012' < textfile > newfile
  7. To replace every nonprinting character, other than valid control characters, with a ? (question mark), type:
    tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile

    This scans a file created in a different locale to find characters that are not printable characters in the current locale.

  8. To replace every sequence of characters in the <space> character class with a single # character, type:
    tr -s '[:space:]' '[#*]'

[Mar 16, 2009] Concatenate only digits from string - bash tr

Mar 3, 2009 | UNIX BASH scripting

Just to introduce a good use of Linux tr command; if you need to concatenate the digits from a string, here is a way:

$ echo "Abc123d56E" | tr -cd '[[:digit:]]'
Output:
12356


From tr man pages:

tr [OPTION]... SET1 [SET2]

-c, -C, --complement: first complement SET1
-d, --delete : delete characters in SET1, do not translate

Similarly:

$ echo "Abc123d56E" | tr -d '[[:digit:]]'
Output:
AbcdE

[Sep 12, 2008] Understanding Linux - UNIX tr command

A clever example of how to use tr to convert text into one word per line. Too simplistic; should be at least [:alnum:]

Create a list of the words in /path/to/file, one per line, enter:

$ tr -cs "[:alpha:]" "\n" < /path/to/file

Where,

[Feb 22, 2008] Text processing with UNIX by Chris Herborth

Aug 01, 2006 | developerWorks

Translating text

Now that you know at least five different ways of generating some text, let's look at doing some simple translations on it.

The tr command lets you translate characters in one set to the corresponding characters in a second set. Let's take a look at a few examples (Listing 4) to see how it works.


Listing 4. Using tr to translate characters
echo "a test" | tr t p
echo "a test" | tr aest 1234
echo "a test" | tr -d t
echo "a test" | tr '[:lower:]' '[:upper:]'
Looking at the output of these commands (see Listing 5) gives you a clue about how tr works (here's a hint: it's a direct replacement of characters in the first set with the corresponding characters from the second set).


Listing 5. What has tr done?

chrish@dhcp3 [199]$ echo "a test" | tr t p
a pesp

chrish@dhcp3 [200]$ echo "a test" | tr aest 1234
1 4234

chrish@dhcp3 [201]$ echo "a test" | tr -d t
a es

chrish@dhcp3 [202]$ echo "a test" | tr '[:lower:]' '[:upper:]'
A TEST
The first and second examples are simple enough, replacing one character for another. The third example, with the -d option (delete), removes the specified characters completely from the output. This is often used to remove carriage returns from DOS text files to turn them into UNIX text files (see Listing 6). Finally, the last example uses character classes (those names inside of [: :]) to convert all lower-case letters into upper-case letters. Portable Operating System Interface-standard (POSIX-standard) character classes include: Listing 6. Converting DOS text files into UNIX text files
tr -d '\r' < input_dos_file.txt > output_unix_file.txt
Although the tr command respects C locale environment variables (try man locale for more information about these), don't expect it to do anything sensible with UTF-8 documents, such as being able to replace lower-case accented characters with appropriate upper-case characters. The tr command works best with ASCII and the other standard C locales.

[Feb 22, 2008] The GAWK Manual - Sample Program

The following example is a complete awk program, which prints the number of occurrences of each word in its input. It illustrates the associative nature of awk arrays by using strings as subscripts. It also demonstrates the `for x in array' construction. Finally, it shows how awk can be used in conjunction with other utility programs to do a useful task of some complexity with a minimum of effort. Some explanations follow the program listing.

awk '
# Print list of word frequencies
{
    for (i = 1; i <= NF; i++)
        freq[$i]++
}

END {
    for (word in freq)
        printf "%s\t%d\n", word, freq[word]
}'

The first thing to notice about this program is that it has two rules. The first rule, because it has an empty pattern, is executed on every line of the input. It uses awk's field-accessing mechanism (see section Examining Fields) to pick out the individual words from the line, and the built-in variable NF (see section Built-in Variables) to know how many fields are available.

For each input word, an element of the array freq is incremented to reflect that the word has been seen an additional time.

The second rule, because it has the pattern END, is not executed until the input has been exhausted. It prints out the contents of the freq table that has been built up inside the first action.

Note that this program has several problems that would prevent it from being useful by itself on real text files:

The way to solve these problems is to use other system utilities to process the input and output of the awk script. Suppose the script shown above is saved in the file `frequency.awk'. Then the shell command:

tr A-Z a-z < file1 | tr -cd 'a-z\012' \
  | awk -f frequency.awk \
  | sort +1 -nr

produces a table of the words appearing in `file1' in order of decreasing frequency.

The first tr command in this pipeline translates all the upper case characters in `file1' to lower case. The second tr command deletes all the characters in the input except lower case characters and newlines. The second argument to the second tr is quoted to protect the backslash in it from being interpreted by the shell. The awk program reads this suitably massaged data and produces a word frequency table, which is not ordered.

The awk script's output is now sorted by the sort command and printed on the terminal. The options given to sort in this example specify to sort by the second field of each input line (skipping one field), that the sort keys should be treated as numeric quantities (otherwise `15' would come before `5'), and that the sorting should be done in descending (reverse) order.

See the general operating system documentation for more information on how to use the tr and sort commands.

[Feb 21, 2008] Understanding Linux - UNIX tr command

Shell scripting example

In the following example you will get confirmation before deleting the file. If the user responds in lower case, the tr command will do nothing, but if the user responds in upper case, the character will be changed to lower case. This will ensure that even if user responds with YES, YeS, YEs etc; script should remove file:

#!/bin/bash echo -n "Enter file name : " read myfile echo -n "Are you sure ( yes or no ) ? " read confirmation confirmation="$(echo ${confirmation} | tr A-Z a-z)" if [ "$confirmation" == "yes" ]; then [ -f $myfile ] && /bin/rm $myfile || echo "Error - file $myfile not found" else : # do nothing fi

Remove all non-printable characters from myfile.txt

$ tr -cd "[:print:]" < myfile.txt

Remove all two more successive blank spaces from a copy of the text in a file called input.txt and save output to a new file called output.txt

tr -s ' ' ' ' < input.txt > output.txt

The -d option is used to delete every instance of the string (i.e., sequence of characters) specified in set1. For example, the following would remove every instance of the word nameserver from a copy of the text in a file called /etc/resolv.conf and write the output to a file called ns.ipaddress.txt:

tr -d 'nameserver' < /etc/resolv.conf > ns.ipaddress.txt

[Feb 21, 2008] Commands Reference, Volume 5 - tr Command

From AIX man pages

Examples

  1. To translate braces into parentheses, enter:
    tr '{}' '()' < textfile > newfile

    This translates each { (left brace) to ( (left parenthesis) and each } (right brace) to ) (right parenthesis). All other characters remain unchanged.

  2. To translate braces into brackets, enter:
    tr '{}' '\[]' < textfile > newfile

    This translates each { (left brace) to [ (left bracket) and each } (right brace) to ] (right bracket). The left bracket must be entered with a \ (backslash) escape character.

  3. To translate lowercase characters to uppercase, enter:
    tr 'a-z' 'A-Z' < textfile > newfile
  4. To create a list of words in a file, enter:
    tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfile
    

    This translates each sequence of characters other than lowercase letters and uppercase letters into a single newline character. The * (asterisk) causes the tr command to repeat the new line character enough times to make the second string as long as the first string.

  5. To delete all NULL characters from a file, enter:
    tr -d '\0' < textfile > newfile
  6. To replace every sequence of one or more new lines with a single new line, enter:
    tr -s '\n' < textfile > newfile

    OR

    tr -s '\012' < textfile > newfile
  7. To replace every nonprinting character, other than valid control characters, with a ? (question mark), enter:
    tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile

    This scans a file created in a different locale to find characters that are not printable characters in the current locale.

  8. To replace every sequence of characters in the <space> character class with a single # (pound sign) character, enter:
    tr -s '[:space:]' '[#*]'

[Feb 21, 2008] The tr command

Cat-ting our file (columns.txt) and then piping the output of the cat command to the input of the translate command causing all lowercase names to be translated to uppercase names.
cat columns.txt | tr '[a-z]' '[A-Z]'

Remember we have not modified the file columns.txt so how do we save the output? Simple, by redirecting the output of the translate command with '>' to a file called UpCaseColumns.txt with:

cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txt            

Since the tr command, does not take a filename like sed did, we could have changed the above example to:

tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txt            

As you can see the input to the translate command now comes, not from stdin, but rather from columns.txt. So either way we do it, we can achieve what we've set out to do, using tr as part of a stream, or taking the input from the stdin ('<').

[PDF] TR 95-10 A Taxonomy of Unix System and Network Vulnerabilities [Bishop95]

[Jul 14, 2007] ONLamp.com -- Sanitizing Mail on Panther Server

[Jul 14, 2007] Learn how to remove extended ASCII characters from Unix files

In the shell program we use to remove all non-printable ASCII characters from a text file, we tell the tr command to delete every character in the translation process except for the specific characters we specify. In essence, we filter out the undesirable characters. The tr command we use in our program is shown below:
tr -cd '\11\12\40-\176' < $INPUT_FILE > $OUTPUT_FILE

In this command, the variable INPUT_FILE must contain the name of the Solaris file you'll be reading from, and OUTPUT_FILE must contain the name of the output file you'll be writing to. When the -c and -d options of the tr command are used in combination like this, the only characters tr writes to the standard output stream are the characters we've specified on the command line.

Although it may not look very attractive, we're using octal characters in our tr command to make our programming job easier and more efficient. Our command tells tr to retain only the octal characters 11, 12, and 40 through 176 when writing to standard output. Octal character 11 corresponds to the [TAB] character, and octal 12 corresponds to the [LINEFEED] character. The octal characters 40 through 176 correspond to the standard visible keyboard characters, beginning with the [Space] character (octal 40) through the ~ character (octal 176). These are the only characters retained by tr -- the rest are filtered out, leaving us with a clean ASCII file.

[Feb 10, 2007] Commonly Used Unix-like Commands

Example1: Change uppercase to lowercase in a file:

D:\temp>more score.txt
john 81 91
mark 82 93
tina 88 92
D:\temp>tr '[a-z]' '[A-Z]' < score.txt > score1.txt
D:\temp>more score1.txt
JOHN 81 91
MARK 82 93
TINA 88 92

[Nov 21, 2005] LinuxPlanet - Interviews - Sobell on the Bourne Again Shell and the Linux Command Line - The Utility Known as tr

LP: Would you talk a little more about the tr utility?

Ah, tr. Well, first thing that comes to mind is that it is the answer to the trivia question, "Name a Linux utility that accepts input only from standard input and never from a file named as an argument on the command line." It is an odd beast that is useful only sometimes--but when it is useful it is very useful. Here is an excerpt that talks about tr:

"The tr utility reads standard input and, for each input character, maps it to an alternate character, deletes the character, or leaves the character alone. This utility reads from standard input and writes to standard output.

"The tr utility is typically used with two arguments, string1 and string2. The position of each character in the two strings is important: Each time tr finds a character from string1 in its input, it replaces that character with the corresponding character from string2.

"With one argument, string1, and the --delete option, tr deletes the characters specified in string1. The option --squeeze-repeats replaces multiple sequential occurrences of characters in string1 with single occurrences (for example, abbc becomes abc).

"You can use a hyphen to represent a range of characters instring1 or string2. The two command lines in the following example produce the same result:

$ echo abcdef | tr  'abcdef' 'xyzabc'
xyzabc
$ echo abcdef | tr  'a-f' 'x-za-c'
xyzabc

"The next example demonstrates a popular method for disguising text, often called ROT13 (rotate 13) because it replaces the first letter of the alphabet with the thirteenth, the second with the fourteenth, and so forth.

$ echo The punchline of the joke is ... |
> tr 'A-M N-Z a-m n-z' 'N-Z A-M n-z a-m'
Gur chapuyvar bs gur wbxr vf ...

"To make the text intelligible again, reverse the order of the arguments to tr:

$ echo Gur chapuyvar bs gur wbxr vf ... |
> tr 'N-Z A-M n-z a-m' 'A-M N-Z a-m n-z'
The punchline of the joke is ...

"The --delete option causes tr to delete selected characters:

$ echo If you can read this, you can spot the missing vowels! |
> tr --delete 'aeiou'
If y cn rd ths, y cn spt th mssng vwls!

"In the following example, tr replaces characters and reduces pairs of identical characters to single characters:

$ echo tennessee | tr --squeeze-repeats 'tnse' 'srne'
serene

"The next example replaces each sequence of nonalphabetic characters (the complement of all the alphabetic characters as specified by the character class alpha) in the file draft1 with a single NEWLINE character. The output is a list of words, one per line.

$ tr --complement --squeeze-repeats '[:alpha:]' '\n' < draft1

"The final example uses character classes to upshift the string hi there:

$ echo hi there | tr '[:lower:]' '[:upper:]'
HI THERE

A Little Devil Called tr

Linux Journal

Luckily, we can also use ranges of characters to specify the characters more efficiently:

tr a-z A-Z Ever had those horrible upper case DOS file names? Here's a Bourne script to take care of them:
for f in *; do mv $f `echo $f | tr A-Z a-z` done 

Many UNIX editors allow some text to be processed by the shell. For example, to replace all upper case characters of the next paragraph with lower case while in vi, type:

tr A-Z a-z 

As another example, the command:

tr a-z A-Z 

capitalizes the current and next line (the character after the ! is a movement character). If you read the International Obfuscated C Code Contest (ftp://ftp.uu.net./pub/ioccc/), you frequently see that part of the hints are coded by a method called rot13. rot13 is a Caesar cypher, i.e., a cypher in which all letters are shifted some number of places. For example, a becomes b, b becomes c, ..., y becomes z, and z becomes a. In rot13 each letter is shifted 13 places. It is a weak cypher, and to decipher it, you can use rot13 again. You can also use tr to read the text in this way:

tr a-zA-Z n-za-mN-ZA-M 
Another interesting way to use tr is to change files from Macintosh format to UNIX format. For returns, the Macintosh uses \r while UNIX uses \n. GNU tr allows you to use the C special characters, so type:

tr \r \n

If you don't have GNU's version of tr, you can always use the corresponding octal numbers as shown here:

tr \015 \012 

You might wonder what would happen if the second string is shorter than the first string. POSIX says this is not allowed. System V says that only that portion of the first string is used that has a matching character in the second string. BSD and GNU pad the second string with its final character in order to match the length of the first string. The reason this last method is handy becomes clearer when we take complements into account. Assume you wish to make a list of all words and keywords in your listing. When you use -c, tr complements the first string. In C, all identifiers and keywords consist of a-zA-Z0-9_, so those are the characters we want to keep. Thus, we can do the following:

tr -c a-zA-Z0-9_ \n

If we pipe the tr output through sort -u, we get our desired list. If we follow POSIX, the second string would have to describe 193 newline characters (described as \n*193 or \n*). If we use system V, only the zero byte is translated to a newline, since the complement of a-zA-Z0-9_ starts with the zero byte.

The second important use of tr is to remove characters. For this option, you use the flag -d with one string as an argument. To fix up those nasty MS-DOS text files with a ^M at the end of the line and a trailing ^Z, specify tr in this way:

tr -d \015\032
Many people have written a program in C to do this same operation. Well, a C program isn't necessary--you only need to know the right program, tr, with the right flags. The -d flag isn't used often, but is nice to have when needed. You can combine it with the -c flag to delete everything except characters from the string you supplied as an argument.

Repeated characters can be squeezed into a single one using the -s option with one string as an argument. It can also be used to squeeze white space. To remove empty lines, type:

tr -s \n The -s option can be used with two strings as arguments. In that case, tr first translates the text as if -s were not given and then tries to squeeze the characters in the second string. For instance, we can squeeze all standard white space to a single space by specifying:
tr -s \n [ *] 
The -d flag can also be used with two strings: the characters in the first string will be removed and the characters in the second string will be squeezed. tr may not be a great program; however, it gets the job done. It is particularly useful in scripts using pipes and command substitutions (i.e., inside the back quotes). If you use tr often, you'll learn to appreciate its capabilities. Small is beautiful.

Dogs of the Linux Shell

Linux Journal

t r is a simple pattern translator. Its practical application overlaps a bit with other, more complex tools, such as sed and awk [with larger binary footprints]. tr is quite useful for simple textual replacements, deletions and additions. Its behavior is dictated by "from" and "to" character sets provided as the first and second argument. The general usage syntax of tr is as follows:

# (12)  tr usage
tr [options] "set1" ["set2"] < input > output

Note that tr does not accept file arguments; it reads from standard input and writes to standard output. When two character sets are provided, tr operates on the characters contained in "set1" and performs some amount of substitution based on "set2". Listing 1 demonstrates some of the more common tasks performed with tr.

# (13) Transform lower case alphas to their
#      equivelent upper case.
$ echo "Hello World." | tr "[a-z]" "[A-Z]"
HELLO WORLD.

# (14) Same lower to upper transformation -
#      uses character class names :lower:
#      and :upper:.  (tr recognizes 12
#      character class names).
$ tr "[:lower:]" "[:upper:]" README > UPPER_README

# (15) Make $PATH a bit more readable/searchable -
#  substitude ':' with a line feed
$ echo $PATH | tr ":" "\n"
/usr/bin
/bin
/usr/local/bin
.....
$ echo $PATH | tr ":" "\n" | grep -i "local"
/usr/local/bin
/usr/home/curly/Local_bin

# (16) Remove all white space from a file.
$ tr -d "[:space:]" < README > NO_WHITE_SPACE

# (17) Substitute all single or sequence of ;
#      with a single :
$ echo ";;;;This;;is;a;;;;simple;;;example." \
| tr -s ";" ":"
:This:is:a:simple:example.

Linux and UNIX tr command help

echo "12345678 9247" | tr 123456789 computerh - this example takes an echo response of '12345678 9247' and pipes it through the tr replacing the appropriate numbers with the letters. In this example it would return computer hope.

tr -cd '\11\12\40-\176' < myfile1 > myfile2 - this example would take the file myfile1 and strip all non printable characters and take that results to myfile2.

Recommended Links

Softpanorama Top Visited

Softpanorama Recommended

tr (Unix) - Wikipedia, the free encyclopedia

tr Command -- nice examples

Commonly Used Unix-like Commands

The tr command

Hacking on Characters with tr Want to quickly strip special characters from a file or change a mac text file into a Unix text file? Learn how in this excerpt from Unix Power Tools, 2nd Edition.

Learn how to remove extended ASCII characters from Unix files

Beginners Guide to Unix Shell Programming

Unix text editing - sed, tr, cut, od

[Chapter 35] 35.11 Hacking on Characters with tr

GNU Core-utils -- contains interesting example of simple spellchecker construicted using tr.

Reference

Solaris 9 man pages section 1 User Commands

Example 1 Creating a list of all the words in a filename

The following example creates a list of all the words in filename1, one per line, in filename2, where a word is taken to be a maximal string of alphabetics. The second string is quoted to protect `\' from the shell. 012 is the ASCII code for NEWLINE.

example% tr -cs A-Za-z '\012' <filename1>filename2

tr Command -- AIX man page

Flags

-A Performs all operations on a byte-by-byte basis using the ASCII collation order for ranges and character classes, instead of the collation order for the current locale.
-c Specifies that the value of String1 be replaced by the complement of the string specified by String1. The complement of String1 is all of the characters in the character set of the current locale, except the characters specified by String1. If the -A and -c flags are both specified, characters are complemented with respect to the set of all 8-bit character codes. If the -c and -s flags are both specified, the -s flag applies to characters in the complement of String1.
-d Deletes each character from standard input that is contained in the string specified by String1.
-s Removes all but the first in a sequence of a repeated characters. Character sequences specified by String1 are removed from standard input before translation, and character sequences specified by String2 are removed from standard output.
String1 Specifies a string of characters.
String2 Specifies a string of characters.

The tr command

Cat-ting our file (columns.txt) and then piping the output of the cat command to the input of the translate command causing all lowercase names to be translated to uppercase names.

cat columns.txt | tr '[a-z]' '[A-Z]'

Remember we have not modified the file columns.txt so how do we save the output? Simple, by redirecting the output of the translate command with '>' to a file called UpCaseColumns.txt with:

cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txt

Since the tr command, does not take a filename like sed did, we could have changed the above example to:

tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txt

As you can see the input to the translate command now comes, not from stdin, but rather from columns.txt. So either way we do it, we can achieve what we've set out to do, using tr as part of a stream, or taking the input from the stdin ('<').

We can also use translate in another way: to distinguish between spaces and tabs. Spaces and tabs can be a pain when using scripts to compile system reports. What we need is a way of translating these characters. Now, there are many ways to skin a cat in Linux and shell scripting. I'm going to show you one way, although I'm sure you could now write a sed expression to do the same thing.

Assume that I have a file with a number of columns in it, but I am not sure about the number of spaces or tabs between the different columns, I would need some way of changing these spaces into a single space. Why? Since, having a space (one or more) or a tab (one or more) between the columns will produce significantly different output if we extracted information from the file with a shell script. How do we do convert many spaces or tabs into a single space? Well, translate is our right-hand man (or woman) for this particular task. In order not to waste our time modifying our columns.txt let's work on the free command, which shows you free memory on your system. Type:

free 

If you look at the output you will see that there's lots of spaces between each one of these fields. How do we reduce multiple spaces between fields to a single space? We can use to tr to squeeze characters (you can squeeze any characters but in this case we want to squeeze a space):

free |tr -s ' '

The -s switch tells the translate command to squeeze. (Read the info page on tr to find out all the other switches of tr).

We could squeeze zeroes with:

free | tr -s '0'

Which would obviously make zero sense!

Going back to our previous command of squeezing spaces, you'll see immediately that our memory usage table (which is what the free command produces) becomes much more usable because we've removed superfluous spaces.

Perhaps, we want some fields from the output. We could redirect the output of this into a file with:

free | tr -s ' ' > file.txt

Traditional systems would have you use a Text editor to cut and paste the fields you are interested in, into a new file. Do we want to do that? Absolutely not! We're lazy, we want to find a better way of doing this.

What I'm interested in, is the line that contains 'Mem'. As part of your project, you should be building a set of scripts to monitor your system. Memory sounds like a good one that you may want to save. Instead of just redirecting the tr command to a file, let's first pass it through sed where we extract only the lines beginning with the word "Mem":

free | tr -s ' ' | sed '/^Mem/!d'

This returns only the line that we're interested in. We could run this over and over again, to ensure that the values change.

Let's take this one step further. We're only interested in the second, third and fourth fields of the line (representing total memory, used memory and free memory respectively). How do we retrieve only these fields?




Etc

Society

Groupthink : Understanding Micromanagers and Control Freaks : Toxic Managers : BureaucraciesHarvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Two Party System as Polyarchy : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

Skeptical Finance : John Kenneth Galbraith : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Oscar Wilde : Talleyrand : Somerset Maugham : War and Peace : Marcus Aurelius : Eric Hoffer : Kurt Vonnegut : Otto Von Bismarck : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Oscar Wilde : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 26, No.1 (January, 2013) Object-Oriented Cult : Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks: The efficient markets hypothesis : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Haters Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

 

The Last but not Least


Copyright 1996-2014 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.

Disclaimer:

The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: February, 19, 2014