|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Unix tr command
Unix tr command copies the standard input to the standard
output with substitution or deletion of selected characters. Input characters in
set1 are mapped to corresponding characters
in set2.
If length is unequal then set2 is extended to the length of set1 by repeating its last
character as necessary. Excess characters in set2 are ignored.
Utility performs classic alphabet1 to
alphabet2 type of translation sometimes called
1:1 transliteration and as such is suitable for implementation of
Caesar cipher.
Unix inherited tr from Multix as a derivative of PL/1
translate built-in, which in turn was a generalization of a TR command in
System/360 architecture (see
IBM System-360 Green Card).
The format of the tr
command is somewhat strange -- this is one of the few Unix commands that accepts
input only from standard input.
tr
[ options ] [
set1 [
set2 ] ]
Sets can be specified by enumeration of characters like in
tr '{}' '()' < infile > outfile)
or using ranges like in tr A-Z a-z <
infile > outfile. Instead of individual characters special POSIX
character classes. can be used. Among them:
alnum: alphanumeric characters
alpha: alphabetic characters
cntrl: control (non-printing)
characters
digit: numeric characters
graph: graphic characters
lower: lower-case alphabetic
characters
print: printable characters
punct: punctuation characters
space: whitespace characters
upper: upper-case characters
xdigit: hexadecimal characters
Typical usage of classes involves changing the
case from upper to lower or vise versa like in the
following
example:
cat names | tr '[:upper:]' '[:lower:]' > lc_names
Classes can be combined to form a more complex set, for example
'[:lower:][:upper:]'
The tr utility accepts several options:
-c
- work on the complement of the listed characters, i.e., operations apply
to characters not in the given set
-d
- delete characters in the first set from the output
-s
- squeeze repeated characters in the output into just one character.
Most Unix administrators do not suspect about existence of those options,
which are quite useful and greatly extend the usability of this generally very
simple command. Here is more full description of those options:
- -c, --complement Complement
set1 with respect to the universe of characters whose ASCII codes are
01 through 0377 octal. For example:
- To replace every nonprinting character, other than valid control
characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile
- Here is more complex and rather elegant example in which the goal is
to create a list of words in a file:
tr -cs '[:lower:][:upper:]' '[\n*]' < text > words
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The *
(asterisk) causes the tr command to repeat the new line character
enough times to make the second string as long as the first string.
- -d, --delete Delete specified
set of characters defined
in set1 but do not translate. The most important usage
of this tr option is for security purposes: it can
sanitize all arguments so the evil user cannot submit commands as arguments in a
script. Such symbols as backticks, all kind of brackets ( ()[]{} ), colon
and semicolon as well as =#$&!@ should be removed from the values of the argument,
if they cannot occur in the argument before you start processing those values.
If script is used by a considerable population there is always one blacksheep that
will try to mangle input arguments to see what will happens ;-)
For example:
- tr --delete '=;:`"<>,./?!@#$%^&(){}[]'
-
tr
can be used to change the carriage returns at the end of each line into the newline
UNIX expects. tr allows you to specify characters as octal
values by preceding the value with a backslash, so the command:
tr -d '\015' < pc.file > unix.file
OR
tr -d '\r' < pc.file > unix.file
will remove the carriage return from the carriage return/newline
pair used by Microsoft OSes as a line terminator. Please note that this can also
be done by dos2unix utility.
- To delete all NULL characters from a file:
tr -d '\0' < textfile > newfile
- -s, --squeeze-repeats Replace
sequences of the same character with one. -s uses set1 if neither translating
nor deleting specified, otherwise squeeze uses set2 and occurs after
translation or deletion. For example:
- To replace every sequence of characters in the <space> character
class with a single : (colon) character, enter:
tr -s '[:space:]' '[\:*]'
- To replace every sequence of one or more new lines with a single new
line:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
- Here is more complex and rather elegant example in which the goal is
to create a list of words in a file:
tr -cs '[:lower:][:upper:]' '[\n*]' < text > words
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The *
(asterisk) causes the tr command to repeat the new line character
enough times to make the second string as long as the first string.
- -t, --truncate-set1 Truncate
set1 to the length of set2. By default
set2 is truncated to the length of
set1. This option reverse the default behavior. It is available only
in GNU implementation of tr.
Sets are specified as strings of characters. Most represent themselves.
Interpreted sequences are:
-
\nnn
--
character with octal value nnn
-
\xnn
--
character with hexadecimal value nn
-
\\ --
backslash
- \a
--
alert
- \b -- backpace
- \f -- form feed
- \r -- return
- \t -- horizontal tab
- \v -- vertical tab
- \E -- escape
- c1-c2 -- all characters from c1 to c2 in ascending order.
The character specified by c1 must collate before the character
specified by c2.
- [c1-c2] -- same as c1-c2 if both sets use this form
- [c*] -- set2 extended to the length of set1 with the symbol c.
In other words fills out the set2 with the character specified by c.
This option can be used only at the end of the set2. Any characters
specified after the * (asterisk) are ignored.
- [c*N] -- N copies of symbol c. N is considered a decimal integer
unless the first digit is a 0; then it is considered an octal integer.
- [:alnum:] -- all letters and digits
- [:alpha:] -- all letters
- [:blank:] -- all horizontal whitespace
- [:cntrl:] -- all control characters
- [:digit:]
--
all digits
- [:graph:]
--
all printable characters, not including space
- [:lower:]
--
all lower case letters
- [:print:]
--
all printable characters, including space
- [:punct:]
--
all punctuation characters
- [:space:]
--
all horizontal or vertical whitespace
- [:upper:]
--
all upper case letters
- [:xdigit:]
--
all hexadecimal digits
- [=c=]
--
Specifies all of the characters with the same equivalence class as
the character specified by C.
Notes:
- Translation occurs if -d is not given and both set1 and
set2 appear
- -t may be used only when translating.
- set2 is extended to the length of set1 by repeating its last
character as necessary. Excess characters in set2 are ignored.
- Only [:lower:] and [:upper:]
are guaranteed to expand in ascending order. They can be used in pairs
to specify case conversion.
- -s (Squeeze all strings of repeated output characters to single
characters) uses set1 if neither translating
nor deleting specified, otherwise squeeze uses set2 and occurs after
translation or deletion.
Notes:
- Those pages are written by people for whom English is not a
native language. Some amount of grammar and spelling errors
should be expected.
- This is a Spartan WHYFF (We Help You For Free) site. It
cannot replace the best teachers and
the
best books.
- The site contain some obsolete pages as it develops like a
living tree... Some links on older pages
are broken. Please
try to use Google, Open directory, etc. to find a replacement link
(see
HOWTO search the WEB for details).
We would appreciate if you can
mail us a correct link.
|
|
01 Aug 2006 | developerWorks
Translating
text
Now that you know at least five different ways of
generating some text, let's look at doing some simple
translations on it.
The tr command lets you translate
characters in one set to the corresponding characters in
a second set. Let's take a look at a few examples (Listing
4) to see how it works.
Listing 4. Using tr to translate
characters
echo "a test" | tr t p
echo "a test" | tr aest 1234
echo "a test" | tr -d t
echo "a test" | tr '[:lower:]' '[:upper:]'
|
Looking at the output of these commands (see
Listing 5)
gives you a clue about how tr works (here's
a hint: it's a direct replacement of characters in the
first set with the corresponding characters from the
second set).
Listing 5. What has tr done?
chrish@dhcp3 [199]$ echo "a test" | tr t p
a pesp
chrish@dhcp3 [200]$ echo "a test" | tr aest 1234
1 4234
chrish@dhcp3 [201]$ echo "a test" | tr -d t
a es
chrish@dhcp3 [202]$ echo "a test" | tr '[:lower:]' '[:upper:]'
A TEST
|
The first and second examples are simple enough, replacing one
character for another. The third example, with the
-d option (delete), removes the specified
characters completely from the output. This is often
used to remove carriage returns from DOS text files to
turn them into UNIX text files (see
Listing 6).
Finally, the last example uses character classes (those
names inside of [: :]) to convert all lower-case letters
into upper-case letters. Portable Operating System
Interface-standard (POSIX-standard) character classes
include:
alnum: alphanumeric characters
alpha: alphabetic characters
cntrl: control (non-printing)
characters
digit: numeric characters
graph: graphic characters
lower: lower-case alphabetic
characters
print: printable characters
punct: punctuation characters
space: whitespace characters
upper: upper-case characters
xdigit: hexadecimal characters
Listing 6. Converting DOS text
files into UNIX text files
tr -d '\r' < input_dos_file.txt > output_unix_file.txt
|
Although the tr command respects C locale
environment variables (try man locale for more
information about these), don't expect it to do anything
sensible with UTF-8 documents, such as being able to
replace lower-case accented characters with appropriate
upper-case characters. The tr command works
best with ASCII and the other standard C locales.
The following example is a complete awk program, which
prints the number of occurrences of each word in its input. It illustrates
the associative nature of awk arrays by using strings as
subscripts. It also demonstrates the `for x in array'
construction. Finally, it shows how awk can be used in
conjunction with other utility programs to do a useful task of some
complexity with a minimum of effort. Some explanations follow the program
listing.
awk '
# Print list of word frequencies
{
for (i = 1; i <= NF; i++)
freq[$i]++
}
END {
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}'
The first thing to notice about this program is that it has two rules.
The first rule, because it has an empty pattern, is executed on every line
of the input. It uses awk's field-accessing mechanism (see
section
Examining Fields) to pick out the individual words from the line, and
the built-in variable NF (see section
Built-in Variables) to know how many fields are available.
For each input word, an element of the array freq is
incremented to reflect that the word has been seen an additional time.
The second rule, because it has the pattern END, is not
executed until the input has been exhausted. It prints out the contents of
the freq table that has been built up inside the first action.
Note that this program has several problems that would prevent it from
being useful by itself on real text files:
- Words are detected using the
awk convention that fields
are separated by whitespace and that other characters in the input
(except newlines) don't have any special meaning to awk.
This means that punctuation characters count as part of words.
- The
awk language considers upper and lower case
characters to be distinct. Therefore, `foo' and `Foo'
are not treated by this program as the same word. This is undesirable
since in normal text, words are capitalized if they begin sentences, and
a frequency analyzer should not be sensitive to that.
- The output does not come out in any useful order. You're more likely
to be interested in which words occur most frequently, or having an
alphabetized table of how frequently each word occurs.
The way to solve these problems is to use other system utilities to
process the input and output of the awk script. Suppose the
script shown above is saved in the file `frequency.awk'. Then the
shell command:
tr A-Z a-z < file1 | tr -cd 'a-z\012' \
| awk -f frequency.awk \
| sort +1 -nr
produces a table of the words appearing in `file1' in order of
decreasing frequency.
The first tr command in this pipeline translates all the
upper case characters in `file1' to lower case. The second tr
command deletes all the characters in the input except lower case characters
and newlines. The second argument to the second tr is quoted to
protect the backslash in it from being interpreted by the shell. The
awk program reads this suitably massaged data and produces a word
frequency table, which is not ordered.
The awk script's output is now sorted by the sort
command and printed on the terminal. The options given to sort
in this example specify to sort by the second field of each input line
(skipping one field), that the sort keys should be treated as numeric
quantities (otherwise `15' would come before `5'),
and that the sorting should be done in descending (reverse) order.
See the general operating system documentation for more information on
how to use the tr and sort commands.
Shell scripting exampleIn the following example you
will get confirmation before deleting the file. If the user
responds in lower case, the tr command will do nothing, but if
the user responds in upper case, the character will be changed
to lower case. This will ensure that even if user responds with
YES, YeS, YEs etc; script should remove file:
#!/bin/bash
echo -n "Enter file name : "
read myfile
echo -n "Are you sure ( yes or no ) ? "
read confirmation
confirmation="$(echo ${confirmation} | tr ‘A-Z’ ‘a-z’)"
if [ "$confirmation" == "yes" ]; then
[ -f $myfile ] && /bin/rm $myfile || echo "Error - file $myfile not found"
else
: # do nothing
fi
Remove all non-printable characters from myfile.txt
$ tr -cd "[:print:]" < myfile.txt
Remove all two more successive blank spaces from a copy of
the text in a file called input.txt and save output to a new
file called output.txt
tr -s ' ' ' ' < input.txt > output.txt
The -d option is used to delete every instance of the string
(i.e., sequence of characters) specified in set1. For example,
the following would remove every instance of the word nameserver
from a copy of the text in a file called /etc/resolv.conf and
write the output to a file called ns.ipaddress.txt:
tr -d 'nameserver' < /etc/resolv.conf > ns.ipaddress.txt
From AIX man pages
Examples
- To translate braces into parentheses, enter:
tr '{}' '()' < textfile > newfile
This translates each { (left brace) to ( (left
parenthesis) and each } (right brace) to ) (right
parenthesis). All other characters remain unchanged.
- To translate braces into brackets, enter:
tr '{}' '\[]' < textfile > newfile
This translates each { (left brace) to [ (left
bracket) and each } (right brace) to ] (right
bracket). The left bracket must be entered with a \ (backslash) escape
character.
- To translate lowercase characters to uppercase, enter:
tr 'a-z' 'A-Z' < textfile > newfile
- To create a list of words in a file, enter:
tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfile
This translates each sequence of characters other than lowercase
letters and uppercase letters into a single newline character. The *
(asterisk) causes the tr command to repeat the new line character
enough times to make the second string as long as the first string.
- To delete all NULL characters from a file, enter:
tr -d '\0' < textfile > newfile
- To replace every sequence of one or more new lines with a single new
line, enter:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
- To replace every nonprinting character, other than valid control
characters, with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile
This scans a file created in a different locale to find characters
that are not printable characters in the current locale.
- To replace every sequence of characters in the <space> character
class with a single # (pound sign) character, enter:
tr -s '[:space:]' '[#*]'
Cat-ting our file (columns.txt) and then piping the output of the cat
command to the input of the translate command causing all lowercase
names to be translated to uppercase names.
cat columns.txt | tr '[a-z]' '[A-Z]'
|
Remember we have not modified the file columns.txt so how do we save
the output? Simple, by redirecting the output of the translate command
with '>' to a file called UpCaseColumns.txt with:
cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txt
|
Since the tr command, does not take a filename
like sed did, we could have changed the above example to:
tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txt
|
As you can see the input to the translate command now comes, not from
stdin, but rather from columns.txt. So either way we do it, we can
achieve what we've set out to do, using tr as part of a stream, or
taking the input from the stdin ('<').
In the shell program we use to remove all non-printable ASCII characters from
a text file, we tell the tr command to delete every character in the
translation process except for the specific characters we specify. In
essence, we filter out the undesirable characters. The tr command
we use in our program is shown below:
tr -cd
'\11\12\40-\176' < $INPUT_FILE > $OUTPUT_FILE
In this command, the variable INPUT_FILE must contain the name of
the Solaris file you'll be reading from, and OUTPUT_FILE must contain
the name of the output file you'll be writing to. When the -c
and -d options of the tr command are used in combination
like this, the only characters tr writes to the standard output stream
are the characters we've specified on the command line.
Although it may not look very attractive, we're using octal characters in
our tr command to make our programming job easier and more efficient.
Our command tells tr to retain only the octal characters 11,
12, and 40 through 176 when writing to standard output.
Octal character 11 corresponds to the [TAB] character, and
octal 12 corresponds to the [LINEFEED] character. The
octal characters 40 through 176 correspond to the standard
visible keyboard characters, beginning with the [Space] character (octal
40) through the ~ character (octal 176). These are the only characters
retained by tr -- the rest are filtered out, leaving us with a clean
ASCII file.
Example1: Change uppercase to lowercase in a file:
D:\temp>more score.txt
john 81 91
mark 82 93
tina 88 92
D:\temp>tr '[a-z]' '[A-Z]' < score.txt > score1.txt
D:\temp>more score1.txt
JOHN 81 91
MARK 82 93
TINA 88 92
LP: Would you talk a little more about the tr utility?
Ah, tr. Well, first thing that comes to mind is that it is the answer to
the trivia question, "Name a Linux utility that accepts input only from standard
input and never from a file named as an argument on the command line." It is
an odd beast that is useful only sometimes--but when it is useful it is very
useful. Here is an excerpt that talks about tr:
"The tr utility reads standard input and, for each input character, maps
it to an alternate character, deletes the character, or leaves the character
alone. This utility reads from standard input and writes to standard output.
"The tr utility is typically used with two arguments, string1 and string2.
The position of each character in the two strings is important: Each time
tr finds a character from string1 in its input, it replaces that character
with the corresponding character from string2.
"With one argument, string1, and the --delete option, tr deletes the
characters specified in string1. The option --squeeze-repeats replaces multiple
sequential occurrences of characters in string1 with single occurrences
(for example, abbc becomes abc).
"You can use a hyphen to represent a range of characters instring1 or
string2. The two command lines in the following example produce the same
result:
$ echo abcdef | tr 'abcdef' 'xyzabc'
xyzabc
$ echo abcdef | tr 'a-f' 'x-za-c'
xyzabc
"The next example demonstrates a popular method for disguising text,
often called ROT13 (rotate 13) because it replaces the first letter of the
alphabet with the thirteenth, the second with the fourteenth, and so forth.
$ echo The punchline of the joke is ... |
> tr 'A-M N-Z a-m n-z' 'N-Z A-M n-z a-m'
Gur chapuyvar bs gur wbxr vf ...
"To make the text intelligible again, reverse the order of the arguments
to tr:
$ echo Gur chapuyvar bs gur wbxr vf ... |
> tr 'N-Z A-M n-z a-m' 'A-M N-Z a-m n-z'
The punchline of the joke is ...
"The --delete option causes tr to
delete selected characters:
$ echo If you can read this, you can spot the missing vowels! |
> tr --delete 'aeiou'
If y cn rd ths, y cn spt th mssng vwls!
"In the following example, tr replaces characters and reduces pairs of
identical characters to single characters:
$ echo tennessee | tr --squeeze-repeats 'tnse' 'srne'
serene
"The next example replaces each sequence of nonalphabetic characters
(the complement of all the alphabetic characters as specified by the character
class alpha) in the file draft1 with a single NEWLINE character. The output
is a list of words, one per line.
$ tr --complement --squeeze-repeats '[:alpha:]' '\n' < draft1
"The final example uses character classes to upshift the string hi there:
$ echo hi there | tr '[:lower:]' '[:upper:]'
HI THERE
Luckily, we can also use ranges of characters
to specify the characters more efficiently:
tr a-z A-Z
Ever had those horrible upper case
DOS file names? Here's a Bourne script to take care of them:
for f in *; do mv $f `echo $f | tr A-Z a-z` done
Many
UNIX editors allow some text to be processed by the shell. For example,
to replace all upper case characters of the next paragraph with lower case while
in vi, type:
tr A-Z a-z
As another example, the command:
tr a-z A-Z
capitalizes the current and next line (the character
after the ! is a movement character). If you read the International Obfuscated
C Code Contest (ftp://ftp.uu.net./pub/ioccc/), you frequently see that part
of the hints are coded by a method called rot13.
rot13 is a Caesar cypher, i.e., a cypher in
which all letters are shifted some number of places. For example, a becomes
b, b becomes c, ..., y becomes z, and z becomes a. In rot13 each letter is shifted
13 places. It is a weak cypher, and to decipher it, you can use rot13 again.
You can also use tr to read the text in this way:
tr a-zA-Z n-za-mN-ZA-M
Another interesting way
to use tr is to change files from
Macintosh format to UNIX format. For returns, the Macintosh uses \r
while UNIX uses \n. GNU tr allows you to use the C special characters,
so type:
tr \r \n
If you don't have GNU's version of tr, you can always
use the corresponding octal numbers as shown here:
tr \015 \012
You might wonder what would happen if the second
string is shorter than the first string. POSIX says this is not allowed. System
V says that only that portion of the first string is used that has a matching
character in the second string. BSD and GNU pad the second string with its final
character in order to match the length of the first string. The reason this
last method is handy becomes clearer when we take complements into account.
Assume you wish to make a list of all words and keywords in your listing. When
you use -c, tr complements the first string. In C, all identifiers and
keywords consist of a-zA-Z0-9_, so those are the characters we want to
keep. Thus, we can do the following:
tr -c a-zA-Z0-9_ \n
If we pipe the tr output through
sort -u, we get our desired list. If we follow POSIX, the second string
would have to describe 193 newline characters (described as \n*193 or
\n*). If we use system V, only the zero byte is translated to a newline,
since the complement of a-zA-Z0-9_ starts with the zero byte.
The second important use of tr is to remove characters.
For this option, you use the flag -d with one string as an argument.
To fix up those nasty MS-DOS text files with a ^M at the end of the line
and a trailing ^Z, specify tr in this way:
tr -d \015\032
Many people have written a program in C to do this
same operation. Well, a C program isn't necessary--you only need to know the
right program, tr, with the right flags. The -d flag isn't used often,
but is nice to have when needed. You can combine it with the -c flag
to delete everything except characters from the string you supplied as an argument.Repeated characters can be squeezed into a single
one using the -s option with one string as an argument. It can also be
used to squeeze white space. To remove empty lines, type:
tr -s \n
The -s option can be used with two strings
as arguments. In that case, tr first translates the text as if -s were
not given and then tries to squeeze the characters in the second string. For
instance, we can squeeze all standard white space to a single space by specifying:
tr -s \n [ *]
The -d flag can also be used with two strings:
the characters in the first string will be removed and the characters in the
second string will be squeezed.
tr may
not be a great program; however, it gets the job done. It is particularly useful
in scripts using pipes and command substitutions (i.e., inside the back quotes).
If you use tr often, you'll learn to appreciate its capabilities. Small is beautiful.
t
r
is a simple pattern translator. Its practical application overlaps a bit with
other, more complex tools, such as sed and awk [with larger binary footprints].
tr is quite useful for simple textual replacements, deletions and additions.
Its behavior is dictated by "from" and "to" character sets provided as the first
and second argument. The general usage syntax of tr is as follows:
# (12) tr usage
tr [options] "set1" ["set2"] < input > output
Note that tr does not accept file arguments; it reads from standard input
and writes to standard output. When two character sets are provided, tr operates
on the characters contained in "set1" and performs some amount of substitution
based on "set2". Listing 1 demonstrates some of the more common tasks performed
with tr.
# (13) Transform lower case alphas to their
# equivelent upper case.
$ echo "Hello World." | tr "[a-z]" "[A-Z]"
HELLO WORLD.
# (14) Same lower to upper transformation -
# uses character class names :lower:
# and :upper:. (tr recognizes 12
# character class names).
$ tr "[:lower:]" "[:upper:]" README > UPPER_README
# (15) Make $PATH a bit more readable/searchable -
# substitude ':' with a line feed
$ echo $PATH | tr ":" "\n"
/usr/bin
/bin
/usr/local/bin
.....
$ echo $PATH | tr ":" "\n" | grep -i "local"
/usr/local/bin
/usr/home/curly/Local_bin
# (16) Remove all white space from a file.
$ tr -d "[:space:]" < README > NO_WHITE_SPACE
# (17) Substitute all single or sequence of ;
# with a single :
$ echo ";;;;This;;is;a;;;;simple;;;example." \
| tr -s ";" ":"
:This:is:a:simple:example.
echo "12345678 9247" | tr 123456789
computerh - this example takes an echo response of '12345678 9247' and pipes
it through the tr replacing the appropriate numbers with the letters. In this
example it would return computer hope.
tr -cd '\11\12\40-\176' < myfile1
> myfile2 - this example would take the file myfile1 and strip all non printable
characters and take that results to myfile2.
Any combination of the options -c, -d,
or -s may be used:
- -c Complement the set of characters in string1 with respect
to the universe of characters whose ASCII codes are 01 through 0377 octal.
- -d
Delete all input characters in string1.
- -s
Squeeze all strings of repeated output characters that are in string2
to single characters.
Example 1 Creating a list of all the words
in a filename
The following example creates a list of all the words in filename1,
one per line, in filename2, where a word is taken to be a maximal
string of alphabetics. The second string is quoted to protect `\' from
the shell. 012 is the ASCII code for NEWLINE.
example% tr
-cs A-Za-z '\012' <filename1>filename2
tr (Unix) - Wikipedia, the free
encyclopedia
Commonly Used Unix-like
Commands
Hacking on Characters with tr Want to quickly strip special characters from a file or change a
mac text file into a Unix text file? Learn how in this excerpt
from Unix Power
Tools, 2nd Edition.
Beginners Guide to Unix Shell Programming
Unix text editing - sed, tr, cut, od
[Chapter 35] 35.11
Hacking on Characters with tr
GNU Core-utils
-- contains interesting example of simple spellchecker construicted using tr.
Flags
| -A |
Performs all operations on a byte-by-byte basis using the ASCII
collation order for ranges and character classes, instead of the collation
order for the current locale. |
| -c |
Specifies that the value of String1 be replaced by the
complement of the string specified by String1. The complement
of String1 is all of the characters in the character set of the
current locale, except the characters specified by String1.
If the -A and -c flags are both specified, characters
are complemented with respect to the set of all 8-bit character codes.
If the -c and -s flags are both specified, the -s
flag applies to characters in the complement of String1. |
| -d |
Deletes each character from standard input that is contained in
the string specified by String1. |
| -s |
Removes all but the first in a sequence of a repeated characters.
Character sequences specified by String1 are removed from standard
input before translation, and character sequences specified by String2
are removed from standard output. |
| String1 |
Specifies a string of characters. |
| String2 |
Specifies a string of characters. |
- To translate braces into parentheses, enter:
tr '{}' '()' < textfile > newfile
This translates each { (left brace) to ( (left
parenthesis) and each } (right brace) to ) (right
parenthesis). All other characters remain unchanged.
- To translate braces into brackets, enter:
tr '{}' '\[]' < textfile > newfile
This translates each { (left brace) to [ (left
bracket) and each } (right brace) to ] (right
bracket). The left bracket must be entered with a \ (backslash) escape character.
- To translate lowercase characters to uppercase, enter:
tr 'a-z' 'A-Z' < textfile > newfile
- To create a list of words in a file, enter:
tr -cs '[:lower:][:upper:]' '[\n*]' < textfile > newfile
This translates each sequence of characters other than lowercase letters
and uppercase letters into a single newline character. The *
(asterisk) causes the tr command to repeat the new line character
enough times to make the second string as long as the first string.
- To delete all NULL characters from a file, enter:
tr -d '\0' < textfile > newfile
- To replace every sequence of one or more new lines with a single new
line, enter:
tr -s '\n' < textfile > newfile
OR
tr -s '\012' < textfile > newfile
- To replace every nonprinting character, other than valid control characters,
with a ? (question mark), enter:
tr -c '[:print:][:cntrl:]' '[?*]' < textfile > newfile
This scans a file created in a different locale to find characters that
are not printable characters in the current locale.
- To replace every sequence of characters in the <space> character class
with a single # (pound sign) character, enter:
tr -s '[:space:]' '[#*]'
Cat-ting our file (columns.txt) and then piping the output
of the cat command to the input of the translate command causing all lowercase
names to be translated to uppercase names.
cat columns.txt | tr '[a-z]' '[A-Z]'
|
Remember we have not modified the file columns.txt so
how do we save the output? Simple, by redirecting the output of the translate
command with '>' to a file called UpCaseColumns.txt with:
cat columns.txt | tr '[a-z]' '[A-Z]' > UpCaseColumns.txt
|
Since the tr command, does not
take a filename like sed did, we could have changed the above example to:
tr '[a-z]' '[A-Z]' < columns.txt > UpCaseColumns.txt
|
As you can see the input to the translate command now
comes, not from stdin, but rather from columns.txt. So either way we do
it, we can achieve what we've set out to do, using tr as part of a stream,
or taking the input from the stdin ('<').
We can also use translate in another way: to distinguish
between spaces and tabs. Spaces and tabs can be a pain when using scripts
to compile system reports. What we need is a way of translating these characters.
Now, there are many ways to skin a cat in Linux and shell scripting. I'm
going to show you one way, although I'm sure you could now write a sed expression
to do the same thing.
Assume that I have a file with a number of columns in
it, but I am not sure about the number of spaces or tabs between the different
columns, I would need some way of changing these spaces into a single space.
Why? Since, having a space (one or more) or a tab (one or more) between
the columns will produce significantly different output if we extracted
information from the file with a shell script. How do we do convert many
spaces or tabs into a single space? Well, translate is our right-hand man
(or woman) for this particular task. In order not to waste our time modifying
our columns.txt let's work on the free command, which shows you free memory
on your system. Type:
If you look at the output you will see that there's lots
of spaces between each one of these fields. How do we reduce multiple spaces
between fields to a single space? We can use to tr to squeeze characters
(you can squeeze any characters but in this case we want to squeeze a space):
The -s switch tells the translate command to squeeze.
(Read the info page on tr to find out all the other switches of tr).
We could squeeze zeroes with:
Which would obviously make zero sense!
Going back to our previous command of squeezing spaces,
you'll see immediately that our memory usage table (which is what the free
command produces) becomes much more usable because we've removed superfluous
spaces.
Perhaps, we want some fields from the output. We could
redirect the output of this into a file with:
free |tr -s ' ' > file.txt
|
Traditional systems would have you use a Text editor to
cut and paste the fields you are interested in, into a new file. Do we want
to do that? Absolutely not! We're lazy, we want to find a better way of
doing this.
What I'm interested in, is the line that contains 'Mem'.
As part of your project, you should be building a set of scripts to monitor
your system. Memory sounds like a good one that you may want to save. Instead
of just redirecting the tr command to a file, let's
first pass it through sed where we extract only the lines beginning with
the word "Mem":
free | tr -s ' ' | sed '/^Mem/!d'
|
This returns only the line that we're interested in. We
could run this over and over again, to ensure that the values change.
Let's take this one step further. We're only interested
in the second, third and fourth fields of the line (representing total memory,
used memory and free memory respectively). How do we retrieve only these
fields?
Copyright © 1996-2007 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with. We do not warrant the correctness
of the information provided or its fitness for any purpose.
Last modified:
March 15, 2008