|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
| News | Shells | Recommended Links | Options | Examples | Pipes | AWK | xargs |
| Environment | find | grep | sort | cut | tee | Exit Status | Etc |
The uniq command can eliminate or count duplicate lines in a presorted file. It reads in lines and compares the previous line to the current line. Depending on the options specified on the command line it may display only unique lines or one occurrence of repeated lines or both types of lines. The common idiom is
sort | uniq
and
sort | uniq -c
For example
grep 'hysteresis' * | awk -F: '{print $1}' | sort | uniq | wc -l
or in more complex pipe:
cut -d '"' -f 2 $1 | cut -d '/' -f 3 | tr '[:upper:]' '[:lower:]'\
| sort | uniq -c | sort -r > $1_sites
The default output is to display lines that only appear once and one copy of lines that appear more than once.
Syntax
uniq [ -cdu [ +n ][ -n ][ input [ output ] ]
Options
| -c | Precedes each line with a count of the number of times it occurred in the input. |
| -d | Deletes duplicate copies. Only one line out of a set of repeated lines is displayed. |
| -u | Displays only lines not duplicated (uniq lines). |
| -n | Ignores the first n fields of an input line when comparing for duplicate lines. A field is a string of nonblank characters. A blank is a space or tab. |
| +n | Ignores the first n characters of an input line when comparing for duplicate lines. |
Arguments
The following arguments may be passed to the uniq command.
| input | The name of the file containing the input data. |
| output | The name of the file to hold the output data. If no output file is specified, the output is displayed on the standard output. |
| The input and output files must not be the same name. If they are, the contents of the file are destroyed. | |
| If no input file is specified, uniq reads from the standard input and writes to the standard output. | |
| You cannot specify an output file without specifying an input file. |
RELATED COMMANDS
sort
You can use uniq to reduce duplicate lines from a file. First the file must be sorted, then you can remove the duplicate lines, reducing the size of the file.
It is also useful to filter out multiple blank lines from unsorted or sorted output of other commands. For example, the dircmp command displays its output using pr; thus the output usually scrolls off your screen before you can read it. But if you pipe the output of the dircmp command through the uniq command, the blank lines are reduced and the output is compact.
uniq(1) takes a stream of lines and collapses adjacent duplicate lines into one copy of the lines. So if you had a file called foo that looked like:
davel davel davel jeffy jones jeffy mark mark mark chuck bonnie chuck
You could run uniq on it like this:
% uniq foo davel jeffy jones jeffy mark chuck bonnie chuck
Notice that there are still two jeffy lines and two chuck lines. This is because the duplicates were not adjacent. To get a true unique list you have to make sure the stream is sorted:
% sort foo | uniq jones bonnie davel chuck jeffy mark
That gives you a truly unique list. However, it's also a useless use of uniq since sort(1) has an argument, -u to do this very common operation:
% sort -u foo jones bonnie davel chuck jeffy mark
That does exactly the same thing as "sort | uniq", but only takes one process instead of two.uniq has other arguments that let it do more interesting mutilations on its input:
- -d tells uniq to eliminate all lines with only a single occurrence (delete unique lines), and print just one copy of repeated lines:
% sort foo | uniq -d davel chuck jeffy mark- -u tells uniq to eliminate all duplicated lines and show only those which appear once (only the unique lines):
% sort foo | uniq -u jones bonnie- -c tells uniq to count the occurrences of each line:
% sort foo | uniq -c 1 jones 1 bonnie 3 davel 2 chuck 2 jeffy 3 mark
I often pipe the output of "uniq -c" to "sort -n" (sort in numeric order) to get the list in order of frequency:
% sort foo | uniq -c | sort -n 1 jones 1 bonnie 2 chuck 2 jeffy 3 davel 3 mark- Finally, there are arguments to make uniq ignore leading characters and fields. See the man page for details.
Tuesday Tiny Techie Tip -- 03 December 1996