Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Unix join command

News Syntax Recommended Links Sorting algorithms Recommended Papers Rcut Reference Pipes
Perl re-implemenations uniq sort tr AWK Tips Humor Etc

The external join command performs the relational database join of two files. Files are joined on a common key field (column) that should exist in both files. Also both files must be sorted on the key field in the same order.

During join operation if the contents in the key field of both files match, the two lines are merged. The merged line is written to the standard output.

Related command are awk, cut, grep, paste, sort, and uniq.  

The join is a classic relational database operation. The join command reads the first line of the first file, then reads lines from the second file until the contents in the key field of the second file match or exceed the contents in the key field of the first file. If a match occurs, the line from the first file and the line from the second file are combined and displayed on the output. If no match occurs, join begins the process again except reading is done from the first file. This process continues until one or both files have been read to the end and no more lines in the other file can match the last line of the file.

An example of a join command might help clarify how the join function works. Let's assume we have the following files.

file1:   file2:  
01123 HP 9000/825 01123 Hewlett Packard
02213 Sun 3/110 02213 Sun Microsystems
03321 Sun 4/110 03321 Sun Microsystems
08412 HP 9000/835 08412 Hewlett Packard

The following command should produce the combined output as shown.

$ join -t"\t" file1 file2

01123 HP 9000/825 Hewlett Packard
02213 Sun3/110 Sun Microsystems
03321 Sun4/110 Sun Microsystems
08412 HP 9000/835 Hewlett Packard

Following is the general format of the join command.

     join [ -an ] [ -e str ] [ -jn m ] [ -o list ] [ -tc ] file1 file2

Options

The following list describes the options and their arguments that may be used to control how join functions.

-an If a field in file n does not have a match in the other file, the line is written to file n. For example, if you use
    join -a1 file1 file2
  and file1 has a line in it with a key field that is not matched in file2, then that line is written to the file named 1.
-e str Replaces empty output fields with the str string. The -o option causes empty fields in certain conditions.
-jn m The join is performed using the mth field of file n. If you do not specify n, the mth field is used from both files. The fields are numbered beginning with 1.
-o list Specifies the fields to be displayed on output after the join has been performed. The list is comprised of n.m descriptions that define which fields from each file are to be displayed. The n refers to the file name and the m refers to the field number. Multiple n.m descriptions may be used in the list. They must be space separated. An example is,
    join -j1 3 -j2 4 -o 1.1 1.3 2.1 1.5 file1 file2
  which uses field 3 of file1 and field 4 of file2 to join the files. The output is field 1 of file1, field 3 of file1, field 1 of file2, and field 5 of file1.
-tc Character c is used as the input and output field separator. For example,
    join -t: file1 file2
  informs join to use a : (colon) field separator in place of a tab.
  The default input field separators are blanks (spaces, tabs, and new-lines). If the default separators are being used, multiple occurrences count as only one field separator. Leading separators are ignored. The default output separator is a space.

The following list describes the arguments that may be passed to the join command.

- A hyphen may be used in place of file1 to cause join to read the standard input as file1.
file1 The first file to be joined to the second file. The file must be sorted in ASCII collating sequence on the fields that are used to join the file to file2.
file2 The second file to be joined to the first file. The file must be sorted in ASCII collating sequence on the fields that are used to join the file to file1.

If fields are separated by spaces and tabs you need to use the sort -b command for sorting the file. If you use the -t option you need to use the sort command without the -b option.

Unfortunately, the comm, join, sort, and uniq commands have different format conventions for referencing data.

Numeric filenames may conflict with the -o option and the field descriptions that relate to it. One solution is to mv your files to temporary names beginning with a nonnumeric character. Then after the join, move the files back to their original names.

You use the join command to join related files together based on a key field. The output is a subset of the fields and lines from both files. It is useful in combining two related tables (files) together to create one larger table.

Examples

To join the /etc/passwd and /etc/group file together to show the user name, user ID, group name, group ID, group name, and home directory (both files should be sorted in ASCII collating sequence).

1.  First you need to sort the passwd file on the group ID and user ID fields:
 sort -t':' +3.0n -4.0n +2.0n -3.0n /etc/passwd > mypasswd
2.  Sort the group file on the group ID field.  sort -t: +3.0n -4.0n /etc/group > mygroup
3.  Now you can join two files:.
   > join -t: -j1 4 -j2 3 -o 1.1 1.3 2.1 2.3 1.6 mypasswd
     mygroup
     root:0:0:root
     .
     .
     mylogin:60:40:tech
     .
     .
 
 

Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: March 15, 2008