Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Softpanorama Search

Unix split command

News Syntax Recommended Links Options Arguments Examples
Perl re-implemenations cat     Humor Etc

The external split command splits a file into smaller files based on a specified number of lines. Each of these smaller files are equal in size, with the exception of the last one created. It is the remainder of the original. Your original file is not changed by split. You may find the split command helpful in dividing large data files into smaller, more manageable files. Original file can be recreated from chunks using cat command.

Since the extensions added by split create chunks ascending order of filenames, you can process generated chunks using shell looping structures.

There are some commands that cannot handle extremely large files; therefore, you may have to split the input for these commands into more manageable blocks. You may also wish to investigate the csplit command, which splits files based on context.

The split command reads input from a file or the standard input and creates multiple output files. It can be used as the last stage of the pipeline.  General format:

split [OPTION] [INPUT [PREFIX]]

Each file can contain either fixed amount of bites (-b) or lines (-l). In case of bytes you can use suffix "k" and "m" in size.

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

If you provide the prefix argument, the destination files are named prefixXX. Where XX is aa for the first file, ab for the second, and continues until the file zz. That's a total of 676 files you can generate if you divide your input into small enough sizes. When using prefix, you must use a name two characters shorter than the maximum allowed for filenames. Maximum filename length is 100; therefore, you can only use filenames of 98 characters for prefix. If you do not provide a prefix argument, the destination files are named xXX. split uses the x as a prefix.

The general format of the split command follows.

     split [options] [file [prefix]

Options

Mandatory arguments to long options are mandatory for short options too.
-a, --suffix-length=N
use suffixes of length N (default 2)
-b, --bytes=SIZE
put SIZE bytes per output file
-C, --line-bytes=SIZE
put at most SIZE bytes of lines per output file
-d, --numeric-suffixes
use numeric suffixes instead of alphabetic
-l, --lines=NUMBER
put NUMBER lines per output file
--verbose
print a diagnostic to standard error just before each output file is opened
--help
display this help and exit
--version
output version information and exit

SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.

 

Arguments

The following list describes the arguments that may be passed to the split command.

- Causes split to read from the standard input.
file The name of the file split reads and divides into n or 1000 line files.
If no file is given on the command line, split will read from the standard input.
prefix The base part of the name used for all output files. An extension is added to prefix for each file created. The extension is made up of two alpha characters. The first file extension is "aa," then "ab," and so on until the original input is completely divided.
If prefix is not specified, the output is written to a file with a base part of "x" and the normal extensions. Thus the default output filenames are xaa, xab, and so on.

split places its output in files with an extension of two characters. The characters begin with "aa," the next file is "ab," and so on until the entire input has been split and stored in multiple files.

Examples

In this activity you use the split command to divide the standard input into separate output files. Begin at the shell prompt.

ls /bin | split -20 - bin


Copyright © 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Disclaimer:

Last modified: November 13, 2009