|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Unix split command
The external split command splits a file into smaller files based on a
specified number of lines. Each of these smaller files are equal in size, with the
exception of the last one created. It is the remainder of the original. Your original
file is not changed by split. You may find the split command helpful in dividing large data files into
smaller, more manageable files. Original file can be recreated from chunks using
cat command.
Since the extensions added by split create chunks ascending
order of filenames, you can process generated chunks using shell looping structures.
There are some commands that cannot handle extremely large files; therefore,
you may have to split the input for these commands into more manageable blocks. You may also wish to investigate the csplit command, which splits
files based on context.
The split command reads input from a file or the standard input and creates
multiple output files. It can be used as the last stage of the pipeline.
General format:
split [OPTION] [INPUT [PREFIX]]
Each file can contain either fixed amount of bites (-b) or lines (-l). In
case of bytes you can use suffix "k" and "m" in size.
|
SIZE may have a multiplier suffix: b for 512, k
for 1K, m for 1 Meg. |
If you provide the prefix argument, the destination files are named
prefixXX. Where XX is aa for the first file, ab for the second, and continues
until the file zz. That's a total of 676 files you can generate if you divide your
input into small enough sizes. When using prefix, you must use a name two
characters shorter than the maximum allowed for filenames. Maximum filename length
is 100; therefore, you can only use filenames of 98 characters for prefix.
If you do not provide a prefix argument, the destination files are named
xXX. split uses the x as a prefix.
The general format of the split command follows.
split [options] [file [prefix]
Mandatory arguments to long options are mandatory for short options too.
- -a, --suffix-length=N
- use suffixes of length N (default 2)
- -b, --bytes=SIZE
- put SIZE bytes per output file
- -C, --line-bytes=SIZE
- put at most SIZE bytes of lines per output file
- -d, --numeric-suffixes
- use numeric suffixes instead of alphabetic
- -l, --lines=NUMBER
- put NUMBER lines per output file
- --verbose
- print a diagnostic to standard error just before each output
file is opened
- --help
- display this help and exit
- --version
- output version information and exit
SIZE may have a multiplier suffix: b for 512, k for 1K, m for 1 Meg.
The following list describes the arguments that may be passed to the split
command.
| - |
Causes split to read from the standard input. |
| file |
The name of the file split reads and divides into n or
1000 line files. |
|
If no file is given on the command line, split will read from
the standard input. |
| prefix |
The base part of the name used for all output files. An extension is
added to prefix for each file created. The extension is made up
of two alpha characters. The first file extension is "aa," then "ab," and
so on until the original input is completely divided. |
|
If prefix is not specified, the output is written to a file
with a base part of "x" and the normal extensions. Thus the default output
filenames are xaa, xab, and so on. |
split places its output in files with an extension of two characters.
The characters begin with "aa," the next file is "ab," and so on until the entire
input has been split and stored in multiple files.
In this activity you use the split command to divide the standard input
into separate output files. Begin at the shell prompt.
- ls /bin | split -20 - bin
- In this simple example, assume
myfile is 3,000 lines long:
split myfile
This will output three 1000-line files: xaa, xab,
and xac.
- Working on the same file, this next example is more complex:
split -l 500 myfile
This will output six 500-line chunks.
- Finally, assume
myfile is a 4600M file (typical size for
DVD ISO image; you cannot write file of such size on many older filesystems
such as FAT32 with max file size: 4 GB minus 1 byte, see
Working with File Systems):
split -b 2000m myfile iso_segment
This will output four 2000M chunks of DVD image.
Copyright © 1996-2009 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
- The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with.
- We do not warrant the correctness of the information provided or its
fitness for any purpose
- In no way this site is associated with or endorse cybersquatters
using
the term "softpanorama" with other main or country domains (e.g. softpanorama.com) with
bad faith intent to profit from the goodwill belonging to
someone else.
Last modified:
November 13, 2009