|
Softpanorama
(slightly skeptical)
Open Source Software Educational Society |
May the
source be with you,
but remember the KISS principle ;-)
|
Diff
diff is one of the oldest UNIX commands
(was included in UNIX around 1976). It compares the contents of the two files
source file and target
file (modified version) and produces "delta" -- lines that are
changed or absent in either of files. It was written by Hunt and
McIlroy and based
on the algorithm for file comparison that they created (see J. W. Hunt and
M. D. McIlroy, An algorithm for differential file comparison,
Bell Telephone Laboratories CSTR #41 (1976),
PostScript
(text edited from OCR, figures redrawn)). While it is the first it is still
one of the best.
Both files in a classic UNIX diff are assumed to be text files. As file difference
is an illusive concept that simplest approach is consider line indivisible and compute
so called "longest common subsequence of lines"(lcs). Then anything
not in this lcs is declared to belong to the difference set
-- the minimal set of lines that needs to be changed for the transformation of source
to the target.
But both symbol based diff and diff based on words can be easily obtained by
using appropriate filters before applying diff. String diff
is essentially the problem that is studied in all algorithms for text file comparison
as any text file can be trivially converted to the string is some alphabet with
each line represented by one letter of this alphabet.
Difference by word is also
a trivial modification of the basic program. Essentially you first need to covert
the text into "one word per line" format and then use line-based diff. Several such
modifications exists (wdiff,
dwdiff, see below)
or can be easily written in any scripting language.
diff proved to be useful in so
many cases that it is difficult to enumerate them. First and foremost this
tool is used to discover differences between versions of a text file. In this role
it is useful in keeping track of the evolution of a document or of a program. For
example, often, a programmer needs to debug a software program with the codebase
that contains several thousand (or even hundred thousand) lines. if the problem
is present only is this version and is not present in older versions then you can
diff the sources and use code browsers for the difference set to try to pinpoint
the source of the problem. This approach can help dramatically narrow the slice
of the program that probably contains the problematic code.
It is also used as a file compression method, since many versions of a (long)
file can be represented by storing one (long) version of it and many (short) scripts
of transforming the older version into newer versions of the same file. Another
application is so called approximate string matching used, for instance, for the
detection of misspelled words.
If we use diff for program understanding (which is comparing two versions of
the program is usually about), then along with diff tools, powerful tools for source
code browsing and, especially,
slicers, are also
necessary. Some are built into the IDE and some are standalone. The grand daddy
of all slicers is Xedit that has built-in
slicing capabilities since 1970th (famous "all" command). Among
code browsers the
cscope for C/C++ created by AT&T is probably
one of the first useful implementations that address the problem.
diff also lead to creation of news class
programs that use generation of the difference set as a part of their operation.
The most popular among those is
patch
written by Larry Wall.
Older version control systems were little more then diff and several shell scripts.
The external diff command compares two text files for differences. It
determines which lines must be changed to make the two files identical. The diff
command scans the two files and indicates editing changes that must be made to the
first file to make it identical to the second file. The changes can be saved for
use as an ed command script to change the first file. It can also compare
directories. System V provides a second command to compare directories; refer to
the dircmp command.
The general format of the diff command. diff [ -lrs ] [ -S name ] [ -cefhn ] [ -biwt ] dir1 dir2
diff [ -cefhn ] [ -biwt ] file1 file2
diff [ -cefhn ] [ -biwt ] - file2
diff [ -cefhn ] [ -biwt ] file1 -
diff [ -D string ] [ -biw ] file1 file2
diff [ -D string ] [ -biw ] - file2
diff [ -D string ] [ -biw ] file1 -
diff [ -C string ] [ -biw ] file1 file2
diff [ -C string ] [ -biw ] - file2
diff [ -C string ] [ -biw ] file1 -
You may want to think of file1 as the old file and file2 as the new file.
The -e option of diff produces an
editing script usable with either ex or ed. It
consists of a sequence of ed commands necessary to re-create
file2 from file1. By editing the script produced by
diff, you can come up with some useful changes to the file
that bring it "half-way" to the new version.
Options
The following list describes the options and their arguments that may be used
to control how diff functions.
Comparison control options:
| -b |
Causes blanks (spaces and tabs) to compare equally even
if an unequal number of blanks exist. All trailing blanks are ignored. For
example, if the first line was in file1 and the second line was in file2,
they would compare as equals. |
file1: A sample line of text here
file2: A sample line of text here
| -i |
Causes the case of letters to be ignored. For example, |
THE BIG dog ran fast.
and
The big dog ran fast.
match as equal lines.
| -t |
Expands tabs on input to spaces on output. Normal output
adds additional characters to the front of each line. This may change the
indention of the original text, making it difficult to read. This option
preserves the original text indention. The -c option also adds additional
characters, causing indention problems. |
| -w |
Causes all white spaces (blanks and tabs) to be ignored. For example, |
if ( x == y )
and
if(x==y)
compare as equals.
Directory comparison options:
| -l |
Display long output listing. The ouput is piped through
pr for pagination. Other differences are saved and summarized after
all text file differences are displayed. |
| -r |
Recursively descends through subdirectories. |
| -s |
Display files that are the same. Normally, identical files are not displayed. |
| -S name |
Begin the directory comparison with file name. Normally, all
files in the directory are compared. |
Mutually exclusive options:
| -D string |
Creates a merged version of file1 and file2
on the standard ouput. C preprocessor controls are included in the output.
If the ifdef string is not defined then a compile (cc) on the output
would yield the same program as a compile on file1. If the ifdef
string is defined then a compile would be the same as a compile on
file2. |
| -c[n] |
Displays a comparison with n lines of context. The default for
n is 3. The output begins with the identification and creation dates
of each file. Each change encountered is separated with a dozen asterisks
(*). Lines removed from file1 are preceded by a hyphen (-). Lines
removed from file2 are preceded by a plus (+). Lines changed from
file1 to file2 and vice versa are preceded with an exclamation
mark (!). System V does not support the [n]. See the -C[n] option for variable
context sizes. |
| -C[n] |
System V only. Same as the -c[n] option on BSD. |
| -e |
Produces an ed script consisting of the a (append),
c (change), and d (delete) commands. These commands can be used
as input to ed to change file1 to match file2. See
the following section on Version Control. |
| -f |
Produces a script similar to that produced by -e, but the order is for
file1 from file2. These commands are not usable with ed. |
| -h |
Does not attempt to find the most efficient way to edit the changes.
It is fast, but not thorough. The changes must be short and well separated.
It does work on files of unlimited size. The -e and -f options are disabled
if -h is specified. |
| -n |
Produces a script similar to the -e option. The order is reversed. Each
insert or delete command contains a count of changed lines. This format
is used by rcsdiff. |
Two arguments may be passed to the diff
command. They can be iether files or directories:
| file1 |
The first input file used in the comparison. If file1
is a directory name, the file2 file in directory file1 is
used for comparison. For example, if you specify, |
diff adir afile
|
diff uses adir/afile afile for the two files. |
| file2 |
The second input file used in the comparison. If file2 is a directory,
the second file is set to file2/file1. |
| - |
A hyphen may be used in place of either file1 or file2
to represent the standard input. This allows you to pipe input to diff
redirect input from a file, or type input from your keyboard for comparison. |
| dir1 |
The first directory containing files used for comparison. |
| dir2 |
The second directory containing files used for comparison. |
Notes:
- This is a Spartan WHYFF (We Help
You For Free) site written by people for whom English
is not a native language.
Some amount of grammar and spelling errors should be
expected.
- The site contain some broken links
as it develops like a living tree...
Please try to use Google, Open directory,
etc. to find a replacement link (see
HOWTO search the WEB for details). We would appreciate
if you can
mail us a correct link.
|
|
|
|
Perl-based wrapper
ColorDiff is a wrapper for diff. It produces the same output as diff, but
with coloured syntax highlighting at the commandline to improve readability.
The output is similar to how a diff-generated patch might appear in Vim or
Emacs with the appropriate syntax highlighting options enabled. The colour
schemes can be read from a central configuration file or from a local user
~/.colordiffrc file.
The diff command displays
different versions of lines that are found when comparing two files.
(There's also a GNU version on the CD-ROM.) It prints a message that uses
ed-like notation (a for append, c for change, and
d for delete) to describe how a set of lines has changed. This is
followed by the lines themselves. The <
character precedes lines from the first file and
> precedes lines from the second file.
The output of diff -e shows compact formats
with just the differences between the files. But, in many cases,
context diff listings are more useful. Context diffs
show the changed lines and the lines around them. (This can be a
headache if you're trying to read the listing on a terminal and
there are many changed lines fairly close to one another: the
context will make a huge "before" section, with the "after"
section several screenfuls ahead. In that case, the more compact
diff formats can be useful.)
On many versions of diff (including the GNU
version used on Linux), the -c option shows context
around each change. By itself, -c shows three lines
above and below each change. Here's an example of a C++ file
before and after some edits; the -c2
option shows two lines of context:
The -e option of diff produces an
editing script usable with either ex or ed,
instead of the usual output. This script consists of a sequence
of a (add),
c (change), and
d (delete) commands necessary to
re-create file2 from file1 (the first and
second files specified on the diff command line).
Obviously there is no need to completely re-create the
first file from the second, because you could do that easily
with cp. However, by editing the script produced by
diff, you can come up with some desired combination of the
two versions.
In case of broken links
please try to use Google search. If you find the page please notify
us about new location
Diff - Wikipedia, the free
encyclopedia
diff - Linux
Command - Unix Command
GNU Diff
diff - find differences between two files
diff [options] from-file to-file
In the simplest case, diff compares the contents of the
two files from-file and to-file. A file name of
- stands for text read from the standard input. As a
special case, diff - - compares a copy of standard input
to itself.
If from-file is a directory and to-file is
not, diff compares the file in from-file whose
file name is that of to-file, and vice versa. The non-directory
file must not be -.
If both from-file and to-file are directories,
diff compares corresponding files in both directories,
in alphabetical order; this comparison is not recursive unless
the -r or --recursive option is given. diff
never compares the actual contents of a directory as if it were
a file. The file that is fully specified may not be standard
input, because standard input is nameless and the notion of
``file with the same name'' does not apply.
diff options begin with -, so normally from-file
and to-file may not begin with -. However,
-- as an argument by itself treats the remaining arguments
as file names even if they begin with -.
Options
Below is a summary of all of the options that GNU diff
accepts. Most options have two equivalent names, one of which
is a single letter preceded by -, and the other of which
is a long name preceded by --. Multiple single letter
options (unless they take an argument) can be combined into
a single command line word: -ac is equivalent to -a
-c. Long named options can be abbreviated to any unique
prefix of their name. Brackets ([ and ]) indicate
that an option takes an optional argument.
- -lines
- Show lines (an integer) lines of context. This
option does not specify an output format by itself; it has
no effect unless it is combined with -c or -u.
This option is obsolete. For proper operation, patch
typically needs at least two lines of context.
- -a
- Treat all files as text and compare them line-by-line,
even if they do not seem to be text.
- -b
- Ignore changes in amount of white space.
- -B
- Ignore changes that just insert or delete blank lines.
- --brief
- Report only whether the files differ, not the details
of the differences.
- -c
- Use the context output format.
- -C lines
- --context[=lines]
- Use the context output format, showing lines
(an integer) lines of context, or three if lines
is not given. For proper operation, patch typically
needs at least two lines of context.
- --changed-group-format=format
- Use format to output a line group containing
differing lines from both files in if-then-else format.
- -d
- Change the algorithm to perhaps find a smaller set of
changes. This makes diff slower (sometimes much slower).
- -D name
- Make merged if-then-else format output, conditional
on the preprocessor macro name.
- -e
- --ed
- Make output that is a valid ed script.
- --exclude=pattern
- When comparing directories, ignore files and subdirectories
whose basenames match pattern.
- --exclude-from=file
- When comparing directories, ignore files and subdirectories
whose basenames match any pattern contained in file.
- --expand-tabs
- Expand tabs to spaces in the output, to preserve the
alignment of tabs in the input files.
- -f
- Make output that looks vaguely like an ed script
but has changes in the order they appear in the file.
- -F regexp
- In context and unified format, for each hunk of differences,
show some of the last preceding line that matches regexp.
- --forward-ed
- Make output that looks vaguely like an ed script
but has changes in the order they appear in the file.
- -h
- This option currently has no effect; it is present for
Unix compatibility.
- -H
- Use heuristics to speed handling of large files that
have numerous scattered small changes.
- --horizon-lines=lines
- Do not discard the last lines lines of the common
prefix and the first lines lines of the common suffix.
- -i
- Ignore changes in case; consider upper- and lower-case
letters equivalent.
- -I regexp
- Ignore changes that just insert or delete lines that
match regexp.
- --ifdef=name
- Make merged if-then-else format output, conditional
on the preprocessor macro name.
- --ignore-all-space
- Ignore white space when comparing lines.
- --ignore-blank-lines
- Ignore changes that just insert or delete blank lines.
- --ignore-case
- Ignore changes in case; consider upper- and lower-case
to be the same.
- --ignore-matching-lines=regexp
- Ignore changes that just insert or delete lines that
match regexp.
- --ignore-space-change
- Ignore changes in amount of white space.
- --initial-tab
- Output a tab rather than a space before the text of
a line in normal or context format. This causes the alignment
of tabs in the line to look normal.
- -l
- Pass the output through pr to paginate it.
- -L label
- --label=label
- Use label instead of the file name in the context
format and unified format headers.
- --left-column
- Print only the left column of two common lines in side
by side format.
- --line-format=format
- Use format to output all input lines in in-then-else
format.
- --minimal
- Change the algorithm to perhaps find a smaller set of
changes. This makes diff slower (sometimes much slower).
- -n
- Output RCS-format diffs; like -f except that
each command specifies the number of lines affected.
- -N
- --new-file
- In directory comparison, if a file is found in only
one directory, treat it as present but empty in the other
directory.
- --new-group-format=format
- Use format to output a group of lines taken from
just the second file in if-then-else format.
- --new-line-format=format
- Use format to output a line taken from just the
second file in if-then-else format.
- --old-group-format=format
- Use format to output a group of lines taken from
just the first file in if-then-else format.
- --old-line-format=format
- Use format to output a line taken from just the
first file in if-then-else format.
- -p
- Show which C function each change is in.
- -P
- When comparing directories, if a file appears only in
the second directory of the two, treat it as present but
empty in the other.
- --paginate
- Pass the output through pr to paginate it.
- -q
- Report only whether the files differ, not the details
of the differences.
- -r
- When comparing directories, recursively compare any
subdirectories found.
- --rcs
- Output RCS-format diffs; like -f except that
each command specifies the number of lines affected.
- --recursive
- When comparing directories, recursively compare any
subdirectories found.
- --report-identical-files
- -s
- Report when two files are the same.
- -S file
- When comparing directories, start with the file file.
This is used for resuming an aborted comparison.
- --from-file=file
- Compare file to all operands. file can
be a directory.
- --to-file=file
- Compare all operands to file. file can
be a directory.
- --sdiff-merge-assist
- Print extra information to help sdiff. sdiff
uses this option when it runs diff. This option is
not intended for users to use directly.
- --show-c-function
- Show which C function each change is in.
- --show-function-line=regexp
- In context and unified format, for each hunk of differences,
show some of the last preceding line that matches regexp.
- --side-by-side
- Use the side by side output format.
- --speed-large-files
- Use heuristics to speed handling of large files that
have numerous scattered small changes.
- --starting-file=file
- When comparing directories, start with the file file.
This is used for resuming an aborted comparison.
- --suppress-common-lines
- Do not print common lines in side by side format.
- -t
- Expand tabs to spaces in the output, to preserve the
alignment of tabs in the input files.
- -T
- Output a tab rather than a space before the text of
a line in normal or context format. This causes the alignment
of tabs in the line to look normal.
- --text
- Treat all files as text and compare them line-by-line,
even if they do not appear to be text.
- -u
- Use the unified output format.
- --unchanged-group-format=format
- Use format to output a group of common lines
taken from both files in if-then-else format.
- --unchanged-line-format=format
- Use format to output a line common to both files
in if-then-else format.
- --unidirectional-new-file
- When comparing directories, if a file appears only in
the second directory of the two, treat it as present but
empty in the other.
- -U lines
- --unified[=lines]
- Use the unified output format, showing lines
(an integer) lines of context, or three if lines
is not given. For proper operation, patch typically
needs at least two lines of context.
- -v
- --version
- Output the version number of diff.
- -w
- Ignore white space when comparing lines.
- -W columns
- --width=columns
- Use an output width of columns in side by side
format.
- -x pattern
- When comparing directories, ignore files and subdirectories
whose basenames match pattern.
- -X file
- When comparing directories, ignore files and subdirectories
whose basenames match any pattern contained in file.
- -y
- Use the side by side output format.
Copyright © 1996-2009 by Dr. Nikolai Bezroukov.
www.softpanorama.org was
created as a service to the UN Sustainable Development Networking Programme (SDNP)
in the author free time.
Submit
comments This document is an industrial compilation designed and created
exclusively for educational use and is placed under the copyright of the
Open Content License(OPL).
Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made
for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
- The statements, views and opinions presented on
this web page are those of the author and are not endorsed by, nor do they necessarily
reflect, the opinions of the author present and former employers, SDNP or any other
organization the author may be associated with.
- We do not warrant the correctness of the information provided or its
fitness for any purpose
- In no way this site is associated with or endorse cybersquatters
using
the term "softpanorama" with other main or country domains (e.g. softpanorama.com) with
bad faith intent to profit from the goodwill belonging to
someone else.
Last modified:
August 08, 2009