Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Diff

News

Recommended Books

Recommended Links

Reference

Program Understanding

Some implementations

patch wdiff dwdiff, kdiff3 tkdiff

pdiff

    Admin Horror Stories Unix History Humor Etc

diff is one of the oldest UNIX commands (was included in UNIX around 1976). It compares the contents of the two files source file and target file (modified version) and produces "delta" -- lines that are changed or absent in either of files.  It was written by Hunt and McIlroy and based on the algorithm for file comparison that they created (see  J. W. Hunt and M. D. McIlroy, An algorithm for differential file comparison, Bell Telephone Laboratories CSTR #41 (1976), PostScript (text edited from OCR, figures redrawn)).  While it is the first it is still one of the best.

Both files in a classic UNIX diff are assumed to be text files. As file difference is an illusive concept that simplest approach is consider line indivisible and compute so called "longest common subsequence of lines"(lcs). Then  anything not in this lcs is declared to belong to the difference set -- the minimal set of lines that needs to be changed for the transformation of source to the target.

But both symbol based diff and diff based on words can be easily obtained by using appropriate filters before applying diff. String diff is essentially the problem that is studied in all algorithms for text file comparison as any text file can be trivially converted to the string is some alphabet with each line represented by one letter of this alphabet.

Difference by word is also a trivial modification of the basic program. Essentially you first need to covert the text into "one word per line" format and then use line-based diff. Several such modifications exists (wdiff, dwdiff, see below) or can be easily written in any scripting language.

diff proved to be useful in so many cases that it is difficult to enumerate them.  First and foremost this tool is used to discover differences between versions of a text file. In this role it is useful in keeping track of the evolution of a document or of a program. For example, often, a programmer needs to debug a software program with the codebase that contains several thousand (or even hundred thousand) lines. if the problem is present only is this version and is not present in older versions then you can diff the sources and use code browsers for the difference set to try to pinpoint the source of the problem. This approach can help dramatically narrow the slice of the program that probably contains the problematic code.

It is also used as a file compression method, since many versions of a (long) file can be represented by storing one (long) version of it and many (short) scripts of transforming the older version into newer versions of the same file. Another application is so called approximate string matching used, for instance, for the detection of misspelled words.  

If we use diff for program understanding (which is comparing two versions of the program is usually about), then along with diff tools, powerful tools for source code browsing and, especially, slicers, are also necessary. Some are built into the IDE and some are standalone. The grand daddy of all slicers is Xedit  that has built-in slicing capabilities since 1970th (famous "all" command).  Among code browsers the cscope for C/C++ created by AT&T is probably one of the first useful implementations that address the problem.

diff also lead to creation of news class programs that use generation of the difference set as a part of their operation. The most popular among those is patch written by Larry Wall.  Older version control systems were little more then diff and several shell scripts.

The external diff command compares two text files for differences. It determines which lines must be changed to make the two files identical. The diff command scans the two files and indicates editing changes that must be made to the first file to make it identical to the second file. The changes can be saved for use as an ed command script to change the first file. It can also compare directories. System V provides a second command to compare directories; refer to the dircmp command.

The general format of the diff command.
     diff [ -lrs ] [ -S name ] [ -cefhn ] [ -biwt ] dir1 dir2

     diff [ -cefhn ] [ -biwt ] file1 file2
     diff [ -cefhn ] [ -biwt ] - file2
     diff [ -cefhn ] [ -biwt ] file1 -

     diff [ -D string ] [ -biw ] file1 file2
     diff [ -D string ] [ -biw ] - file2
     diff [ -D string ] [ -biw ] file1 -

     diff [ -C string ] [ -biw ] file1 file2
     diff [ -C string ] [ -biw ] - file2
     diff [ -C string ] [ -biw ] file1 -

You may want to think of file1 as the old file and file2 as the new file.

The -e option of diff produces an editing script usable with either ex or ed. It consists of a sequence of ed commands necessary to re-create file2 from file1. By editing the script produced by diff, you can come up with some useful changes to the file that bring it "half-way" to the new version.

Options

The following list describes the options and their arguments that may be used to control how diff functions.

Comparison control options:

-b Causes blanks (spaces and tabs) to compare equally even if an unequal number of blanks exist. All trailing blanks are ignored. For example, if the first line was in file1 and the second line was in file2, they would compare as equals.
                    file1:    A     sample line of text here
                    file2:    A     sample line of text here
-i Causes the case of letters to be ignored. For example,
                     THE BIG dog ran fast.
                    and
                     The big dog ran fast.
                    match as equal lines.
-t Expands tabs on input to spaces on output. Normal output adds additional characters to the front of each line. This may change the indention of the original text, making it difficult to read. This option preserves the original text indention. The -c option also adds additional characters, causing indention problems.
-w Causes all white spaces (blanks and tabs) to be ignored. For example,
                      if ( x == y )
                    and
                      if(x==y)
                    compare as equals.

Directory comparison options:

-l Display long output listing. The ouput is piped through pr for pagination. Other differences are saved and summarized after all text file differences are displayed.
-r Recursively descends through subdirectories.
-s Display files that are the same. Normally, identical files are not displayed.
-S name Begin the directory comparison with file name. Normally, all files in the directory are compared.

Mutually exclusive options:

-D string Creates a merged version of file1 and file2 on the standard ouput. C preprocessor controls are included in the output. If the ifdef string is not defined then a compile (cc) on the output would yield the same program as a compile on file1. If the ifdef string is defined then a compile would be the same as a compile on file2.
-c[n] Displays a comparison with n lines of context. The default for n is 3. The output begins with the identification and creation dates of each file. Each change encountered is separated with a dozen asterisks (*). Lines removed from file1 are preceded by a hyphen (-). Lines removed from file2 are preceded by a plus (+). Lines changed from file1 to file2 and vice versa are preceded with an exclamation mark (!). System V does not support the [n]. See the -C[n] option for variable context sizes.
-C[n] System V only. Same as the -c[n] option on BSD.
-e Produces an ed script consisting of the a (append), c (change), and d (delete) commands. These commands can be used as input to ed to change file1 to match file2. See the following section on Version Control.
-f Produces a script similar to that produced by -e, but the order is for file1 from file2. These commands are not usable with ed.
-h Does not attempt to find the most efficient way to edit the changes. It is fast, but not thorough. The changes must be short and well separated. It does work on files of unlimited size. The -e and -f options are disabled if -h is specified.
-n Produces a script similar to the -e option. The order is reversed. Each insert or delete command contains a count of changed lines. This format is used by rcsdiff.

Two arguments may be passed to the diff command. They can be iether files or directories:

file1 The first input file used in the comparison. If file1 is a directory name, the file2 file in directory file1 is used for comparison. For example, if you specify,
                    diff adir afile
diff uses adir/afile afile for the two files.
file2 The second input file used in the comparison. If file2 is a directory, the second file is set to file2/file1.
- A hyphen may be used in place of either file1 or file2 to represent the standard input. This allows you to pipe input to diff redirect input from a file, or type input from your keyboard for comparison.
dir1 The first directory containing files used for comparison.
dir2 The second directory containing files used for comparison.

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News

[Aug 27, 2017] Diff A Directory Recursively, Ignoring All Binary Files

It is now possible to use -r to recursively compare directories
Aug 27, 2017 | stackoverflow.com
diff -r dir1/ dir2/ | sed '/Binary\ files\ /d' >outputfile

This recursively compares dir1 to dir2, sed removes the lines for binary files (begins with " Binary files "), then it's redirected to the outputfile.

-- Shannon VanWagner

[Feb 04, 2017] Quickly find differences between two directories

You will be surprised, but GNU diff use in Linux understands the situation when two arguments are directories and behaves accordingly
Feb 04, 2017 | www.cyberciti.biz

The diff command compare files line by line. It can also compare two directories:

# Compare two folders using diff ##
diff /etc /tmp/etc_old  
Rafal Matczak September 29, 2015, 7:36 am
§ Quickly find differences between two directories
And quicker:
 diff -y <(ls -l ${DIR1}) <(ls -l ${DIR2})  

[Sep 11, 2008] ColorDiff

Perl-based wrapper

ColorDiff is a wrapper for diff. It produces the same output as diff, but with coloured syntax highlighting at the commandline to improve readability. The output is similar to how a diff-generated patch might appear in Vim or Emacs with the appropriate syntax highlighting options enabled. The colour schemes can be read from a central configuration file or from a local user ~/.colordiffrc file.

LinuxDevCenter.com -- Checking Differences with diff

The diff command displays different versions of lines that are found when comparing two files. (There's also a GNU version on the CD-ROM.) It prints a message that uses ed-like notation (a for append, c for change, and d for delete) to describe how a set of lines has changed. This is followed by the lines themselves. The < character precedes lines from the first file and > precedes lines from the second file.

LinuxDevCenter.com -- Context diffs

The output of diff -e shows compact formats with just the differences between the files. But, in many cases, context diff listings are more useful. Context diffs show the changed lines and the lines around them. (This can be a headache if you're trying to read the listing on a terminal and there are many changed lines fairly close to one another: the context will make a huge "before" section, with the "after" section several screenfuls ahead. In that case, the more compact diff formats can be useful.)

On many versions of diff (including the GNU version used on Linux), the -c option shows context around each change. By itself, -c shows three lines above and below each change. Here's an example of a C++ file before and after some edits; the -c2 option shows two lines of context:

LinuxDevCenter.com -- ex Scripts Built by diff

The -e option of diff produces an editing script usable with either ex or ed, instead of the usual output. This script consists of a sequence of a (add), c (change), and d (delete) commands necessary to re-create file2 from file1 (the first and second files specified on the diff command line).

Obviously there is no need to completely re-create the first file from the second, because you could do that easily with cp. However, by editing the script produced by diff, you can come up with some desired combination of the two versions.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Diff - Wikipedia, the free encyclopedia

diff - Linux Command - Unix Command

GNU Diff

diff - find differences between two files
diff [options] from-file to-file
In the simplest case, diff compares the contents of the two files from-file and to-file. A file name of - stands for text read from the standard input. As a special case, diff - - compares a copy of standard input to itself.

If from-file is a directory and to-file is not, diff compares the file in from-file whose file name is that of to-file, and vice versa. The non-directory file must not be -.

If both from-file and to-file are directories, diff compares corresponding files in both directories, in alphabetical order; this comparison is not recursive unless the -r or --recursive option is given. diff never compares the actual contents of a directory as if it were a file. The file that is fully specified may not be standard input, because standard input is nameless and the notion of ``file with the same name'' does not apply.

diff options begin with -, so normally from-file and to-file may not begin with -. However, -- as an argument by itself treats the remaining arguments as file names even if they begin with -.

Options

Below is a summary of all of the options that GNU diff accepts. Most options have two equivalent names, one of which is a single letter preceded by -, and the other of which is a long name preceded by --. Multiple single letter options (unless they take an argument) can be combined into a single command line word: -ac is equivalent to -a -c. Long named options can be abbreviated to any unique prefix of their name. Brackets ([ and ]) indicate that an option takes an optional argument.
-lines
Show lines (an integer) lines of context. This option does not specify an output format by itself; it has no effect unless it is combined with -c or -u. This option is obsolete. For proper operation, patch typically needs at least two lines of context.
-a
Treat all files as text and compare them line-by-line, even if they do not seem to be text.
-b
Ignore changes in amount of white space.
-B
Ignore changes that just insert or delete blank lines.
--brief
Report only whether the files differ, not the details of the differences.
-c
Use the context output format.
-C lines

--context[=lines]
Use the context output format, showing lines (an integer) lines of context, or three if lines is not given. For proper operation, patch typically needs at least two lines of context.
--changed-group-format=format
Use format to output a line group containing differing lines from both files in if-then-else format.
-d
Change the algorithm to perhaps find a smaller set of changes. This makes diff slower (sometimes much slower).
-D name
Make merged if-then-else format output, conditional on the preprocessor macro name.
-e

--ed
Make output that is a valid ed script.
--exclude=pattern
When comparing directories, ignore files and subdirectories whose basenames match pattern.
--exclude-from=file
When comparing directories, ignore files and subdirectories whose basenames match any pattern contained in file.
--expand-tabs
Expand tabs to spaces in the output, to preserve the alignment of tabs in the input files.
-f
Make output that looks vaguely like an ed script but has changes in the order they appear in the file.
-F regexp
In context and unified format, for each hunk of differences, show some of the last preceding line that matches regexp.
--forward-ed
Make output that looks vaguely like an ed script but has changes in the order they appear in the file.
-h
This option currently has no effect; it is present for Unix compatibility.
-H
Use heuristics to speed handling of large files that have numerous scattered small changes.
--horizon-lines=lines
Do not discard the last lines lines of the common prefix and the first lines lines of the common suffix.
-i
Ignore changes in case; consider upper- and lower-case letters equivalent.
-I regexp
Ignore changes that just insert or delete lines that match regexp.
--ifdef=name
Make merged if-then-else format output, conditional on the preprocessor macro name.
--ignore-all-space
Ignore white space when comparing lines.
--ignore-blank-lines
Ignore changes that just insert or delete blank lines.
--ignore-case
Ignore changes in case; consider upper- and lower-case to be the same.
--ignore-matching-lines=regexp
Ignore changes that just insert or delete lines that match regexp.
--ignore-space-change
Ignore changes in amount of white space.
--initial-tab
Output a tab rather than a space before the text of a line in normal or context format. This causes the alignment of tabs in the line to look normal.
-l
Pass the output through pr to paginate it.
-L label

--label=label
Use label instead of the file name in the context format and unified format headers.
--left-column
Print only the left column of two common lines in side by side format.
--line-format=format
Use format to output all input lines in in-then-else format.
--minimal
Change the algorithm to perhaps find a smaller set of changes. This makes diff slower (sometimes much slower).
-n
Output RCS-format diffs; like -f except that each command specifies the number of lines affected.
-N

--new-file
In directory comparison, if a file is found in only one directory, treat it as present but empty in the other directory.
--new-group-format=format
Use format to output a group of lines taken from just the second file in if-then-else format.
--new-line-format=format
Use format to output a line taken from just the second file in if-then-else format.
--old-group-format=format
Use format to output a group of lines taken from just the first file in if-then-else format.
--old-line-format=format
Use format to output a line taken from just the first file in if-then-else format.
-p
Show which C function each change is in.
-P
When comparing directories, if a file appears only in the second directory of the two, treat it as present but empty in the other.
--paginate
Pass the output through pr to paginate it.
-q
Report only whether the files differ, not the details of the differences.
-r
When comparing directories, recursively compare any subdirectories found.
--rcs
Output RCS-format diffs; like -f except that each command specifies the number of lines affected.
--recursive
When comparing directories, recursively compare any subdirectories found.
--report-identical-files

-s
Report when two files are the same.
-S file
When comparing directories, start with the file file. This is used for resuming an aborted comparison.
--from-file=file
Compare file to all operands. file can be a directory.
--to-file=file
Compare all operands to file. file can be a directory.
--sdiff-merge-assist
Print extra information to help sdiff. sdiff uses this option when it runs diff. This option is not intended for users to use directly.
--show-c-function
Show which C function each change is in.
--show-function-line=regexp
In context and unified format, for each hunk of differences, show some of the last preceding line that matches regexp.
--side-by-side
Use the side by side output format.
--speed-large-files
Use heuristics to speed handling of large files that have numerous scattered small changes.
--starting-file=file
When comparing directories, start with the file file. This is used for resuming an aborted comparison.
--suppress-common-lines
Do not print common lines in side by side format.
-t
Expand tabs to spaces in the output, to preserve the alignment of tabs in the input files.
-T
Output a tab rather than a space before the text of a line in normal or context format. This causes the alignment of tabs in the line to look normal.
--text
Treat all files as text and compare them line-by-line, even if they do not appear to be text.
-u
Use the unified output format.
--unchanged-group-format=format
Use format to output a group of common lines taken from both files in if-then-else format.
--unchanged-line-format=format
Use format to output a line common to both files in if-then-else format.
--unidirectional-new-file
When comparing directories, if a file appears only in the second directory of the two, treat it as present but empty in the other.
-U lines

--unified[=lines]
Use the unified output format, showing lines (an integer) lines of context, or three if lines is not given. For proper operation, patch typically needs at least two lines of context.
-v

--version
Output the version number of diff.
-w
Ignore white space when comparing lines.
-W columns

--width=columns
Use an output width of columns in side by side format.
-x pattern
When comparing directories, ignore files and subdirectories whose basenames match pattern.
-X file
When comparing directories, ignore files and subdirectories whose basenames match any pattern contained in file.
-y
Use the side by side output format.


Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: October 25, 2020