Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Unix find tutorial

Prev | Contents | Next

Part 10: Using find for backups

The find command lets you copy the entire contents of a directory while preserving the permissions, times, and ownership of every file and subdirectory. Because find capabilities to specify complex criteria for files it can create a perfect list of files for cpio, tar, pax and another archiver to backup

Fortunately find has several options that are very useful for structuring the backup:

The typical usage is to combine find and the cpio command, as the latter accepts the list of files via standard input.  Tar can do this too with -T - option.  Typically each mount point is backed up in a separate tar or cpio archive. 

cd /usr

find /usr -mount fstype ext3 - | cpio -pdumv /backup/usr080124.cpi

or, using tar:

find /usr -mount fstype ext3 -print0 | tar -null -cvzf /backup/usr080124.tgz
It is also possible to do incremental backups using -newer option
find /usr -newer /backup/usr080124.tgz -mount fstype ext3 -print0 | tar -null -cvzf /backup/usr_delta080124.tgz

You can also try to avoid errors in backing up named pipes, devises, etc  using more complex traversal expressions, for example

find / -mount -fstype ext3 \( -type f -or -type l \) > /tmp/root_list.txt

The problem here is with hard linked files. That that is problem of tar not find. The cpio command is a more sophisticated backup tool than tar. It is harder to use, but is capable of copying special files (such as devices and links) consistently, and will accept wildcard characters when listing the files to be archived.

On higher level you might benefit from exclusion of all files that are not changes in RPMs from which system was installed.  This is the approach taken by  backup built-in in YAST (it uses tar, not cpio). While tar cannot accept the list of files as standard input it has the -T option which can be used to specify the location of file with list of files to be tarred". Here is how this option is described in the manual:

Instead of giving the names of files or archive members on the command line, you can put the names into a file, and then use the ‘--files-from=file-of-names’ -T file-of-names’) option to tar. Give the name of the file which contains the list of files to include as the argument to ‘--files-from’. In the list, the file names should be separated by newlines. You will frequently use this option when you have generated the list of files to archive with the find utility. 

... ... ...

In the file list given by ‘-T’ option, any file name beginning with ‘-’ character is considered a tar option and is processed accordingly.(14) For example, the common use of this feature is to change to another directory by specifying ‘-C’ option:

$ cat list
-C/etc
passwd
hosts
-C/lib
libc.a
$ tar -c -f foo.tar --files-from list

For example if we want to archive file that has size less then 1000 we can first create of list of such files using find and then use tar to created an archive.

find .  -size -1K -print > /etc/small-files
tar -cvzT /etc/small-files -f little.tgz

You can also compress the archive with gzip of the fly:

tar -zPvcf backup.tar.gz -T list_of_files_to_be_tarred_or_list_of_locations

You will want to use the ‘--label=archive-label’ (‘-V archive-label’) option to give the archive a volume label, so you can tell what this archive is even if the label falls off the tape, or anything like that.

Unless the file system you are dumping is guaranteed to fit on one volume, you might need to use the ‘--multi-volume’ (‘-M’) option.

Like find,  tar has an option of that prevent it from crossing the filesystem (partition) boundaries:  ‘--one-file-system’ option to prevent  from crossing file system boundaries when storing (sub)directories.

It also has the ‘--incremental’ (‘-G’) option  (see section Using tar to Perform Incremental Dumps).

Find has several useful for backups filesystem Traversal Options

The options ‘-H’, ‘-L’ or ‘-P’ may be specified at the start of the command line (if none of these is specified, ‘-P’ is assumed). If you specify more than one of these options, the last one specified takes effect (the ‘-follow’ option is equivalent to ‘-L’).

If find would follow a symbolic link, but cannot for any reason (for example, because it has insufficient permissions or the link is broken), it falls back on using the properties of the symbolic link itself. Symbolic Links for a more complete description of how symbolic links are handled.

— Option: -maxdepth levels
 

Descend at most levels (a non-negative integer) levels of directories below the command line arguments. ‘-maxdepth 0’ means only apply the tests and actions to the command line arguments.

— Option: -mindepth levels
 

Do not apply any tests or actions at levels less than levels (a non-negative integer). ‘-mindepth 1’ means process all files except the command line arguments.

— Option: -depth
 

Process each directory's contents before the directory itself. Doing this is a good idea when producing lists of files to archive with cpio or tar. If a directory does not have write permission for its owner, its contents can still be restored from the archive since the directory's permissions are restored after its contents.

— Option: -d
 

This is a deprecated synonym for ‘-depth’, for compatibility with Mac OS X, FreeBSD and OpenBSD. The ‘-depth’ option is a POSIX feature, so it is better to use that.

— Action: -prune
 

If the file is a directory, do not descend into it. The result is true. For example, to skip the directory src/emacs and all files and directories under it, and print the names of the other files found:

          find . -wholename './src/emacs' -prune -o -print

The above command will not print ./src/emacs among its list of results. This however is not due to the effect of the ‘-prune’ action (which only prevents further descent, it doesn't make sure we ignore that item). Instead, this effect is due to the use of ‘-o’. Since the left hand side of the “or” condition has succeeded for ./src/emacs, it is not necessary to evaluate the right-hand-side (‘-print’) at all for this particular file. If you wanted to print that directory name you could use either an extra ‘-print’ action:

          find . -wholename './src/emacs' -prune -print -o -print

or use the comma operator:

          find . -wholename './src/emacs' -prune , -print

If the ‘-depth’ option is in effect, the subdirectories will have already been visited in any case. Hence ‘-prune’ has no effect in this case.

Because ‘-delete’ implies ‘-depth’, using ‘-prune’ in combination with ‘-delete’ may well result in the deletion of more files than you intended.

— Action: -quit
 

Exit immediately (with return value zero if no errors have occurred). This is different to ‘-prune’ because ‘-prune’ only applies to the contents of pruned directories, whilt ‘-quit’ simply makes find stop immediately. No child processes will be left running, but no more files specified on the command line will be processed. For example, find /tmp/foo /tmp/bar -print -quit will print only ‘/tmp/foo’. Any command lines which have been built by ‘-exec ... \+’ or ‘-execdir ... \+’ are invoked before the program is exited.

— Option: -noleaf
 

Do not optimize by assuming that directories contain 2 fewer subdirectories than their hard link count. This option is needed when searching filesystems that do not follow the Unix directory-link convention, such as CD-ROM or MS-DOS filesystems or AFS volume mount points. Each directory on a normal Unix filesystem has at least 2 hard links: its name and its . entry. Additionally, its subdirectories (if any) each have a .. entry linked to that directory. When find is examining a directory, after it has statted 2 fewer subdirectories than the directory's link count, it knows that the rest of the entries in the directory are non-directories (leaf files in the directory tree). If only the files' names need to be examined, there is no need to stat them; this gives a significant increase in search speed.

— Option: -ignore_readdir_race
 

If a file disappears after its name has been read from a directory but before find gets around to examining the file with stat, don't issue an error message. If you don't specify this option, an error message will be issued. This option can be useful in system scripts (cron scripts, for example) that examine areas of the filesystem that change frequently (mail queues, temporary directories, and so forth), because this scenario is common for those sorts of directories. Completely silencing error messages from find is undesirable, so this option neatly solves the problem. There is no way to search one part of the filesystem with this option on and part of it with this option off, though. When this option is turned on and find discovers that one of the start-point files specified on the command line does not exist, no error message will be issued.

— Option: -noignore_readdir_race
 

This option reverses the effect of the ‘-ignore_readdir_race’ option.

 


Next: , Previous: Directories, Up: Finding Files

2.10 Filesystems

A filesystem is a section of a disk, either on the local host or mounted from a remote host over a network. Searching network filesystems can be slow, so it is common to make find avoid them.

There are two ways to avoid searching certain filesystems. One way is to tell find to only search one filesystem:

— Option: -xdev
— Option: -mount
 

Don't descend directories on other filesystems. These options are synonyms.

The other way is to check the type of filesystem each file is on, and not descend directories that are on undesirable filesystem types:

— Test: -fstype type
 

True if the file is on a filesystem of type type. The valid filesystem types vary among different versions of Unix; an incomplete list of filesystem types that are accepted on some version of Unix or another is:

          ext2 ext3 proc sysfs ufs 4.2 4.3 nfs tmp mfs S51K S52K

You can use ‘-printf’ with the ‘%F’ directive to see the types of your filesystems. The ‘%D’ directive shows the device number. See Print File Information. ‘-fstype’ is usually used with ‘-prune’ to avoid searching remote filesystems (see Directories).

Prev | Contents | Next



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019;

[an error occurred while processing this directive]