|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
|
Version 3.01
| See also |
| Softpanorama find page |
| Recommended Links |
| Man pages |
| Examples |
| xargs |
| tar |
| cpio |
Unix find is a pretty tricky but very useful utility that can often fool even experienced UNIX professionals with ten on more years of sysadmins work under the belt. It can enhance functionality of those Unix utilities that does not include tree traversal (BTW GNU grep has -r option for this purpose and can be used on its own to perform tree traversal task: grep -r "search string" /tmp.). There are several versions of find with the main two being POSIX find used in Solaris, AIX, etc and GNU find used in linux. GNU find can be installed on Solaris and AIX and it is actually a strong recommendation as there are some differences; moreover gnu find have additional capabilities that are often useful.
But find can do more then a simple tree traversal available with option -r (or -R) in many Unix utilities. Traversal provided by find can have excluded directory tree branches, can select files or directories using regular expressions, can be limited to specific typed of filesystem, etc. This capability is far above and beyond regular tree traversal of Unix utilities so find is a real Unix utility -- a useful enhancer of functionally of other utilities including both utilities that do not have capability to traverse the directory tree and those which have built-in simple recursive tree traversal
The idea behind find is extremely simple: this is a utility for searching files using the directory information and in this sense it is similar to ls. But it is more powerful then ls as it can provide " a ride" for other utilities and has an idiosyncratic mini-language for specifying queries, the language which probably outlived its usefulness but nobody has courage to replace it with a standard scripting language.
For obscure historical reasons find mini-language is completely different from all other UNIX commands: it has full-word options rather than single-letter options. For example, instead of a typical Unix-style option -f to match filenames (like in tar -xvf mytar.tar) find uses option -name. Also path to search can consist of multiple starting points, for example
find /usr /bin /sbin /opt -name sar # here we exclude non-relevant directories
In general you need to specify the set of starting points for a search through the file system first. The first argument starting with "-" is considered to be a start of "find expression". The latter can have side effects if you specified actions in the expression.
It is very important to understand that you can specify more than one directory as a starting point for the search. To look across the /bin and /var/html directory trees for filenames that contain the pattern *.htm*, you can use the following command:
find /usr /var/html -name "*.htm*" -print
Please note that you need quotes for any regex. Otherwise it will be evaluated immediately in the current context by shell.
It is simply impossible to remember all the details of this language unless you construct complex queries each day and that's why this page was created. Along with this page it make sense to consult the list of typical (and not so typical) examples which can be found in in Examples page on this site as well as in the links listed in Webliography. An excellent paper Advanced techniques for using the UNIX find command was written by Bill Zimmerly. I highly recommend to read it and then print and have a reference. Several examples in this tutorial are borrowed from the article.
The full find language is pretty complex and consist of several dozens of different predicates and options. There are two versions of this language: one implemented in POSIX find and the second implemented in GNU find which is a superset of POSIX find. That can make big difference in complex scripts. But for interactive use the differences is minor: only small subset of options is typically used on day-to-day basis by system administrators. Among them:
Use the -iname predicate (GNU find supports it) to run a case-insensitive search, rather than just -name. For example:
$ find . -follow -iname '*.htm' -print0 | xargs -i -0 mv '{}' ~/webhome
Usage of -print0 is a simple
insurance for the correct processing of files with spaces.
|
Note: If you use parameters to find in scripts be careful when -mtime parameter is equal zero ( -mtime +0 ). Some version of GNU find incorrectly interprit the following expression find -mtime +0 -mtime -1which should be equivalent to find -mtime -1but does not produce any files |
find . -mtime -7 -name "*.html" -print
If you use the number 7 (without a hyphen), find will match only html files that were modified exactly seven 24-hour periods (days) ago:
find . -mtime 7 -name "*.html" -print
find . -mtime +7 -name "*.html" -print
find . -wholename '/lib*'which will print entries from directories /lib64 and /lib. To ignore the directories specified, use option -prune For example, to skip the directory /proc and all files and directories under it (which is important for linux as otherwise errors are produced you can something like this:
find . -wholename '/proc' -prune -o -name file_to_be_foundIf you administer a lot of linux boxes it is better to create alias ff:
if [[ `uname` == "Linux" ]] ; do
alias ff='find . -wholename '/proc' -prune -o -name '
else
ff='find . -name ' # not GNU find does not support -wholename
fi
Other useful options of the find command include:
-size attribute
lets you specify how big the files should be to match. You can specify your
size in kilobytes and optionally also use + or - to
specify size greater than or less than specified agrument. For example:find /home -name "*.txt" -size 100k
find /home -name "*.txt" -size +100k
find /home -name "*.txt" -size -100k
The first brings up files of exactly 100KB, the second only files greater than 100KB, and the last only files less than 100KB.
For example to find a list of the directories use can use the -type specifier. Here's one example:
find . -type d -print
It is possible to locate files and directories that match or do not match multiple conditions forming complex logical expression. Expressions can contain "escaped parentheses": parentheses have a special meaning in shell, so we need to escape that meaning, and write them as \( and \) or inside of single quotes as '(' and ')'. You cannot use single quotes around the entire expression though, as that will confuse the find command. It wants each predicate as its own word. For example:
find . \! -name "*.gz" -exec gzip {} \;
find / -type f \( -perm -4000 -o -perm -2000 \) -exec ls -l {} \;
By default options are concatenated using AND predicate. For example, if you want to obtain a list of all files accessed in the last 24 hours, execute the following command (with or without -print option):
find . -atime 0 -print
If the system administrator want a list of .profile used by all users, the following command should be executed:
find / -name .profile -print
You can also specify multiple "AND" conditions (AND logical condition is a default so you do not specify it explicitly). If you wanted to find a list of files that have been modified in the last 24 hours and which has a permission of 777, you would execute the following command:
find . -perm 777 -mtime 0 -printWhich is the same as:find . -perm 777 -a -mtime 0 -a -print
The find command checks the specified options, going from left to right, once for each file or directory encountered.
The simplest invocation of find can be used to create a list of all files and directories below the current directory:
find . -print
You can use regular expressions to select files, for example those that have a .html suffix):
find . -name "*.html: -print
Important thing to understand that unless -daystart option is used [Gnu find only], time in Unix find is measured in 24 hour periods (fractions are allowed in GNU find) from the current moment.
| Unless -daystart option is used [Gnu find only], time in Unix find is measured in 24 hour periods from the current moment |
Those 24 hours periods are usually called "days" but the definition of "days" used in find is different from common usage (calendar days were each new day starts at midnight). The "day" in "find language" is interpreted as "24 hour periods starting from the current time". Here is how working with time ranges described in GNU find documentation (Finding Files)
2.3.1 Age Ranges
These tests are mainly useful with ranges (+n and -n).
Test: -atime n
Test: -ctime n
Test: -mtime n Test: -amin nTrue if the file was last accessed (or its status changed, or it was modified) n*24 hours ago. The number of 24-hour periods since the file's timestamp is always rounded down; therefore 0 means less than 24 hours ago, 1 means between 24 and 48 hours ago, and so forth. Fractional values are supported but this only really makes sense for the case where ranges (+n and -n) are used.
Test: -cmin n
Test: -mmin n Option: -daystartTrue if the file was last accessed (or its status changed, or it was modified) n minutes ago. These tests provide finer granularity of measurement than -atime et al., but rounding is done in a similar way (again, fractions are supported). For example, to list files in /u/bill that were last read from 2 to 6 minutes ago:
find /u/bill -amin +2 -amin -6
>Measure times from the beginning of today rather than from 24 hours ago. So, to list the regular files in your home directory that were modified yesterday, do
find ~/ -daystart -type f -mtime 1The -daystart option is unlike most other options in that it has an effect on the way that other tests are performed. The affected tests are -amin, -cmin, -mmin, -atime, -ctime and -mtime. The -daystart option only affects the behavior of any tests which appear after it on the command line.
Previous: Age Ranges, Up: Time2.3.2 Comparing Timestamps
Test: -newerXY reference
Succeeds if timestamp X of the file being considered is newer than timestamp Y of the file reference. The letters X and Y can be any of the following letters:
- a
- Last-access time of reference
- B
- Birth time of reference (when this is not known, the test cannot succeed)
- c
- Last-change time of reference
- m
- Last-modification time of reference
- t
- The reference argument is interpreted as a literal time, rather than the name of a file. See Date input formats, for a description of how the timestamp is understood. Tests of the form -newerXt are valid but tests of the form -newertY are not.
For example the test
-newerac /tmp/foosucceeds for all files which have been accessed more recently than /tmp/foo was changed. Here X is a and Y is c.Not all files have a known birth time. If Y is b and the birth time of reference is not available,
findexits with an explanatory error message. If X is b and we do not know the birth time the file currently being considered, the test simply fails (that is, it behaves like-falsedoes).Some operating systems (for example, most implementations of Unix) do not support file birth times. Some others, for example NetBSD-3.1, do. Even on operating systems which support file birth times, the information may not be available for specific files. For example, under NetBSD, file birth times are supported on UFS2 file systems, but not UFS1 file systems.
There are two ways to list files in /usr modified after February 1 of the current year. One uses -newermt:
find /usr -newermt "Feb 1"The other way of doing this works on the versions of find before 4.3.3:
touch -t 02010000 /tmp/stamp$$ find /usr -newer /tmp/stamp$$ rm -f /tmp/stamp$$ Test: -anewer file
Test: -cnewer file
Test: -newer file Test: -used nTrue if the file was last accessed (or its status changed, or it was modified) more recently than file was modified. These tests are affected by -follow only if -follow comes before them on the command line. See Symbolic Links, for more information on -follow. As an example, to list any files modified since /bin/sh was last modified:
find . -newer /bin/shTrue if the file was last accessed n days after its status was last changed. Useful for finding files that are not being used, and could perhaps be archived or removed to save disk space.
The find utility supports AND, OR, and NOT expressions you can select files created or modified during contain intervals, for example files that are at least one week old (7 days) but less then 30 days old, you can combine the predicates like this:
$ find . -mtime +30 -a -mtime -7 -print0
Note: If you use parameters to find in scripts be careful when -mtime parameter is equal zero ( -mtime +0 ). Some version of GNU find incorrectly interprit the following expression
find -mtime +0 -mtime -1which should be equivalent to
find -mtime -1but does not produce any files...
Find is capable to perform various actions on the files or directories that are found. Among most commonly used actions are
Find is able to execute one or more commands for each file it has found with the -exec option. Unfortunately, one cannot simply enter the command. You need to remember two syntactic tricks:
For example:
find . -type d -exec ls -ld {} \;
Here are several "global" chmod tricks based on fine -exec capabilities:
find . -type f -exec chmod 500 {} ';'
This command will search in the current directory and all sub directories and change
permissions of each file as specified.
find . -name "*rc.conf" -exec chmod o+r '{}' \;find . -name "*rc.conf" -exec chmod o+r '{} ;'
This command will search in the current directory and all sub directories. All files named *rc.conf will be processed by the chmod -o+r command. The argument '{}' is a macro that expands to each found file. The \; argument indicates the exec argument has ended.
The end results of this command is all *rc.conf files have the other permissions set to read access (if the operator is the owner of the file).
Note: The -print option will print out the path of any file that is found with that name. In general -print is a default option.
The find command is commonly used to remove core files that are more than a few 24-hour periods (days) old. These core files are copies of the actual memory image of a running program when the program dies unexpectedly. They can be huge, so occasionally trimming them is wise:
find . -name core -ctime +4 -exec /bin/rm -f {} \;
For grep the /dev/null argument can by used to show the name of the file
before the text that is found. Without it, only the text found is printed. An equivalent
mechanism in GNU find is to use the "-H" or "--with-filename"
option to grep:find /tmp -exec grep "search string" '{}' /dev/null \; -print
An alternative to -exec option is piping output into xargs command which we will discuss in the next section.
One of the biggest limitations of the -exec option (or predicate with the side effect to be more correct) is that it can only run the specified command on one file at a time. The xargs command solves this problem by enabling users to run a single command on many files at one time. In general, it is much faster to run one command on many files, because this cuts down on the number of invocations of particular command/utility.
For example often one needs to find files containing a specific pattern in multiple directories one can use an exec option in find (please note that you should use the -l flag for grep so that grep specifies the matched filenames):
find . -type f -exec grep -li '/bin/ksh' {} \;
But there is more elegant and more Unix-like way of accomplishing the same task using xarg and pipes. You can use the xargs to read the output of find and build a pipelines that invokes grep. This way, grep is called only four or five times even though it might check through 200 or 300 files. By default, xargs always appends the list of filenames to the end of the specified command, so using it is as easy as can be:
find . -type f -print | xargs grep -li 'bin/ksh'
This gave the same output, but it was a lot faster. Also when grep is getting multiple filenames, it will automatically include the filename of any file that contains a match so option for grep -l is redundant:
find . -type f -print | xargs grep -i 'bin/ksh'
When used in combination, find, grep, and xargs are a potent team to help find files lost or misplaced anywhere in the UNIX file system. I encourage you to experiment further with these important commands to find ways they can help you work with UNIX. You can use time to find the difference in speed with -exec option vs xarg in the following way:
time find /usr/src -name "*.html" -exec grep -l foo '{}' ';' | wc -l
time find /usr/src -name "*.html" | xargs grep -l foo | wc -l
xargs works considerably faster. The difference becomes even greater when more complex commands are run and the list of files is longer.
find /mnt/zip -name "*prefs copy" -print | xargs rm
This is actually dangerous if you have a filename with spaces. If you add option -print0, you can avoid this danger:
find /mnt/zip -name "*prefs copy" -print0 | xargs rm
Two other useful options for xargs are the -p option, which makes xargs interactive, and the -n args option, which makes xargs run the specified command with only args number of arguments.
Some people wonder why there is a -p option. xargs runs the specified command on the filenames from its standard input, so interactive commands such as cp -i, mv -i, and rm -i don't work right. The -p option solves that problem. In the preceding example, the -p option would have made the command safe because I could answer yes or no to each file. Thus, the command I typed was the following:
find /mnt/zip -name "*prefs copy" -print0 | xargs -p rm
Many users frequently ask why xargs should be used when shell command substitution archives the same results. Take a look at this example:
grep -l foo ΄find /usr/src/linux -name "*.html"΄
The drawback with commands such as this is that if the set of files returned by find is longer than the system's command-line length limit, the command will fail. The xargs approach gets around this problem because xargs runs the command as many times as is required, instead of just once.
Suid root refers to a special attribute called set user id. This allows the program to do functions not normally allowed for users to do themselves. Low level networking routines, controlling graphical display functions, changing passwords, and logging in are all examples of programs that rely on executing their functions as a user that is not restricted by standard file permissions. While many programs need this functionality, the program must be bug free in only allowing the user to do the function the program was designed for. Every SUID root program represents a potential security problem.
The first step in controlling SUID root programs is to have a baseline, the list of all SUID program in the system. This can be achieved quite easily by using find:
find / -type f -perm +6000 -exec ls -l {} \; > suid.list (note: this will find both set user id and set group id programs)
Above command is using GNU find and executes ls command. You cal use option -ls instead but output will be slightly different. Solaris POSIX find command different:
find / -type f \( -perm -4000 -o -perm -2000 \) -exec ls -l {} \;
This command will find all the SUID programs on a system and pipes the commands to a file called SUID.list. The next step in controlling SUID root programs is to analyze which programs should not be SUID root or can be removed without impeding system functionality. An obvious example of something that should not be SUID root is /usr/X11R6/bin/SuperProbe. This is a program merely used for testing purposes.
'chmod -s /usr/X11R6/bin/SuperProbe'
Other programs that are unneeded to be SUID root include anything in the svgalib hierarchy. This library itself is buggy and nothing that depends on it should be SUID root in a secure system.
Here is an example of minimized SUID.list though perhaps a little too overzealous. For example, the functionality that does not exist with this setup is ability to use ping and traceroute by a regular users and this is a typical security paranoia overkill. It can be compensated by controlling access to those program via sudo but this is road to nowhere. But in any case minimization of the number of SUID program is task worth trying. It is excessive zeal that hurts...
find / -perm -0002 -type d -print
To find all world writable files:
find / -perm -0002 -type f -print
find / -perm -2 ! -type l -ls
Find files with messed UID or GID
find / -nouser -o -nogroup -print
Find links that point to nothing
To find links that point to nothing, use the perl interpreter with
find, like this:
$ find / -type l -print | perl -nle '-e || print';This command starts at the topmost directory (/) and lists all links (-type l -print) that theperlinterpreter determines point to nothing (-nle '-e || print') -- see the Resources section for more information regarding this tip from the Unix Guru Universe site. You can further pipe the output through therm -f {}functionality if you want to delete the files. Perl is, of course, one of the many powerful interpretive language tools also found in most UNIX toolkits.To list all zero-length files, use this command:
$ find . -empty -exec ls {} \;After finding empty files, you might choose to delete them by replacing thelscommand with thermcommand.Clean out core dumps and temporary files
You can also use
findto delete temporary files, whatever they may be. For example:find . \( -name a.out -o -name '*.o' -o -name 'core' \) -exec rm {} \;Using find for backups
The
findcommand lets you copy the entire contents of a directory while preserving the permissions, times, and ownership of every file and subdirectory. Because find capabilities to sp[ecify complex criteria for files it can create a perfect list of files for cpio, tar, pax and aother archivers to backupFortunatly find has several options that are very usful for structuring the backup:
The typical usage is to combine find and the cpio command,
as the latter accepts the list of files via standard input. Tar can do this
too with -T - option. Typically each mount point is backed up in a separate
tar or cpio archive.
cd /usr find /usr -mount fstype ext3 - | cpio -pdumv /backup/usr080124.cpi
or, using tar:
find /usr -mount fstype ext3 -print0 | tar -null -cvzf /backup/usr080124.tgzIt is also possible to do incremental backups using -newer option
find /usr -newer /backup/usr080124.tgz -mount fstype ext3 -print0 | tar -null -cvzf /backup/usr_delta080124.tgz
You can also try to avoid errors in backing up named pipes, devises, etc using more complex traversal expressions, for example
find / -mount -fstype ext3 \( -type f -or -type l \) > /tmp/root_list.txt
The problem here is with hard linked files. That that is problem of tar not find. The cpio command is a more sophisticated backup tool than tar. It is harder to use, but is capable of copying special files (such as devices and links) consistently, and will accept wildcard characters when listing the files to be archived.
On higher level you might benefit from exclusion of all files that are not changes in RPMs from which system was installed. This is the approach taken by backup built-in in YAST (it uses tar, not cpio). While tar cannot accept the list of files as standard input it has the -T option which can be used to specify the location of file with list of files to be tarred". Here is how this option is described in the manual:
Instead of giving the names of files or archive members on the command line, you can put the names into a file, and then use the --files-from=file-of-names (-T file-of-names) option to
tar. Give the name of the file which contains the list of files to include as the argument to --files-from. In the list, the file names should be separated by newlines. You will frequently use this option when you have generated the list of files to archive with thefindutility.... ... ...
In the file list given by -T option, any file name beginning with - character is considered a
taroption and is processed accordingly.(14) For example, the common use of this feature is to change to another directory by specifying -C option:$ cat list -C/etc passwd hosts -C/lib libc.a $ tar -c -f foo.tar --files-from list
For example if we want to archive file that has size less then 1000 we can first create of list of such files using find and then use tar to created an archive.
find . -size -1K -print > /etc/small-files tar -cvzT /etc/small-files -f little.tgz
You can also compress the archive with gzip of the fly:
tar -zPvcf backup.tar.gz -T list_of_files_to_be_tarred_or_list_of_locations
You will want to use the --label=archive-label (-V archive-label) option to give the archive a volume label, so you can tell what this archive is even if the label falls off the tape, or anything like that.
Unless the file system you are dumping is guaranteed to fit on one volume, you might need to use the --multi-volume (-M) option.
Like find, tar has an option of that prevent it from crossing
the filesystem (partition) boundaries: --one-file-system option
to prevent from crossing file system boundaries when storing (sub)directories.
It also has the --incremental (-G) option
(see section
Using tar to Perform Incremental Dumps).
Clearly, your use of the UNIX find command is limited only by your
knowledge and creativity. The find command has a lot of options, and to get
the full power out of find, xargs, and grep, you need to experiment. Among other
things you can specify:
| Option | Meaning | Example |
|---|---|---|
| -atime n -atime +n -atime -n -size |
True if file was accessed n 24-hour periods (days) ago (n), accessed more then n 24-hour periods (days) ago(+n) or less than n 24-hour periods (days) ago (-n) |
|
| -ctime n | True if the file was created n 24-hour periods (days) ago. | find . -ctime +30 -type f -exec rm {} ';' |
| -exec command | Execute command. | find . -mtime -2 -type f -exec mv {} ../Spam_collector \; |
| -mtime n | True if file was modified n 24-hour periods (days) ago. | find . -mtime -2 -type f -exec mv {} ../Spam_collector \; |
| -name pattern | True if filename matches pattern. | |
| Print names of files found. | ||
| -type c | True if file is of type c | find . -mtime -2 -type f -exec mv {} ../Spam_collector \; |
| -user name | True if file is owned by user name. |
Multiple options are joined by AND by default. OR may be specified with the -o flag and the use of grouped parentheses. For example, to match all files modified more than 90 24-hour periods (days) ago or accessed more than 30 24-hour periods (days) ago, use
\( -mtime +90 -o -atime +30 \)
NOT should be specified with a backslash before exclamation point. For example, to match all files ending in .txt except the file starting with "a-z", use:
\! -name "[a-z]*" -name "*.txt"
Note: More extensive list of links can be found at Softpanorama Unix Find Page
Advanced techniques for using the UNIX find command --
an excellent article by Bill Zimmerly from which several
examples were borrowed.
Finding Files - Table of Contents by David MacKenzie Edition 1.1, for GNU
find version 4.1 November 1994
Find tutorial
Copyright 2001 Bruce Barnett and General Electric Company
Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Created: May 16, 1997; Last modified: May 01, 2008