Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Unix Find Command Tutorial

Dr. Nikolai Bezroukov

Version 3.01


Introduction

 

See also
Softpanorama find page
Recommended Links
Man pages
Examples
xargs
tar
cpio

Unix find is a pretty tricky but very useful utility that can often fool even experienced UNIX professionals with ten on more years of sysadmins work under the belt. It can enhance functionality of those Unix utilities that does not include tree traversal (BTW GNU grep has -r option for this purpose and can be used on its own to perform tree traversal task: grep -r "search string" /tmp.).  There are several versions of find with the main two being POSIX find used in Solaris, AIX, etc and GNU find used in linux. GNU find can be installed on Solaris and AIX and it is actually a strong recommendation as there are some differences; moreover gnu find have additional capabilities that are often useful.

But find can do more then a simple tree traversal available with option -r (or -R) in many Unix utilities. Traversal provided by find can have excluded directory tree branches, can select files or directories using regular expressions, can be limited to specific typed of filesystem, etc. This capability is far above and beyond regular tree traversal of Unix utilities so find is a real Unix utility -- a useful enhancer of functionally of other utilities including both utilities that do not have capability to traverse the directory tree and those which have built-in simple recursive tree traversal

The idea behind find is extremely simple: this is a utility for searching files using the directory information and in this sense it is similar to ls. But it is more powerful then ls as it can provide " a ride" for other utilities and has an idiosyncratic mini-language for specifying queries, the language which probably outlived its usefulness but nobody has courage to replace it with a standard scripting language.

For obscure historical reasons find mini-language is completely different from all other UNIX commands: it has full-word options rather than single-letter options. For example, instead of a typical Unix-style option -f to match filenames (like in tar -xvf mytar.tar) find uses option -name. Also path to search can consist of multiple starting points, for example

find /usr /bin /sbin /opt -name sar # here we exclude non-relevant directories

In general you need to specify the set of starting points for a search through the file system first. The first argument starting with "-" is considered to be a start of "find expression". The latter can have side effects if you specified actions in the expression.

It is very important to understand that you can specify more than one directory as a starting point for the search. To look across the /bin and /var/html directory trees for filenames that contain the pattern *.htm*, you can use the following command:

find /usr /var/html -name "*.htm*" -print

Please note that you need quotes for any regex. Otherwise it will be evaluated immediately in the current context by shell.

It is simply impossible to remember all the details of this language unless you construct complex queries each day and that's why this page was created. Along with this page it make sense to consult the list of typical (and not so typical) examples which can be found in in Examples page on this site as well as in the links listed in Webliography. An excellent paper Advanced techniques for using the UNIX find command was written by Bill Zimmerly. I highly recommend to read it and then print and have a reference. Several examples in this tutorial are borrowed from the article.

The full find language is pretty complex and consist of several dozens of different predicates and options. There are two versions of this language: one implemented in POSIX find and the second implemented in GNU find which is a superset of POSIX find. That can make big difference in complex scripts. But for interactive use the differences is minor: only small subset of options is typically used on day-to-day basis by system administrators. Among them:

Other useful options of the find command include:

  1. -regex regex [GNU find only] File name matches regular expression. This is a match on the whole pathname not a filename. Stupidly enough the default regular expressions understood by find are Emacs Regular Expressions, not Perl regular expressions. It is important to note that "-iregex" option provide capability to ignore case.
  2. -perm permissions Locates files with certain permission settings. Often used for finding world-writable files or SUID files. See below
  3. -user Locates files that have specified ownership. Option -nouser locates files without ownership. For such files no user in /etc/passwd corresponds to file's numeric user ID (UID). such files are often created when tar of sip archive is transferred from other server on which the account probably exists under a different UID)
  4. -group Locates files that are owned by specified group. Option -nogroup means that no group corresponds to file's numeric group ID (GID) of the file
  5. -size Locates files with specified size. -size attribute lets you specify how big the files should be to match. You can specify your size in kilobytes and optionally also use + or - to specify size greater than or less than specified agrument. For example:
    find /home -name "*.txt" -size 100k 
    find /home -name "*.txt" -size +100k 
    find /home -name "*.txt" -size -100k 

    The first brings up files of exactly 100KB, the second only files greater than 100KB, and the last only files less than 100KB.

  6. -ls list current file in `ls -dils' format on standard output.
  7. -type Locates a certain type of file. The most typical options for -type are as following:

    For example to find a list of the directories use can use the -type specifier. Here's one example:

    find . -type d -print

Find logical expressions

It is possible to locate files and directories that match or do not match multiple conditions forming complex logical expression. Expressions can contain "escaped parentheses": parentheses have a special meaning in shell, so we need to escape that meaning, and write them as \( and \) or inside of single quotes as '(' and ')'. You cannot use single quotes around the entire expression though, as that will confuse the find command. It wants each predicate as its own word. For example:

By default options are concatenated using AND predicate. For example, if you want to obtain a list of all files accessed in the last 24 hours, execute the following command (with or without -print option):

find . -atime 0 -print

If the system administrator want a list of .profile used by all users, the following command should be executed:

find / -name .profile -print

You can also specify multiple "AND" conditions (AND logical condition is a default so you do not specify it explicitly). If you wanted to find a list of files that have been modified in the last 24 hours and which has a permission of 777, you would execute the following command:

find . -perm 777  -mtime 0 -print
Which is the same as:
find . -perm 777 -a -mtime 0 -a -print

The find command checks the specified options, going from left to right, once for each file or directory encountered.

The simplest invocation of find can be used to create a list of all files and directories below the current directory:

find . -print

You can use regular expressions to select files, for example those that have a .html suffix):

find . -name "*.html: -print

Selecting files using their age

Important thing to understand that unless -daystart option is used [Gnu find only], time in Unix find is measured in 24 hour periods (fractions are allowed in GNU find) from the current moment.

Unless -daystart option is used [Gnu find only], time in Unix find is measured in 24 hour periods from the current moment

Those 24 hours periods are usually called "days" but the definition of "days" used in find is different from common usage  (calendar days were each new day starts at midnight). The "day" in "find language" is interpreted as "24 hour periods starting from the current time". Here is how working with time ranges described in GNU find documentation  (Finding Files)

2.3.1 Age Ranges

These tests are mainly useful with ranges (‘+n’ and ‘-n’).

— Test: -atime n
— Test: -ctime n
— Test: -mtime n

True if the file was last accessed (or its status changed, or it was modified) n*24 hours ago. The number of 24-hour periods since the file's timestamp is always rounded down; therefore 0 means “less than 24 hours ago”, 1 means “between 24 and 48 hours ago”, and so forth. Fractional values are supported but this only really makes sense for the case where ranges (‘+n’ and ‘-n’) are used.

— Test: -amin n
— Test: -cmin n
— Test: -mmin n

True if the file was last accessed (or its status changed, or it was modified) n minutes ago. These tests provide finer granularity of measurement than ‘-atime’ et al., but rounding is done in a similar way (again, fractions are supported). For example, to list files in /u/bill that were last read from 2 to 6 minutes ago:

          find /u/bill -amin +2 -amin -6
— Option: -daystart
 

Measure times from the beginning of today rather than from 24 hours ago. So, to list the regular files in your home directory that were modified yesterday, do

          find ~/ -daystart -type f -mtime 1

The ‘-daystart’ option is unlike most other options in that it has an effect on the way that other tests are performed. The affected tests are ‘-amin’, ‘-cmin’, ‘-mmin’, ‘-atime’, ‘-ctime’ and ‘-mtime’. The ‘-daystart’ option only affects the behavior of any tests which appear after it on the command line.

>  
Previous: Age Ranges, Up: Time

2.3.2 Comparing Timestamps

— Test: -newerXY reference
 

Succeeds if timestamp ‘X’ of the file being considered is newer than timestamp ‘Y’ of the file reference. The letters ‘X’ and ‘Y’ can be any of the following letters:

‘a’
Last-access time of reference
 
‘B’
Birth time of reference (when this is not known, the test cannot succeed)
 
‘c’
Last-change time of reference
 
‘m’
Last-modification time of reference
 
‘t’
The reference argument is interpreted as a literal time, rather than the name of a file. See Date input formats, for a description of how the timestamp is understood. Tests of the form ‘-newerXt’ are valid but tests of the form ‘-newertY’ are not.

For example the test -newerac /tmp/foo succeeds for all files which have been accessed more recently than /tmp/foo was changed. Here ‘X’ is ‘a’ and ‘Y’ is ‘c’.

Not all files have a known birth time. If ‘Y’ is ‘b’ and the birth time of reference is not available, find exits with an explanatory error message. If ‘X’ is ‘b’ and we do not know the birth time the file currently being considered, the test simply fails (that is, it behaves like -false does).

Some operating systems (for example, most implementations of Unix) do not support file birth times. Some others, for example NetBSD-3.1, do. Even on operating systems which support file birth times, the information may not be available for specific files. For example, under NetBSD, file birth times are supported on UFS2 file systems, but not UFS1 file systems.

There are two ways to list files in /usr modified after February 1 of the current year. One uses ‘-newermt’:

     find /usr -newermt "Feb 1"

The other way of doing this works on the versions of find before 4.3.3:

     touch -t 02010000 /tmp/stamp$$
     find /usr -newer /tmp/stamp$$
     rm -f /tmp/stamp$$
— Test: -anewer file
— Test: -cnewer file
— Test: -newer file

True if the file was last accessed (or its status changed, or it was modified) more recently than file was modified. These tests are affected by ‘-follow’ only if ‘-follow’ comes before them on the command line. See Symbolic Links, for more information on ‘-follow’. As an example, to list any files modified since /bin/sh was last modified:

          find . -newer /bin/sh
— Test: -used n

True if the file was last accessed n days after its status was last changed. Useful for finding files that are not being used, and could perhaps be archived or removed to save disk space.

The find utility supports AND, OR, and NOT expressions you can select files created or modified during contain intervals, for example files that are at least one week old (7 days) but less then 30 days old, you can combine the predicates like this:

	$ find . -mtime +30 -a -mtime -7 -print0

Note: If you use parameters to find in scripts be careful when -mtime parameter is equal zero ( -mtime +0 ). Some version of GNU find incorrectly interprit the following expression

find -mtime +0 -mtime -1
which should be equivalent to
find  -mtime -1
but does not produce any files...

Using -exec option with find

Find is capable to perform various actions on the files or directories that are found. Among most commonly used actions are

Find is able to execute one or more commands for each file it has found with the -exec option. Unfortunately, one cannot simply enter the command. You need to remember two syntactic tricks:

  1. The command that you want to execute need to contain a special macro argument {}, which will be replaced by the matched filename on each invocation of -exec predicate.
  2. You need to specify \; (or ';' ) at the end of the command. (If the \ is left out, the shell will interpret the ; as the end of the find command.) If {} macro is the last item in the command then it should be a space between the {} and the \;.

    For example:

    find . -type d -exec ls -ld {} \;

Here are several "global" chmod tricks based on fine -exec capabilities:

find . -type f -exec chmod 500 {} ';'
This command will search in the current directory and all sub directories and change permissions of each file as specified.
find . -name "*rc.conf"  -exec chmod o+r '{}' \; 
find . -name "*rc.conf" -exec chmod o+r '{} ;' 

This command will search in the current directory and all sub directories. All files named *rc.conf will be processed by the chmod -o+r command. The argument '{}' is a macro that expands to each found file. The \; argument indicates the exec argument has ended.

The end results of this command is all *rc.conf files have the other permissions set to read access (if the operator is the owner of the file).

Note: The -print option will print out the path of any file that is found with that name. In general -print is a default option.

The find command is commonly used to remove core files that are more than a few 24-hour periods (days) old. These core files are copies of the actual memory image of a running program when the program dies unexpectedly. They can be huge, so occasionally trimming them is wise:

find . -name core -ctime +4 -exec /bin/rm -f {} \;
For grep the /dev/null argument can by used to show the name of the file before the text that is found. Without it, only the text found is printed. An equivalent mechanism in GNU find is to use the "-H" or "--with-filename" option to grep:

find /tmp -exec grep "search string" '{}' /dev/null \; -print

An alternative to -exec option is piping output into xargs command which we will discuss in the next section.

Feeding find output to pipes with xargs

One of the biggest limitations of the -exec option (or predicate with the side effect to be more correct) is that it can only run the specified command on one file at a time. The xargs command solves this problem by enabling users to run a single command on many files at one time. In general, it is much faster to run one command on many files, because this cuts down on the number of invocations of particular command/utility.

For example often one needs to find files containing a specific pattern in multiple directories one can use an exec option in find (please note that you should use the -l flag for grep so that grep specifies the matched filenames):

find . -type f -exec grep -li '/bin/ksh' {} \;

But there is more elegant and more Unix-like way of accomplishing the same task using xarg and pipes. You can use the xargs to read the output of find and build a pipelines that invokes grep. This way, grep is called only four or five times even though it might check through 200 or 300 files. By default, xargs always appends the list of filenames to the end of the specified command, so using it is as easy as can be:

find . -type f -print | xargs grep -li 'bin/ksh'

This gave the same output, but it was a lot faster. Also when grep is getting multiple filenames, it will automatically include the filename of any file that contains a match so option for grep -l is redundant:

find . -type f -print | xargs grep -i 'bin/ksh'

When used in combination, find, grep, and xargs are a potent team to help find files lost or misplaced anywhere in the UNIX file system. I encourage you to experiment further with these important commands to find ways they can help you work with UNIX. You can use time to find the difference in speed with -exec option vs xarg in the following way:

time find /usr/src -name "*.html" -exec grep -l foo '{}' ';' | wc -l

time find /usr/src -name "*.html" | xargs grep -l foo | wc -l

xargs works considerably faster. The difference becomes even greater when more complex commands are run and the list of files is longer.

find /mnt/zip -name "*prefs copy" -print | xargs rm

This is actually dangerous if you have a filename with spaces. If you add option -print0, you can avoid this danger:

find /mnt/zip -name "*prefs copy" -print0 | xargs rm

Two other useful options for xargs are the -p option, which makes xargs interactive, and the -n args option, which makes xargs run the specified command with only args number of arguments.

Some people wonder why there is a -p option. xargs runs the specified command on the filenames from its standard input, so interactive commands such as cp -i, mv -i, and rm -i don't work right. The -p option solves that problem. In the preceding example, the -p option would have made the command safe because I could answer yes or no to each file. Thus, the command I typed was the following:

find /mnt/zip -name "*prefs copy" -print0 | xargs -p rm

Many users frequently ask why xargs should be used when shell command substitution archives the same results. Take a look at this example:

grep -l foo ΄find /usr/src/linux -name "*.html"΄

The drawback with commands such as this is that if the set of files returned by find is longer than the system's command-line length limit, the command will fail. The xargs approach gets around this problem because xargs runs the command as many times as is required, instead of just once.

SUID/SGUID games

Suid root refers to a special attribute called set user id. This allows the program to do functions not normally allowed for users to do themselves. Low level networking routines, controlling graphical display functions, changing passwords, and logging in are all examples of programs that rely on executing their functions as a user that is not restricted by standard file permissions. While many programs need this functionality, the program must be bug free in only allowing the user to do the function the program was designed for. Every SUID root program represents a potential security problem.

The first step in controlling SUID root programs is to have a baseline, the list of all SUID program in the system. This can be achieved quite easily by using find:

find / -type f -perm +6000 -exec ls -l {} \; > suid.list (note: this will find both set user id and set group id programs)

Above command is using GNU find and executes ls command. You cal use option -ls instead but output will be slightly different. Solaris POSIX find command different:

find / -type f \( -perm -4000 -o -perm -2000 \) -exec ls -l {} \;

This command will find all the SUID programs on a system and pipes the commands to a file called SUID.list. The next step in controlling SUID root programs is to analyze which programs should not be SUID root or can be removed without impeding system functionality. An obvious example of something that should not be SUID root is /usr/X11R6/bin/SuperProbe. This is a program merely used for testing purposes.

'chmod -s /usr/X11R6/bin/SuperProbe'

Other programs that are unneeded to be SUID root include anything in the svgalib hierarchy. This library itself is buggy and nothing that depends on it should be SUID root in a secure system.

Here is an example of minimized SUID.list though perhaps a little too overzealous. For example, the functionality that does not exist with this setup is ability to use ping and traceroute by a regular users and this is a typical security paranoia overkill. It can be compensated by controlling access to those program via sudo but this is road to nowhere. But in any case minimization of the number of SUID program is task worth trying. It is excessive zeal that hurts...

Finding Word Writable, Abandoned and other Abnormal Files

To find all world writable directories:

find / -perm -0002 -type d -print

To find all world writable files:


find / -perm -0002 -type f -print
find / -perm -2 ! -type l -ls

Find files with messed UID or GID

find / -nouser -o -nogroup -print

Find links that point to nothing

To find links that point to nothing, use the perl interpreter with find, like this:

$ find / -type l -print | perl -nle '-e || print';
This command starts at the topmost directory (/) and lists all links (-type l -print) that the perl interpreter determines point to nothing (-nle '-e || print') -- see the Resources section for more information regarding this tip from the Unix Guru Universe site. You can further pipe the output through the rm -f {} functionality if you want to delete the files. Perl is, of course, one of the many powerful interpretive language tools also found in most UNIX toolkits.

List zero-length files

To list all zero-length files, use this command:

$ find . -empty -exec ls {} \;
After finding empty files, you might choose to delete them by replacing the ls command with the rm command.

Clean out core dumps and temporary files

You can also use find to delete temporary files, whatever they may be. For example:

find . \( -name a.out -o -name '*.o' -o -name 'core' \) -exec rm {} \;

Using find for backups

The find command lets you copy the entire contents of a directory while preserving the permissions, times, and ownership of every file and subdirectory. Because find capabilities to sp[ecify complex criteria for files it can create a perfect list of files for cpio, tar, pax and aother archivers to backup

Fortunatly find has several options that are very usful for structuring the backup:

The typical usage is to combine find and the cpio command, as the latter accepts the list of files via standard input.  Tar can do this too with -T - option.  Typically each mount point is backed up in a separate tar or cpio archive. 

cd /usr

find /usr -mount fstype ext3 - | cpio -pdumv /backup/usr080124.cpi

or, using tar:

find /usr -mount fstype ext3 -print0 | tar -null -cvzf /backup/usr080124.tgz
It is also possible to do incremental backups using -newer option
find /usr -newer /backup/usr080124.tgz -mount fstype ext3 -print0 | tar -null -cvzf /backup/usr_delta080124.tgz

You can also try to avoid errors in backing up named pipes, devises, etc  using more complex traversal expressions, for example

find / -mount -fstype ext3 \( -type f -or -type l \) > /tmp/root_list.txt

The problem here is with hard linked files. That that is problem of tar not find. The cpio command is a more sophisticated backup tool than tar. It is harder to use, but is capable of copying special files (such as devices and links) consistently, and will accept wildcard characters when listing the files to be archived.

On higher level you might benefit from exclusion of all files that are not changes in RPMs from which system was installed.  This is the approach taken by  backup built-in in YAST (it uses tar, not cpio). While tar cannot accept the list of files as standard input it has the -T option which can be used to specify the location of file with list of files to be tarred". Here is how this option is described in the manual:

Instead of giving the names of files or archive members on the command line, you can put the names into a file, and then use the ‘--files-from=file-of-names’ -T file-of-names’) option to tar. Give the name of the file which contains the list of files to include as the argument to ‘--files-from’. In the list, the file names should be separated by newlines. You will frequently use this option when you have generated the list of files to archive with the find utility. 

... ... ...

In the file list given by ‘-T’ option, any file name beginning with ‘-’ character is considered a tar option and is processed accordingly.(14) For example, the common use of this feature is to change to another directory by specifying ‘-C’ option:

$ cat list
-C/etc
passwd
hosts
-C/lib
libc.a
$ tar -c -f foo.tar --files-from list

For example if we want to archive file that has size less then 1000 we can first create of list of such files using find and then use tar to created an archive.

find .  -size -1K -print > /etc/small-files
tar -cvzT /etc/small-files -f little.tgz

You can also compress the archive with gzip of the fly:

tar -zPvcf backup.tar.gz -T list_of_files_to_be_tarred_or_list_of_locations

You will want to use the ‘--label=archive-label’ (‘-V archive-label’) option to give the archive a volume label, so you can tell what this archive is even if the label falls off the tape, or anything like that.

Unless the file system you are dumping is guaranteed to fit on one volume, you might need to use the ‘--multi-volume’ (‘-M’) option.

Like find,  tar has an option of that prevent it from crossing the filesystem (partition) boundaries:  ‘--one-file-system’ option to prevent  from crossing file system boundaries when storing (sub)directories.

It also has the ‘--incremental’ (‘-G’) option  (see section Using tar to Perform Incremental Dumps).

Summary

Clearly, your use of the UNIX find command is limited only by your knowledge and creativity. The find command has a lot of options, and to get the full power out of find, xargs, and grep, you need to experiment. Among other things you can specify:

Selected Examples

Option Meaning Example
-atime n

-atime +n

-atime -n

-size

True if file was accessed n 24-hour periods (days) ago (n), accessed more then n 24-hour periods (days) ago(+n) or less than n 24-hour periods (days) ago (-n)
  • -mtime +7 Matches files modified more than 7 24-hour periods (days) ago
  • -atime -2 Matches files accessed less than 2 24-hour periods (days) ago
  • -size +100 Matches files larger than 100 blocks (50K)
-ctime n True if the file was created n 24-hour periods (days) ago. find . -ctime +30 -type f -exec rm {} ';'
-exec command Execute command. find . -mtime -2 -type f -exec mv {} ../Spam_collector \;
-mtime n True if file was modified n 24-hour periods (days) ago. find . -mtime -2 -type f -exec mv {} ../Spam_collector \;
-name pattern True if filename matches pattern.
-print Print names of files found.
-type c True if file is of type c find . -mtime -2 -type f -exec mv {} ../Spam_collector \;
-user name True if file is owned by user name.


Multiple options are joined by AND by default. OR may be specified with the -o flag and the use of grouped parentheses. For example, to match all files modified more than 90 24-hour periods (days) ago or accessed more than 30 24-hour periods (days) ago, use

\( -mtime +90 -o -atime +30 \)

NOT should be specified with a backslash before exclamation point. For example, to match all files ending in .txt except the file starting with "a-z", use:

\! -name "[a-z]*" -name "*.txt"

Webliography

Note: More extensive list of links can be found at Softpanorama Unix Find Page



Copyright © 1996-2007 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Created: May 16, 1997; Last modified: May 01, 2008