Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Unix find tutorial

Prev | Contents | Next

Part 4: Selecting files using their age


Introduction

Along with other usage, using age is one of the very powerful method of finding "lost" files in a large filesystem. When you can't remember iether the name of the file or directory in which you put it, but remember when it was created or last modified you can narrow your search considerably. It is also important to get rid on old files in certain directories such as /tmp.

Also list of files modified during particular day serve as a useful report about your activities of the sysadmin.  If several sysadmins work on the same server it can warn you about activities of other members of the team, who typically do not communicate all changes they made. 

Find permits selection of files based on Unix mtime, ctime, and atime attributes. Those three attributes have one numeric parameter -- n which time interval  -- an integer with optional sign -- is measured in 24-hour periods (days) or minutes counted from the current moment (GNU find only.

Note: It's evident that mtime -1 and mtime 0 are the same and both means "today".

The unit of time measurement in mtime, ctime, and atime attributes is 24 hour periods (a day). GNU find also permits using minutes for the period, for example:

find / -mmin -10

Usage in metadata search

Predicates mtime, ctime, and atime are   useful   for metadata search, especially mtime. For example, if you forget where you downloaded a file (but you know that you downloaded it today), you can try to find it using:

$ find ~ -type f -mtime 0 -ls
Generally finding  files lost in large filesystem became more and more important activity for most sysadmins and even power users (such as webmasters).  And typically you know when you last modified or downloaded the file so you have time interval in which one of timestamps reside.

But, of cause time is just one parameter of such search. You should try to combine several attributes to narrow your search.

Usually you can make some guesses about  a least three attributes out of following:

Using file name and timestamps is called metadata search in which NSA is so famously efficient. Unfortunately they do not provide us their tools ;-).  But as a starting point find is good enough and on SSD drives it is even fast enough.

Note: After you extracted the list of files with find it make sense to write them down and then use in a separate elimination process using grep.  Several instances of grep can be run in parallel via xargs. See grep command

Usage of the reference file instead of time period

With Gnu find you can also use a reference file instead of specifying number of days

— Test: -anewer file
— Test: -cnewer file
— Test: -newer file

True if the file was last accessed (or its status changed, or it was modified) more recently than file was modified. These tests are affected by ‘-follow’ only if ‘-follow’ comes before them on the command line. See Symbolic Links, for more information on ‘-follow’. As an example, to list any files modified since /bin/sh was last modified:

          find . -newer /bin/sh
This file is not necessary should be a file that is already present in the filesystem. You can create an artificial file with touch command just before execution of the find command explicitly for this particular purpose.

this capability is very useful for finding files. If you know two such files: one created before the file you are trying to find and one  created after, you can specify the time period more precisely, then with mtime predicate.

You can also use certain suffixes to newer predicate:

— Test: -newerXY reference

Succeeds if timestamp ‘X’ of the file being considered is newer than timestamp ‘Y’ of the file reference. The letters ‘X’ and ‘Y’ can be any of the following letters:

‘a’
Last-access time of reference
‘B’
Birth time of reference (when this is not known, the test cannot succeed)
‘c’
Last-change time of reference
‘m’
Last-modification time of reference
‘t’
Can be used only as "Y" (the second letter). The reference argument is interpreted as a literal time, rather than the name of a file. See Date input formats, for a description of how the timestamp is understood. Tests of the form ‘-newerXt’ are valid but tests of the form ‘-newertY’ are not. The simplest format  is YYYYMMDD HH:MM:SS. You can get such string from date command or specify it explicitly:
timestamp="20130207 00:38:51"
find -maxdepth 1 -type f newermt "$timestamp"' 

For example the test -newerac /tmp/foo  succeeds for all files which have been accessed more recently than /tmp/foo was changed. Here ‘X’ is ‘a’ and ‘Y’ is ‘c’.

Not all files have a known birth time. If ‘Y’ is ‘b’ and the birth time of reference is not available, find  exits with an explanatory error message. If ‘X’ is ‘b’ and we do not know the birth time the file currently being considered, the test simply fails (that is, it behaves like -false  does).

Some operating systems (for example, most implementations of Unix) do not support file birth times. Some others, for example NetBSD-3.1, do. Even on operating systems which support file birth times, the information may not be available for specific files. For example, under NetBSD, file birth times are supported on UFS2 file systems, but not UFS1 file systems.

There are two ways to list files in /usr modified after February 1 of the current year. One uses ‘-newermt’:

     find /usr -newermt "Feb 1"

The other way of doing this works on the versions of find before 4.3.3:

     touch -t 02010000 /tmp/stamp$$
     find /usr -newer /tmp/stamp$$
     rm -f /tmp/stamp$$

Specifying the time period from the beginning of today, rather than from the current moment

You can specify the time period from the beginning of today, rather than from the current moment using option -daystart. It affects most of the predicates that specified time period described here such as  ‘-amin’, ‘-cmin’, ‘-mmin’, ‘-atime’, ‘-ctime’ and ‘-mtime’.

— Option: -daystart

Measure times from the beginning of today rather than from 24 hours ago. So, to list the regular files in your home directory that were modified yesterday, do

find ~/ -daystart -type f -mtime 1

The ‘-daystart’ option is unlike most other options in that it has an effect on the way that other tests are performed. The affected tests are ‘-amin’, ‘-cmin’, ‘-mmin’, ‘-atime’, ‘-ctime’ and ‘-mtime’. The ‘-daystart’ option only affects the behavior of any tests which appear after it on the command line.

Using "last usage of the file" date as the criteria for finding files

Another interesting possibility is to use the date you last read the file. This search can be accomplished using  predicate used:

— Test: -used n

True if the file was last accessed n days after its status was last changed. Useful for finding files that are not being used, and could perhaps be archived or removed to save disk space.

All find standard predicates that work with age of the file use 24 hour periods (a day). Only Gnu find can use hours (using -amin, -mmin, -cmin).

Deletion of files based on their age and usage of tmpwatch utility instead of find

Deletion of files based on their  age is often used for log files. If can be done efficiently by find but the problem here is that find is a general utility which in this case you use for a specialized problem.  One important criteria here is safety -- if you run find on wrong directly you can create a chaos in the system. And such blunder happened, especially when you are tired and what to accomplish some task quickly. See Sysadmin Horror Stories

The problem with deletion is that by default find follows symbolic links.  And if such link exists in the directory you are trying to clean you are hosed. So please test you predicate for deletion first with running it using -ls first,  Before you run it with -exec rm {} \;.  See Using -exec option with find

There is a specialized Linux utility for file deletion, which is called (not too correctly ;-) tmpwatch, which can more safely delete files based on their age. It does not follow symbolic links in the directories it's cleaning.

 It has very similar predicates as find so its usage is not  equivalent of learning yet another utility, activity that many Linux sysadmin naturally try to avoid. 

Simple Examples

More on age ranges

Unix keeps track of three timestamps. Of them atime  is the simplest the non-controversial: it stands for access time which is when the file was last read.

It is important to understand the precise meaning of ctime  and mtime  timestamps. The most common misconception here is to view ctime as file "creation time". It is actually "change time". Here are more formal explanations:

For a given file ctime  and mtime  can be different depending on if you just modified the inode or the contents of the file (which updates ctime as well). Commands like chown, chmod,  and ln  change only ctime. Touch command change only mtime. For example if you need to change the date to Jun 21, 2008 9AM to example.txt, then you can go (-t parameter in touch has format [[CC]YY]MMDDhhmm[.SS]):

touch -t 200907210900 example.txt 

The second important thing to understand that unless -daystart  option is used [Gnu find only], time in Unix find is measured in 24 hour periods (fractions are allowed in GNU find) from the current moment.

Unless -daystart option is used [Gnu find only], time in Posix find is measured in 24 hour periods from the current moment

Those 24 hours periods are usually called "days" but the definition of "days" used in find is different from common usage (calendar days are typically understood as 24 hour periods starting at midnight). The "day" in "find language" is interpreted as "24 hour periods starting from the current time". Here is how working with time ranges described in GNU find documentation (Finding Files)

2.3.1 Age Ranges

These tests are mainly useful with ranges (‘+n’ and ‘-n’).

— Test: -atime n
— Test: -ctime n
— Test: -mtime n

True if the file was last accessed (or its status changed, or it was modified) n*24 hours ago. The number of 24-hour periods since the file's timestamp is always rounded down; therefore 0 means “less than 24 hours ago”, 1 means “between 24 and 48 hours ago”, and so forth. Fractional values are supported but this only really makes sense for the case where ranges (‘+n’ and ‘-n’) are used.

— Test: -amin n
— Test: -cmin n
— Test: -mmin n

True if the file was last accessed (or its status changed, or it was modified) n minutes ago. These tests provide finer granularity of measurement than ‘-atime’ et al., but rounding is done in a similar way (again, fractions are supported). For example, to list files in /u/bill that were last read from 2 to 6 minutes ago:

find /u/bill -amin +2 -amin -6

More sophisticated examples

Expressions can be use to select files created or modified during contain intervals, for example files that are at least one week old (7 days) but less then 30 days old. You can combine the predicates like this:

	find . -mtime +30 -a -mtime -7 -print0

Note: If you use parameters with find command in scripts be careful when -mtime parameter is equal zero ( -mtime +0 ). Earlier versions of GNU find incorrectly interpret the following expression:

find -mtime +0 -mtime -1
which should be equivalent to
find  -mtime -1
but does not produce any files...
find . -xdev -type f -mtime +$((365*7)) -print0|xargs -0 du -bsc|awk '/\ttotal$/{s+=$0}END{print 
s}' # Total bytes of files older than ~7 yr

Gotchas

Removing files by age can have nasty side effects if you transferee file from one computer to another preserving timestamp and the target directory is controlled by cron job that delete files of certain age. For example the scp command run as root, retains the file attributes from the original file.

To avoid this problem you can use a script that inventoried the files in the directory and diff the content of the directory the contents from previous days to determine when each file actually appeared. You can use also ctime instead of mtime. For copied file this is the date and time that was created of the target system. In such cases using ctime in the cron job is safer.

Another common problem is the  directory contains symbolic links, especially directory symbolic links.

That means testing your predicate before "destruction" operations such as deletion of the file or change of attributes/ownership is very important.  Generally in such cases it makes sense to write result in the file, analyze this file first and then use xargs utility to process the file with deletion of other operation. This way you at least will have a list of affected files. Also it is safer to move files to some "trash" directory then deleting them.

Prev | Contents | Next



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019;

[an error occurred while processing this directive]