|Contents||Bulletin||Scripting in shell and Perl||Network troubleshooting||History||Humor|
|News||NFS overview||Recommended Links||Sun Documentation||Tutorials||Reference||HOWTO||FAQs||RFCs||nfsv4 mounts files and directories as nobody|
|rpcbind failure error||server not responding error||NFS client fails a reboot error||service not responding error||program not registered error||stale file handle error||unknown host error||mount point error||no such file error||No such file or directory|
The first thing is to determine with one version of protocol you encounter the problem. Is this NFS3 and NFS4? For example RHEL 6 by default uses NFS4 for mounting other RHEL 6 systems. So you can get NFS4 mount even if you do not suspect specifying NFS4 explicitly. This is due to the fact that nfs4 keyword in /etc/fstab is deprecated and now client starts asking for services starting with NFS4 not NFS3 like before.
NFS 3 is conceptually elegant and rather simple, and difficult to troubleshoot problems are more often due to the complexity of the environment then protocol itself. It's comely opposite with NFS4. NFS 4 is much more complex and less reliable. It was mostly created by NetApps guys who wanted to compete with SAN based appliances and they overcooked the protocol trying to improve the transmission speed. For example, NFS4 often demonstrate spontaneous problem with mount points, when umount operation just hangs and/or mounted directory became inaccessible. It is very difficult to escape from this situation in RHEL 6 and derivatives without rebooting the server. In such cases the best way to avoid headache is to downgrade version of NFS to 3.
We will limit ourselves discussion of NFS3 protocol as downgrading of NFS4 to NFS3 is kind of universal troubleshooting method for NFS4: if problem persists then you can problem=lsioot it in NFS 3 and the switch back. If problem disappears that it is doe to flaky implementation of NFS3. So everywhere below NFS means NFS3. For example NFS 4 is more picky and can mount directories with nobody:nobody permissions when NFS3 mounts it correctly. See nfsv4 mounts files and directories as nobody
Most problem with NFS3 are not connected with protocol per se, but more like environment, infrastructure within which it operates. Among "infrastructure induced" NFS problems are:
Among common NFS3 errors we can specifically list the following:
The error in accessing the server is due to:
NFS server server2 not responding, still trying
Possible causes for the server not responding error are:
Setting default interface for multicast: add net 184.108.40.206: gateway:
these symptoms might indicate that a client is requesting an NFS mount using an entry in the /etc/vfstab file, specifying a foreground mount from a non-operational NFS server.
To solve this error, complete the following steps:
1. To interrupt the failed client node press Stop-A, and boot the client into single-user mode.
2. Edit the /etc/vfstab file to comment out the NFS mounts.
3. To continue booting to the default run level (normally run level 3), press Control-D.
4. Determine if all the NFS servers are operational and functioning properly.
5. After you resolve problems with the NFS servers, remove the comments from the /etc/vfstab file.
Note – If the NFS server is not available, an alternative to commenting out
the entry in the /etc/vfstab file is to use the bg mount option so that the
boot sequence can proceed in parallel with the attempt to perform the NFS mount.
nfs mount: dbserver: NFS: Service not responding
nfs mount: retrying: /mntpoint
To solve the service not responding error condition, complete the following steps:
nfs mount: dbserver: RPC: Program not registered
nfs mount: retrying: /mntpoint
To solve the program not registered error condition, complete the following steps:
nfs mount: sserver1:: RPC: Unknown host
To solve the unknown host error condition, verify the host name in the hosts
database that supports the client node. Note – The preceding example misspelled
the node name server1 as sserver1.
mount: mount-point /DS9 does not exist.
To solve the mount point error condition, check that the mount point exists
on the client. Check the spelling of the mount point on the command line or
in the /etc/vfstab file on the
client, or comment out
the entry and reboot the system.
|Bulletin||Latest||Past week||Past month||
NFS is conceptually simple. On the server (the box that is allowing others to use its disk space), you place a line in /etc/exports to enable its use by clients. This is called sharing. For instance, to share /home/myself for both read and write on subnet 192.168.100, netmask 255.255.255.0, place the following line in /etc/exports on the server:
/home/myself 192.168.100.0/24(rw)To share it read only, change the (rw)to (ro).
On the client wanting to access the shared directory, use the mount command to access the share:
mkdir /mnt/test mount -t nfs -o rw 192.168.100.85:/home/myself /mnt/testThe preceding must be performed by user root, or else as a sudo command. Another alternative is to place the mount in /etc/fstab. That will be discussed later in this document.
As mentioned, NFS is conceptually simple, but in practice you'll encounter some truly nasty gotchas:
To minimize troubleshooting time, quickcheck for each of these problems. Each of these gotchas is explained in detail in this document.
- Non-running NFS or portmap daemons
- Differing uid's for the same usernames
- NFS hostile firewall
- Defective DNS causes timeouts
Following are a few known problems with NFS and suggested workarounds.
a. Time Synchronization
NFS does not synchronize time between client and server, and offers no mechanism for the client to determine what time the server thinks it is. What this means is that a client can update a file, and have the timestamp on the file be either some time long in the past, or even in the future, from its point of view.
While this is generally not an issue if clocks are a few seconds or even a few minutes off, it can be confusing and misleading to humans. Of even greater importance is the affect on programs. Programs often do not expect time difference like this, and may end abnormally or behave strangely, as various tasks timeout instantly, or take extraordinarily long while to timeout.
Poor time synchronization also makes debugging problems difficult, because there is no easy way to establish a chronology of events. This is especially problematic when investigating security issues, such as break in attempts.
Workaround: Use the Network Time Protocol (NTP) religiously. Use of NTP can result in machines that have extremely small time differences.
Note: The NFS protocol version 3 does have support for the client specifying the time when updating a file, but this is not widely implemented. Additionally, it does not help in the case where two clients are accessing the same file from machines with drifting clocks.
b. File Locking Semantics
Programs use file locking to insure that concurrent access to files does not occur except when guaranteed to be safe. This prevents data corruption, and allows handshaking between cooperative processes.
In Unix, the kernel handles file locking. This is required so that if a program is terminated, any locks that it has are released. It also allows the operations to be atomic, meaning that a lock cannot be obtained by multiple processes.
Because NFS is stateless, there is no way for the server to keep track of file locks - it simply does not know what clients there are or what files they are using. In an effort to solve this, a separate server, the lock daemon, was added. Typically, each NFS server will run a lock daemon.
The combination of lock daemon and NFS server yields a solution that is almost like Unix file locking. Unfortunately, file locking is extremely slow, compared to NFS traffic without file locking (or file locking on a local Unix disk). Of greater concern is the behaviour of NFS locking on failure.
In the event of server failure (e.g. server reboot or lock daemon restart), all client locks are lost. However, the clients are not informed of this, and because the other operations (read, write, and so on) are not visibly interrupted, they have no reliable way to prevent other clients from obtaining a lock on a file they think they have locked.
In the event of client failure, the locks are not immediately freed. Nor is there a timeout. If the client process terminates, the client OS kernel will notify the server, and the lock will be freed. However, if the client system shuts down abnormally (e.g. power failure or kernel panic), then the server will not be notified. When the client reboots and remounts the NFS exports, the server is notified and any client locks are freed.
If the client does not reboot, for example if a frustrated user hits the power switch and goes home for the weekend, or if a computer has had a hardware failure and must wait for replacement parts, then the locks are never freed! In this unfortunate scenario, the server lock daemon must be restarted, with the same effects as a server failure.
Workaround: If possible (given program source and skill with code modification), remove locking and insure no inconsistency occurs via other mechanisms, possibly using atomic file creation (see below) or some other mechanism for synchronization. Otherwise, build platforms never fail and have a staff trained on the implications of NFS file locking failure. If NFS is used only for files that are never accessed by more than a single client, locking is not an issue.
Note: A status monitor mechanism exists to monitor client status, and free client locks if a client is unavailable. However, clients may chose not to use this mechanism, and in many implementations do not.
c. File Locking API
In Unix, there are two flavours of file locking, flock() from BSD and lockf() from System V. It varies from system to system which of these mechanisms work with NFS. In Solaris, Sun's Unix variant, lockf() works with NFS, and flock() is implemented via lockf(). On other systems, the results are less consistent. For example, on some systems, lockf() is not implemented at all, and flock() does not support NFS; while on other systems, lockf() supports NFS but flock() does not.
Regardless of the system specifics, programs often assume that if they are unable to obtain a lock, it is because another program has the lock. This can cause problems as programs wait for the lock to be freed. Since the reason the lock fails is because locking is unsupported, the attempt to obtain a lock will never work. This results in either the applications waiting forever, or aborting their operation.
These results will also vary with the support of the server. While typically the NFS server runs an accompanying lock daemon, this is not guaranteed.
Workaround: Upgrade to the latest versions of all operating systems, as they usually have improved and more consistent locking support. Also, use the lock daemon. Additionally, try to use only programs written to handle NFS locking properly, veified either by code review or a vendor compliance statement.
d. Exclusive File Creation
In Unix, when a program creates a file, it may ask for the operation to fail if the file already exists (as opposed to the default behaviour of using the existing file). This allows programs to know that, for example, they have a unique file name for a temporary file. It is also used by various daemons for locking various operations, e.g. modifying mail folders or print queues.
Unfortunately, NFS does not properly implement this behaviour. A file creation will sometimes return success even if the file already exists. Programs written to work on a local file system will experience strange results when they attempt to update a file after using file creation to lock it, only to discover another file is modifying it (I have personally seen mailboxes with hundreds of mail messages corrupted because of this), because it also "locked" the file via the same mechanism.
Workaround: If possible (given program source and skill with code modification), use the following method, as documented in the Linux open() manual page:The solution for performing atomic file locking using a lockfile is to create a unique file on the same fs (e.g., incorporating hostname and pid), use link(2) to make a link to the lockfile and use stat(2) on the unique file to check if its link count has increased to 2. Do not use the return value of the link() call.
This still leaves the issue of client failure unanswered. The suggested solution for this is to pick a timeout value and assume if a lock is older than a certain application-specific age that it has been abandoned.
e. Delayed Write CachingIn an effort to improve efficiency, many NFS clients cache writes. This means that they delay sending small writes to the server, with the idea that if the client makes another small write in a short amount of time, the client need only send a single message to the server.
Unix servers typically cache disk writes to local disks the same way. The difference is that Unix servers also keep track of the state of the file in the cache memory versus the state on disk, so programs are all presented with a single view of the file.
In NFS caching, all applications on a single client will typically see the same file contents. However, applications accessing the file from different clients will not see the same file for several seconds.
Workaround: It is often possible to disable client write caching. Unfortunately, this frequently causes unacceptably slow performance, depending on the application. (Applications that perform I/O of large chunks of data should be unaffected, but applications that perform lots of small I/O operations will be severely punished.) If locking is employed, applications can explicitly cooperate and flush files from the local cache to the server, but see the previous sections on locking when employing this solution.
f. Read Caching and File Access TimeUnix file systems typically have three times associated with a file: the time of last modification (file creation or write), the time of last "change" (write or change of inode information), or the time of last access (file execution or read). NFS file systems also report this information.
NFS clients perform attribute caching for efficiency reasons. Reading small amounts of data does not update the access time on the server. This means a server may report a file has been unaccessed for a much longer time than is accurate.
This can cause problems as administrators and automatic cleanup software may delete files that have remained unused for a long time, expecting them to be stale lock files, abandoned temporary files and so on.
Workaround: Attribute caching may be disabled on the client, but this is usually not a good idea for performance reasons. Administrators should be trained to understand the behaviour of NFS regarding file access time. Any programs that rely on access time information should be modified to use another mechanism.
g. Indestructible FilesIn Unix, when a file is opened, the data of that file is accessible to the process that opened it, even if the file is deleted. The disk blocks the file uses are freed only when the last process which has it open has closed it.
An NFS server, being stateless, has no way to know what clients have a file open. Indeed, in NFS clients never really "open" or "close" files. So when a file is deleted, the server merely frees the space. Woe be unto any client that was expecting the file contents to be accessible as before, as in the Unix world!
In an effort to minimize this as much as possible, when a client deletes a file, the operating systems checks if any process on the same client box has it open. If it does, the client renames the file to a "hidden" file. Any read or write requests from processes on the client that were to the now-deleted file go to the new file.
This file is named in the form .nfsXXXX, where the XXXX value is determined by the inode of the deleted file - basically a random value. If a process (such as rm) attempts to delete this new file from the client, it is replaced by a new .nfsXXXX file, until the process with the file open closes it.
These files are difficult to get rid of, as the process with the file open needs to be killed, and it is not easy to determine what that process is. These files may have unpleasant side effects such as preventing directories from being removed.
If the server or client crashes while a .nfsXXXX file is in use, they will never be deleted. There is no way for the server or a client to know whether a .nfsXXXX file is currently being used by a client or not.
Workaround: One should be able to delete .nfsXXXX files from another client, however if a process writes to the file, it will be created at that time. It would be best to exit or kill processes using an NFS file before deleting it. Unfortunately, there is no way to know if an uncooperative process has a file open.
h. User and Group Names and NumbersNFS uses user and group numbers, rather than names. This means that each machine that accesses an NFS export needs (or at least should) have the same user and group identifiers as the NFS export has. Note that this problem is not unique to NFS, and also applies, for instance, to removable media and archives. It is most frequently an issue with NFS, however.
Workaround: Either the /etc/passwd and /etc/group files must be synchronized, or something like NIS needs to be used for this purpose.
i. Superuser AccountNFS has special handling of the superuser account (also known as the root account). By default, the root user may not update files on an NFS mount.
Normally on a Unix system, root may do anything to any file. When an NFS drive has been mounted, this is no longer the case. This can confuse scripts and administrators alike.
To clarify: a normal user (for example "shane" or "billg") can update files that the superuser ("root") cannot.
Workaround: Enable root access to specific clients for NFS exports, but only in a trusted environment since NFS is insecure. Therefore, this does not guarantee that unauthorized client will be unable to access the mount as root.
Groupthink : Understanding Micromanagers and Control Freaks : Toxic Managers : Bureaucracies : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Two Party System as Polyarchy : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Skeptical Finance : John Kenneth Galbraith : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Oscar Wilde : Talleyrand : Somerset Maugham : War and Peace : Marcus Aurelius : Eric Hoffer : Kurt Vonnegut : Otto Von Bismarck : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Oscar Wilde : Bernard Shaw : Mark Twain Quotes
Vol 26, No.1 (January, 2013) Object-Oriented Cult : Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks: The efficient markets hypothesis : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least
|You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.|
The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: April 22, 2014