Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Unix System Calls

News Recommended Books Recommended Links Reference Dubeau. Beej Rusling
System V IPC  Filesystems Unix Kernel Linux Memory Management Gnu C library Process_control Memory allocation
Rochkind Burkett et all David Marshall Open Group Search Engine History Humor Etc

System calls are functions that a programmer can call to perform the services of the operating system. There are several online books that describe them at some length, for example Programming in C.

System calls can be roughly grouped into five major categories(System call - Wikipedia): process control, file management, device management, information maintenance and communication.

We will reproduce Wikipedia classification with some modifications:

  1. Process Control (see separate page Process control ).
  2. File management.
  3. Device Management.
  4. Information Maintenance.
  5. Communication.

Man pages should be used as a reference when you study Unix system calls.  The manual pages are divided into eight sections, with section 2 devoted to Unix system calls. They are organized as follows:

1. Commands This section provides information about user-level commands, such as ps and ls.

2. UNIX System Calls This section gives information about the library calls that interface with the UNIX operating system, such as open for opening a file, and exec for executing a program file. These are often accessed by C programmers.

3. Libraries This section contains the library routines that come with the system. An example library that comes with each system is the math library, containing such functions as fabs for absolute value. Like the system call section, this is relevant to programmers.

4. File Formats This section contains information on the file formats of system files, such as init, group, and passwd. This is useful for system administrators.

5. File Formats This section contains information on various system characteristics. For example, a manual page exists here to display the complete ASCII character set (ascii).

6. Games This section usually contains directions for games that came with the system.

7. Device Drivers This section contains information on UNIX device drivers, such scsi and floppy. These are usually pertinent to someone implementing a device driver, as well as the system administrator.

8. System Maintenance This section contains information on commands that are useful for the system administrator, such as how to format a disk.

Section 2 can be very useful as a reference. When you invoke the man command, the output is sent through what is known as a pager. This is a command that lets you view the text one page at a time. The default pager for most UNIX systems is the more command. You can, however, specify a different one by setting the PAGER environment variable.

The second source of information for a particulr call is Google. It usually can get you some useful links to the information for a particular call:

Some of them involve access to data that users must not be permitted to corrupt or even change.

It's often difficult to determine what is a library routine (e.g printf()), and what is a system call (e.g sleep()). They are used in the same way and the only way to tell is to remember which is which.

To obtain information about a system call or library routine, how to use it, what it returns, what it does etc., you can read the on-line manual. If you are looking for the manual on read, you can read the manual by doing: 

% man 2 read - if read is a system call, or % man 3 read - if read is a library routine

All of the entries in Section 2 of the manuals are system calls, and all of the entries in Section 3 are library routines; so if you don't know whether something is a system call, or a library routine, try looking it up in both Sections 2 and 3.

Here is an excerpt from Rochkind's book that introduces system calls, and explain how to use them:

The subject of this book is UNIX system calls, which form the interface between the UNIX kernel and the user programs that run on top of it. Those who interact only with commands, like the shell, text editors, and other application programs, may have little need to know much about system calls, but a thorough knowledge of them is essential for UNIX programmers. System calls are the only way to access kernel facilities such as the file system, the multitasking mechanisms, and the interprocess communication primitives.

System calls define what UNIX is. Everything else -- subroutines and commands -- is built on this foundation. While the novelty of many of these higher-level programs has been responsible for much of UNIX's renown, they could as well have been programmed on any modern operating system. When one describes UNIX as elegant, simple, efficient, reliable, and portable, one is referring not to the commands (some of which are none of these things), but to the kernel. How hard is it to learn UNIX system calls? When I first started programming UNIX, in 1973, it wasn't very hard at all. UNIX -- and its programmer's manual -- was only a fraction of its present size and complexity. There weren't any programming examples in the manual, but all of the source code was on-line and it was easy to read through programs like the shell or the editor to see how system calls worked. Perhaps most important, there were more experienced people around to ask for help. Even Dennis Ritchie and Ken Thompson, the inventors of UNIX, took time out to help me.

Today's aspiring UNIX programmers have a tougher challenge than I did. UNIX is now so widely dispersed that an expert is unlikely to be nearby. Most computers running UNIX are licensed for the object code only, so the source code for commands is unavailable. There are twice as many system calls now as there were in 1973, and the quality of the manual has deteriorated markedly from the days when Ritchie and Thompson did all the system call write-ups. It's now full of grotesque paragraphs like this:

If the set-user-ID mode bit of the new process file is set (see chmod(2)), exec sets the effective user ID of the new process to the owner ID of the new process file. Similarly, if the set-group-ID mode bit of the new process file is set, the effective group ID of the new process is set to the group ID of the new process file. The real user ID and the real group ID of the new process remain the same as those of the calling process.

As an old-timer I understood what this meant when I first saw it, but a newcomer is sure to be completely baffled. And until now, there's been nowhere to turn. This book's goal is to allow any experienced programmer to learn UNIX system calls as easily as I did, and then to use them wisely and portably. It's packed with examples -- over 3500 lines of C code. Instead of just tactics (how the system calls are used), I've tried also to include strategies (why and when they're used). And there's lots of informal advice as well, based on my experiences programming UNIX over the past dozen years.

Flavors of Unix

The number of different flavors of Unix is amazing, and what is worse, the system calls and their parameters change from flavor to flavors. One of the goals in writing Unix programs is to make them as portable as possible across all the flavors of Unix; obviously this isn't possible.

The number of system calls has quadrupled, more or less, depending on what you mean by "system call." The first edition of Advanced UNIX Programming focused on only about 70 genuine kernel system calls—for example, open, read, and write; but not library calls like fopen, fread, and fwrite. The second edition includes about 300. (There are about 1,100 standard function calls in all, but many of those are part of the Standard C Library or are obviously not kernel facilities.) 

However, most of the original 70 Unix system calls haven't changed, so if you try to use these, you should be all right.

Historically there were several variant of Unix calls:

Using System Calls

How does a C programmer actually issue a system call? There is no difference between a system call and any other function call. For example, the read system call might be issued like this:

     amt = read(fd, buf, numbytes);

The implementation of the subroutine read varies with the UNIX implementation. It is usually an assembly language program that uses a machine instruction designed specifically for system calls, which isn't directly executable from C. Nowadays, it's safe to assume that system calls are simply C subroutines. Remember, though, that since a system call involves a context switch (from user to kernel and back), it takes much longer than a simple subroutine call within a process's own address space. So avoiding excessive system calls might be a wise strategy for programs that need to be tightly optimized.

Most system calls return a value. In the read example above, the number of bytes read is returned. To indicate an error, a system call returns a value that can't be mistaken for valid data, namely -1 . Therefore, our read example should have been coded something like this:

    if ((amt= read(fd, buf, numbytes)) == -1)
      printf("Read failed\n");

Note that exit is a system call too, but it can't return an error.

There are lots of reasons why a system call that returns -1 might have failed. The global integer errno contains a code that indicates the reason. These error codes are defined at the beginning of the system call chapter of the UNIX manual [the pages titled ``intro(2)'']. Note that errno contains valid data only if a system call actually returns -1; you can't use errno alone to determine whether an error occurred.

The library routine perror takes as its argument a string, and prints out the string, a colon, and a description of the error condition stored in errno. So, a way of handling the error above that gives the programmer more information is:

    if ((amt= read(fd, buf, numbytes)) == -1)

which might print out read: file does not exist on an error.

Reading System Call Man Pages

The manual pages for all Unix system calls give a declaration for the system call. This shows you what type of value the system call returns, what types of arguments it takes, and what header files you need to include before you can use the system call. As an example, here is part of the man page for the read() system call.


     #include <unistd.h>
     #include <sys/types.h>
     #include <sys/uio.h>

     read(int d, char *buf, int nbytes)


Read() attempts to read nbytes of data from the object referenced by the file descriptor d into the buffer pointed to by buf.


If successful, the number of bytes actually read is returned. Upon reading end-of-file, zero is returned. Otherwise, a -1 is returned and the global variable errno is set to indicate the error.

The first part shows what header files you need to include. Then the declaration of the system call is given.

     int read(int d, char *buf, int nbytes)

read() takes three arguments: an int which is called d in the man page, a pointer to a character called buf (usually an array of characters), and another int called nbytes. read() returns an int as its result.

The names of the arguments given in the man pages need not be the same as the ones you use in your programs, they are only to explain the function of the system call. For example, you could use the read() function in a program as follows:

  int main()
    int i, count, desc;
    char array[500];

    desc=0; count=500;
    i=read(desc, array, count);

Process-IDs and Process Groups

Every process has a process-ID, which is a positive integer. At any instant this is guaranteed to be unique. Every process but one has a parent. The exception is process 0, which is created and used by the kernel itself, for swapping.

A process's system data also records its parent-process-ID, the process-ID of its parent. If a process is orphaned because its parent has terminated, its parent-process-ID is changed to 1. This is the process-ID of the initialization process ( init), which is the ancestor of all other processes. In other words, the initialization process adopts all orphans.

Sometimes programmers choose to implement a subsystem as a group of related processes instead of as a single process. For example, a complex database management system might be broken down into several processes to gain additional concurrency of disk I/O. The UNIX kernel allows these related processes to be organized into a process group.

One of the group members is the group leader. Each member of the group has the group leader's process-ID as its process-group-ID. The kernel provides a system call to send a signal to each member of a designated process group. Typically, this would be used to terminate the entire group as a whole, but any signal can be broadcast in this way.

Any process can resign from its process group, become a leader of its own group (of one) by making its process-group-ID the same as its own process-ID, and then spawn child processes to round out the new group. Hence, a single user could be running, say, 10 processes formed into, say, three process groups.

A process group can have a control terminal, which is the first terminal device opened by the group leader. Normally, the control terminal for a user's processes is the terminal from which the user logged in. When a new process group is formed, the processes in the new group no longer have a control terminal.

The terminal device driver sends interrupt, quit, and hangup signals coming from a terminal to every process for which that terminal is the control terminal. Unless precautions are taken, hanging up a terminal, for example, will terminate all of the user's processes. To prevent this, a process can arrange to ignore hangups (this is what the nohup command does).

When a process group leader terminates for any reason, all processes with the same control terminal are sent a hangup signal, which, unless caught or ignored, terminates them too. This feature makes hard-wired terminals, which can't be physically hung up, behave like those that can. Thus, when a user logs off (terminating the shell, which is normally the process group leader), everything is cleaned up for the next user, just as it would be if the user actually hung up.

In summary, there are three process-IDs associated with each process:

Unix Permissions

A user-ID is a positive integer that is associated with a user's login name in the password file ( /etc/passwd). When a user logs in, the login command makes this ID the user-ID of the first process created, the login shell. Processes descended from the shell inherit this user-ID.

Users are also organized into groups (not to be confused with process groups), which have IDs too, called group-IDs. A user's login group-ID is taken from the password file and made the group-ID of his or her login shell.

Groups are defined in the group file ( /etc/group). While logged in, a user can change to another group of which he or she is a member; this changes the group-ID of the process that handles the request (normally the shell, via the newgrp command), which then is inherited by all descendent processes.

These two IDs are called the real user-ID and the real group-ID because they are representative of the real user, the person who is logged in. Two other IDs are also associated with each process: the effective user-ID and the effective group-ID. These IDs are normally the same as the corresponding real IDs, but they can be different, as we shall see shortly. For now, we'll assume the real and effective IDs are the same.

The effective ID is always used to determine permissions; the real ID is used for accounting and user-to-user communication. One indicates the user's permissions; the other indicates the user's identity.

Each file (ordinary, directory, or special) has, in its i-node, an owner user-ID and an owner group-ID. The i-node also contains three sets of three permission bits (nine bits in all). Each set has one bit for read permission, one bit for write per- mission, and one bit for execute permission. A bit is 1 if the permission is granted and 0 if not. There is a set for the owner, for the owner group, and for others (the public). Here are the bit assignments (bit 0 is the rightmost bit):

Permission bits are frequently specified using an octal number. For example, octal 775 would mean read, write, and execute permission for the owner and the group, and only read and execute permission for others. The ls command would show this combination of permissions as rwxrwxr-x; in binary it would be 111111101; in octal it would be 775.

The permission system determines whether a given process can perform a desired action (read, write, or execute) on a given file. For ordinary files the meaning of the actions is obvious. For directories the meaning of read is obvious, since directories are stored in ordinary files (the ls command reads a directory, for example). ``Write'' permission on a directory means the ability to issue a system call that would modify the directory (add or remove a link). ``Execute'' permission means the ability to use the directory in a path (sometimes called ``search'' permission). For special files, read and write permissions mean the ability to execute the read and write system calls. What, if anything, that implies is up to the designer of the device driver. Execute permission on a special file is meaningless.

The permission system determines whether permission will be granted using this algorithm:

  1. If the effective user-ID is zero, permission is instantly granted (the effective user is the superuser).
  2. If the process's effective user-ID and the file's user-ID match, then the owner set of bits is used to see if the action will be allowed.
  3. If the process's effective group-ID and the file's group-ID match, then the group set of bits is used.
  4. If neither the user-IDs nor group-IDs match, then the process is an ``other'' and the third set of bits is used.

Occasionally we want a user to temporarily take on the privileges of another user. For example, when we execute the passwd command to change our password, we would like the effective user-ID to be that of root (the traditional login name for the superuser), because only root can write into the password file. This is done by making root the owner of the passwd command (i.e., the ordinary file containing the passwd program), and then turning on another permission bit in the passwd command's i-node, called the set-user-ID bit. Executing a program with this bit on changes the effective user-ID to the owner of the file containing the program. Since it's the effective, rather than the real, user-ID that determines permissions, this allows a user to temporarily take on the permissions of someone else. The set-group-ID bit is used in a similar way.

Since both user-IDs (real and effective) are inherited from parent process to child process, it is possible to use the set-user-ID feature to run with an effective user-ID for a very long time.

System Calls to Get IDs

Here are the system calls to get the IDs mentioned above:

    int getuid()            /* Get the real user-ID */
                            /* Returns the ID */

    int getgid()            /* Get the real group-ID */
                            /* Returns the ID */

    int geteuid()           /* Get the effective user-ID */
                            /* Returns the ID */

    int getegid()           /* Get the effective group-ID */
                            /* Returns the ID */

    int getpid()            /* Get the process-ID */
                            /* Returns the ID */

    int getppid()           /* Get the parent process-ID */
                            /* Returns the ID */

    int getpgrp()           /* Get the process-group-ID */
                            /* Returns the ID */

Each of these system calls returns a single ID, as indicated by the comments following their function headers.

time System Call

    long time(timep)                      /* Get system time */
     long *timep;                         /* Pointer to time */

time returns the time, in seconds, since January I, 1970. If the argument timep is not NULL, the current time is stored into the long integer to which it points. This is a carry-over from the days before the C language supported long integers. It is of no use now that a simple assignment statement can be used to capture the return value. The argument to time should always be NULL. i.e value= time(NULL);

Top Visited
Past week
Past month


Old News ;-)

2005 2004 2003 2002 2001

[Oct 08, 2019] How does converting from raid 5 to 6 work? (on the back end)

Oct 08, 2019 |


4 points · 13 hours ago

RAID 5 stripes the data over N disks with an additional stripe containing the parity, basically the XOR of all the other disks. RAID 6 use the same parity as RAID 5 but also use a different type of parity on an extra disk. So RAID 5 requires N+1 disks and RAID 6 requires N+2 disks. So in theory you can just add another disk and fill it with the different parity bit and you have a RAID 6, however it is not that simple. The parity disks on both RAID 5 and 6 rotates for each stripe. So if the parity is stored on disk 1 for the first stripe it is stored on disk 2 on the second and so forth. So if you add an additional disk all the stripes needs to be rewritten in the new schema. Some RAID controllers have this fuctionality. The tricky thing is that you need to track how far you have gone so that in the case of a power failure you can still retrieve the data. In any case it does require another disk.

OnARedditDiet Windows Admin 4 points · 14 hours ago

You're going to put your RAID in degraded mode so you're basically causing it to be in a one drive failed scenario and then asking it to rewrite every disk. Is that something you want to do? level 2

Dry_Soda 6 points · 12 hours ago

What could possibly go wrong? #YOLO level 3

25cmshlong OCP DBA 12c, OCE 12c, OCP Solaris 11, RHCE, NCSE ONTAP, CCNA R&S 1 point · 9 hours ago

Not much. It is adding another parity disk so worst case array will be left in initial state - single parity (RAID5).

(Ofc truly worst case is that reading all the drives will overload power supply and fry whole disk subsystem. But it is better not to think about it since RAID6 will not help there either :) level 4

OnARedditDiet Windows Admin 1 point · 9 hours ago

It's very well known that a full read/write pass that comes from rebuilding a degraded RAID can potentially crash the RAID by exposing existing hard drive issues. In this case nothing has failed but you could crash the RAID by fixing a non fault situation. level 5

25cmshlong OCP DBA 12c, OCE 12c, OCP Solaris 11, RHCE, NCSE ONTAP, CCNA R&S 1 point · 8 hours ago
· edited 5 hours ago

EDIT: Oops, I remembered that in most implementations of RAID (ie, not ZFS & WAFL) there is no dedicated parity/dparity drives, but instead rotating parity. So there will definitely reading and rewriting on all disks of the array.

So text below is incorrect for most RAID subsystems

That's true but not a concern during adding parity disk. If some latent stripes appears they can be recovered using original parity. All writes during conversion will go to the new parity disks, original data on drives stays intact level 5

drbluetongue Drunk while on-call 1 point · 5 hours ago

I don't know why you were downvoted - the most likely time you will get a disk failure is during a rebuild of an array. I've had one fail during rebuild that was from the same batch as the already failed disk in a RAID 6, thank god it was RAID 6...

Nowdays, at least at my old job, we made sure to ask the vendor for disks for the SAN's to be randomised

[Aug 31, 2019] The Linux Programming Interface

Aug 31, 2019 |

73 "Michael Kerrisk has been the maintainer of the Linux Man Pages collection (man 7) for more than five years now, and it is safe to say that he has contributed to the Linux documentation available in the online manual more than any other author before. For this reason he has been the recipient a few years back of a Linux Foundation fellowship meant to allow him to devote his full time to the furthering this endeavor. His book is entirely focused on the system interface and environment Linux (and, to some extent, any *NIX system) provides to a programmer. My most obvious choice for a comparison of the same caliber is Michael K. Johnson and Eric W. Troan's venerable Linux Application Development , the second edition of which was released in 2004 and is somewhat in need of a refresh, lamentably because it is an awesome book that belongs on any programmer's shelf. While Johnson and Troan have introduced a whole lot of programmers to the pleasure of coding to Linux's APIs, their approach is that of a nicely flowing tutorial, not necessarily complete, but unusually captivating and very suitable to academic use. Michael's book is a different kind of beast: while the older tome selects exquisite material, it is nowhere as complete as his -- everything relating to the subject that I could reasonably think of is in the book, in a very thorough and maniacally complete yet enjoyably readable way -- I did find one humorous exception, more on that later. Keep reading for the rest of Federico's review.

The Linux Programming Interface
author Michael Kerrisk
pages 1552
publisher No Starch Press
rating 8/10
reviewer Federico Lucifredi
ISBN 9781593272203
summary The definitive guide to the Linux and UNIX programming interface
This book is an unusual, if not altogether unique, entry into the Linux programming library: for one, it is a work of encyclopedic breadth and depth, spanning in great detail concepts usually spread in a multitude of medium-sized books, but by this yardstick the book is actually rather concise, as it is neatly segmented in 64 nearly self-contained chapters that work very nicely as short, deep-dive technical guides. I have collected an extremely complete technical library over the years, and pretty much any book of significance that came out of the Linux and Bell Labs communities is in it -- it is about 4 shelves, and it is far from portable. It is very nice to be able to reach out and pick the definitive work on IPC, POSIX threads, or one of several socket programming guides -- not least because having read them, I know what and where to pick from them. But for those out there who have not invested so much time, money, and sweat moving so many books around, Kerrisk's work is priceless: any subject be it timers, UNIX signals, memory allocation or the most classical of topics (file I/O) gets its deserved 15-30 page treatment, and you can pick just what you need, in any order.

Weighing in at 1552 pages, this book is second only to Charles Kozierok's mighty TCP/IP Guide in length in the No Starch Press catalog. Anyone who has heard me comment about books knows I usually look askance at anything beyond the 500-page mark, regarding it as something defective in structure that fails the "I have no time to read all that" test. In the case of Kerrisk's work, however, just as in the case of Kozierok's, actually, I am happy to waive my own rule, as these heavyweights in the publisher's catalog are really encyclopedias, and despite my bigger library I will like to keep this single tome within easy reach of my desk to avoid having to fetch the other tomes for quick lookups -- yes, I still have lazy programmer blood in my veins.

There is another perspective to this: while writing, I took a break and while wandering around I found myself in Miguel's office (don't tell him ;-), and there spotted a Bell Labs book lying on his shelf that (incredibly) I have never heard of. After a quick visit to AbeBooks to take care of this embarrassing matter, I am back here writing to use this incident as a valuable example: the classic system programming books, albeit timeless in their own way, show their rust when it comes to newer and more esoteric Linux system calls (mmap and inotify are fair examples) and even entire subsystems in some cases -- and that's another place where this book shines: it is not only very complete, it is really up to date, a combination I cannot think of a credible alternative to in today's available book offerings.

One more specialized but particularly unique property of this book is that it can be quite helpful in navigating what belongs to what standard, be it POSIX, X/Open, SUS, LSB, FHS, and what not. Perhaps it is not entirely complete in this, but it is more helpful than anything else I have seen released since Donald Lewine's ancient POSIX Programmers Guide (O'Reilly). Standards conformance is a painful topic, but one you inevitably stumble into when writing code meant to compile and run not only on Linux but to cross over to the BSDs or farther yet to other *NIX variants. If you have to deal with that kind of divine punishment, this book, together with the Glibc documentation, is a helpful palliative as it will let you know what is not available on other platforms, and sometimes even what alternatives you may have, for example, on the BSDs.

If you are considering the purchase, head over to Amazon and check out the table of contents, you will be impressed. The Linux Programming Encyclopedia would have been a perfectly adequate title for it in my opinion. In closing, I mentioned that after thinking for a good while I found one thing to be missing in this book: next to the appendixes on tracing, casting the null pointer, parsing command-line options, and building a kernel configuration, a tutorial on writing man pages was sorely and direly missing! Michael, what were you thinking?

Federico Lucifredi is the maintainer of man (1) and a Product Manager for the SUSE Linux Enterprise and openSUSE distributions.

You can purchase The Linux Programming Interface from . Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines , then visit the submission page .

[Dec 15, 2010] "Beej" UNIX Inter Process Communication (IPC) tutorial By Brian Hall

Explains the different aspects of traditional UNIX Inter Process Communication (IPC). Brian Hall provides a lot of C code where you can compile / test these concepts yourself for a better understanding.

[Mar 21, 2007] Kernel command using Linux system calls

21 Mar 2007 (IBM Developerworks) Linux® system calls -- we use them every day. But do you know how a system call is performed from user-space to the kernel? Explore the Linux system call interface (SCI), learn how to add new system calls (and alternatives for doing so), and discover utilities related to the SCI.

A system call is an interface between a user-space application and a service that the kernel provides. Because the service is provided in the kernel, a direct call cannot be performed; instead, you must use a process of crossing the user-space/kernel boundary. The way you do this differs based on the particular architecture. For this reason, I'll stick to the most common architecture, i386.

In this article, I explore the Linux SCI, demonstrate adding a system call to the 2.6.20 kernel, and then use this function from user-space. I also investigate some of the functions that you'll find useful for system call development and alternatives to system calls. Finally, I look at some of the ancillary mechanisms related to system calls, such as tracing their usage from a given process.


The implementation of system calls in Linux is varied based on the architecture, but it can also differ within a given architecture. For example, older x86 processors used an interrupt mechanism to migrate from user-space to kernel-space, but new IA-32 processors provide instructions that optimize this transition (using sysenter and sysexit instructions). Because so many options exist and the end-result is so complicated, I'll stick to a surface-level discussion of the interface details. See the Resources at the end of this article for the gory details.

You needn't fully understand the internals of the SCI to amend it, so I explore a simple version of the system call process (see Figure 1). Each system call is multiplexed into the kernel through a single entry point. The eax register is used to identify the particular system call that should be invoked, which is specified in the C library (per the call from the user-space application). When the C library has loaded the system call index and any arguments, a software interrupt is invoked (interrupt 0x80), which results in execution (through the interrupt handler) of the system_call function. This function handles all system calls, as identified by the contents of eax. After a few simple tests, the actual system call is invoked using the system_call_table and index contained in eax. Upon return from the system call, syscall_exit is eventually reached, and a call to resume_userspace transitions back to user-space. Execution resumes in the C library, which then returns to the user application.

[Jan 3, 2005] Has UNIX Programming Changed in 20 Years By Marc Rochkind.

If all the basics are the same, what has changed? Well, these things:

More System Calls

The number of system calls has quadrupled, more or less, depending on what you mean by "system call." The first edition of Advanced UNIX Programming focused on only about 70 genuine kernel system calls-for example, open, read, and write; but not library calls like fopen, fread, and fwrite. The second edition includes about 300. (There are about 1,100 standard function calls in all, but many of those are part of the Standard C Library or are obviously not kernel facilities.) Today's UNIX has threads, real-time signals, asynchronous I/O, and new interprocess-communication features (POSIX IPC), none of which existed 20 years ago. This has caused, or been caused by, the evolution of UNIX from an educational and research system to a universal operating system. It shows up in embedded systems (parking meters, digital video recorders); inside Macintoshes; on a few million web servers; and is even becoming a desktop system for the masses. All of these uses were unanticipated in 1984.

More Languages

In 1984, UNIX applications were usually programmed in C, occasionally mixed with shell scripts, Awk, and Fortran. C++ was just emerging; it was implemented as a front end to the C compiler. Today, C is no longer the principal UNIX application language, although it's still important for low-level programming and as a reference language. (All the examples in both books are written in C.) C++ is efficient enough to have replaced C when the application requirements justify the extra effort, but many projects use Java instead, and I've never met a programmer who didn't prefer it over C++. Computers are fast enough so that interpretive scripting languages have become important, too, led by Perl and Python. Then there are the web languages: HTML, JavaScript, and the various XML languages, such as XSLT.

Even if you're working in one of these modern languages, though, you still need to know what going on "down below," because UNIX still defines-and, to a degree, limits-what the higher-level languages can do. This is a challenge for many students who want to learn UNIX, but don't want to learn C. And for their teachers, who tire of debugging memory problems and explaining the distinction between declarations and definitions.


To enable students to learn UNIX without first learning C, I developed a Java-to-UNIX system-call interface that I call Jtux. It allows almost all of the UNIX system calls to be executed from Java, using the same arguments and datatypes as the official C calls. You can find out more about Jtux and download its source code from

More Subsystems

The third area of change is that UNIX is both more visible than ever (sold by Wal-Mart!) and more hidden, underneath subsystems like J2EE and web servers, Apache, Oracle, and desktops such as KDE or GNOME. Many application programmers are programming for these subsystems, rather than for UNIX directly. What's more, the subsystems themselves are usually insulated from UNIX by a thin portability layer that has different implementations for different operating systems. Thus, many UNIX system programmers these days are working on middleware, rather than on the end-user applications that are several layers higher up.

More Portability

The fourth change is the requirement for portability between UNIX systems, including Linux and the BSD-derivatives, one of which is the Macintosh OS X kernel (Darwin). Portability was of some interest in 1984, but today it's essential. No developer wants to be locked into a commercial version of UNIX without the possibility of moving to Linux or BSD, and no Linux developer wants to be locked into only one distribution. Platforms like Java help a lot, but only serious attention to the kernel APIs, along with careful testing, will ensure that the code is really portable. Indeed, you almost never hear a developer say that he or she is writing for XYZ's UNIX. It's much more common to hear "UNIX and Linux," implying that the vendor choice will be made later. (The three biggest proprietary UNIX hardware companies-Sun, HP, and IBM-are all strong supporters of Linux.)

More Complete Standards

The requirement for portability is connected with the fifth area of change, the role of standards. In 1984, a UNIX standards effort was just starting. The IEEE's POSIX group hadn't yet been formed. Its first standard, which emerged in 1988, was a tremendous effort of exceptional quality and rigor, but it was of very little use to real-world developers because it left out too many APIs, such as those for interprocess communication and networking. That minimalist approach to standards changed dramatically when The Open Group was formed from the merger of X/Open and the Open Software Foundation in 1996. Its objective was to include all the APIs that the important applications were using, and to specify them as well as time allowed-which meant less precisely than POSIX did. They even named one of their standards Spec 1170, the number being the total of 926 APIs, 70 headers, and 174 commands. Quantity over quality, maybe, but the result meant that for the first time programmers would find in the standard the APIs they really needed. Today, The Open Group's Single UNIX Specification is the best guide for UNIX programmers who need to write portably.

[Nov 29, 2004] The Canberra University views of Processes and Process Management

UNIX System Call programming - List of Postings


[Aug 20, 2004] Manipulating Files And Directories In Unix Copyright (c) 1998-2002 by guy keren.

The following tutorial describes various common methods for reading and writing files and directories on a Unix system. Part of the information is common C knowledge, and is repeated here for completeness. Other information is Unix-specific, although DOS programmers will find some of it similar to what they saw in various DOS compilers. If you are a proficient C programmer, and know everything about the standard I/O functions, its buffering operations, and know functions such as fseek() or fread(), you may skip the standard C library I/O functions section. If in doubt, at least skim through this section, to catch up on things you might not be familiar with, and at least look at the standard C library examples.

  • This document is copyright (c) 1998-2002 by guy keren.

    The material in this document is provided AS IS, without any expressed or implied warranty, or claim of fitness for a particular purpose. Neither the author nor any contributers shell be liable for any damages incured directly or indirectly by using the material contained in this document.

    permission to copy this document (electronically or on paper, for personal or organization internal use) or publish it on-line is hereby granted, provided that the document is copied as-is, this copyright notice is preserved, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.

    Permission to make translations of this document is also granted, under these terms - assuming the translation preserves the meaning of the text, the copyright notice is preserved as-is, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.

    For any questions about the document and its license, please contact the author.

  • UNIX System Calls

    [Apr 17, 2003] Exploring processes with Truss: Part 1 By Sandra Henry-Stocker

    The ps command can tell you quite a few things about each process running on your system. These include the process owner, memory use, accumulated time, the process status (e.g., waiting on resources) and many other things as well. But one thing that ps cannot tell you is what a process is doing - what files it is using, what ports it has opened, what libraries it is using and what system calls it is making. If you can't look at source code to determine how a program works, you can tell a lot about it by using a procedure called "tracing". When you trace a process (e.g., truss date), you get verbose commentary on the process' actions. For example, you will see a line like this each time the program opens a file:

    open("/usr/lib/", O_RDONLY) = 4

    The text on the left side of the equals sign clearly indicates what is happening. The program is trying to open the file /usr/lib/ and it's trying to open it in read-only mode (as you would expect, given that this is a system library). The right side is not nearly as self-evident. We have just the number 4. Open is not a Unix command, of course, but a system call. That means that you can only use the command within a program. Due to the nature of Unix, however, system calls are documented in man pages just like ls and pwd.

    To determine what this number represents, you can skip down in this column or you can read the man page. If you elect to read the man page, you will undoubtedly read a line that tells you that the open() function returns a file descriptor for the named file. In other words, the number, 4 in our example, is the number of the file descriptor referred to in this open call. If the process that you are tracing opens a number of files, you will see a sequence of open calls. With other activity removed, the list might look something like this:

    open("/dev/zero", O_RDONLY) = 3

    open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

    open("/usr/lib/", O_RDONLY) = 4

    open("/usr/lib/", O_RDONLY) = 4

    open64("./../", O_RDONLY|O_NDELAY) = 3

    open64("./../../", O_RDONLY|O_NDELAY) = 3

    open("/etc/mnttab", O_RDONLY) = 4

    Notice that the first file handle is 3 and that file handles 3 and 4 are used repeatedly. The initial file handle is always 3. This indicates that it is the first file handle following those that are the same for every process that you will run - 0, 1 and 2. These represent standard in, standard out and standard error.

    The file handles shown in the example truss output above are repeated only because the associated files are subsequently closed. When a file is closed, the file handle that was used to access it can be used again.

    The close commands include only the file handle, since the location of the file is known. A close command would, therefore, be something like close(3). One of the lines shown above displays a different response - Err#2

    ENOENT. This "error" (the word is put in quotes because this does not necessarily indicate that the process is defective in any way) indicates that the file the open call is attempting to open does not exist. Read "ENOENT" as "No such file".

    Some open calls place multiple restrictions on the way that a file is opened. The open64 calls in the example output above, for example, specify both O_RDONLY and O_NDELAY. Again, reading the man page will help you to understand what each of these specifications means and will present with a list of other options as well.

    As you might expect, open is only one of many system calls that you will see when you run the truss command. Next week we will look at some additional system calls and determine what they are doing.

    Exploring processes with Truss: part 2 By Sandra Henry-Stocker

    While truss and its cousins on non-Solaris systems (e.g., strace on Linux and ktrace on many BSD systems) provide a lot of data on what a running process is doing, this information is only useful if you know what it means. Last week, we looked at the open call and the file handles that are returned by the call to open(). This week, we look at some other system calls and analyze what these system calls are doing. You've probably noticed that the nomenclature for system functions is to follow the name of the call with a set of empty parentheses for example, open(). You will see this nomenclature in use whenever system calls are discussed.

    The fstat() and fstat64() calls obtains information about open files - "fstat" refers to "file status". As you might expect, this information is retrieved from the files' inodes, including whether or not you are allowed to read the files' contents. If you trace the ls command (i.e., truss ls), for example, your trace will start with lines that resemble these:

    1 execve("/usr/bin/ls", 0x08047BCC, 0x08047BD4) argc = 1

    2 open("/dev/zero", O_RDONLY) = 3

    3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xDFBFA000

    4 xstat(2, "/usr/bin/ls", 0x08047934) = 0

    5 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

    6 sysconfig(_CONFIG_PAGESIZE) = 4096

    7 open("/usr/lib/", O_RDONLY) = 4

    8 fxstat(2, 4, 0x08047310) = 0


    28 lstat64(".", 0x080478B4) = 0

    29 open64(".", O_RDONLY|O_NDELAY) = 3

    30 fcntl(3, F_SETFD, 0x00000001) = 0

    31 fstat64(3, 0x0804787C) = 0

    32 brk(0x08057208) = 0

    33 brk(0x08059208) = 0

    34 getdents64(3, 0x08056F40, 1048) = 424

    35 getdents64(3, 0x08056F40, 1048) = 0

    36 close(3) = 0

    In line 31, we see a call to fstat64, but what file is it checking? The man page for the fstat() and your intuition are probably both telling you that this fstat call is obtaining information on the file opened two lines before – "." or the current directory - and that it is referring to this file by its file handle (3) returned by the open() call in line

    2. Keep in mind that a directory is simply a file, though a different variety of file, so the same system calls are used as would be used to check a text file.

    You will probably also notice that the file being opened is called /dev/zero (again, see line 2). Most Unix sysadmins will immediately know that /dev/zero is a special kind of file - primarily because it is stored in /dev. And, if moved to look more closely at the file, they

    will confirm that the file that /dev/zero points to (it is itself a symbolic link) is a special character file. What /dev/zero provides to system programmers, and to sysadmins if they care to use it, is an endless stream of zeroes. This is more useful than might first appear.

    To see how /dev/zero works, you can create a 10M-byte file full of zeroes with a command like this:

    /bin/dd < /dev/zero > zerofile bs=1024 seek=10240 count=1

    This command works well because it creates the needed file with only a few read and write operations; in other words, it is very efficient.

    You can verify that the file is zero-filled with od.

    # od -x zerofile

    0000000 0000 0000 0000 0000 0000 0000 0000 0000



    Each string of four zeros (0000) represents two bytes of data. The * on the second line of output indicates that all of the remaining lines are identical to the first.

    Looking back at the truss output above, we cannot help but notice that the first line of the truss output includes the name of the command that we are tracing. The execve() system call executes a process. The first argument to execve() is the name of the file from which the new process

    image is to be loaded. The mmap() call which follows maps the process image into memory. In

    other words, it directly incorporates file data into the process address space. The getdents64() calls on lines 34 and 35 are extracting information from the directory file - "dents" refers to "directory entries'.

    The sequence of steps that we see at the beginning of the truss output executing the entered command, opening /dev/zero, mapping memory and so on - looks the same whether you are tracing ls, pwd, date or restarting Apache. In fact, the first dozen or so lines in your truss output will be nearly identical regardless of the command you are running. You should, however, expect to see some differences between different Unix systems and different versions of Solaris.

    Viewing the output of truss, you can get a solid sense of how the operating system works. The same insights are available if you are tracing your own applications or troubleshooting third party executables.


    Sandra Henry-Stocker

    Recommended Links

    See sepatate page Unix System Calls Links

    Google matched content

    Softpanorama Recommended

    Top articles



    The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

    Copyright © 1996-2018 by Dr. Nikolai Bezroukov. was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to make a contribution, supporting development of this site and speed up access. In case is down you can use the at


    The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

    Last modified: November 02, 2019