Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Unix System Calls

News Recommended Books Recommended Links Reference Dubeau. Beej Rusling
System V IPC  Filesystems Unix Kernel Linux Memory Management Gnu C library Process_control Memory allocation
Rochkind Burkett et all David Marshall Open Group Search Engine History Humor Etc

System calls are functions that a programmer can call to perform the services of the operating system. There are several online books that describe them at some length, for example Programming in C.

System calls can be roughly grouped into five major categories(System call - Wikipedia): process control, file management, device management, information maintenance and communication.

We will reproduce Wikipedia classification with some modifications:

  1. Process Control (see separate page Process control ).
  2. File management.
  3. Device Management.
  4. Information Maintenance.
  5. Communication.

Man pages should be used as a reference when you study Unix system calls.  The manual pages are divided into eight sections, with section 2 devoted to Unix system calls. They are organized as follows:

1. Commands This section provides information about user-level commands, such as ps and ls.

2. UNIX System Calls This section gives information about the library calls that interface with the UNIX operating system, such as open for opening a file, and exec for executing a program file. These are often accessed by C programmers.

3. Libraries This section contains the library routines that come with the system. An example library that comes with each system is the math library, containing such functions as fabs for absolute value. Like the system call section, this is relevant to programmers.

4. File Formats This section contains information on the file formats of system files, such as init, group, and passwd. This is useful for system administrators.

5. File Formats This section contains information on various system characteristics. For example, a manual page exists here to display the complete ASCII character set (ascii).

6. Games This section usually contains directions for games that came with the system.

7. Device Drivers This section contains information on UNIX device drivers, such scsi and floppy. These are usually pertinent to someone implementing a device driver, as well as the system administrator.

8. System Maintenance This section contains information on commands that are useful for the system administrator, such as how to format a disk.

Section 2 can be very useful as a reference. When you invoke the man command, the output is sent through what is known as a pager. This is a command that lets you view the text one page at a time. The default pager for most UNIX systems is the more command. You can, however, specify a different one by setting the PAGER environment variable.

The second source of information for a particulr call is Google. It usually can get you some useful links to the information for a particular call:

Some of them involve access to data that users must not be permitted to corrupt or even change.

It's often difficult to determine what is a library routine (e.g printf()), and what is a system call (e.g sleep()). They are used in the same way and the only way to tell is to remember which is which.

To obtain information about a system call or library routine, how to use it, what it returns, what it does etc., you can read the on-line manual. If you are looking for the manual on read, you can read the manual by doing: 

% man 2 read - if read is a system call, or % man 3 read - if read is a library routine

All of the entries in Section 2 of the manuals are system calls, and all of the entries in Section 3 are library routines; so if you don't know whether something is a system call, or a library routine, try looking it up in both Sections 2 and 3.

Here is an excerpt from Rochkind's book that introduces system calls, and explain how to use them:

The subject of this book is UNIX system calls, which form the interface between the UNIX kernel and the user programs that run on top of it. Those who interact only with commands, like the shell, text editors, and other application programs, may have little need to know much about system calls, but a thorough knowledge of them is essential for UNIX programmers. System calls are the only way to access kernel facilities such as the file system, the multitasking mechanisms, and the interprocess communication primitives.

System calls define what UNIX is. Everything else -- subroutines and commands -- is built on this foundation. While the novelty of many of these higher-level programs has been responsible for much of UNIX's renown, they could as well have been programmed on any modern operating system. When one describes UNIX as elegant, simple, efficient, reliable, and portable, one is referring not to the commands (some of which are none of these things), but to the kernel. How hard is it to learn UNIX system calls? When I first started programming UNIX, in 1973, it wasn't very hard at all. UNIX -- and its programmer's manual -- was only a fraction of its present size and complexity. There weren't any programming examples in the manual, but all of the source code was on-line and it was easy to read through programs like the shell or the editor to see how system calls worked. Perhaps most important, there were more experienced people around to ask for help. Even Dennis Ritchie and Ken Thompson, the inventors of UNIX, took time out to help me.

Today's aspiring UNIX programmers have a tougher challenge than I did. UNIX is now so widely dispersed that an expert is unlikely to be nearby. Most computers running UNIX are licensed for the object code only, so the source code for commands is unavailable. There are twice as many system calls now as there were in 1973, and the quality of the manual has deteriorated markedly from the days when Ritchie and Thompson did all the system call write-ups. It's now full of grotesque paragraphs like this:

If the set-user-ID mode bit of the new process file is set (see chmod(2)), exec sets the effective user ID of the new process to the owner ID of the new process file. Similarly, if the set-group-ID mode bit of the new process file is set, the effective group ID of the new process is set to the group ID of the new process file. The real user ID and the real group ID of the new process remain the same as those of the calling process.

As an old-timer I understood what this meant when I first saw it, but a newcomer is sure to be completely baffled. And until now, there's been nowhere to turn. This book's goal is to allow any experienced programmer to learn UNIX system calls as easily as I did, and then to use them wisely and portably. It's packed with examples -- over 3500 lines of C code. Instead of just tactics (how the system calls are used), I've tried also to include strategies (why and when they're used). And there's lots of informal advice as well, based on my experiences programming UNIX over the past dozen years.

Flavors of Unix

The number of different flavors of Unix is amazing, and what is worse, the system calls and their parameters change from flavor to flavors. One of the goals in writing Unix programs is to make them as portable as possible across all the flavors of Unix; obviously this isn't possible.

The number of system calls has quadrupled, more or less, depending on what you mean by "system call." The first edition of Advanced UNIX Programming focused on only about 70 genuine kernel system calls—for example, open, read, and write; but not library calls like fopen, fread, and fwrite. The second edition includes about 300. (There are about 1,100 standard function calls in all, but many of those are part of the Standard C Library or are obviously not kernel facilities.) 

However, most of the original 70 Unix system calls haven't changed, so if you try to use these, you should be all right.

Historically there were several variant of Unix calls:

Using System Calls

How does a C programmer actually issue a system call? There is no difference between a system call and any other function call. For example, the read system call might be issued like this:

     amt = read(fd, buf, numbytes);

The implementation of the subroutine read varies with the UNIX implementation. It is usually an assembly language program that uses a machine instruction designed specifically for system calls, which isn't directly executable from C. Nowadays, it's safe to assume that system calls are simply C subroutines. Remember, though, that since a system call involves a context switch (from user to kernel and back), it takes much longer than a simple subroutine call within a process's own address space. So avoiding excessive system calls might be a wise strategy for programs that need to be tightly optimized.

Most system calls return a value. In the read example above, the number of bytes read is returned. To indicate an error, a system call returns a value that can't be mistaken for valid data, namely -1 . Therefore, our read example should have been coded something like this:

    if ((amt= read(fd, buf, numbytes)) == -1)
     {
      printf("Read failed\n");
      exit(1);
     }

Note that exit is a system call too, but it can't return an error.

There are lots of reasons why a system call that returns -1 might have failed. The global integer errno contains a code that indicates the reason. These error codes are defined at the beginning of the system call chapter of the UNIX manual [the pages titled ``intro(2)'']. Note that errno contains valid data only if a system call actually returns -1; you can't use errno alone to determine whether an error occurred.

The library routine perror takes as its argument a string, and prints out the string, a colon, and a description of the error condition stored in errno. So, a way of handling the error above that gives the programmer more information is:

    if ((amt= read(fd, buf, numbytes)) == -1)
     {
      perror("read");
      exit(1);
     }

which might print out read: file does not exist on an error.

Reading System Call Man Pages

The manual pages for all Unix system calls give a declaration for the system call. This shows you what type of value the system call returns, what types of arguments it takes, and what header files you need to include before you can use the system call. As an example, here is part of the man page for the read() system call.

SYNOPSIS

     #include <unistd.h>
     #include <sys/types.h>
     #include <sys/uio.h>

     int
     read(int d, char *buf, int nbytes)

DESCRIPTION

Read() attempts to read nbytes of data from the object referenced by the file descriptor d into the buffer pointed to by buf.

RETURN VALUES

If successful, the number of bytes actually read is returned. Upon reading end-of-file, zero is returned. Otherwise, a -1 is returned and the global variable errno is set to indicate the error.

The first part shows what header files you need to include. Then the declaration of the system call is given.

     int read(int d, char *buf, int nbytes)

read() takes three arguments: an int which is called d in the man page, a pointer to a character called buf (usually an array of characters), and another int called nbytes. read() returns an int as its result.

The names of the arguments given in the man pages need not be the same as the ones you use in your programs, they are only to explain the function of the system call. For example, you could use the read() function in a program as follows:

  int main()
   {
    int i, count, desc;
    char array[500];

    desc=0; count=500;
    i=read(desc, array, count);
   }

Process-IDs and Process Groups

Every process has a process-ID, which is a positive integer. At any instant this is guaranteed to be unique. Every process but one has a parent. The exception is process 0, which is created and used by the kernel itself, for swapping.

A process's system data also records its parent-process-ID, the process-ID of its parent. If a process is orphaned because its parent has terminated, its parent-process-ID is changed to 1. This is the process-ID of the initialization process ( init), which is the ancestor of all other processes. In other words, the initialization process adopts all orphans.

Sometimes programmers choose to implement a subsystem as a group of related processes instead of as a single process. For example, a complex database management system might be broken down into several processes to gain additional concurrency of disk I/O. The UNIX kernel allows these related processes to be organized into a process group.

One of the group members is the group leader. Each member of the group has the group leader's process-ID as its process-group-ID. The kernel provides a system call to send a signal to each member of a designated process group. Typically, this would be used to terminate the entire group as a whole, but any signal can be broadcast in this way.

Any process can resign from its process group, become a leader of its own group (of one) by making its process-group-ID the same as its own process-ID, and then spawn child processes to round out the new group. Hence, a single user could be running, say, 10 processes formed into, say, three process groups.

A process group can have a control terminal, which is the first terminal device opened by the group leader. Normally, the control terminal for a user's processes is the terminal from which the user logged in. When a new process group is formed, the processes in the new group no longer have a control terminal.

The terminal device driver sends interrupt, quit, and hangup signals coming from a terminal to every process for which that terminal is the control terminal. Unless precautions are taken, hanging up a terminal, for example, will terminate all of the user's processes. To prevent this, a process can arrange to ignore hangups (this is what the nohup command does).

When a process group leader terminates for any reason, all processes with the same control terminal are sent a hangup signal, which, unless caught or ignored, terminates them too. This feature makes hard-wired terminals, which can't be physically hung up, behave like those that can. Thus, when a user logs off (terminating the shell, which is normally the process group leader), everything is cleaned up for the next user, just as it would be if the user actually hung up.

In summary, there are three process-IDs associated with each process:

Unix Permissions

A user-ID is a positive integer that is associated with a user's login name in the password file ( /etc/passwd). When a user logs in, the login command makes this ID the user-ID of the first process created, the login shell. Processes descended from the shell inherit this user-ID.

Users are also organized into groups (not to be confused with process groups), which have IDs too, called group-IDs. A user's login group-ID is taken from the password file and made the group-ID of his or her login shell.

Groups are defined in the group file ( /etc/group). While logged in, a user can change to another group of which he or she is a member; this changes the group-ID of the process that handles the request (normally the shell, via the newgrp command), which then is inherited by all descendent processes.

These two IDs are called the real user-ID and the real group-ID because they are representative of the real user, the person who is logged in. Two other IDs are also associated with each process: the effective user-ID and the effective group-ID. These IDs are normally the same as the corresponding real IDs, but they can be different, as we shall see shortly. For now, we'll assume the real and effective IDs are the same.

The effective ID is always used to determine permissions; the real ID is used for accounting and user-to-user communication. One indicates the user's permissions; the other indicates the user's identity.

Each file (ordinary, directory, or special) has, in its i-node, an owner user-ID and an owner group-ID. The i-node also contains three sets of three permission bits (nine bits in all). Each set has one bit for read permission, one bit for write per- mission, and one bit for execute permission. A bit is 1 if the permission is granted and 0 if not. There is a set for the owner, for the owner group, and for others (the public). Here are the bit assignments (bit 0 is the rightmost bit):

Permission bits are frequently specified using an octal number. For example, octal 775 would mean read, write, and execute permission for the owner and the group, and only read and execute permission for others. The ls command would show this combination of permissions as rwxrwxr-x; in binary it would be 111111101; in octal it would be 775.

The permission system determines whether a given process can perform a desired action (read, write, or execute) on a given file. For ordinary files the meaning of the actions is obvious. For directories the meaning of read is obvious, since directories are stored in ordinary files (the ls command reads a directory, for example). ``Write'' permission on a directory means the ability to issue a system call that would modify the directory (add or remove a link). ``Execute'' permission means the ability to use the directory in a path (sometimes called ``search'' permission). For special files, read and write permissions mean the ability to execute the read and write system calls. What, if anything, that implies is up to the designer of the device driver. Execute permission on a special file is meaningless.

The permission system determines whether permission will be granted using this algorithm:

  1. If the effective user-ID is zero, permission is instantly granted (the effective user is the superuser).
  2. If the process's effective user-ID and the file's user-ID match, then the owner set of bits is used to see if the action will be allowed.
  3. If the process's effective group-ID and the file's group-ID match, then the group set of bits is used.
  4. If neither the user-IDs nor group-IDs match, then the process is an ``other'' and the third set of bits is used.

Occasionally we want a user to temporarily take on the privileges of another user. For example, when we execute the passwd command to change our password, we would like the effective user-ID to be that of root (the traditional login name for the superuser), because only root can write into the password file. This is done by making root the owner of the passwd command (i.e., the ordinary file containing the passwd program), and then turning on another permission bit in the passwd command's i-node, called the set-user-ID bit. Executing a program with this bit on changes the effective user-ID to the owner of the file containing the program. Since it's the effective, rather than the real, user-ID that determines permissions, this allows a user to temporarily take on the permissions of someone else. The set-group-ID bit is used in a similar way.

Since both user-IDs (real and effective) are inherited from parent process to child process, it is possible to use the set-user-ID feature to run with an effective user-ID for a very long time.

System Calls to Get IDs

Here are the system calls to get the IDs mentioned above:

    int getuid()            /* Get the real user-ID */
                            /* Returns the ID */

    int getgid()            /* Get the real group-ID */
                            /* Returns the ID */

    int geteuid()           /* Get the effective user-ID */
                            /* Returns the ID */

    int getegid()           /* Get the effective group-ID */
                            /* Returns the ID */

    int getpid()            /* Get the process-ID */
                            /* Returns the ID */

    int getppid()           /* Get the parent process-ID */
                            /* Returns the ID */

    int getpgrp()           /* Get the process-group-ID */
                            /* Returns the ID */

Each of these system calls returns a single ID, as indicated by the comments following their function headers.

time System Call

    long time(timep)                      /* Get system time */
     long *timep;                         /* Pointer to time */

time returns the time, in seconds, since January I, 1970. If the argument timep is not NULL, the current time is stored into the long integer to which it points. This is a carry-over from the days before the C language supported long integers. It is of no use now that a simple assignment statement can be used to capture the return value. The argument to time should always be NULL. i.e value= time(NULL);


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

2005 2004 2003 2002 2001

[Dec 15, 2010] “Beej” UNIX Inter Process Communication (IPC) tutorial By Brian Hall

Explains the different aspects of traditional UNIX Inter Process Communication (IPC). Brian Hall provides a lot of C code where you can compile / test these concepts yourself for a better understanding.

[Mar 21, 2007] Kernel command using Linux system calls

21 Mar 2007 (IBM Developerworks) Linux® system calls -- we use them every day. But do you know how a system call is performed from user-space to the kernel? Explore the Linux system call interface (SCI), learn how to add new system calls (and alternatives for doing so), and discover utilities related to the SCI.

A system call is an interface between a user-space application and a service that the kernel provides. Because the service is provided in the kernel, a direct call cannot be performed; instead, you must use a process of crossing the user-space/kernel boundary. The way you do this differs based on the particular architecture. For this reason, I'll stick to the most common architecture, i386.

In this article, I explore the Linux SCI, demonstrate adding a system call to the 2.6.20 kernel, and then use this function from user-space. I also investigate some of the functions that you'll find useful for system call development and alternatives to system calls. Finally, I look at some of the ancillary mechanisms related to system calls, such as tracing their usage from a given process.

The SCI

The implementation of system calls in Linux is varied based on the architecture, but it can also differ within a given architecture. For example, older x86 processors used an interrupt mechanism to migrate from user-space to kernel-space, but new IA-32 processors provide instructions that optimize this transition (using sysenter and sysexit instructions). Because so many options exist and the end-result is so complicated, I'll stick to a surface-level discussion of the interface details. See the Resources at the end of this article for the gory details.

You needn't fully understand the internals of the SCI to amend it, so I explore a simple version of the system call process (see Figure 1). Each system call is multiplexed into the kernel through a single entry point. The eax register is used to identify the particular system call that should be invoked, which is specified in the C library (per the call from the user-space application). When the C library has loaded the system call index and any arguments, a software interrupt is invoked (interrupt 0x80), which results in execution (through the interrupt handler) of the system_call function. This function handles all system calls, as identified by the contents of eax. After a few simple tests, the actual system call is invoked using the system_call_table and index contained in eax. Upon return from the system call, syscall_exit is eventually reached, and a call to resume_userspace transitions back to user-space. Execution resumes in the C library, which then returns to the user application.

[Jan 3, 2005] Has UNIX Programming Changed in 20 Years  By Marc Rochkind.

If all the basics are the same, what has changed? Well, these things:

More System Calls

The number of system calls has quadrupled, more or less, depending on what you mean by "system call." The first edition of Advanced UNIX Programming focused on only about 70 genuine kernel system calls—for example, open, read, and write; but not library calls like fopen, fread, and fwrite. The second edition includes about 300. (There are about 1,100 standard function calls in all, but many of those are part of the Standard C Library or are obviously not kernel facilities.) Today's UNIX has threads, real-time signals, asynchronous I/O, and new interprocess-communication features (POSIX IPC), none of which existed 20 years ago. This has caused, or been caused by, the evolution of UNIX from an educational and research system to a universal operating system. It shows up in embedded systems (parking meters, digital video recorders); inside Macintoshes; on a few million web servers; and is even becoming a desktop system for the masses. All of these uses were unanticipated in 1984.

More Languages

In 1984, UNIX applications were usually programmed in C, occasionally mixed with shell scripts, Awk, and Fortran. C++ was just emerging; it was implemented as a front end to the C compiler. Today, C is no longer the principal UNIX application language, although it's still important for low-level programming and as a reference language. (All the examples in both books are written in C.) C++ is efficient enough to have replaced C when the application requirements justify the extra effort, but many projects use Java instead, and I've never met a programmer who didn't prefer it over C++. Computers are fast enough so that interpretive scripting languages have become important, too, led by Perl and Python. Then there are the web languages: HTML, JavaScript, and the various XML languages, such as XSLT.

Even if you're working in one of these modern languages, though, you still need to know what going on "down below," because UNIX still defines—and, to a degree, limits—what the higher-level languages can do. This is a challenge for many students who want to learn UNIX, but don't want to learn C. And for their teachers, who tire of debugging memory problems and explaining the distinction between declarations and definitions.

TIP

To enable students to learn UNIX without first learning C, I developed a Java-to-UNIX system-call interface that I call Jtux. It allows almost all of the UNIX system calls to be executed from Java, using the same arguments and datatypes as the official C calls. You can find out more about Jtux and download its source code from http://basepath.com/aup/.

More Subsystems

The third area of change is that UNIX is both more visible than ever (sold by Wal-Mart!) and more hidden, underneath subsystems like J2EE and web servers, Apache, Oracle, and desktops such as KDE or GNOME. Many application programmers are programming for these subsystems, rather than for UNIX directly. What's more, the subsystems themselves are usually insulated from UNIX by a thin portability layer that has different implementations for different operating systems. Thus, many UNIX system programmers these days are working on middleware, rather than on the end-user applications that are several layers higher up.

More Portability

The fourth change is the requirement for portability between UNIX systems, including Linux and the BSD-derivatives, one of which is the Macintosh OS X kernel (Darwin). Portability was of some interest in 1984, but today it's essential. No developer wants to be locked into a commercial version of UNIX without the possibility of moving to Linux or BSD, and no Linux developer wants to be locked into only one distribution. Platforms like Java help a lot, but only serious attention to the kernel APIs, along with careful testing, will ensure that the code is really portable. Indeed, you almost never hear a developer say that he or she is writing for XYZ's UNIX. It's much more common to hear "UNIX and Linux," implying that the vendor choice will be made later. (The three biggest proprietary UNIX hardware companies—Sun, HP, and IBM—are all strong supporters of Linux.)

More Complete Standards

The requirement for portability is connected with the fifth area of change, the role of standards. In 1984, a UNIX standards effort was just starting. The IEEE's POSIX group hadn't yet been formed. Its first standard, which emerged in 1988, was a tremendous effort of exceptional quality and rigor, but it was of very little use to real-world developers because it left out too many APIs, such as those for interprocess communication and networking. That minimalist approach to standards changed dramatically when The Open Group was formed from the merger of X/Open and the Open Software Foundation in 1996. Its objective was to include all the APIs that the important applications were using, and to specify them as well as time allowed—which meant less precisely than POSIX did. They even named one of their standards Spec 1170, the number being the total of 926 APIs, 70 headers, and 174 commands. Quantity over quality, maybe, but the result meant that for the first time programmers would find in the standard the APIs they really needed. Today, The Open Group's Single UNIX Specification is the best guide for UNIX programmers who need to write portably.

[Nov 29, 2004] The Canberra University views of Processes and Process Management

UNIX System Call programming - List of Postings

Threads

[Aug 20, 2004] Manipulating Files And Directories In Unix Copyright (c) 1998-2002 by guy keren.

The following tutorial describes various common methods for reading and writing files and directories on a Unix system. Part of the information is common C knowledge, and is repeated here for completeness. Other information is Unix-specific, although DOS programmers will find some of it similar to what they saw in various DOS compilers. If you are a proficient C programmer, and know everything about the standard I/O functions, its buffering operations, and know functions such as fseek() or fread(), you may skip the standard C library I/O functions section. If in doubt, at least skim through this section, to catch up on things you might not be familiar with, and at least look at the standard C library examples.

  • This document is copyright (c) 1998-2002 by guy keren.

    The material in this document is provided AS IS, without any expressed or implied warranty, or claim of fitness for a particular purpose. Neither the author nor any contributers shell be liable for any damages incured directly or indirectly by using the material contained in this document.

    permission to copy this document (electronically or on paper, for personal or organization internal use) or publish it on-line is hereby granted, provided that the document is copied as-is, this copyright notice is preserved, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.

    Permission to make translations of this document is also granted, under these terms - assuming the translation preserves the meaning of the text, the copyright notice is preserved as-is, and a link to the original document is written in the document's body, or in the page linking to the copy of this document.

    For any questions about the document and its license, please contact the author.
  • UNIX System Calls

    [Apr 17, 2003] Exploring processes with Truss: Part 1 By Sandra Henry-Stocker

    The ps command can tell you quite a few things about each process running on your system. These include the process owner, memory use, accumulated time, the process status (e.g., waiting on resources) and many other things as well. But one thing that ps cannot tell you is what a process is doing - what files it is using, what ports it has opened, what libraries it is using and what system calls it is making. If you can't look at source code to determine how a program works, you can tell a lot about it by using a procedure called "tracing". When you trace a process (e.g., truss date), you get verbose commentary on the process' actions. For example, you will see a line like this each time the program opens a file:

    open("/usr/lib/libc.so.1", O_RDONLY) = 4

    The text on the left side of the equals sign clearly indicates what is happening. The program is trying to open the file /usr/lib/libc.so.1 and it's trying to open it in read-only mode (as you would expect, given that this is a system library). The right side is not nearly as self-evident. We have just the number 4. Open is not a Unix command, of course, but a system call. That means that you can only use the command within a program. Due to the nature of Unix, however, system calls are documented in man pages just like ls and pwd.

    To determine what this number represents, you can skip down in this column or you can read the man page. If you elect to read the man page, you will undoubtedly read a line that tells you that the open() function returns a file descriptor for the named file. In other words, the number, 4 in our example, is the number of the file descriptor referred to in this open call. If the process that you are tracing opens a number of files, you will see a sequence of open calls. With other activity removed, the list might look something like this:

    open("/dev/zero", O_RDONLY) = 3

    open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

    open("/usr/lib/libc.so.1", O_RDONLY) = 4

    open("/usr/lib/libdl.so.1", O_RDONLY) = 4

    open64("./../", O_RDONLY|O_NDELAY) = 3

    open64("./../../", O_RDONLY|O_NDELAY) = 3

    open("/etc/mnttab", O_RDONLY) = 4

    Notice that the first file handle is 3 and that file handles 3 and 4 are used repeatedly. The initial file handle is always 3. This indicates that it is the first file handle following those that are the same for every process that you will run - 0, 1 and 2. These represent standard in, standard out and standard error.

    The file handles shown in the example truss output above are repeated only because the associated files are subsequently closed. When a file is closed, the file handle that was used to access it can be used again. 

    The close commands include only the file handle, since the location of the file is known. A close command would, therefore, be something like close(3). One of the lines shown above displays a different response - Err#2

    ENOENT. This "error" (the word is put in quotes because this does not necessarily indicate that the process is defective in any way) indicates that the file the open call is attempting to open does not exist. Read "ENOENT" as "No such file".

    Some open calls place multiple restrictions on the way that a file is opened. The open64 calls in the example output above, for example, specify both O_RDONLY and O_NDELAY. Again, reading the man page will help you to understand what each of these specifications means and will present with a list of other options as well.

    As you might expect, open is only one of many system calls that you will see when you run the truss command. Next week we will look at some additional system calls and determine what they are doing.

    Exploring processes with Truss: part 2 By Sandra Henry-Stocker

    While truss and its cousins on non-Solaris systems (e.g., strace on Linux and ktrace on many BSD systems) provide a lot of data on what a running process is doing, this information is only useful if you know what it means. Last week, we looked at the open call and the file handles that are returned by the call to open(). This week, we look at some other system calls and analyze what these system calls are doing. You've probably noticed that the nomenclature for system functions is to follow the name of the call with a set of empty parentheses for example, open(). You will see this nomenclature in use whenever system calls are discussed.

    The fstat() and fstat64() calls obtains information about open files - "fstat" refers to "file status". As you might expect, this information is retrieved from the files' inodes, including whether or not you are allowed to read the files' contents. If you trace the ls command (i.e., truss ls), for example, your trace will start with lines that resemble these:

    1 execve("/usr/bin/ls", 0x08047BCC, 0x08047BD4) argc = 1

    2 open("/dev/zero", O_RDONLY) = 3

    3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xDFBFA000

    4 xstat(2, "/usr/bin/ls", 0x08047934) = 0

    5 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

    6 sysconfig(_CONFIG_PAGESIZE) = 4096

    7 open("/usr/lib/libc.so.1", O_RDONLY) = 4

    8 fxstat(2, 4, 0x08047310) = 0

    ...

    28 lstat64(".", 0x080478B4) = 0

    29 open64(".", O_RDONLY|O_NDELAY) = 3

    30 fcntl(3, F_SETFD, 0x00000001) = 0

    31 fstat64(3, 0x0804787C) = 0

    32 brk(0x08057208) = 0

    33 brk(0x08059208) = 0

    34 getdents64(3, 0x08056F40, 1048) = 424

    35 getdents64(3, 0x08056F40, 1048) = 0

    36 close(3) = 0

    In line 31, we see a call to fstat64, but what file is it checking? The man page for the fstat() and your intuition are probably both telling you that this fstat call is obtaining information on the file opened two lines before – "." or the current directory - and that it is referring to this file by its file handle (3) returned by the open() call in line

    2. Keep in mind that a directory is simply a file, though a different variety of file, so the same system calls are used as would be used to check a text file.

    You will probably also notice that the file being opened is called /dev/zero (again, see line 2). Most Unix sysadmins will immediately know that /dev/zero is a special kind of file - primarily because it is stored in /dev. And, if moved to look more closely at the file, they

    will confirm that the file that /dev/zero points to (it is itself a symbolic link) is a special character file. What /dev/zero provides to system programmers, and to sysadmins if they care to use it, is an endless stream of zeroes. This is more useful than might first appear.

    To see how /dev/zero works, you can create a 10M-byte file full of zeroes with a command like this:

    /bin/dd < /dev/zero > zerofile bs=1024 seek=10240 count=1

    This command works well because it creates the needed file with only a few read and write operations; in other words, it is very efficient.

    You can verify that the file is zero-filled with od.

    # od -x zerofile

    0000000 0000 0000 0000 0000 0000 0000 0000 0000

    *

    50002000

    Each string of four zeros (0000) represents two bytes of data. The * on the second line of output indicates that all of the remaining lines are identical to the first.

    Looking back at the truss output above, we cannot help but notice that the first line of the truss output includes the name of the command that we are tracing. The execve() system call executes a process. The first argument to execve() is the name of the file from which the new process

    image is to be loaded. The mmap() call which follows maps the process image into memory. In

    other words, it directly incorporates file data into the process address space. The getdents64() calls on lines 34 and 35 are extracting information from the directory file - "dents" refers to "directory entries'.

    The sequence of steps that we see at the beginning of the truss output executing the entered command, opening /dev/zero, mapping memory and so on - looks the same whether you are tracing ls, pwd, date or restarting Apache. In fact, the first dozen or so lines in your truss output will be nearly identical regardless of the command you are running. You should, however, expect to see some differences between different Unix systems and different versions of Solaris.

    Viewing the output of truss, you can get a solid sense of how the operating system works. The same insights are available if you are tracing your own applications or troubleshooting third party executables.

    -------------------

    Sandra Henry-Stocker

    Recommended Links

    See sepatate page Unix System Calls Links

    Softpanorama Top Visited

    Softpanorama Recommended




    Etc

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.

    Society

    Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

    Quotes

    War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

    Bulletin:

    Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

    History:

    Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

    Classic books:

    The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

    Most popular humor pages:

    Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

    The Last but not Least


    Copyright © 1996-2014 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

    This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

    You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.

    Disclaimer:

    The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    Last modified: February 19, 2014