May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Unix Kernel Internals

News See also Recommended Books Recommended Links Linux Performance Tuning Papers & Tutorials Unix System Calls
init Linux process management Virtual memory Linux filesystems Linux Swap filesystem Linux
Linux Performance Tuning
Solaris Internals Controlling System Processes in Solaris Solaris Virtual Memory Disk and Filesystems Management in Solaris Swap Space and Virtual Memory Solaris Networking Solaris Performance Tuning
IRC IEEE Software 1999   History Random Findings Humor Etc

Unix kernel is a pretty old, probably the forth oldest surviving kernel in existence (if we count VMS, MVS and VM/CMS ). The first version of Unix was developed in 1969 by Ken Thompson with strong influence from Multics. After  several years of internal development, the team from Berkeley led by Bill Joy made important contributions. The first Berkeley major contribution was addition in 1978 of virtual memory and on-demand paging, The result is widely known as 3BSD UNIX. 

This work convinced DARPA to fund Berkeley for the development of a standard Unix system for government use that included networking protocol now known as TCP/IP. The result was 4BSD that was able to communicate uniformly among diverse ser of protocols including LAN (Ethernet and token rings), as well as wide area networks.  In 1983 4.2 BSD and in 1986 4.3BSD was released. The quality of those implementations and their free availability was probably one of the most important reason of popularity of networking and rapid grows of Internet. The last Berkeley release was finalized in June of 1993. It included BSD Fast Filesystem (ffs) and NFS (originated by Sun).

That paradoxically coincided with Microsoft self-imposed withdrawal from the Unix scene: in October 1988 Dave Cutler, the architect of VAX/VMS was hired by Microsoft and tasked with the development of new OS which will became world famous Windows NT. After Microsoft withdrawal,  the main developer and promoter of Unix became Sun Microsystems. It introduced several important enhancement of the OS like /proc filesystem,  virtual filesystems layer (required for NFS implementation), RPC and several others.

Commercial part of Unix story from early 80th was dominated by Microsoft which produced XENIX and Sun which produced SunOS (1984). In 1989 the ANSI standard of C language was approved and Unix was ported to this new version of C.  The fact the Microsoft withdraw from the Unix development also provided a nice  opening for Linux as the kernel re-implementation  project originally started in Finland and then moved to the USA.

While initially a free software project Linux soon  became a part of commercial story of Unix due to existence of enterprise distributions like Red Hat and Suse as well as peculiarities' of the license used (GPL), which permitted "brute-force/largest player survives" commercialization.  While in essence Linux was a reimplementation of Unix kernel as any reimplementation it helped to polish certain areas and also served as stimulus for established Unix players to upgrade and made more compatible their offering. Linux soon became the lowest common denominator in Unix world. 

Kernel provides the following key functions:

They are all provided via system calls.

Those day Unix kernel became complex and stray away from the original design goals. Neither simplicity of orientation of programmers as the main users survives commercial success. Those goals quietly died.  Here is an interesting quote from Solaris kernel developer Andy Tucker interview:

The nature of OS research has changed over the years. In the 80's and early 90's, there was a lot of "big systems" research; universities and industry labs would start by building an operating system, and then use that as a platform for investigating new ideas. So CMU had Mach, Berkeley had Sprite, Stanford had the V System, etc.. This meant that there was a lot of re-examination of basic OS constructs --- how to best build an OS from the ground up. As a result we had work on distributed systems, microkernels, etc. --- but the systems were all aimed at supporting the same applications, essentially the ones running on the researchers' desktops.

Now most of the research I see is based on existing OS platforms, usually Linux or one of the *BSDs. The focus is often on improving support for new types of applications --- multimedia, mobility, etc.. So we have fewer people looking at the basic structure of operating systems (with some notable exceptions), but more looking at how to make operating systems perform better from a user's point of view. The use of existing OS platforms also removes some of the barriers to entry for OS research --- universities with small OS groups and budgets can do interesting research without having to build an entirely new operating system.

30 years after UNIX was recoded in C, most people still use C (or in some cases a little bit of C++) for the OS kernel. Is C perfectly adequate, or do they see some of the newer languages (C#, Java, or even modern C++ paradigms) being applied to OS design?

Andy Tucker: There have been various experiments in this area; as an example, Sun has developed operating systems in both C++ (SpringOS) and Java (JavaOS). While object-oriented languages offer a number of advantages in terms of ease of development for higher-level programming abstractions, this doesn't always benefit OS kernels as much as it would user applications. Since the kernel is the piece of software that most directly interacts with the hardware, the benefits of having a simple mapping between the language and machine instructions is often more compelling than ease-of-development features like garbage collection and templates. There are also issues like runtime support requirements that can be extensive, depending on the language. What we often wind up doing instead is taking some of the concepts from object-oriented languages, such as polymorphism, and finding creative ways to implement them in non-OO languages like C.

How do you feel Solaris process management technologies like the Fair Share Scheduler will stack up to the Linux O(1) scheduler. Furthermore, has Sun ever attempted to implement an O(1) scheduler for Solaris and if so, what problems/drawbacks they encountered which kept it out of the released kernels.

Andy Tucker: Solaris has actually had an O(1) scheduler for a number of years. The run queues are also per-CPU to maximize scalability. This isn't a secret, but we haven't talked about the technology itself much; we've been mostly focused on the results.

The "fair-share scheduler" is one of several scheduling policies in Solaris, which control how priorities are assigned to individual processes. This is separate from the scheduler, which handles dispatching processes onto processors in priority order.

The fair-share scheduler allows the allocation of CPU in the system to be divided among groups of processes according to proportions defined by an administrator. For example, on a system running both a mail server and a web server, the administrator might decide that if the system is busy, 2/3 of the CPU should go to the mail server, and 1/3 should go the web server. Although in the past the fair-share scheduler was available only as a separate product (Solaris Resource Manager), we decided that it was important enough technology for our customers to bundle in the core operating system.

What is the future holds for Solaris 10? What enhancements are in-store in the OS and kernel level? Are there any plans to integrate the Gridengine into Solaris rather than being a separate application?

Andy Tucker: Solaris 10 will have a number of new features that we think are pretty exciting. One is Solaris Zones --- this takes an idea that was initially developed for FreeBSD (jails) and extends it to address the needs of our customers. It allows administrators to divide up a single system into a number of separate application environments, called zones, where processes in one zone are not able to see or interact with those in other zones. This means that multiple applications can run on the same system without conflicting with each other, but the administrator only has to deal with one OS kernel for backups, patches, etc..

We're also looking at ways to improve system reliability and observability. Solaris 10 will include tools that allow tracing not only what's going on at user level, but also what's going on in the kernel. So a developer trying to understand why their application is performing poorly can get information from the whole software stack and get a much better picture of what's really going on. We're also using these tools internally to improve the performance and reliability of Solaris and other Sun software.

Nice overview of Linux kernel is provided in the Performance Tuning for Linux An Introduction to Kernels Linux Kernel Architecture. Here is an extended quote from sample chapter:

Let’s begin this section by discussing the architecture of the Linux kernel, including responsibilities of the kernel, its organization and modules, services of the kernel, and process management.

Kernel Responsibilities

The kernel (also called the operating system) has two major responsibilities:

Some operating systems allow applications to directly access hardware components, although this capability is very uncommon nowadays. UNIX-like operating systems hide all the low-level hardware details from an application. If an application wants to make use of a hardware resource, it must make a request to the operating system. The operating system then evaluates the request and interacts with the hardware component on behalf of the application, but only if it’s valid. To enforce this kind of scheme, the operating system needs to depend on hardware capabilities that forbid applications to directly interact with them.

Organization and Modules

Like many other UNIX-like operating systems, the Linux kernel is monolithic. This means that even though Linux is divided into subsystems that control various components of the system (such as memory management and process management), all of these subsystems are tightly integrated to form the whole kernel. In contrast, microkernel operating systems provide bare, minimal functionality, and all other operating system layers are performed on top of microkernels as processes. Microkernel operating systems are generally slower due to message passing between the various layers. However, microkernel operating systems can be extended very easily.

Linux kernels can be extended by modules. A module is a kernel feature that provides the benefits of a microkernel without a penalty. A module is an object that can be linked to the kernel at runtime.

Using Kernel Services

The kernel provides a set of interfaces for applications running in user mode to interact with the system. These interfaces, also known as system calls, give applications access to hardware and other kernel resources. System calls not only provide applications with abstracted hardware, but also ensure security and stability.

Most applications do not use system calls directly. Instead, they are programmed to an application programming interface (API). It is important to note that there is no relation between the API and system calls. APIs are provided as part of libraries for applications to make use of. These APIs are generally implemented through the use of one or more system calls.

/proc File System—External Performance View

The /proc file system provides the user with a view of internal kernel data structures. It also lets you look at and change some of the kernel internal data structures, thereby changing the kernal’s behavior. The /proc file system provides an easy way to fine-tune system resources to improve the performance not only of applications but of the overall system.

/proc is a virtual file system that is created dynamically by the kernel to provide data. It is organized into various directories. Each of these directories corresponds to tunables for a given subsystem. Appendix A explains in detail how to use the /proc file system to fine-tune your system.

Another essential of the Linux system is memory management. In the next section, we’ll cover five aspects of how Linux handles this management.

Memory Management

The various aspects of memory management in Linux include address space, physical memory, memory mapping, paging, and swapping.

See also

Top updates

Bulletin Latest Past week Past month
Google Search


Old News ;-)

Become a Linux Kernel Hacker and Write Your Own Module

Soulskill :

M-Saunders (706738) writes "It might sound daunting, but kernel hacking isn't a mysterious black art reserved for the geekiest of programmers. With a bit of background knowledge, anyone with a grounding in C can implement a new kernel module and understand how the kernel works internally. Linux Voice explains how to write a module that creates a new device node, /dev/reverse, that reverses a string when it's written to it. Sure, it's not the most practical example in the world, but it's a good starting point for your own projects, and gives you an insight into how it all fits together."

MindPrison (864299) | yesterday | (#47102547)

Very true... (4, Interesting)

...I remember my first meeting with Slackware, it was a Linux distro that provoked any user to learn stuff from scratch, and you HAD to use the command line (bash/shell) to install it if you wanted to use it. This forced me to learn Linux. (At least some of the basics)

It also came with a Kernel compilation system + all the needed libraries and packages, so compiling to your own computer was a few commands and worked right out of the box. And then my curiosity got piqued and this drove me to go into the configuration and find out how I could optimize my kernel to fit my needs. In the beginning it was a lot of trial and error, and it looked real daunting, but after a few tries - it wasn't nearly as scary. Before you knew it, I was coding my first stuff in C++. A lot of fun, actually.

So yeah, by all means - if you guys have the time, the curiosity, do go ahead and code something, but do yourself a favor - start off easy.

ADRA (37398) | yesterday | (#47102567)

Umm (4, Insightful)

Well yes, any C developer (already a minority in the umbrella of 'programmers' these days) can write code for the kernel, but just because one can write software for the kernel doesn't mean they can write anything meaningful to be done in kernel space vs. anywhere else. If you're expecting a slew of new driver hackers reverse engineering chipsets, and implementing better drivers, testing all corner cases (because dev's LOVE testing) I think you're barking up a very small tree, but all the luck to you, becase what's good for Linux is good for me, you, us all.

shoor (33382) | yesterday | (#47103871)

Re:First Tutorial I've seen with Goto... (2)

I got my intro to programming in the mid 1960s with 'the college computer' a PDP-8 that we programmed in Fortran using punched cards. In those days, just getting access to a computer was a pretty big deal, but things were changing, so 'programming paradigms' started appearing, and the first one that I remember was 'structured programming'. This is where I first heard the mantra of 'goto-less' programming. (Before that, the mantra was not to write self-modifying code, which was something you almost had to be writing assembly language code to be able to do, though COBOL had an 'alters' statement as I recall.)

I remember being somewhat startled by the idea of excluding gotos. How could you write non trivial code without any goto statements? I actually thought of it almost as a challenge to figure out how to do so. The opposite of structured code was 'spaghetti code'. Anyway, it's become a conventional bit of wisdom that I suppose is just automatically passed down to each generation of students without anyone ever seriously questioning it, except those who find they really need it sometimes. At some point I started defiantly putting an occasional goto in my code again, but not often.

Eravnrekaree (467752) | yesterday | (#47103379)

Writing modules near impossible (3, Interesting)

While the article shows a cute little example on how to write a useless module, it does not show anyone how to actually write a serious kernel module. The Linux kernel has never been known for documenting kernel internals, such documentation is scant at best and simply not sufficient to write a module.

It is safe to say tha due to the poor practices of Kernel developers who consistently ignore good practice by not Documenting Their Crap, the kernel is an elite club of developers with knowledge that is secret. The practices of the Linux kernel development is just sheer sloppiness, horribly bad practice.

They could have easily set up a Wiki and documented the interfaces and their architecture. What we see with the kernel developers is that they do not care about anyone else, not users, and not even outside techies, so why would they care about whether or not an outsider can understand the kernel, just as why would they care if a user can upgrade kernel versions without having all of their device drivers blow up.

As anyone well versed in computer science knows, computer code is rarely self documenting, especially the kernel, and trying to reverse document a large software project is an outrageous waste of time and can be enough of a problem that it keeps even seasoned programmers away from the project. A huge piece of undocumented code is just not worth the effort to learn.


[Mar 04, 2011]   Linux Scheduler simulation by M. Tim Jones

See also Inside the Linux Scheduler (developerWorks, June 2006)

Scheduling is one of the most complex—and interesting—aspects of the Linux kernel. Developing schedulers that provide suitable behavior for single-core machines to quad-core servers can be difficult. Luckily, the Linux Scheduler Simulator (LinSched) hosts your Linux scheduler in user space (for scheduler prototyping) while modeling arbitrary hardware targets to validate your scheduler across a spectrum of topologies. Learn about LinSched and how to experiment with your scheduler for Linux.

Operating Systems Lecture Notes Lecture 6 CPU Scheduling by Martin C. Rinard

Permission is granted to copy and distribute this material for educational purposes only, provided that the following credit line is included: "Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard." Permission is granted to alter and distribute this material provided that the following credit line is included: "Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard."

Anatomy of Linux process management by M. Tim Jones

Dec 20,  2008 | developerWorks

Linux is a very dynamic system with constantly changing computing needs. The representation of the computational needs of Linux centers around the common abstraction of the process. Processes can be short-lived (a command executed from the command line) or long-lived (a network service). For this reason, the general management of processes and their scheduling is very important.

From user-space, processes are represented by process identifiers (PIDs). From the user's perspective, a PID is a numeric value that uniquely identifies the process. A PID doesn't change during the life of a process, but PIDs can be reused after a process dies, so it's not always ideal to cache them.

In user-space, you can create processes in any of several ways. You can execute a program (which results in the creation of a new process) or, within a program, you can invoke a fork or exec system call. The fork call results in the creation of a child process, while an exec call replaces the current process context with the new program. I discuss each of these methods to understand how they work.

For this article, I build the description of processes by first showing the kernel representation of processes and how they're managed in the kernel, then review the various means by which pro

Linux kernel advances

What's new in 2.6.28?

Linux kernel 2.6.28 was released on December 24, 2008 (at release 5 as of early February, 2009). This first release of 2.6.28 includes a large number of changes—so large that the change-log text file is itself almost 6MB in size. This release is viewed as so stable that it's the kernel of the next Ubuntu distribution, version 9.04, Jaunty Jackalope.

The fourth extended file system

The fourth extended file system (ext4) file system was renamed from ext4dev to ext4, which means that it's stable enough for regular use. Ext4 is the successor to the standard third extended file system (ext3) available today, but with better performance, features, and reliability. Ext4 permits exabyte file systems that can support larger numbers of files, larger files, and deeper directory structures. It also includes extents with multi-block and delayed block allocation for performance. Ext4 is both forward and backward compatible (meaning that you can mount an ext4 file system on an ext3 disk format and vice versa, depending upon the features used). You can also gradually migrate a file system from ext3 to ext4 online with a mass change. For links to more information about the ext4 file system, see Resources.

And although ext4 will be the new standard Linux file system for some time to come, other file systems are coming that offer even better scalability and features. One such file system, Btrfs, is available in an experimental form in the 2.6.29 kernel. Btrfs is a Linux-compatible file system (read GNU Public License [GPL]) that competes in features with the well-known ZFS.

Graphics Execution Manager memory management

One of the areas that has seen solid improvements over the past year is the Linux graphics stack. Not surprisingly, it's also an area where graphics processor units (GPUs) provide useful assists for rendering. In many cases, GPUs are more powerful than the central processing units (CPUs) they assist.

To support the GPUs of today and tomorrow, one area of the Linux graphics stack that needed improvement was memory management, including buffer management, page mapping, placement, and caching. This was necessary because graphics applications—particularly three-dimensional applications—can consume a vast amount of memory. The Graphics Execution Manager (GEM) helps here by providing ways to manage graphics data that blends into the kernel using the existing kernel subsystems (such as using the shared memory file system, or shmfs, to manage graphic objects).

Boot tracer

Although the time required to boot Linux has shrunk over time, expectations are still that it takes too long. For that reason, boot times remain under scrutiny. This kernel includes a new feature to measure and record the timings of init calls. The timings can be used later to visualize the flow and performance of the boot process. This process is configurable (it requires enabling to collect the data), but once collected, the data can be analyzed using offline scripts (including graphical depictions), which will ultimately lead to better boot times and a more optimized boot process. This update incorporates the process identifier (PID) of the calling thread so that the parallelism of the boot process can be viewed.


Based conceptually on the idea of suspending an operating system for the purpose of migrating it to a new host (for example, virtual machine, or VM, migration), a new capability called freezing (and thawing) has been committed. This new feature allows either a group of tasks or a file system to be frozen and kept in its freeze-time state, later to be thawed to reintroduce the task group or file system.

You freeze tasks in the context of a container, which is a scheme that virtualizes operating systems at the user-space level (a single kernel supports multiple user spaces). This new functionality is a step in the direction of migrating a set of processes between hosts, which can be very useful for load balancing. You can also freeze file systems to support snapshots for file system backup. Currently, file system freezing is achieved through an ioctl with an argument of FIFREEZE or FITHAW.

Outside of containers, this new freeze/thaw scheme can find uses in checkpointing. In this application, you could freeze a collection of related processes at specific intervals (checkpoints), then thaw a particular epoch as a way to roll back to a known good state.

Improved virtual memory scalability

As Linux finds increasing use in virtualized systems—particularly those with many processors and vast amounts of memory—the ability to scale memory usage becomes critical to performance. Kernel 2.6.28 includes a number of scalability enhancements related to memory. For example, this kernel maintains separate Least Recently Used (LRU) lists, one for pages backed by files and another for pages backed by swap. This allows the kernel to focus on swap-backed pages, which are more likely to be written to disk, and pay less attention to file-backed pages.

Another change separates the evictable pages from the unevictable pages (such as those that were locked through mlock). In this way, the pageout code does not need to iterate unevictable pages in the LRU list, leading to improved performance in systems with very large numbers of pages.

[Sep 09, 2008] Kernel tuning with sysctl by Federico Kereki

The Linux kernel is flexible, and you can even modify the way it works on the fly by dynamically changing some of its parameters, thanks to the sysctl command. Sysctl provides an interface that allows you to examine and change several hundred kernel parameters in Linux or BSD. Changes take effect immediately, and there's even a way to make them persist after a reboot. By using sysctl judiciously, you can optimize your box without having to recompile your kernel, and get the results immediately.

To start getting a taste of what sysctl can modify, run sysctl -a and you will see all the possible parameters. The list can be quite long: in my current box there are 712 possible settings.

$ sysctl -a kernel.panic = 0 kernel.core_uses_pid = 0 kernel.core_pattern = core kernel.tainted = 129 ...many lines snipped...

If you want to get the value of just a single variable, use something like sysctl vm.swappiness, or just sysctl vm to list all variables that start with "vm." Add the -n option to output just the variable values, without the names; -N has the opposite effect, and produces the names but not the values.

You can change any variable by using the -w option with the syntax sysctl -w variable=value. For example, sysctl -w net.ipv6.conf.all.forwarding=1 sets the corresponding variable to true (0 equals "no" or "false"; 1 means "yes" or "true") thus allowing IP6 forwarding. You may not even need the -w option -- it seems to be deprecated. Do some experimenting on your own to confirm that.

For more information, run man sysctl to display the standard documentation.

sysctl and the /proc directory

The /proc/sys virtual directory also provides an interface to the sysctl parameters, allowing you to examine and change them. For example, the /proc/sys/vm/swappiness file is equivalent to the vm.swappiness parameter in sysctl.conf; just forget the initial "/proc/sys/" part, substitute dots for the slashes, and you get the corresponding sysctl parameter. (By the way, the substitution is not actually required; slashes are also accepted, though it seems everybody goes for the notation with the dots instead.) Thus, echo 10 >/proc/sys/vm/swappiness is exactly the same as sysctl -w vm.swappiness=10. But as a rule of thumb, if a /proc/sys file is read-only, you cannot set it with sysctl either.

sysctl values are loaded at boot time from the /etc/sysctl.conf file. This file can have blank lines, comments (lines starting either with a "#" character or a semicolon), and lines in the "variable=value" format. For example, my own sysctl.conf file is listed below. If you want to apply it at any time, you can do so with the command sysctl -p.

# Disable response to broadcasts. net.ipv4.icmp_echo_ignore_broadcasts = 1 # enable route verification on all interfaces net.ipv4.conf.all.rp_filter = 1 # enable ipV6 forwarding net.ipv6.conf.all.forwarding = 1 # increase the number of possible inotify(7) watches fs.inotify.max_user_watches = 65536

Getting somewhere?

With so many tunable parameters, how do you decide what to do? Alas, this is a sore point with sysctl: most of the relevant documentation is hidden in the many source files of the Linux kernel, and isn't easily available, and it doesn't help that the explanations given are sometime arcane and difficult to understand. You may find something in the /usr/src/linux/Documentation/sysctl directory, but most (if not all) files there refer to kernel 2.2, and seemingly haven't been updated in the last several years.

Looking around for books on the subject probably won't help much. I found hack #71 in O'Reilly's Linux Server Hacks, Volume 2, from 2005, but that was about it. Several other books include references to sysctl, but as to specific parameters or hints, you are on your own.

As an experiment, I tried looking for information on the swappiness parameter, which can optimize virtual memory management. The /usr/src/Linux/Documentation/sysctl/vm.txt file didn't even refer to it, probably because this parameter appeared around version 2.6 of the kernel. Doing a general search in the complete /usr/src/linux directory turned up five files that mention "swappiness": three "include" (.h) files in include/linux, plus kernel/sysctl.c and mm/vmscan.c. The latter file included the information:

/* * From 0 .. 100. Higher means more swappy. */ int vm_swappiness = 60;

That was it! You can see the default value (60) and a minimal reference to the field meaning. How helpful is that?

My suggestion would be to use sysctl -a to learn the available parameters, then Google around for extra help. You may find, say, an example of changing the shared memory allocation to solve a video program problem, or an explanation on vm.swappiness, or even more suggestions for optimizing IP4 network traffic.

sysctl shows yet another aspect of the great flexibility of Linux systems. While documentation for it is not widely available, learning its features and capabilities on your own can help you get even more performance out of your box. That's system administration at its highest (or lowest?) level.

Read in the original layout at:


LPI exam 201 prep, Topic 201: Linux kernel

In this tutorial, David Mertz begins preparing you to take the Linux Professional Institute Intermediate Level Administration (LPIC-2) Exam 201. In this first of a series of eight tutorials, you will learn to understand, compile, and customize a Linux kernel.

Linux Kernel Compiling - Intel® Software Network

Linux* kernel compilation presents a workload that represents a common software development task, and is included in standard benchmark suites by trade publications to test CPU and system performance.

The purpose of this document is two-fold: to demonstrate parallel build of the Linux kernel; and to evaluate the Intel® Extended Memory 64 Technology (Intel EM64T) performance benefit on the Intel processors. This study is based on 3.6 GHz Intel Xeon® processor with Intel EM64T.

Intel EM64T is an enhancement to Intel IA-32 architecture. An IA-32 processor equipped with this technology is compatible with the existing IA-32 software. This enables the software to access more memory address space, and allows for the co-existence of software written for the 32-bit linear address space with software capable of accessing the 64-bit linear address space.

A minor configuration change on the Intel EM64T platforms, enabling Hyper-Threading Technology (HT Technology) and building the Linux kernel in multistream mode (by adding a single parameter to the build process), delivers significant performance benefit over the default configuration and build process. Several key results indicate a performance benefit with HT Technology turned on, and from Intel EM64T.

Linux kernel 2.6.4*, which is freely available, is evaluated in this study. Red Hat EL 3.0 distribution is used on all hosts. All Intel platforms considered in this study are enabled with the HT Technology and include DP 3.6GHz Nocona, and 3.2GHz Intel Xeon platforms.

Following are the key objectives of this paper:

The Process Model of Linux Application Development

One of Unix's hallmarks is its process model. It is the key to understanding access rights, the relationships among open files, signals, job control, and most other low-level topics in this book. Linux adopted most of Unix's process model and added new ideas of its own to allow a truly lightweight threads implementation.

10.1 Defining a Process

What exactly is a process? In the original Unix implementations, a process was any executing program. For each program, the kernel kept track of

A process was also the basic scheduling unit for the operating system. Only processes were allowed to run on the CPU.

10.1.1 Complicating Things with Threads

Although the definition of a process may seem obvious, the concept of threads makes all of this less clear-cut. A thread allows a single program to run in multiple places at the same time. All the threads created (or spun off) by a single program share most of the characteristics that differentiate processes from each other. For example, multiple threads that originate from the same program share information on open files, credentials, current directory, and memory image. As soon as one of the threads modifies a global variable, all the threads see the new value rather than the old one.

Many Unix implementations (including AT&T's canonical System V release) were redesigned to make threads the fundamental scheduling unit for the kernel, and a process became a collection of threads that shared resources. As so many resources were shared among threads, the kernel could switch between threads in the same process more quickly than it could perform a full context switch between processes. This resulted in most Unix kernels having a two-tiered process model that differentiates between threads and processes.

10.1.2 The Linux Approach

Linux took another route, however. Linux context switches had always been extremely fast (on the same order of magnitude as the new "thread switches" introduced in the two-tiered approach), suggesting to the kernel developers that rather than change the scheduling approach Linux uses, they should allow processes to share resources more liberally.

Under Linux, a process is defined solely as a scheduling entity and the only thing unique to a process is its current execution context. It does not imply anything about shared resources, because a process creating a new child process has full control over which resources the two processes share (see the clone() system call described on page 153 for details on this). This model allows the traditional Unix process management approach to be retained while allowing a traditional thread interface to be built outside the kernel.

Luckily, the differences between the Linux process model and the two-tiered approach surface only rarely. In this book, we use the term process to refer to a set of (normally one) scheduling entities which share fundamental resources, and a thread is each of those individual scheduling entities. When a process consists of a single thread, we often use the terms interchangeably. To keep things simple, most of this chapter ignores threads completely. Toward the end, we discuss the clone() system call, which is used to create threads (and can also create normal processes).

Sys Admin Magazine DTrace -- Most Exposing Solaris Tool Ever Peter Baer Galvin

DTrace is a powerful new tool that's part of the Solaris 10 release and is available in pre-release via the Software Express for Solaris mechanism discussed in the April 2004 Solaris Companion. Because it is unique, DTrace is a bit difficult to describe. In this column, I'll summarize the features of DTrace, but I'll leave it to the Solaris kernel engineers who wrote DTrace to explore it with me in a series of questions and answers. I think that by the time you are finished hearing the engineers talk about DTrace, and once you experience it yourself, you'll agree with me that it's a brilliant piece of work that adds greatly to the ability to understand the workings of Solaris.

Interview with Solaris Kernel Engineer Andy Tucker -

1. Why have the other commercial Unixes all pretty much bitten the dust? Is Solaris that much better, or is it just more important to Sun than HP-UX was to HP, AIX to IBM or IRIX to SGI?

 Andy Tucker: I think the most important thing Sun has done to ensure the success of Solaris is simply to remain committed to it. Even in the early days of Solaris, when most Sun customers were still running SunOS 4.x and other companies with UNIX implementations were starting to look at NT, Sun stayed focused on Solaris.

You can also look at some of the "big bets" that were made early in Solaris development. One of the most significant was that of designing in support for multithreading and multiprocessing from the ground up. Doing this work up front allowed Solaris to easily scale on large multiprocessors, and to handle the multithreaded workloads that are increasingly common.

2. Do you think that the proprietary, company-supported development effort that you're a part in has any specific benefits over the Linux kernel's Linus-and-his-henchmen method?

Andy Tucker: The main advantage Sun has is that we can make sure our efforts are well integrated and are focused on the needs of Sun's customers. There's a lot of great stuff available for Linux, but the decentralized development model means that someone who's looking for, say, both a fair-share CPU scheduler and network QoS support has to pull the pieces out of different places, build them into a kernel, and hope they work together. Solaris has these as built-in, integrated components that just need to be switched on.

3. Technically-speaking, what do you think of the Linux kernel and the Mach kernel? Also, how FreeBSD 5.x compares to Solaris?

Andy Tucker: I think they're all fine operating systems, each with their strengths and weaknesses. Mach broke a lot of new ground: it was the first microkernel OS to get widespread use and introduced some basic concepts (such as processor sets) that we've since borrowed in Solaris. Linux obviously has a huge developer base, and as a result there's a tremendous amount of activity and energy around it. FreeBSD (and the other *BSD implementations) are inheritors of the BSD legacy and have been the source of a lot of interesting ideas.

I don't really like to do head-to-head comparisons, since I like to think of OS development as a collaborative exercise. We're all working to improve the state of the art and to make life easier for our users. The open source operating systems are often a source of new and interesting ideas; I hope the developers of those operating systems see Solaris similarly.

4. Solaris has some very complex algorithms. STREAMS, page coloring, and multi-level scheduling are all more complex than what is usually implemented in UNIX kernels. In retrospect, which Solaris features have really paid off, despite their complexity, and which ones have not?

 Andy Tucker: I'll note that most modern operating systems incorporate some sort of page coloring and multi-level scheduling algorithms; Solaris is hardly unique in this regard. I think that in most cases the significant work we've done has paid off; the complexity (if any) is usually required to meet the customer requirements. We're also happy to rewrite things if we find a better or simpler way to do something.

On the other hand, there are obviously some features that haven't really succeeded in the customer base, such as NIS+. And there are also some cases where we took a direction with the underlying technology that turned out to be a mistake. An example is the two-level thread scheduling model, where thread scheduling happens both at user level and in the kernel. Although this approach had some theoretical advantages in terms of thread creation and context switch time, it turned out to be enormously complicated, particularly when dealing with traditional Unix process semantics like signals. In Solaris 8, we made an "alternate" version of the threads library available that relied solely on kernel-based scheduling; it turned out to be not only much simpler and easier to maintain, but also faster in almost every case. It particularly sped up Java code, which is obviously important to us. In Solaris 9 (and later) we switched over to the single-level library as the only one available.

5. What do you think about the Cathedral vs Bazaar idea when applied to OS kernels, where the programming model is rather different than than of regular application programs?

Andy Tucker: In some ways the Cathedral vs. Bazaar distinction seems a bit artificial. I don't know about other OS companies, but within Sun we have hundreds of engineers from all over the company working on different parts of the operating system. Many of these people aren't actually part of the Solaris engineering organization; they work on different hardware platforms, or on storage devices, or in the research labs, or on some other product that touches on Solaris in some way. We continuously release the latest code for internal use throughout the development cycle, and do beta tests to get feedback from customers. So in a way we're doing "Bazaar" style development, even though it's commercial product and all developers are Sun employees.

The difficulty with this type of development, particularly on a large complex piece of software like an OS kernel, is ensuring that changes are architecturally consistent, well integrated, and of appropriate quality. This doesn't mean there can't be a large development community, it just means there needs to be some person or persons that are checking proposed changes to make sure they're not going to cause a problem. In Linux, this role is filled by Linus and some of the other folks working with him, who review the changes going into the official kernel base. Within Sun, we have groups of senior engineers who similarly review proposed changes for quality, appropriateness, completeness, etc..

Solaris Kernel Tuning

sysdef -i reports on several system resource limits. Other parameters can be checked on a running system using adb -k :

adb -k /dev/ksyms /dev/mem
(to exit)

More information on kernel tuning is available in Sun's online documentation.


The maxusers kernel parameter is the one most often tuned. By default, it is set to the number of Mb of physical memory or 1024, whichever is lower. It cannot be set higher than 2048.

Several kernel parameters are set when maxusers is set unless otherwise overridden by the /etc/system file. Some of these formulas differ between different versions of Solaris:


Solaris 8 dynamically sizes the number of ptys available to a system, so you are less likely to run into pty starvation than was the case under Solaris 2.5.1-7. There are still hard system limits that are set based upon hardware configuration, and it may be necessary to increase the number of ptys manually as in Solaris 2.5.1-7.

If the system is suffering from pty starvation, the number of ptys available can be increased by increasing pt_cnt above the default of 48. Solaris 2.5.1 and 2.6 systems should not have pt_cnt set higher than 3844 due to limitations with the telnet and rlogin daemons. Solaris 7 does not have this restriction, but there may be other system issues that prevent setting pt_cnt arbitrarily high. Once pt_cnt is increased, a reconfiguration boot (boot -r) is required to build the ptys.

If pt_cnt is increased, some sources recommend that other variables be set at the same time. Other sources (such as the Solaris2 FAQ) suggest that this advice is spurious and results in a needless consumption of resources. See the notes below before making any of these changes; setting the values too high may result in wasted memory. In any case, one form of these recommendations is:

npty limits the number of BSD ptys. These are not usually used by applications, but may need to be increased on a system running a special service. In addition to setting npty in the /etc/system file, the /etc/iu.ap file will need to be edited to substitute the value npty-1 in the third field of the ptsl line. After both changes are made, a boot -r is required for the changes to take effect. Note that Solaris does not support any more than 176 BSD ptys in any case. sadcnt sets the number of STREAMS addressable devices and nautopush sets the number of STREAMS autopush entries. nautopush should be set to twice sadcnt. Whether or not these values need to be increased as above depends on the types of activity on the system.

RAM Tuneables

See the Memory/Swapping page for a discussion of parameters related to RAM and paging.

Disk I/O Tuneables

See the Disk I/O page for a full discussion of disk I/O-related tuneables.

IPC Tuneables

Check the IPC Tuning page for InterProcess Communication-related resource parameters.

File Descriptors

See the File Descriptors page for more discussion regarding tuning issues.

File descriptors are retired when the file is closed or the process terminates. Opens always choose the lowest-numbered file descriptor available. Available file descriptors are allocated as follows:

Misc Tuneables

A quick guide for repairing your kernel from a live CD

quick guide for repairing your kernel from a live CD  Posted by special contributor Ben Hughes on 2004-10-05 19:16:46 UTC

GNU/Linux, and all other operating systems, are based around a kernel which controls hardware access and maximizes CPU and RAM efficiency by controlling when and how much programs get to use. The difference between Linux and most other operating systems (closed source ones at least BSD and other open source OS's you can do this with) is that you can compile the kernel to meet your needs.

Step 1. Basics of the kernel.

I will most likely never have to use an old serial modem or something, so i would not compile in the drivers for it. Also, Linux supports modules, which are drivers that don't load until you tell them to. Modules can be useful for things that you don't use much, like I don't use ReiserFS personally but if my friend who does needs me to retrieve data from a hard drive, I don't want to have to recompile my kernel to help, instead i just type modprobe reiserfs . Compiling a kernel in Linux is fairly easy, if you know basically what you are doing, that is what this article hopes to explain.

If you have a working system and just want a kernel to improve performance, get you up to date, or for bragging rights, go down to Step 3

If you f00barred your system and need to install a new kernel from a live cd, keep on reading.

Step 2. Chrooting from Knoppix

Okay, this step is very easy it involves opening a konsole and typing as root

mount /dev/ -rw /mnt/linux

mount /dev/ -rw /mnt/linux/

chroot /mnt/linux

Well, that basically concludes that step. Basically you just mount all your required linux partitions. (Yes you have to know what those are, if you feel like you are going to b0rk your install soon and still have normal access to the computer just print out your /etc/fstab) Then, you simply chroot into it.

Step 3. Configuring and Compiling the Kernel

Configuring the kernel is the hardest part of this. Before going into this know your hardware. That said download the sources for the latest kernel version from or if you are using Gentoo (if you are you should have read the manual but anyway...) emerge the version of kernel sources you want (such as gentoo-dev-sources gentoo-gaming-sources or whatever). Once they are downloaded decompress and untar them to /usr/source and then create a linux symlink.

tar -xvjf .bz2 -C /usr/src

cd /usr/src

rm linux

ln -s linux

cd linux

Now you are in your kernel source directory, and now its time for the magic to happen type

make menuconfig

This will launch a rather nice interface for configuring the kernel. I will tell you what every system *needs* to function. First off you are going to want to go under file systems and select all the ones you use and under psuedo-filesystems select all of them (NOTE: DO NOT set any of the ones that you use constantly to modules, this will make it so that the computer cannot boot). Now go into processor type and features and select the applicable options. Now its time to explore the device drivers, these are rather important, go crazy here, make sure you include support for your network cards, block devices, sound cards, whatever. Now for the most part it should be done, look through the other categories though to make sure everything is happy. Once you are satisfied with your config, save and exit. Now it is time to actually compile the beast, depending on your system this could take a while, call the pizza guy if you must. Type

make && make modules_install

Now wait for it. While you are waiting lets go over the next step, actually installing the kernel. What you have to do is copy the bzImage into your /boot directory, but you do not have to call it bzImage, you can call it Bob or John or Alice or whatever, I usually just call it gentoo. Okay, the code to install is

cp arch/i386/boot/bzImage /boot/

cp /boot/

cp .config /boot/.config

Once that is done, all you have left to do is edit /etc/lilo.conf (or grub.conf but i don't know much about grub, there is some good information online about it) For LILO simply update lilo.conf (Mine looks like this because I do some fancy things with it)

boot=/dev/sda # Install LILO in the MBR

prompt # Give the user the chance to select another section

timeout=500 # Wait 5 (five) seconds before booting the

default=gentoo # When the timeout has passed, boot the "gentoo" section

install=/boot/boot-bmp.b # means you will use grafical version

bitmap=/boot/handy_128.bmp # background path

bmp-colors=38,68,53,112,38,25 # text color

bmp-table=114p,347p,2,7 # label position on the screen p=pixel

bmp-timer=470p,336p,25,0,11 # timer position on the screen p=pixel

#This is where you put kernel information for linux

image=/boot/gentoo #image name (what you named the bzImage)

label=gentoo # Name we give to this section

read-only # Start with a read-only root. Do not alter!

root=/dev/sda7 # Location of the root filesystem

# The next two lines are only if you dualboot with a Windows system.

# In this case, Windows is hosted on /dev/hda6.



Once that is edited to include the latest information. Simply run as root


then everything should be happy if you did everything right. Now boot into your normal system and see if it works, if it kernel panics try again. This takes a bit of practice but once you understand it, it becomes easy.

About the Author
I am SchleyFox and I use Gentoo GNU/Linux. I go to to get linux help and so should you.

[Mar 22, 2000]LinuxWorld: Customizing the FreeBSD Kernel - FreeBSD for the Linux administrator

 "This step-by-step guide includes a discussion of some of the core differences between the FreeBSD kernel and the Linux kernel; descriptions of the kernel configuration, build processes, and common kernel options; ways you can gather more information; and steps to take if you have trouble."

Recommended Links

Kernel in general


Memory management

TCP stack


Random Findings




Groupthink : Understanding Micromanagers and Control Freaks : Toxic Managers : BureaucraciesHarvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Two Party System as Polyarchy : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy


Skeptical Finance : John Kenneth Galbraith : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Oscar Wilde : Talleyrand : Somerset Maugham : War and Peace : Marcus Aurelius : Eric Hoffer : Kurt Vonnegut : Otto Von Bismarck : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Oscar Wilde : Bernard Shaw : Mark Twain Quotes


Vol 26, No.1 (January, 2013) Object-Oriented Cult : Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks: The efficient markets hypothesis : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law


Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor


The Last but not Least

Copyright © 1996-2014 by Dr. Nikolai Bezroukov. was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: (the fastest) and


The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: July 18, 2014