| News | See also | Recommended Books | Recommended Links | Papers | Tutorials |
| Unix System Calls | Filesystems | History | Humor | Etc |
Unix kernel is a pretty old, probably the third oldest surviving kernel in existence (if we count VMS and VM/CMS as one). The first version of Unix was developed in 1969 by Ken Thompson with strong influence from Multics. After several years of internal development team from Berkeley led by Bill Joy made important contributions. The first Berkeley major contribution was addition in 1978 of virtual memory and on-demand paging, The result is widely known as 3BSD UNIX. This work convinced DARPA to fund Berkeley for the development of a standard Unix system for government use that included networking protocol now known as TCP/IP. The result was 4BSD that was able to communicate uniformly among diverse ser of protocols including LAN (Ethernet and token rings), as well as wide area networks. In 1983 4.2 BSD and in 1986 4.3BSD was released. The quality of those implementations and their free availability was probably one of the most important reason of popularity of networking and rapid grows of Internet. The last Berkeley release was finalized in June of 1993. It included BSD Fast Filesystem (ffs) and NFS (originated by Sun).
That paradoxically coincided with Microsoft self-imposed withdrawal from the Unix scene: in October 1988 Dave Cutler, the architect of VAX/VMS was hired by Microsoft and tasked with the development of new OS which will became world famous Windows NT. After Microsoft withdrawal the main developer and promoter of Unix became Sun Microsystems. It introduced several important enhancement of the OS like /proc filesystem, virtual filesystems layer (required for NFS implementation), RPC and several others.
Commercial part of Unix story from early 80th was dominated by Microsoft which produced XENIX and Sun which produced SunOS (1984). In 1989 the ANSI standard of C language was approved and Unix was ported to this new version of C. Also Microsoft abandoned it Unix distribution and Sun became the dominant player. The fact the Microsoft withdraw from the Unix development provided an opening for Linux a development project originally started in Finland and then moved to the USA. While initially a free software project it became a part of commercial story of Unix due to existence of enterprise distributions like Red Hat and Suse as well as peculiarities' of the license used (GPL), which permitted "brute-force/largest player survives" commercialization. While in essence Linux was a reimplementation of Unix kernel as any reimplementation it helped to polish certain areas and also served as stimulus for established Unix players to upgrade and made more compatible their offering. Linux soon became the lowest common denominator in Unix world.
Kernel provides the following key functions:
They are all provided via system calls.
Those day Unix kernel became complex and stray away from the original design goals. Neither simplicity of orientation of programmers as the main users survives commercial success. Those goals quietly died. Here is an interesting quote from Solaris kernel developer Andy Tucker interview:
The nature of OS research has changed over the years. In the 80's and early 90's, there was a lot of "big systems" research; universities and industry labs would start by building an operating system, and then use that as a platform for investigating new ideas. So CMU had Mach, Berkeley had Sprite, Stanford had the V System, etc.. This meant that there was a lot of re-examination of basic OS constructs --- how to best build an OS from the ground up. As a result we had work on distributed systems, microkernels, etc. --- but the systems were all aimed at supporting the same applications, essentially the ones running on the researchers' desktops.
Now most of the research I see is based on existing OS platforms, usually Linux or one of the *BSDs. The focus is often on improving support for new types of applications --- multimedia, mobility, etc.. So we have fewer people looking at the basic structure of operating systems (with some notable exceptions), but more looking at how to make operating systems perform better from a user's point of view. The use of existing OS platforms also removes some of the barriers to entry for OS research --- universities with small OS groups and budgets can do interesting research without having to build an entirely new operating system.
30 years after UNIX was recoded in C, most people still use C (or in some cases a little bit of C++) for the OS kernel. Is C perfectly adequate, or do they see some of the newer languages (C#,
Java , or even modern C++ paradigms) being applied to OS design?Andy Tucker: There have been various experiments in this area; as an example, Sun has developed operating systems in both C++ (SpringOS) and Java (JavaOS). While object-oriented languages offer a number of advantages in terms of ease of development for higher-level programming abstractions, this doesn't always benefit OS kernels as much as it would user applications. Since the kernel is the piece of software that most directly interacts with the hardware, the benefits of having a simple mapping between the language and machine instructions is often more compelling than ease-of-development features like garbage collection and templates. There are also issues like runtime support requirements that can be extensive, depending on the language. What we often wind up doing instead is taking some of the concepts from object-oriented languages, such as polymorphism, and finding creative ways to implement them in non-OO languages like C.
How do you feel Solaris process management technologies like the Fair Share Scheduler will stack up to the Linux O(1) scheduler. Furthermore, has Sun ever attempted to implement an O(1) scheduler for Solaris and if so, what problems/drawbacks they encountered which kept it out of the released kernels.
Andy Tucker: Solaris has actually had an O(1) scheduler for a number of years. The run queues are also per-CPU to maximize scalability. This isn't a secret, but we haven't talked about the technology itself much; we've been mostly focused on the results.
The "fair-share scheduler" is one of several scheduling policies in Solaris, which control how priorities are assigned to individual processes. This is separate from the scheduler, which handles dispatching processes onto processors in priority order.
The fair-share scheduler allows the allocation of
CPU in the system to be divided among groups of processes according to proportions defined by an administrator. For example, on a system running both a mail server and aweb server, the administrator might decide that if the system is busy, 2/3 of the CPU should go to the mail server, and 1/3 should go the web server. Although in the past the fair-share scheduler was available only as a separate product (Solaris Resource Manager), we decided that it was important enough technology for our customers to bundle in the core operating system.What is the future holds for Solaris 10? What enhancements are in-store in the OS and kernel level? Are there any plans to integrate the Gridengine into Solaris rather than being a separate application?
Andy Tucker: Solaris 10 will have a number of new features that we think are pretty exciting. One is Solaris Zones --- this takes an idea that was initially developed for FreeBSD (jails) and extends it to address the needs of our customers. It allows administrators to divide up a single system into a number of separate application environments, called zones, where processes in one zone are not able to see or interact with those in other zones. This means that multiple applications can run on the same system without conflicting with each other, but the administrator only has to deal with one OS kernel for backups, patches, etc..
We're also looking at ways to improve system reliability and observability. Solaris 10 will include tools that allow tracing not only what's going on at user level, but also what's going on in the kernel. So a developer trying to understand why their application is performing poorly can get information from the whole software stack and get a much better picture of what's really going on. We're also using these tools internally to improve the performance and reliability of Solaris and other Sun software.
|
In this tutorial, David Mertz begins preparing you to take the Linux Professional Institute Intermediate Level Administration (LPIC-2) Exam 201. In this first of a series of eight tutorials, you will learn to understand, compile, and customize a Linux kernel.
Linux* kernel compilation presents a workload that represents a common software development task, and is included in standard benchmark suites by trade publications to test CPU and system performance.
The purpose of this document is two-fold: to demonstrate parallel build of the Linux kernel; and to evaluate the Intel® Extended Memory 64 Technology (Intel EM64T) performance benefit on the Intel processors. This study is based on 3.6 GHz Intel Xeon® processor with Intel EM64T.
Intel EM64T is an enhancement to Intel IA-32 architecture. An IA-32 processor equipped with this technology is compatible with the existing IA-32 software. This enables the software to access more memory address space, and allows for the co-existence of software written for the 32-bit linear address space with software capable of accessing the 64-bit linear address space.
A minor configuration change on the Intel EM64T platforms, enabling Hyper-Threading Technology (HT Technology) and building the Linux kernel in multistream mode (by adding a single parameter to the build process), delivers significant performance benefit over the default configuration and build process. Several key results indicate a performance benefit with HT Technology turned on, and from Intel EM64T.
Linux kernel 2.6.4*, which is freely available, is evaluated in this study. Red Hat EL 3.0 distribution is used on all hosts. All Intel platforms considered in this study are enabled with the HT Technology and include DP 3.6GHz Nocona, and 3.2GHz Intel Xeon platforms.
Following are the key objectives of this paper:
- To evaluate the HT Technology benefit with Intel processors for multistream Linux kernel build.
- To review Linux kernel build performance on Intel processors with Intel EM64T.
One of Unix's hallmarks is its process model. It is the key to understanding access rights, the relationships among open files, signals, job control, and most other low-level topics in this book. Linux adopted most of Unix's process model and added new ideas of its own to allow a truly lightweight threads implementation.
10.1 Defining a Process
What exactly is a process? In the original Unix implementations, a process was any executing program. For each program, the kernel kept track of
- The current location of execution (such as waiting for a system call to return from the kernel), often called the program's context
- Which files the program had access to
- The program's credentials (which user and group owned the process, for example)
- The program's current directory
- Which memory space the program had access to and how it was laid out
A process was also the basic scheduling unit for the operating system. Only processes were allowed to run on the CPU.
10.1.1 Complicating Things with Threads
Although the definition of a process may seem obvious, the concept of threads makes all of this less clear-cut. A thread allows a single program to run in multiple places at the same time. All the threads created (or spun off) by a single program share most of the characteristics that differentiate processes from each other. For example, multiple threads that originate from the same program share information on open files, credentials, current directory, and memory image. As soon as one of the threads modifies a global variable, all the threads see the new value rather than the old one.
Many Unix implementations (including AT&T's canonical System V release) were redesigned to make threads the fundamental scheduling unit for the kernel, and a process became a collection of threads that shared resources. As so many resources were shared among threads, the kernel could switch between threads in the same process more quickly than it could perform a full context switch between processes. This resulted in most Unix kernels having a two-tiered process model that differentiates between threads and processes.
10.1.2 The Linux Approach
Linux took another route, however. Linux context switches had always been extremely fast (on the same order of magnitude as the new "thread switches" introduced in the two-tiered approach), suggesting to the kernel developers that rather than change the scheduling approach Linux uses, they should allow processes to share resources more liberally.
Under Linux, a process is defined solely as a scheduling entity and the only thing unique to a process is its current execution context. It does not imply anything about shared resources, because a process creating a new child process has full control over which resources the two processes share (see the clone() system call described on page 153 for details on this). This model allows the traditional Unix process management approach to be retained while allowing a traditional thread interface to be built outside the kernel.
Luckily, the differences between the Linux process model and the two-tiered approach surface only rarely. In this book, we use the term process to refer to a set of (normally one) scheduling entities which share fundamental resources, and a thread is each of those individual scheduling entities. When a process consists of a single thread, we often use the terms interchangeably. To keep things simple, most of this chapter ignores threads completely. Toward the end, we discuss the clone() system call, which is used to create threads (and can also create normal processes).
DTrace is a powerful new tool that's part of the Solaris 10 release and is available in pre-release via the Software Express for Solaris mechanism discussed in the April 2004 Solaris Companion. Because it is unique, DTrace is a bit difficult to describe. In this column, I'll summarize the features of DTrace, but I'll leave it to the Solaris kernel engineers who wrote DTrace to explore it with me in a series of questions and answers. I think that by the time you are finished hearing the engineers talk about DTrace, and once you experience it yourself, you'll agree with me that it's a brilliant piece of work that adds greatly to the ability to understand the workings of Solaris.
1. Why have the other commercial Unixes all pretty much bitten the dust? Is Solaris that much better, or is it just more important to Sun than HP-UX was to HP, AIX to IBM or IRIX to SGI?
Andy Tucker: I think the most important thing Sun has done to ensure the success of Solaris is simply to remain committed to it. Even in the early days of Solaris, when most Sun customers were still running SunOS 4.x and other companies with UNIX implementations were starting to look at NT, Sun stayed focused on Solaris.
You can also look at some of the "big bets" that were made early in Solaris development. One of the most significant was that of designing in support for multithreading and multiprocessing from the ground up. Doing this work up front allowed Solaris to easily scale on large multiprocessors, and to handle the multithreaded workloads that are increasingly common.
2. Do you think that the proprietary, company-supported development effort that you're a part in has any specific benefits over the Linux kernel's Linus-and-his-henchmen method?
Andy Tucker: The main advantage Sun has is that we can make sure our efforts are well integrated and are focused on the needs of Sun's customers. There's a lot of great stuff available for Linux, but the decentralized development model means that someone who's looking for, say, both a fair-share
CPU scheduler and network QoS support has to pull the pieces out of different places, build them into a kernel, and hope they work together. Solaris has these as built-in, integrated components that just need to be switched on.3. Technically-speaking, what do you think of the Linux kernel and the Mach kernel? Also, how FreeBSD 5.x compares to Solaris?
Andy Tucker: I think they're all fine operating systems, each with their strengths and weaknesses. Mach broke a lot of new ground: it was the first microkernel OS to get widespread use and introduced some basic concepts (such as
processor sets) that we've since borrowed in Solaris. Linux obviously has a huge developer base, and as a result there's a tremendous amount of activity and energy around it. FreeBSD (and the other *BSD implementations) are inheritors of the BSD legacy and have been the source of a lot of interesting ideas.I don't really like to do head-to-head comparisons, since I like to think of OS development as a collaborative exercise. We're all working to improve the state of the art and to make life easier for our users. The open source operating systems are often a source of new and interesting ideas; I hope the developers of those operating systems see Solaris similarly.
4. Solaris has some very complex algorithms. STREAMS, page coloring, and multi-level scheduling are all more complex than what is usually implemented in UNIX kernels. In retrospect, which Solaris features have really paid off, despite their complexity, and which ones have not?
Andy Tucker: I'll note that most modern operating systems incorporate some sort of page coloring and multi-level scheduling algorithms; Solaris is hardly unique in this regard. I think that in most cases the significant work we've done has paid off; the complexity (if any) is usually required to meet the customer requirements. We're also happy to rewrite things if we find a better or simpler way to do something.
On the other hand, there are obviously some features that haven't really succeeded in the customer base, such as NIS+. And there are also some cases where we took a direction with the underlying technology that turned out to be a mistake. An example is the two-level thread scheduling model, where thread scheduling happens both at user level and in the kernel. Although this approach had some theoretical advantages in terms of thread creation and context switch time, it turned out to be enormously complicated, particularly when dealing with traditional Unix process semantics like signals. In Solaris 8, we made an "alternate" version of the threads library available that relied solely on kernel-based scheduling; it turned out to be not only much simpler and easier to maintain, but also faster in almost every case. It particularly sped up
Java code, which is obviously important to us. In Solaris 9 (and later) we switched over to the single-level library as the only one available.5. What do you think about the Cathedral vs Bazaar idea when applied to OS kernels, where the
programming model is rather different than than of regular application programs?Andy Tucker: In some ways the Cathedral vs. Bazaar distinction seems a bit artificial. I don't know about other OS companies, but within Sun we have hundreds of engineers from all over the company working on different parts of the operating system. Many of these people aren't actually part of the Solaris engineering organization; they work on different hardware platforms, or on storage devices, or in the research labs, or on some other product that touches on Solaris in some way. We continuously release the latest code for internal use throughout the development cycle, and do beta tests to get feedback from customers. So in a way we're doing "Bazaar" style development, even though it's commercial product and all developers are Sun employees.
The difficulty with this type of development, particularly on a large complex piece of software like an OS kernel, is ensuring that changes are architecturally consistent, well integrated, and of appropriate quality. This doesn't mean there can't be a large development community, it just means there needs to be some person or persons that are checking proposed changes to make sure they're not going to cause a problem. In Linux, this role is filled by Linus and some of the other folks working with him, who review the changes going into the official kernel base. Within Sun, we have groups of senior engineers who similarly review proposed changes for quality, appropriateness, completeness, etc..
sysdef -ireports on several system resource limits. Other parameters can be checked on a running system usingadb -k:
adb -k /dev/ksyms /dev/mem(to exit)
parameter-name/D
^DMore information on kernel tuning is available in Sun's online documentation.
maxusers
The
maxuserskernel parameter is the one most often tuned. By default, it is set to the number of Mb of physical memory or 1024, whichever is lower. It cannot be set higher than 2048.Several kernel parameters are set when
maxusersis set unless otherwise overridden by the/etc/systemfile. Some of these formulas differ between different versions of Solaris:
- max_nprocs: Number of processes =
10 + (16 x maxusers)- ufs_ninode: Inode cache size =
(17xmaxusers)+90(Solaris 2.5.1) or4x(maxusers + max_nprocs)+320(Solaris 2.6-8). See the Disk I/O page for more information.- ncsize: Name lookup cache size =
(17xmaxusers)+90(Solaris 2.5.1) or4x(maxusers + max_nprocs)+320(Solaris 2.6-8). See the Disk I/O page for more information.- ndquot: Quota table size =
(maxusers x 10) + max_nprocs- maxuproc: User process limit =
max_nprocs - 5ptys
Solaris 8 dynamically sizes the number of ptys available to a system, so you are less likely to run into pty starvation than was the case under Solaris 2.5.1-7. There are still hard system limits that are set based upon hardware configuration, and it may be necessary to increase the number of ptys manually as in Solaris 2.5.1-7.
If the system is suffering from pty starvation, the number of ptys available can be increased by increasing
pt_cntabove the default of 48. Solaris 2.5.1 and 2.6 systems should not havept_cntset higher than 3844 due to limitations with the telnet and rlogin daemons. Solaris 7 does not have this restriction, but there may be other system issues that prevent settingpt_cntarbitrarily high. Oncept_cntis increased, a reconfiguration boot (boot -r) is required to build the ptys.If
pt_cntis increased, some sources recommend that other variables be set at the same time. Other sources (such as the Solaris2 FAQ) suggest that this advice is spurious and results in a needless consumption of resources. See the notes below before making any of these changes; setting the values too high may result in wasted memory. In any case, one form of these recommendations is:
- npty: Set to
pt_cnt(see the note below)- nautopush: Set to twice the value of
pt_cnt- sadcnt: Set to same value as
pt_cnt
nptylimits the number of BSD ptys. These are not usually used by applications, but may need to be increased on a system running a special service. In addition to settingnptyin the/etc/systemfile, the/etc/iu.apfile will need to be edited to substitute the valuenpty-1in the third field of theptslline. After both changes are made, aboot -ris required for the changes to take effect. Note that Solaris does not support any more than 176 BSD ptys in any case.sadcntsets the number of STREAMS addressable devices andnautopushsets the number of STREAMS autopush entries.nautopushshould be set to twicesadcnt. Whether or not these values need to be increased as above depends on the types of activity on the system.RAM Tuneables
See the Memory/Swapping page for a discussion of parameters related to RAM and paging.
Disk I/O Tuneables
See the Disk I/O page for a full discussion of disk I/O-related tuneables.
IPC Tuneables
Check the IPC Tuning page for InterProcess Communication-related resource parameters.
File Descriptors
See the File Descriptors page for more discussion regarding tuning issues.
File descriptors are retired when the file is closed or the process terminates. Opens always choose the lowest-numbered file descriptor available. Available file descriptors are allocated as follows:
- rlim_fd_cur: It is dangerous to set this value higher than 256 due to limitations with the
stdiolibrary. If programs require more file descriptors, they should usesetrlimitdirectly.- rlim_fd_max: It is dangerous to set this value higher than 1024 due to limitations with
select. If programs require more file descriptors, they should usesetrlimitdirectly.Misc Tuneables
- dump_cnt: Size of dumps.
- rstchown: Posix/restricted chown enabled (default=1)
- ngroups_max: Maximum number of supplementary groups per user (default=32).
quick guide for repairing your kernel from a live CD
Posted by special contributor Ben Hughes on 2004-10-05 19:16:46 UTC GNU/Linux, and all other operating systems, are based around a kernel which controls hardware access and maximizes CPU and RAM efficiency by controlling when and how much programs get to use. The difference between Linux and most other operating systems (closed source ones at least BSD and other open source OS's you can do this with) is that you can compile the kernel to meet your needs. >Step 1. Basics of the kernel. I will most likely never have to use an old
serial modem or something, so i would not compile in the drivers for it. Also, Linux supports modules, which are drivers that don't load until you tell them to. Modules can be useful for things that you don't use much, like I don't use ReiserFS personally but if my friend who does needs me to retrieve data from ahard drive , I don't want to have to recompile my kernel to help, instead i just type modprobe reiserfs . Compiling a kernel in Linux is fairly easy, if you know basically what you are doing, that is what this article hopes to explain.
If you have a working system and just want a kernel to improve performance, get you up to date, or for bragging rights, go down to Step 3
If you f00barred your system and need to install a new kernel from a live cd, keep on reading.
Step 2. Chrooting from Knoppix
Okay, this step is very easy it involves opening a konsole and typing as root
mount /dev/
-rw /mnt/linux mount /dev/
-rw /mnt/linux/ chroot /mnt/linux
Well, that basically concludes that step. Basically you just mount all your required linux partitions. (Yes you have to know what those are, if you feel like you are going to b0rk your install soon and still have normal access to the
computer just print out your /etc/fstab) Then, you simply chroot into it.Step 3. Configuring and Compiling the Kernel
Configuring the kernel is the hardest part of this. Before going into this know your hardware. That said download the sources for the latest kernel version from www.kernel.org or if you are using Gentoo (if you are you should have read the manual but anyway...) emerge the version of kernel sources you want (such as gentoo-dev-sources gentoo-gaming-sources or whatever). Once they are downloaded decompress and untar them to /usr/source and then create a linux symlink.
tar -xvjf
.bz2 -C /usr/src cd /usr/src
rm linux
ln -s
linux cd linux
Now you are in your kernel source directory, and now its time for the magic to happen type
make menuconfig
This will launch a rather nice interface for configuring the kernel. I will tell you what every system *needs* to function. First off you are going to want to go under file systems and select all the ones you use and under psuedo-filesystems select all of them (NOTE: DO NOT set any of the ones that you use constantly to modules, this will make it so that the computer cannot boot). Now go into
processor type and features and select the applicable options. Now its time to explore the device drivers, these are rather important, go crazy here, make sure you include support for your network cards, block devices, sound cards, whatever. Now for the most part it should be done, look through the other categories though to make sure everything is happy. Once you are satisfied with your config, save and exit. Now it is time to actually compile the beast, depending on your system this could take a while, call the pizza guy if you must. Typemake && make modules_install
Now wait for it. While you are waiting lets go over the next step, actually installing the kernel. What you have to do is copy the bzImage into your /boot directory, but you do not have to call it bzImage, you can call it Bob or John or Alice or whatever, I usually just call it gentoo. Okay, the code to install is
cp arch/i386/boot/bzImage /boot/
cp System.map /boot/System.map
cp .config /boot/.config
Once that is done, all you have left to do is edit /etc/lilo.conf (or grub.conf but i don't know much about grub, there is some good information online about it) For LILO simply update lilo.conf (Mine looks like this because I do some fancy things with it)
boot=/dev/sda # Install LILO in the MBR
prompt # Give the user the chance to select another section
timeout=500 # Wait 5 (five) seconds before booting the
default=gentoo # When the timeout has passed, boot the "gentoo" section
install=/boot/boot-bmp.b # means you will use grafical version
bitmap=/boot/handy_128.bmp # background path
bmp-colors=38,68,53,112,38,25 # text color
bmp-table=114p,347p,2,7 # label position on the screen p=pixel
bmp-timer=470p,336p,25,0,11 # timer position on the screen p=pixel
#This is where you put kernel information for linux
image=/boot/gentoo #image name (what you named the bzImage)
label=gentoo # Name we give to this section
read-only # Start with a read-only root. Do not alter!
root=/dev/sda7 # Location of the root filesystem
# The next two lines are only if you dualboot with a Windows system.
# In this case, Windows is hosted on /dev/hda6.
other=/dev/sda1
label=windows
Once that is edited to include the latest information. Simply run as root
lilo
then everything should be happy if you did everything right. Now boot into your normal system and see if it works, if it kernel panics try again. This takes a bit of practice but once you understand it, it becomes easy.
About the Author
I am SchleyFox and I use Gentoo GNU/Linux. I go to www.usalug.org to get linux help and so should you.[Mar 22, 2000]LinuxWorld: Customizing the FreeBSD Kernel - FreeBSD for the Linux administrator
"This step-by-step guide includes a discussion of some of the core differences between the FreeBSD kernel and the Linux kernel; descriptions of the kernel configuration, build processes, and common kernel options; ways you can gather more information; and steps to take if you have trouble."
Recommended Links
Linux kernel - Wikipedia, the free encyclopedia
The Linux Kernel Hackers' Guide
In case of broken links please try to use Google search. If you find the page please notify us about new location