Simply type yum install iotop.
|
Softpanorama |
May the source be with you, but remember the KISS principle ;-)
Softpanorama Search
|
| Commercial Linuxes | Recommended Links | Performance Monitoring | sar | |||
| uptime command | free | top | ps | pmap | ptree | lsof |
| mostat | vmstat | iostat | procstat | sar | nfsstat | |
| tcpdump | iptraf | netstat | ||||
| Disk subsystem tuning | Linux Kernel Tuning | Linux Virtual Memory Subsystem Tuning | TCP performance tuning | NFS performance tuning | strace | |
| Linux performance bottlenecks | VMware | Virtualization | Humor | Etc |
The best performance specialists are good at what they do for two basic reasons:
One of the simplest performance monitoring packages for linux is Sysstat which includes sar.
Here are major areas for tuning (adapted from Server Oriented System Tuning Info ):
For example:
/dev/rd/c0d0p3 /test ext2 noatime 1 2
The disk i/o elevators is another kernel tuneable that can be tweaked for improved disk i/o in some cases.
For the Adaptec aic7xxx seriers cards (2940's, 7890's, *160's, etc) this can be enabled with a module option like:
aic7xx=tag_info:{{0,0,0,0,}}
This enabled the default tagged command queing on the first device, on the first 4 scsi ids.
options aic7xxxaic7xxx=tag_info:{{24.24.24.24.24.24}}
in /etc/modules.conf will set the TCQ depth to 24
You probably want to check the driver documentation for your particular scsi modules for more info.
Making sure the cards are running in full duplex mode is also very often critical to benchmark performace. Depending on the networking hardware used, some of the cards may not autosense properly and may not run full duplex by default.
Many cards include module options that can be used to force the cards into full duplex mode. Some examples for common cards include
alias eth0 eepro100 options eepro100 full_duplex=1 alias eth1 tulip options tulip full_duplex=1
Though full duplex gives the best overall performance, I've seen some circumstances where setting the cards to half duplex will actually increase thoughput, particulary in cases where the data flow is heavily one sided.
If you think your in a situation where that may help, I would suggest trying it and benchmarking it.
In order to optimize TCP performance for this situation, I would suggest tuning the following parameters.
echo 1024 65000 > /proc/sys/net/ipv4/ip_local_port_rangeAllows more local ports to be available. Generally not a issue, but in a benchmarking scenario you often need more ports available. A common example is clients running `ab` or `http_load` or similar software.
In the case of firewalls, or other servers doing NAT or masquerading, you may not be able to use the full port range this way, because of the need for high ports for use in NAT.
Increasing the amount of memory associated with socket buffers can often improve performance. Things like NFS in particular, or apache setups with large buffer configured can benefit from this.
echo 262143 > /proc/sys/net/core/rmem_max echo 262143 > /proc/sys/net/core/rmem_defaultThis will increase the amount of memory available for socket input queues. The "wmem_*" values do the same for output queues.
Note: With 2.4.x kernels, these values are supposed to "autotune" fairly well, and some people suggest just instead changing the values in:
/proc/sys/net/ipv4/tcp_rmem /proc/sys/net/ipv4/tcp_wmemThere are three values here, "min default max".
These reduce the amount of work the TCP stack has to do, so is often helpful in this situation.
echo 0 > /proc/sys/net/ipv4/tcp_sack echo 0 > /proc/sys/net/ipv4/tcp_timestamps
But the basic tuning steps include:
Try using NFSv3 if you are currently using NFSv2. There can be very significant performance increases with this change.
Increasing the read write block size. This is done with the rsize and wsize mount options. They need to the mount options used by the NFS clients. Values of 4096 and 8192 reportedly increase performance alot. But see the notes in the HOWTO about experimenting and measuring the performance implications. The limits on these are 8192 for NFSv2 and 32768 for NFSv3
Another approach is to increase the number of nfsd threads running. This is normally controlled by the nfsd init script. On Red Hat Linux machines, the value "RPCNFSDCOUNT" in the nfs init script controls this value. The best way to determine if you need this is to experiment. The HOWTO mentions a way to determin thread usage, but that doesnt seem supported in all kernels.
Another good tool for getting some handle on NFS server performance is `nfsstat`. This util reads the info in /proc/net/rpc/nfs[d] and displays it in a somewhat readable format. Some info intended for tuning Solaris, but useful for it's description of the nfsstat format
See also the tcp tuning info
Make sure you starting a ton of initial daemons if you want good benchmark scores.
Something like:
#######
MinSpareServers 20
MaxSpareServers 80
StartServers 32
# this can be higher if apache is recompiled
MaxClients 256
MaxRequestsPerChild 10000
Note: Starting a massive amount of httpd processes is really a benchmark
hack. In most real world cases, setting a high number for max servers, and a
sane spare server setting will be more than adequate. It's just the instant
on load that benchmarks typically generate that the StartServers helps with.
The MaxRequestPerChild should be bumped up if you are sure that your httpd processes do not leak memory. Setting this value to 0 will cause the processes to never reach a limit.
One of the best resources on tuning these values, especially for app servers, is the
mod_perl performance tuning documentation.Bumping the number of available httpd processes
Apache sets a maximum number of possible processes at compile time. It is set to 256 by default, but in this kind of scenario, can often be exceeded.
To change this, you will need to chage the hardcoded limit in the apache source code, and recompile it. An example of the change is below:
--- apache_1.3.6/src/include/httpd.h.prezab Fri Aug 6 20:11:14 1999 +++ apache_1.3.6/src/include/httpd.h Fri Aug 6 20:12:50 1999 @@ -306,7 +306,7 @@ * the overhead. */ #ifndef HARD_SERVER_LIMIT -#define HARD_SERVER_LIMIT 256 +#define HARD_SERVER_LIMIT 4000 #endif /*
To make useage of this many apache's however, you will also need to boost the number of processes support, at least for 2.2 kernels. See the section on kernel process limits for info on increasing this.
The biggest scalability problem with apache, 1.3.x versions at least, is it's model of using one process per connection. In cases where there large amounts of concurent connections, this can require a large amount resources. These resources can include RAM, schedular slots, ability to grab locks, database connections, file descriptors, and others.In cases where each connection takes a long time to complete, this is only compunded. Connections can be slow to complete because of large amounts of cpu or i/o usage in dynamic apps, large files being transfered, or just talking to clients on slow links.
There are several strategies to mitigate this. The basic idea being to free up heavyweight apache processes from having to handle slow to complete connections.
Static Content Servers
For purely static content, some of the other smaller more lightweight web servers can offer very good performance. They arent nearly as powerful or as flexible as apache, but for very specific performance crucial tasks, they can be a big win.
If you need even more ExtremeWebServerPerformance, you probabaly want to take a look at TUX, written by Ingo Molnar. This is the current world record holder for SpecWeb99. It probabaly owns the right to be called the worlds fastest web server.
The easiest approache is probabaly to use mod_proxy and the "ProxyPass" directive to pass content to another server. mod_proxy supports a degree of caching that can offer a significant performance boost. But another advantage is that since the proxy server and the web server are likely to have a very fast interconnect, the web server can quickly serve up large content, freeing up a apache process, why the proxy slowly feeds out the content to clients. This can be further enhanced by increasing the amount of socket buffer memory thats for the kernel. See the section on tcp tuning for info on this.
proxy links
ListenBacklog
The apache ListenBacklog paramater lets you specify what backlog paramater is set to listen(). By default on linux, this can be as high as 128.
Increasing this allows a limited number of httpd's to handle a burst of attempted connections.
There are some experimental patches from SGI that accelerate apache. More info at:
I havent really had a chance to test the SGI patches yet, but I've been told they are pretty effective.
The first one is to rebuild it with mmap support. In cases where you are serving up a large amount of small files, this seems to be particularly useful. You just need to add a "--with-mmap" to the configure line.
You also want to make sure the following options are enabled in the /etc/smb.conf file:
read raw = no read prediction = true level2 oplocks = true
One of the better resources for tuning samba is the "Using Samba" book from O'reily. The
chapter on performance tuning is available online.I use the values:
cachesize 10000 dbcachesize 100000 sizelimit 10000 loglevel 0 dbcacheNoWsync index cn,uid index uidnumber index gid index gidnumber index mail
If you add the following parameters to /etc/openldap/slapd.conf before entering the info into the database, they will all get indexed and performance will increase.
But aside from that, a good set of benchmarking utilities are often very helpful in doing system tuning work. It is impossible to duplicate "real world" situations, but that isnt really the goal of a good benchmark. A good benchmark typically tries to measure the performance of one particular thing very accurately. If you understand what the benchmarks are doing, they can be very useful tools.
Some of the common and useful benchmarks include:
Check Doug Ledford's list of benchmarks for more info on Bonnie. There is also a somwhat newer version of Bonnie called Bonnie++ that fixes a few bugs, and includes a couple of extra tests.
Dbench is available at The Samba ftp site and mirrors
http_load is available from ACME Labs
dkftpbench is available from Dan kegel's page
The tiobench site.
dt does a lot. disk io, process creation, async io, etc.
dt is available at The dt page
Info: http://www.netperf.org/netperf/NetperfPage.html
Download: ftp://ftp.sgi.com/sgi/src/netperf/
Info provided by Bill Hilf.
Info:
http://www.hpl.hp.com/personal/David_Mosberger/httperf.html
Download: ftp://ftp.hpl.hp.com/pub/httperf/
Info provided by Bill Hilf.
Info: http://www.xenoclast.org/autobench/
Download: http://www.xenoclast.org/autobench/downloads/
Info provided by Bill Hilf.
Heres a sample vmstat output on a lightly used desktop:
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 1 0 0 5416 2200 1856 34612 0 1 2 1 140 194 2 1 97
And heres some sample output on a heavily used server:
procs memory swap io system cpu r b w swpd free buff cache si so bi bo in cs us sy id 16 0 0 2360 264400 96672 9400 0 0 0 1 53 24 3 1 96 24 0 0 2360 257284 96672 9400 0 0 0 6 3063 17713 64 36 0 15 0 0 2360 250024 96672 9400 0 0 0 3 3039 16811 66 34 0
The interesting numbers here are the first one, this is the number of the process that are on the run queue. This value shows how many process are ready to be executed, but can not be ran at the moment because other process need to finish. For lightly loaded systems, this is almost never above 1-3, and numbers consistently higher than 10 indicate the machine is getting pounded.
Other interseting values include the "system" numbers for in and cs. The in value is the number of interupts per second a system is getting. A system doing a lot of network or disk I/o will have high values here, as interupts are generated everytime something is read or written to the disk or network.
The cs value is the number of context switches per second. A context switch is when the kernel has to take off of the executable code for a program out of memory, and switch in another. It's actually _way_ more complicated than that, but thats the basic idea. Lots of context swithes are bad, since it takes some fairly large number of cycles to performa a context swithch, so if you are doing lots of them, you are spending all your time chaining jobs and not actually doing any work. I think we can all understand that concept.
netstat
One of the more useful options is:
netstat -pa
The `-p` options tells it to try to determine what program has the socket open, which is often very useful info. For example, someone nmap's their system and wants to know what is using port 666 for example. Running netstat -pa will show you its satand running on that tcp port.
One of the most twisted, but useful invocations is:
netstat -a -n|grep -E "^(tcp)"| cut -c 68-|sort|uniq -c|sort -n
This will show you a sorted list of how many sockets are in each connection state. For example:
9 LISTEN
21 ESTABLISHED
ps
ps -eo pid,%cpu,vsz,args,wchan
Shows every process, their pid, % of cpu, memory size, name, and what syscall they are currently executing. Nifty.
For Suse 11Simply type yum install iotop.
Iotop is licensed under the terms of the GNU GPL.The latest version is Iotop 0.3.2 (NEWS), available here : iotop-0.3.2.tar.bz2 or iotop-0.3.2.tar.gz.Freshmeat project page to stay informed: http://freshmeat.net/projects/iotop.
September 11, 2009 | Levent Serinol's Blog
Linux kernel 2.6.20 and later supports per process I/O accounting. You can access every process/thread's I/O read/write values by using /proc filesystem. You can check if your kernel has built with I/O account by just simply checking /proc/self/io file. If it exists then you have I/O accounting built-in.
$ cat /proc/self/io rchar: 3809 wchar: 0 syscr: 10 syscw: 0 read_bytes: 0 write_bytes: 0 cancelled_write_bytes: 0Field Descriptions: rchar - bytes read wchar - byres written syscr - number of read syscalls syscw - number of write syscalls read_bytes - number of bytes caused by this process to read from underlying storage write_bytes - number of bytes caused by this process to written from underlying storageAs you know, ever process is presented by it's pid number under /proc directory. You can access any process's I/O accounting values by just looking /proc/#pid/io file. There is a utility called iotop which collects these values and shows you in like top utility. You see your processes I/O activity with iotop utility.
AbstractOver the past few years, Linux has made its way into the data centers of many corporations all over the globe. The Linux operating system has become accepted by both the scientific and enterprise user population. Today, Linux is by far the most versatile operating system. You can find Linux on embedded devices such as firewalls and cell phones and mainframes. Naturally, performance of the Linux operating system has become a hot topic for both scientific and enterprise users. However, calculating a global weather forecast and hosting a database impose different requirements on the operating system. Linux has to accommodate all possible usage scenarios with the most optimal performance. The consequence of this challenge is that most Linux distributions contain general tuning parameters to accommodate all users.IBMฎ has embraced Linux, and it is recognized as an operating system suitable for enterprise-level applications running on IBM systems. Most enterprise applications are now available on Linux, including file and print servers, database servers, Web servers, and collaboration and mail servers.
With use of Linux in an enterprise-class server comes the need to monitor performance and, when necessary, tune the server to remove bottlenecks that affect users. This IBM Redpaper describes the methods you can use to tune Linux, tools that you can use to monitor and analyze server performance, and key tuning parameters for specific server applications. The purpose of this redpaper is to understand, analyze, and tune the Linux operating system to yield superior performance for any type of application you plan to run on these systems.
The tuning parameters, benchmark results, and monitoring tools used in our test environment were executed on Red Hat and Novell SUSE Linux kernel 2.6 systems running on IBM System x servers and IBM System z servers. However, the information in this redpaper should be helpful for all Linux hardware platforms.
Update 4/2008: Typos corrected
09.30.2008You've just had your first cup of coffee and have received that dreaded phone call. The system is slow. What are you going to do? This article will discuss performance bottlenecks and optimization in Red Hat Enterprise Linux (RHEL5).
Before getting into any monitoring or tuning specifics, you should always use some kind of tuning methodology. This is one which I've used successfully through the years:
1. Baseline The first thing you must do is establish a baseline, which is a snapshot of how the system appears when it's performing well. This baseline should not only compile data, but also document your system's configuration (RAM, CPU and I/O). This is necessary because you need to know what a well-performing system looks like prior to fixing it.
2. Stress testing and monitoring This is the part where you monitor and stress your systems at peak workloads. It's the monitoring which is key here as you cannot effectively tune anything without some historic trending data.
3. Bottleneck identification This is where you come up with the diagnosis for what is ailing your system. The primary objective of section 2 is to determine the bottleneck. I like to use several monitoring tools here. This allows me to cross-reference my data for accuracy.
4. Tune Only after you've identified the bottleneck can you tune it.
5. Repeat Once you've tuned it, you can start the cycle again but this time start from step 2 (monitoring) as you already have your baseline.
It's important to note that you should only make one change at a time. Otherwise, you'll never know exactly what impacted any changes which might have occurred. It is only by repeating your tests and consistently monitoring your systems that you can determine if your tuning is making an impact.
RHEL monitoring toolsBefore we can begin to improve the performance of our system, we need to use the monitoring tools available to us to baseline. Here are some monitoring tools you should consider using:
OprofileThis tool (made available in RHEL5) utilizes the processor to retrieve kernel system information about system executables. It allows one to collect samples of performance data every time a counter detects an interrupt. I like the tool also because it carries little overhead which is very important because you don't want monitoring tools to be causing system bottlenecks. One important limitation is that the tool is very much geared towards finding problems with CPU limited processes. It does not identify processes which are sleeping or waiting on I/O.
The steps used to start up Oprofile include setting up the profiler, starting it and then dumping the data.
First we'll set up the profile. This option assumes that one wants to monitor the kernel.
# opcontrol --setup vmlinux=/usr/lib/debug/lib/modules/'uname -r'/vmlinuxThen we can start it up.
# opcontrol --startFinally, we'll dump the data.
# opcontrol --stop/--shutdown/--dumpThis tool (introduced in RHEL5) collects data by analyzing the running kernel. It really helps one come up with a correct diagnosis of a performance problem and is tailor-made for developers. SystemTap eliminates the need for the developer to go through the recompile and reinstallation process to collect data.
FryskThis is another tool which was introduced by Red Hat in RHEL5. What does it do for you? It allows both developers and system administrators to monitor running processes and threads. Frysk differs from Oprofile in that it uses 100% reliable information (similar to SystemTap) - not just a sampling of data. It also runs in user mode and does not require kernel modules or elevated privileges. Allowing one to stop or start running threads or processes is also a very useful feature.
Some more general Linux tools include top and vmstat. While these are considered more basic, often I find them much more useful than more complex tools. Certainly they are easier to use and can help provide information in a much quicker fashion.Top provides a quick snapshot of what is going on in your system in a friendly character-based display.
![]()
It also provides information on CPU, Memory and Swap Space.
Let's look at vmstat one of the oldest but more important Unix/Linux tools ever created. Vmstat allows one to get a valuable snapshot of process, memory, sway I/O and overall CPU utilization.Now let's define some of the fields:
Memory
swpd The amount of virtual memory
free The amount of free memory
buff Amount of memory used for buffers
cache Amount of memory used as page cacheProcess
r number of run-able processes
b number or processes sleeping. Make sure this number does not exceed the amount of run-able processes, because when this condition occurs it usually signifies that there are performance problems.Swap
CPU
si the amount of memory swapped in from disk
so the amount of memory swapped out.
This is another important field you should be monitoring if you are swapping out data, you will likely be having performance problems with virtual memory.
us The % of time spent in user-level code.
It is preferable for you to have processes which spend more time in user code rather than system code. Time spent in system level code usually means that the process is tied up in the kernel rather than processing real data.
sy the time spent in system level code
id the amount of time the CPU is idle wa The amount of time the system is spending waiting for I/O.If your system is waiting on I/O everything tends to come to a halt. I start to get worried when this is > 10.
There is also:
Free This tool provides memory information, giving you data around the total amount of free and used physical and swap memory.
Now that we've analyzed our systems lets look at what we can do to optimize and tune our systems.
CPU Overhead Shutting Running Processes
Linux starts up all sorts of processes which are usually not required. This includes processes such as autofs, cups, xfs, nfslock and sendmail. As a general rule, shut down anything that isn't explicitly required. How do you do this? The best method is to use the chkconfig command.Here's how we can shut these processes down.
[root ((Content component not found.)) _29_140_234 ~]# chkconfig --del xfsYou can also use the GUI - /usr/bin/system-config-services to shut down daemon process.
Tuning the kernel
To tune your kernel for optimal performance, start with:sysctl This is the command we use for changing kernel parameters. The parameters themselves are found in /proc/sys/kernel
Let's change some of the parameters. We'll start with the msgmax parameter. This parameter specifies the maximum allowable size of a single message in an IPC message queue. Let's view how it currently looks.
[root ((Content component not found.)) _29_139_52 ~]# sysctl kernel.msgmax
kernel.msgmax = 65536
[root ((Content component not found.)) _29_139_52 ~]#There are three ways to make these kinds of kernel changes. One way is to change this using the echo command.
[root ((Content component not found.)) _29_139_52 ~]# echo 131072 >/proc/sys/kernel/msgmax
[root ((Content component not found.)) _29_139_52 ~]# sysctl kernel.msgmax
kernel.msgmax = 131072
[root ((Content component not found.)) _29_139_52 ~]#Another parameter that is changed quite frequently is SHMMAX, which is used to define the maximum size (in bytes) for a shared memory segment. In Oracle this should be set large enough for the largest SGA size. Let's look at the default parameter:
# sysctl kernel.shmmax
kernel.shmmax = 268435456This is in bytes which translates to 256 MG. Let's change this to 512 MG, using the -w flag.
[root ((Content component not found.)) _29_139_52 ~]# sysctl -w kernel.shmmax=5368709132
kernel.shmmax = 5368709132
[root ((Content component not found.)) _29_139_52 ~]#The final method for making changes is to use a text editor such as vi directly editing the /etc/sysctl.conf file to manually make our changes.
To allow the parameter to take affect dynamically without a reboot, issue the sysctl command with the -p parameter.
Obviously, there is more to performance tuning and optimization than we can discuss in the context of this small article entire books have been written on Linux performance tuning. For those of you first getting your hands dirty with tuning, I suggest you tread lightly and spend time working on development, test and/or sandbox environments prior to deploying any changes into production. Ensure that you monitor the effects of any changes that you make immediately; it's imperative to know the effect of your change. Be prepared for the possibility that fixing your bottleneck has created another one. This is actually not a bad thing in itself, as long as your overall performance has improved and you understand fully what is happening.
Performance monitoring and tuning is a dynamic process which does not stop after you have fixed a problem. All you've done is established a new baseline. Don't rest on your laurels, and understand that performance monitoring must be a routine part of your role as a systems administrator.About the author: Ken Milberg is a systems consultant with two decades of experience working with Unix and Linux systems. He is a SearchEnterpriseLinux.com Ask the Experts advisor and columnist.
Before you learn how to configure your system, you should learn how to gather essential system> information. For example, you should know how to find the amount of free memory, the amount of available hard drive space, how your hard drive is partitioned, and what processes are running. This chapter discusses how to retrieve this type of information from your Red Hat Enterprise Linux system using simple commands and a few simple programs.
1. System Processes
The
ps ax command displays a list of current system processes, including processes owned by other users. To display the owner alongside each process, use the ps aux command. This list is a static list; in other words, it is a snapshot of what was running when you invoked the command. If you want a constantly updated list of running processes, use top as described below. The ps output can be long. To prevent it from scrolling off the screen, you can pipe it through less:ps aux | less
You can use the
ps command in combination with the grep command to see if a process is running. For example, to determine if Emacs is running, use the following command:ps ax | grep emacs
The
top command displays currently running processes and important information about them including their memory and CPU usage. The list is both real-time and interactive. An example of output from the top command is provided as follows:To exit top press the q key. Useful interactive commands that you can use:
Immediately refresh the display
Space
Display a help screen h
Kill a process. You are prompted for the k
process ID and the signal to send to it.
n Change the number of processes displayed.
You are prompted to enter the number.
u Sort by user.
M Sort by memory usage.
For more information, refer to the top(1) manual page. P Sort by CPU usage.
redhat.comThe short summary of our study indicates that there is no SINGLE answer to which I/O scheduler is best. The good news is that with Red Hat Enterprise Linux 4 an end-user can customize their scheduler with a simple boot option. Our data suggests the default Red Hat Enterprise Linux 4 I/O scheduler, CFQ, provides the most scalable algorithm for the widest range of systems, configurations, and commercial database users. However, we have also measured other workloads whereby the Deadline scheduler out-performed CFQ for large sequential read-mostly DSS queries. Other studies referenced in the section "References" explored using the AS scheduler to help interactive response times. In addition, noop has proven to free up CPU cycles and provide adequate I/O performance for systems with intelligent I/O controller which provide their own I/O ordering capabilities.
In conclusion, we recommend baselining an application with the default CFQ. Use this article and its references to match your application to one of the studies. Then adjust the I/O scheduler via the simple command line re-boot option if seeking additional performance. Make only one change at a time, and use performance tools to validate the results.
Articles - Kernel Tuningvm.swappiness is a tunable kernel parameter that controls how much the kernel favors swap over RAM. At the source code level, it's also defined as the tendency to steal mapped memory. A high swappiness value means that the kernel will be more apt to unmap mapped pages. A low swappiness value means the opposite, the kernel will be less apt to unmap mapped pages. In other words, the higher the vm.swappiness value, the more the system will swap.
The default value I've seen on both enterprise level Red Hat and SLES servers is 60.
To find out what the default value is on a particular server, run:
sysctl vm.swappinessThe value is also located in /proc/sys/vm/swappiness.
What reason might there be to change the value of this parameter? Like all other tunable kernel parameters, there may not be a compelling reason to change the default value, but having a facility that allows one to manipulate how the linux kernel behaves without modifying source code is indispensable.
If there were reasons to change the vm.swappiness kernel parameter, one might be to decrease the parameter if swapping is undesirable. I've seen enterprise configurations where servers had a swap to RAM ratio of 1:125. It's evident in this case that there is no interest in ever using anything but physical memory so why not make the kernel privy to this information. Whether the vm.swappiness parameter is set to 0, 20, 40, or any other value, the owner of the server should perform due diligence to see what affect this has on the server and applications. For an under-the-cover look on the effect of changing the parameter, one only needs to look at the vmscan.c source file and the swap_tendency algorithm.
swap tendency = mapped_ratio / 2 + distress + vm_swappiness;
On the flip side, one may consider increasing the vm.swappiness parameter greater than the default if a particular system has physical memory contraints."Systems with memory constraints that run batch jobs (processes that sleep for long time) might benefit from an aggressive swapping behavior." http://unixfoo.blogspot.com/2007/11/linux-performance-tuning.html
Andrew Morton sets his workstation vm.swappiness parameter to 100. "My point is that decreasing the tendency of the kernel to swap stuff out is wrong. You really don't want hundreds of megabytes of BloatyApp's untouched memory floating about in the machine. Get it out on the disk, use the memory for something useful."
The following is an excerpt of a benchmark obtained using different vm.swappiness values while performing dd on a 2.6.5-7.97-default kernel (http://lwn.net/Articles/100978/):
vm.swappiness Total I/O Avg Swap 0 273.57 MB/s 0 MB 20 273.75 MB/s 0 MB 40 273.52 MB/s 0 MB 60 229.01 MB/s 23068 MB 80 195.63 MB/s 25587 MB 100 184.30 MB/s 26006 MB To read more information on the vm.swappiness kernel tunable, you may find these links helpful.
About:
Sysprof is a sampling CPU profiler that uses a Linux kernel module to profile the entire system, not just a single application. It handles shared libraries, and applications do not need to be recompiled. It profiles all running processes, not just a single application, has a nice graphical interface, shows the time spent in each branch of the call tree, can load and save profiles, and is easy to use.Release focus: Minor bugfixes
Changes:
This version compiles with recent kernels.Author:
S๘ren Sandmann [contact developer]
The kernel summit was two weeks ago, and at the end of that I got one of the new 80GB solid state disks from Intel. Since then, I've been wanting to talk to people about it because I'm so impressed with it, but at the same time I don't much like using the kernel mailing list as some kind of odd public publishing place that isn't really kernel-related, so since I'm testing this whole blogging thing, I might as well vent about it here.
That thing absolutely rocks.
I've been impressed by Intel before (Core 2), but they've had their share of total mistakes and idiotic screw-ups too (Itanic), but the things Intel tends to have done well are the things where they do incremental improvements. So it's a nice thing to be able to say that they can do new things very well too. And while I often tend to get early access to technology, seldom have I looked forward to it so much, and seldom have things lived up to my expectations so well.
In fact, I can't recall the last time that a new tech toy I got made such a dramatic difference in performance and just plain usability of a machine of mine.
So what's so special about that Intel SSD, you ask? Sure, it gets up to 250MB/s reads and 70MB/s writes, but fancy disk arrays can certainly do as well or better. Why am I not gushing about soem nice NAS box? I didn't even put the thing into a laptop, after all, it's actually in Tove's Mac Mini (running Linux, in case anybody was confused ;), so a RAID NAS box would certainly have been a lot bigger and probably have more features.
But no, forget about the throughput figures. Others can match - or at last come close - to the throughput, but what that Intel SSD does so well is random reads and writes. You can do small random accesses to it and still get great performance, and quite frankly, that's the whole point of not having some stupid mechanical latencies as far as I'm concerned.
And the sad part is that other SSD's generally absolutely suck when it comes to especially random write performance. And small random writes is what you get when you update various filesystem meta-data on any normal filesystem, so it really does matter. For example, a vendor who shall remain nameless has an SSD disk out there that they were also hawking at the Kernel Summit, and while they get fine throughput (something like 50+MB/s on big contiguous writes), they benchmark a pitiful 10 (yes, that's ten, as in "how many fingers do you have) small random writes per second. That is slower than a rotational disk.
In contrast, the Intel SSD does about 8,500 4kB random writes per second. Yeah, that's over eight thousand IOps on random write accesses with a relevant block size, rather than some silly and unrealistic contiguous write test. That's what I call solid-state media.
The whole thing just rocks. Everything performs well. You can put that disk in a machine, and suddenly you almost don't even need to care whether things were in your page cache or not. Firefox starts up pretty much as snappily in the cold-cache case as it does hot-cache. You can do package installation and big untars, and you don't even notice it, because your desktop doesn't get laggy or anything.
So here's the deal: right now, don't buy any other SSD than the Intel ones, because as far as I can tell, all the other ones are pretty much inferior to the much cheaper traditional disks, unless you never do any writes at all (and turn off 'atime', for that matter).
So people - ignore the manufacturer write throughput numbers. They don't mean squat. The fact that you may be able to push 50MB/s to the SSD is meaningless if that can only happen when you do big, aligned, writes.
If anybody knows of any reasonable SSDs that work as well as Intel's, let me know.
sarvant analyzes files from the sysstat utility "sar" and produces graphs of the collected data using gnuplot. It supports user-defined data source collection, debugging, start and end times, interval counting, and output types (Postscript, PDF, and PNG). It's also capable of using gnuplot's graph smoothing capability to soften spiked line graphs. It can analyze performance data over both short and long periods of time.
You will find here a tutorial describing a few use cases for some sysstat commands. The first section below concerns the sar and sadf commands. The second one concerns the pidstat command. Of course, you should really have a look at the manual pages to know all the features and how these commands can help you to monitor your system (follow the Documentation link above for that).
- Section 1: Using sar and sadf
- Section 2: Using pidstat
Section 1: Using sar and sadf
sar is the system activity reporter. By interpreting the reports that sar produces, you can locate system bottlenecks and suggest some possible solutions to those annoying performance problems.
The Linux kernel maintains internal counters that keep track of requests, completion times, I/O block counts, etc. From this and other information, sar calculates rates and ratios that give insight into where the bottlenecks are.
The key to understanding sar is that it reports on system activity over a period of time. You must take care to collect sar data at an appropriate time (not at lunch time or on weekends, for example). Here is one way to invoke sar:
The -u option specifies our interest in the CPU subsystem. The -o option will create an output file that contains binary data. Finally, we will take 3 samples at two-second intervals. Upon completion of the sampling, sar will report the results to the screen. This provides us with a snapshot of current system activity.
$ sar -u -o datafile 2 3
The above example uses sar in interactive mode. You can also invoke sar from cron. In this case, cron would run the /usr/lib/sa/sa1 shell script and create a daily log file. The /usr/lib/sa/sa2 shell script is run to format the log into human-readable form. These scripts may be invoked by a crontab run by root (although I prefer to use adm). Here is the crontab, located in /etc/cron.d directory and using Vixie cron syntax, that makes this happen:
In reality, the sa1 script initiates a related utility called sadc. sa1 gives sadc several arguments to specify the amount of time to wait between samples, the number of samples, and the name of a file into which the binary results should be written.
# Run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib/sa/sa1 -d 1 1
# 0 * * * * root /usr/lib/sa/sa1 -d 600 6 &
# Generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib/sa/sa2 -A
A new file is created each day so that we can easily interpret daily results. The sa2 script calls sar, which formats the binary data into human-readable form.
Let's think of our system as being composed of three interdependant subsystems: CPU, disk and memory. Our goal is to find out which subsystem is responsible for any performance bottleneck. By analyzing sar's output, we can achieve that goal.
Listing below represents the report produced by initiating the sar -u command. Initiating sar in this manner produces a report from the daily log file produced by sadc.
The %user and %system columns simply specify the amount of time the CPU spends in user and system mode. The %iowait and %idle columns are of interest to us when doing performance analysis. The %iowait column specifies the amount of time the CPU spends waiting for I/O requests to complete. The %idle column tells us how much useful work the CPU is doing. A %idle time near zero indicates a CPU bottleneck, while a high %iowait value indicates unsatisfactory disk performance.
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM CPU %user %nice %system %iowait %steal %idle
09:10:00 PM all 96.18 0.00 0.42 0.00 0.00 3.40
09:20:00 PM all 97.99 0.00 0.36 0.00 0.00 1.65
09:30:00 PM all 97.59 0.00 0.38 0.00 0.00 2.03
...
Additional information can be obtained by the sar -q command, which displays the run queue length, total number of processes, and the load averages for the past one, five and fifteen minutes:
This example shows that the system is busy (since more than one process is runnable at any given time) and rather overloaded.
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
09:10:00 PM 2 121 2.22 2.17 1.45
09:20:00 PM 6 137 2.79 2.48 1.73
09:30:00 PM 5 129 3.31 2.83 1.95
...
sar also lets you monitor memory utilization. Have a look at the following example produced by sar -r:
This listing shows that the system has plenty of free memory. Swap space is not used. So memory is not a problem here. You can double-check this by using sar -W to get swapping statistics:
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
09:10:00 PM 591468 444388 42.90 19292 227412 1632920 0 0.00 0
09:20:00 PM 546860 488996 47.21 21844 243900 1632920 0 0.00 0
09:30:00 PM 538268 497588 48.04 25308 267228 1632920 0 0.00 0
...
sar can also help you to monitor disk activity. sar -b displays I/O and transfer rate statistics grouped for all block devices:
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM pswpin/s pswpout/s
09:10:00 PM 0.00 0.00
09:20:00 PM 0.00 0.00
09:30:00 PM 0.00 0.00
...
sar -d enables you to get more detailed information on a per device basis. It displays statistics data similar to those displayed by iostat:
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM tps rtps wtps bread/s bwrtn/s
09:10:00 PM 6.37 2.32 4.05 126.84 61.41
09:20:00 PM 4.03 0.74 3.29 54.49 46.04
09:30:00 PM 6.71 3.11 3.59 80.13 49.18
...
sar has numerous other options that enable you to gather statistics for every part of your system. You will find useful information about them in the manual page.
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util
09:10:00 AM sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
09:10:00 AM sdb 18.09 0.00 160.80 8.89 0.01 0.67 0.19 0.35
09:20:00 AM sda 2.51 0.00 52.26 20.80 0.00 0.60 0.40 0.10
09:20:00 AM sdb 18.91 0.00 141.29 7.47 0.02 0.92 0.21 0.40
09:30:00 AM sda 26.87 11.94 291.54 11.30 0.12 4.33 1.07 2.89
09:30:00 AM sdb 7.00 0.00 54.00 7.71 0.00 0.50 0.14 0.10
...
OK. As a last example, let's show how the sadf command can help us to produce some graphs.
We use the command sar -B to display paging statistics from daily data file sa29 (see example below).
sadf -d extracts data in a format that can be easily ingested by a relational database:
# sar -B -f /var/log/sa/sa29
Linux 2.6.8.1-27mdkcustom (localhost) 03/29/2006
09:00:00 PM pgpgin/s pgpgout/s fault/s majflt/s
09:10:00 PM 63.42 30.71 267.35 0.45
09:20:00 PM 27.25 23.02 281.88 0.26
09:30:00 PM 40.06 24.59 246.51 0.3209:40:00 PM 43.58 26.11 265.25 0.34
09:50:00 PM 34.12 28.38 271.54 0.37
Average: 41.69 26.56 266.51 0.35
If we saw this as a text file, both Excel and Open Office will allow us to specify a semicolon as a field delimiter. Then we can generate our performance report and graph.
# sadf -d /var/log/sa/sa29 -- -B
localhost;601;2006-03-29 19:10:00 UTC;63.42;30.71;267.35;0.45
localhost;600;2006-03-29 19:20:00 UTC;27.25;23.02;281.88;0.26
localhost;600;2006-03-29 19:30:00 UTC;40.06;24.59;246.51;0.32
localhost;600;2006-03-29 19:40:00 UTC;43.58;26.11;265.25;0.34
localhost;600;2006-03-29 19:50:00 UTC;34.12;28.38;271.54;0.37
Section 2: Using pidstat
The pidstat command is used to monitor processes and threads currently being managed by the Linux kernel. It can also monitor the children of those processes and threads.
With its -d option, pidstat can report I/O statistics, providing that you have a recent Linux kernel (2.6.20+) with the option CONFIG_TASK_IO_ACCOUNTING compiled in. So imagine that your system is undergoing heavy I/O and you want to know which tasks are generating them. You could then enter the following command:
This report tells us that there is only one task (a "dd" command with PID 15625) which is responsible for these I/O.
$ pidstat -d 2
Linux 2.6.20 (localhost) 09/26/2007
10:13:31 AM PID kB_rd/s kB_wr/s kB_ccwr/s Command
10:13:31 AM 15625 1.98 16164.36 0.00 dd
10:13:33 AM PID kB_rd/s kB_wr/s kB_ccwr/s Command
10:13:33 AM 15625 4.00 20556.00 0.00 dd
10:13:35 AM PID kB_rd/s kB_wr/s kB_ccwr/s Command
10:13:35 AM 15625 0.00 10642.00 0.00 dd
...
When no PID's are explicitly selected on the command line (as in the case above), the pidstat command examines all the tasks managed by the system but displays only those whose statistics are varying during the interval of time. But you can also indicate which tasks you want to monitor. The following example reports CPU statistics for PID 8197 and all its threads:
$ pidstat -t -p 8197 1 3
Linux 2.6.8.1-27mdkcustom (localhost) 09/26/2007
10:40:05 AM PID TID %user %system %CPU CPU Command
10:40:06 AM 8197 - 71.29 1.98 73.27 0 procthread
10:40:06 AM - 8197 71.29 1.98 73.27 0 |__procthread
10:40:06 AM - 8198 0.00 0.99 0.99 0 |__procthread
10:40:06 AM PID TID %user %system %CPU CPU Command
10:40:07 AM 8197 - 67.00 2.00 69.00 0 procthread
10:40:07 AM - 8197 67.00 2.00 69.00 0 |__procthread
10:40:07 AM - 8198 1.00 1.00 2.00 0 |__procthread
10:40:07 AM PID TID %user %system %CPU CPU Command
10:40:08 AM 8197 - 56.00 6.00 62.00 0 procthread
10:40:08 AM - 8197 56.00 6.00 62.00 0 |__procthread
10:40:08 AM - 8198 2.00 1.00 3.00 0 |__procthread
Average: PID TID %user %system %CPU CPU Command
Average: 8197 - 64.78 3.32 68.11 - procthread
Average: - 8197 64.78 3.32 68.11 - |__procthread
Average: - 8198 1.00 1.00 1.99 - |__procthread
As a last example, let me show you how pidstat helped me to detect a memory leak in the pidstat command itself. At that time I was testing the very first version of pidstat I wrote for sysstat 7.1.4 and fixing the last remaining bugs. Here is the command I entered on the command line and the output I got:
I noticed that pidstat had a memory footprint (VSZ and RSS fields) that was constantly increasing as the time went by. I quickly found that I had forgotten to close a file descriptor in a function of my code and that was responsible for the memory leak...!
$ pidstat -r 2
Linux 2.6.8.1-27mdkcustom (localhost) 09/26/2007
10:59:03 AM PID minflt/s majflt/s VSZ RSS %MEM Command
10:59:05 AM 14364 113.66 0.00 2480 1540 0.15 pidstat
10:59:05 AM PID minflt/s majflt/s VSZ RSS %MEM Command
10:59:07 AM 7954 150.00 0.00 27416 19448 1.88 net_applet
10:59:07 AM 14364 120.00 0.00 3048 2052 0.20 pidstat
10:59:07 AM PID minflt/s majflt/s VSZ RSS %MEM Command
10:59:09 AM 14364 116.00 0.00 3488 2532 0.24 pidstat
10:59:09 AM PID minflt/s majflt/s VSZ RSS %MEM Command
10:59:11 AM 7947 0.50 0.00 27044 18356 1.77 mdkapplet
10:59:11 AM 14364 116.00 0.00 3928 3012 0.29 pidstat
10:59:11 AM PID minflt/s majflt/s VSZ RSS %MEM Command
10:59:13 AM 7954 155.50 0.00 27416 19448 1.88 net_applet
10:59:13 AM 14364 115.50 0.00 4496 3488 0.34 pidstat
...
Over the past few years, Linux has made its way into the data centers of many corporations all over the globe. The Linux operating system has become accepted by both the scientific and enterprise user population. Today, Linux is by far the most versatile operating system. You can find Linux on embedded devices such as firewalls and cell phones and mainframes. Naturally, performance of the Linux operating system has become a hot topic for both scientific and enterprise users. However, calculating a global weather forecast and hosting a database impose different requirements on the operating system. Linux has to accommodate all possible usage scenarios with the most optimal performance. The consequence of this challenge is that most Linux distributions contain general tuning parameters to accommodate all users.IBMฎ has embraced Linux, and it is recognized as an operating system suitable for enterprise-level applications running on IBM systems. Most enterprise applications are now available on Linux, including file and print servers, database servers, Web servers, and collaboration and mail servers.
With use of Linux in an enterprise-class server comes the need to monitor performance and, when necessary, tune the server to remove bottlenecks that affect users. This IBM Redpaper describes the methods you can use to tune Linux, tools that you can use to monitor and analyze server performance, and key tuning parameters for specific server applications. The purpose of this redpaper is to understand, analyze, and tune the Linux operating system to yield superior performance for any type of application you plan to run on these systems.
The tuning parameters, benchmark results, and monitoring tools used in our test environment were executed on Red Hat and Novell SUSE Linux kernel 2.6 systems running on IBM System x servers and IBM System z servers. However, the information in this redpaper should be helpful for all Linux hardware platforms. >
http://linuxperf.nl.linux.org/
http://www.citi.umich.edu/projects/citi-netscape/
http://home.att.net/~jageorge/performance.html
http://www.psc.edu/networking/perf_tune.html#Linux
Need to stress out an ftp server, or measure how many users it can support? dkftpbench can do it.Want to write your own highly efficient networking software, but annoyed by having to support very different code for Linux, FreeBSD, and Solaris? libPoller can help.
dklimits
This is part of the dkftpbench package.
fd-limit
thread-limit
Copyright ฉ 1996-2009 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
Disclaimer:
Last modified: September 24, 2009