Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

uptime command

News Recommended Books Recommended Links Reference  Linux Performance Tuning

ps

Unix top command
netstat iostat vmstat sar nfsstat

 

 
watch uname uptime Admin Horror Stories Unix History Humor Etc

Utility uptime is the standard way to your system’s load average. Please note that /proc/loadavg gives average load values without necessity to call uptime

The problem is that it has nothing to do with load average. But what does those three figures mean? Simply, it is the number of processes in the run queue (running +runnable processes) averaged over three different periods of period (the last 1, 5 and 15 minutes).  "Running plus runnable" means that they are either:

Again it has nothing to do with the percentage of the CPU used, although it is correlated with it. So in no way 4 means that each CPU is loaded 400%.

Adrian Cockcroft on Solaris In Sun Performance and Tuning [Coc95] in the section on p.97 entitled: Understanding and Using the Load Average states:

The load average is the sum of the run queue length and the number of jobs currently running on the CPUs.

That just means that on average there are 4 processes blocked in scheduler queue.   Imagine McDonalds with 4 cash registers and people lining for lunch. Each casher will serve customer exactly 10 sec. If customer can't get his order in 10 sec he needs to return to the line and wait for a free register again to continue processing of his order. Length of the line plus the number of registers in those circumstances averaged per 1 h, 5h and 15h (hours used instead of minutes here) would be a crude, half-decent analogy of what is happening in measuring load averages (they are also available directly from /proc/loadavg pseudofile).

So it load average is 4 that means that processes on average do not wait before they are are assigned specific CPU to run thier quitun of time (say 10ms). Average 8 means that there are on average 4 processes waiting for the CPU and that means that there is some kind of bottleneck that prevents processes from being assigned to CPUs

Linux man page is misleading and incorrect:

uptime gives a one line display of the following information. The current time, how long the system has been running, how many users are currently logged on, and the system load averages for the past 1, 5, and 15 minutes.

This is the same information contained in the header line displayed by w(1).

Sun man page is correct:

The uptime command prints the current time, the length of time the system has been up, and the average number of jobs in the run queue over the last 1, 5 and 15 minutes. It is, essentially, the first line of a w(1) command.

Here is a good explanation Examining Load Average Linux Journal and at Load (computing) - Wikipedia, the free encyclopedia

The load-average calculation is best thought of as a moving average of processes in Linux's run queue marked running or uninterruptible. The words “thought of” were chosen for a reason: that is how the measurements are meant to be interpreted, but not exactly what happens behind the curtain. It is at this juncture in our journey when the reality of it all, like quantum mechanics, seems not to fit the intuitive way as it presents itself.

The load averages that the top and uptime commands display are obtained directly from /proc. If you are running Linux kernel 2.4 or later, you can read those values yourself with the command cat /proc/loadavg. However, it is the Linux kernel that produces those values in /proc. Specifically, timer.c and sched.h work together to do the computation. To understand what timer.c does for a living, the concept of time slicing and the jiffy counter help round out the picture.

In the Linux kernel, each dispatchable process is given a fixed amount of time on the CPU per dispatch. By default, this amount is 10 milliseconds, or 1/100th of a second. For that short time span, the process is assigned a physical CPU on which to run its instructions and allowed to take over that processor. More often than not, the process will give up control before the 10ms are up through socket calls, I/O calls or calls back to the kernel. (On an Intel 2.6GHz processor, 10ms is enough time for approximately 50-million instructions to occur. That's more than enough processing time for most application cycles.) If the process uses its fully allotted CPU time of 10ms, an interrupt is raised by the hardware, and the kernel regains control from the process. The kernel then promptly penalizes the process for being such a hog. As you can see, that time slicing is an important design concept for making your system seem to run smoothly on the outside. It also is the vehicle that produces the load-average values.

The 10ms time slice is an important enough concept to warrant a name for itself: quantum value. There is not necessarily anything inherently special about 10ms, but there is about the quantum value in general, because whatever value it is set to (it is configurable, but 10ms is the default), it controls how often at a minimum the kernel takes control of the system back from the applications. One of the many chores the kernel performs when it takes back control is to increment its jiffies counter. The jiffies counter measures the number of quantum ticks that have occurred since the system was booted. When the quantum timer pops, timer.c is entered at a function in the kernel called timer.c:do_timer(). Here, all interrupts are disabled so the code is not working with moving targets. The jiffies counter is incremented by 1, and the load-average calculation is checked to see if it should be computed. In actuality, the load-average computation is not truly calculated on each quantum tick, but driven by a variable value that is based on the HZ frequency setting and tested on each quantum tick. (HZ is not to be confused with the processor's MHz rating. This variable sets the pulse rate of particular Linux kernel activity and 1HZ equals one quantum or 10ms by default.) Although the HZ value can be configured in some versions of the kernel, it is normally set to 100. The calculation code uses the HZ value to determine the calculation frequency. Specifically, the timer.c:calc_load() function will run the averaging algorithm every 5 * HZ, or roughly every five seconds. Following is that function in its entirety:

unsigned long avenrun[3];

   static inline void calc_load(unsigned long ticks)
   {
      unsigned long active_tasks; /* fixed-point */
      static int count = LOAD_FREQ;

      count -= ticks;
      if (count < 0) {
         count += LOAD_FREQ;
         active_tasks = count_active_tasks();
         CALC_LOAD(avenrun[0], EXP_1, active_tasks);
         CALC_LOAD(avenrun[1], EXP_5, active_tasks);
         CALC_LOAD(avenrun[2], EXP_15, active_tasks);
      }
   }

The avenrun array contains the three averages we have been discussing. The calc_load() function is called by update_times(), also found in timer.c, and is the code responsible for supplying the calc_load() function with the ticks parameter. Unfortunately, this function does not reveal its most interesting aspect: the computation itself. However, that can be located easily in sched.h, a header used by much of the kernel code. In there, the CALC_LOAD macro and its associated values are available:

   extern unsigned long avenrun[];	/* Load averages */

   #define FSHIFT   11		/* nr of bits of precision */
   #define FIXED_1  (1<<FSHIFT)	/* 1.0 as fixed-point */
   #define LOAD_FREQ (5*HZ)	/* 5 sec intervals */
   #define EXP_1  1884		/* 1/exp(5sec/1min) as fixed-point */
   #define EXP_5  2014		/* 1/exp(5sec/5min) */
   #define EXP_15 2037		/* 1/exp(5sec/15min) */

   #define CALC_LOAD(load,exp,n) \
      load *= exp; \
      load += n*(FIXED_1-exp); \
      load >>= FSHIFT;

Here is where the tires meet the pavement. It should now be evident that reality does not appear to match the illusion. At least, this is certainly not the type of averaging most of us are taught in grade school. But it is an average nonetheless. Technically, it is an exponential decay function and is the moving average of choice for most UNIX systems as well as Linux. Let's examine its details.

The macro takes in three parameters: the load-average bucket (one of the three elements in avenrun[]), a constant exponent and the number of running/uninterruptible processes currently on the run queue. The possible exponent constants are listed above: EXP_1 for the 1-minute average, EXP_5 for the 5-minute average and EXP_15 for the 15-minute average. The important point to notice is that the value decreases with age. The constants are magic numbers that are calculated by the mathematical function shown below:

When x=1, then y=1884; when x=5, then y=2014; and when x=15, then y=2037. The purpose of the magical numbers is that it allows the CALC_LOAD macro to use precision fixed-point representation of fractions. The magic numbers are then nothing more than multipliers used against the running load average to make it a moving average. (The mathematics of fixed-point representation are beyond the scope of this article, so I will not attempt an explanation.) The purpose of the exponential decay function is that it not only smooths the dips and spikes by maintaining a useful trend line, but it accurately decreases the quality of what it measures as activity ages. As time moves forward, successive CPU events increase their significance on the load average. This is what we want, because more recent CPU activity probably has more of an impact on the current state than ancient events. In the end, the load averages give a smooth trend from 15 minutes through the current minute and give us a window into not only the CPU usage but also the average demand for the CPUs. As the load average goes above the number of physical CPUs, the more the CPU is being used and the more demand there is for it. And, as it recedes, the less of a demand there is. With this understanding, the load average can be used with the CPU percentage to obtain a more accurate view of CPU activity.

It is my hope that this serves not only as a practical interpretation of Linux's load averages but also illuminates some of the dark mathematical shadows behind them. For more information, a study of the exponential decay function and its applications would shed more light on the subject. But for the more practical-minded, plotting the load average vs. a controlled number of processes (that is, modeling the effects of the CALC_LOAD algorithm in a controlled loop) would give you a feel for the actual relationship and how the decaying filter applies.

Ray Walker is a consultant specializing in UNIX kernel-level code. He has been a software developer for more than 25 years, working with Linux since 1995. He can be contacted at ray.rwalk2730@gmail.com.

Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

/proc/loadavg gives average load values without nessesty to call uptime

Interpretation of load average vs. CPU utilization CPU runqueue load

Hi,

I have a server with 16 CPU ( 4 sockets with Quad-Core ) and I have a multi-threaded (32 threads) batch program that is expected work mainly in CPU.

When I check CPU usage, is shows less than 50% only.

Cpu(s): 55.2% us,  2.0% sy,  0.0% ni, 42.0% id,  0.0% wa,  0.0% hi,  0.7% si
So that shows a very under-utilized system.

However load average is about 13:

load average: 13.78, 13.21, 13.41
13 among 16 CPU cores, that is about 80%. That's different.

That was from top.

Please can you explain where my interpretation is false.

I checked with sar as well:

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
$ sar -p
09:20:01 AM       CPU     %user     %nice   %system   %iowait     %idle
09:00:02 AM       all     49.68      0.00      2.46      0.01     47.85
09:10:01 AM       all     49.73      0.00      2.63      0.01     47.64
09:20:01 AM       all     49.71      0.00      2.50      0.01     47.78
09:30:01 AM       all     51.00      0.00      2.72      0.01     46.27

$ sar -q
09:20:01 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
09:00:02 AM        15      1218     13.58     13.23     13.31
09:10:01 AM        13      1218     13.99     14.21     13.72
09:20:01 AM        11      1212     14.19     13.74     13.66
09:30:01 AM        13      1216     13.86     13.79     13.60
And I've one more question: runq-sz is defined as number of processes waiting for run time. Does that mean that at 9:30 there was 13 processes waiting for available CPU, when cpu is 46% idle ? Or is runq-sz included in load average, meaning that I have 13 processes running in cpu, and then I should expect cpu usage about 80% ?

Thanks,
Franck.

1) the load is calculated in differents ways
but on this link http://www.teamquest.com/resources/gunther/display/5/index.htm it's nicely presented:
The load average is the sum of the run queue length and the number of jobs currently running on the CPUs.

so
(30, 0, 15) load is 15 = 0 +15
(30, 8, 7) load is 15 = 8 + 7
(30, 8, 6) load is 14 = 8 + 6

2) Yes
plus----
It's quite complicated and really really fast in the real life
I recommand you read solaris internals (http://www.solarisinternals.com/) and the book with them.
It's for solaris but in the end all kernel are based on the same ideas with differents implementations.

3) yes 30,000 is a big number of syscall
you can use the "strace" command to get an idea of which syscall is called more often

looks like there is a software called kerneltrap that might help you (I've searched dtrace linux on google)

[Aug 21, 2010] Understanding Linux CPU Load - when should you be worried by Andre

July 31

A single-core CPU is like a single lane of traffic. Imagine you are a bridge operator ... sometimes your bridge is so busy there are cars lined up to cross. You want to let folks know how traffic is moving on your bridge. A decent metric would be how many cars are waiting at a particular time. If no cars are waiting, incoming drivers know they can drive across right away. If cars are backed up, drivers know they're in for delays.

So, Bridge Operator, what numbering system are you going to use? How about:

= load of 1.00

= load of 0.50

= load of 1.70

This is basically what CPU load is. "Cars" are processes using a slice of CPU time ("crossing the bridge") or queued up to use the CPU. Unix refers to this as the run-queue length: the sum of the number of processes that are currently running plus the number that are waiting (queued) to run.

Like the bridge operator, you'd like your cars/processes to never be waiting. So, your CPU load should ideally stay below 1.00. Also like the bridge operator, you are still ok if you get some temporary spikes above 1.00 ... but when you're consistently above 1.00, you need to worry.

So you're saying the ideal load is 1.00?

Well, not exactly. The problem with a load of 1.00 is that you have no headroom. In practice, many sysadmins will draw a line at 0.70:

What about Multi-processors? My load says 3.00, but things are running fine!

Got a quad-processor system? It's still healthy with a load of 3.00.

On multi-processor system, the load is relative to the number of processor cores available. The "100% utilization" mark is 1.00 on a single-core system, 2.00, on a dual-core, 4.00 on a quad-core, etc.

If we go back to the bridge analogy, the "1.00" really means "one lane's worth of traffic". On a one-lane bridge, that means it's filled up. On a two-late bridge, a load of 1.00 means its at 50% capacity -- only one lane is full, so there's another whole lane that can be filled.

= load of 2.00 on two-lane road

Same with CPUs: a load of 1.00 is 100% CPU utilization on single-core box. On a dual-core box, a load of 2.00 is 100% CPU utilization.

Multicore vs. multiprocessor

While we're on the topic, let's talk about multicore vs. multiprocessor. For performance purposes, is a machine with a single dual-core processor basically equivalent to a machine with two processors with one core each? Yes. Roughly. There are lots of subtleties here concerning amount of cache, frequency of process hand-offs between processors, etc. Despite those finer points, for the purposes of sizing up the CPU load value, the total number of cores is what matters, regardless of how many physical processors those cores are spread across.

Which leads us to a two new Rules of Thumb:

Bringing It Home

Let's take a look at the load averages output from uptime:

~ $ uptime
23:05 up 14 days, 6:08, 7 users, load averages: 0.65 0.42 0.36

This is on a dual-core CPU, so we've got lots of headroom. I won't even think about it until load gets and stays above 1.7 or so.

Now, what about those three numbers? 0.65 is the average over the last minute, 0.42 is the average over the last five minutes, and 0.36 is the average over the last 15 minutes. Which brings us to the question:

Which average should I be observing? One, five, or 15 minute?

For the numbers we've talked about (1.00 = fix it now, etc), you should be looking at the five or 15-minute averages. Frankly, if your box spikes above 1.0 on the one-minute average, you're still fine. It's when the 15-minute average goes north of 1.0 and stays there that you need to snap to. (obviously, as we've learned, adjust these numbers to the number of processor cores your system has).

So # of cores is important to interpreting load averages ... how do I know how many cores my system has?

cat /proc/cpuinfo to get info on each processor in your system. Note: not available on OSX, Google for alternatives. To get just a count, run it through grep and word count: grep 'model name' /proc/cpuinfo | wc -l

Monitoring Linux CPU Load with Scout

Scout provides 2 ways to modify the CPU load. Our original server load plugin and Jesse Newland's Load-Per-Processor plugin both report the CPU load and alert you when the load peaks and/or is trending in the wrong direction:

More Reading

Comments

  1.  John Liptak said 3 days later:

    I would add a paragraph about CPU bound applications. For example, if you have a ray tracing app that will use all of CPU that you can give it, your load average will always be > 1.0. It’s not a problem if you planned it that way.

  2.  A real linux admin said 4 days later:

    You forgot to mention that any process in IOWAIT is considered on the CPU . So you could have one process spending most of its time waiting on a slow NFS share that could drive your load average up to 1.0, even though your CPU is virtually idle.

  3.  Buttson said 4 days later:

    grep -c will give you a count without needing to pipe through wc.

  4.  Stephen said 4 days later:

    In addition to the CPU bound processed, you’ve not mentioned nice. If you have processes that are very low priority (high nice value), they may not interfere with interactive use of the machine. For example, the machine i’m typing on is quite responsive. The load average is almost a little over 4. It’s a dual CPU. 4 of the processes are niced as far as they can go. These can be ignored. A little over 4 minus 4 is a little over zero. So, there’s no idle time, but the machine does what i want it to do, when i want it to do it.

    And, i have 4 processes running because i want to make use of both CPUs. But the code is written to either run single thread, or use 4 threads, cooperating. Well, that’s the breaks.

  5. chkno said 4 days later:

    The above is accurate for CPU -bound loads, but there are other resources in play. For example, if you have a one-CPU one-disk system with a load average of 15, do not just run out and buy a 16-core one-disk system. If all those processes are waiting on disk i/o, you would be much better off with a one-CPU 16-disk system (or an ssd).

  6.  Chris said 4 days later:

    While I understood the basics of the load rule (over .7 watch out) I never truly got how it scaled to multi-core processors or how it broke it down into multiple minute time frames. Thanks!

    One question though, the Pent4 had hyperthreading which “faked” a second core which could help, or seriously hinder performance depending upon the code. Its come back now with the i7/xeon derivatives (not sure about the new i5’s). How does this affect the values to be wary of? Or does it? Since its a “virtual” core in a sense it can help, but too much or the wrong kind of work can starve your main core and cause serious problems.

    Any thoughts?

  7.  oldschool said 4 days later:

    One slight disagreement…in large scale processing applications, a load of .7 is considered wasteful. While a web server may need available headroom for unexpected traffic spikes, a server dedicated to processing data should be worked as hard as possible. Consider the difference between the plumbing to and from your house (and neighbor hood) versus an oil pipeline to Alaska. You expect the plumbing for sporadically used water or sewage to and from your house to handle unexpected usage patterns. The pipeline from Alaska should be running at full capacity at all times to avoid costly down time. Large grid computing makes use of this model by allowing your servers to have the additional capacity for spikes, while consuming the idle cpu cycles when possible.

Andy Millar » Blog Archive » Linux Load Average explained

load average: 0.00, 0.00, 0.00

How do you get this output?

To get your system’s Load Average, run the command uptime. It will show you the current time, how long your system has been powered on for, the number of users logged in, and finally the system’s load average.
What does it mean?

Simply, it is the number of blocking processes in the run queue averaged over a certain time period.

Time periods:

load average: 1min, 5min, 15min

What is a blocking process?

A blocking process is a process that is waiting for something to continue. Typically, a process is waiting for:

What does a high load average mean?

A high load average typically means that your server is under-specified for what it is being used for, or that something has failed (like an externally mounted disk).

How do I diagnose a high load average?

Typically, a server with a high load average is unresponsive and slow — and you want to reduce the load and increase responsiveness. But how do you go about working out what is causing your high load?

Lets start with the simplest one, are we waiting for CPU? Run the Linux command top.

Check the numbers above in the red circle. They are basically representing what percentage of its’ total time the CPU is spending processing stuff. If these numbers are constantly around 99-100% then chances are the problem is related to your CPU, almost certainly that it is under powered. Consider upgrading your CPU.

The next thing to look for is if the cpu is waiting on I/O. Now check the number around where the red circle is now. If this number is high (above 80% or so) then you have problems. This means that the CPU is spending a LOT of time waiting in I/O. This could mean that you have a failing Hard Disk, Failing Network Card, or that your applications are trying to access data on either of them at a rate significantly higher than the throughput that they are designed for.

To find out what applications are causing the load, run the command ps faux. This will list every process running on your system, and the state it is in.

You want to look in the STAT column. The common flags that you should be looking for are:

So, look for any processes with a STAT of D, and you can go from there to diagnose the problem.

Further Diagnosis

To diagnose further, you can use the following programs

This entry was posted on Sunday, December 24th, 2006 at 5:22 pm and is filed under Geekery. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

1142

Interpretation of the uptime command - UNIX for Dummies Questions & Answers - The UNIX and Linux Forums

The load average is not a percentage. It can be thought of as the number of processes waiting to run. So a load average of 0.25 means on average there is 1/4 of a process waiting to run, in other words most of the time none are waiting but a quarter of the time one is.

The 15.97 means on average there are about 16 processes waiting to run. If you have a medium to large multiprocessor box this might be ok. If it is a single processor box you'll have terrible performance.

It depends on what the processors are as well. I've had a 12 processor Sun box with 400 mhz Ultrasparc II processors start slowing down at a load average of 20, while a 4 processor box with Ultrasparc III 1.05ghz processors ran fine with a load average of 30. So a rule of thumb like "load average of 2x number of procs" won't help much, you need to compare to your system's historical behavior.

UNIX Load Average Part 1 How It Works by Dr. Neil Gunther

In order to view the mathematical notations correctly, please check here before continuing.

Have you ever wondered how those three little numbers that appear in the UNIX® load average (LA) report are calculated?

This TeamQuest online column explains how and how the load average (LA) can be reorganized to do better capacity planning. But first, try testing your knowledge with the "LA Triplets"Quiz.

In this two part-series I want to explore the use of averages in performance analysis and capacity planning. There are many manifestations of averages e.g., arithmetic average (the usual one), moving average (often used in financial planning), geometric average (used in the SPEC CPU benchmarks), harmonic average (not used enough), to name a few.

More importantly, we will be looking at averages over time or time-dependent averages. A particular example of such a time-dependent average is the load average metric that appears in certain UNIX commands. In Part 1 I shall look at what the load average is and how it gets calculated. In Part 2 I'll compare it with other averaging techniques as they apply in capacity planning and performance analysis. This article does not assume you are a familiar with UNIX commands, so I will begin by reviewing those commands which display the load average metric. By Section 4, however, I'll be submerging into the UNIX kernel code that does all the work.

1 UNIX Commands

Actually, load average is not a UNIX command in the conventional sense. Rather it's an embedded metric that appears in the output of other UNIX commands like uptime  and procinfo. These commands are commonly used by UNIX sysadmin's to observe system resource consumption. Let's look at some of them in more detail.

1.1  Classic Output

The generic ASCII textual format appears in a variety of UNIX shell commands. Here are some common examples.
uptime
The uptime  shell command produces the following output:
[pax:~]% uptime
9:40am  up 9 days, 10:36,  4 users,  load average: 0.02, 0.01, 0.00
It shows the time since the system was last booted, the number of active user processes and something called the load average.
procinfo
On Linux systems, the procinfo  command produces the following output:
[pax:~]% procinfo
Linux 2.0.36 (root@pax) (gcc 2.7.2.3) #1 Wed Jul 25 21:40:16 EST 2001 [pax]

Memory:      Total        Used        Free      Shared     Buffers      Cached
Mem:         95564       90252        5312       31412       33104       26412
Swap:        68508           0       68508

Bootup: Sun Jul 21 15:21:15 2002    Load average: 0.15 0.03 0.01 2/58 8557
...
The load average appears in the lower left corner of this output.
w
The w(ho)  command produces the following output:
[pax:~]% w
  9:40am  up 9 days, 10:35,  4 users,  load average: 0.02, 0.01, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU  WHAT
mir      ttyp0    :0.0             Fri10pm  3days  0.09s  0.09s  bash
neil     ttyp2    12-35-86-1.ea.co  9:40am  0.00s  0.29s  0.15s  w
...
Notice that the first line of the output is identical to the output of the uptime  command.
top
The top  command is a more recent addition to the UNIX command set that ranks processes according to the amount of CPU time they consume. It produces the following output:
  4:09am  up 12:48,  1 user,  load average: 0.02, 0.27, 0.17
58 processes: 57 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.5% user,  0.9% system,  0.0% nice, 98.5% idle
Mem:   95564K av,  78704K used,  16860K free,  32836K shrd,  40132K buff
Swap:  68508K av,      0K used,  68508K free                 14508K cched

  PID USER     PRI  NI  SIZE  RSS SHARE STAT  LIB %CPU %MEM   TIME COMMAND
 5909 neil      13   0   720  720   552 R       0  1.5  0.7   0:01 top
    1 root       0   0   396  396   328 S       0  0.0  0.4   0:02 init
    2 root       0   0     0    0     0 SW      0  0.0  0.0   0:00 kflushd
    3 root     -12 -12     0    0     0 SW<     0  0.0  0.0   0:00 kswapd
...

In each of these commands, note that there are three numbers reported as part of the load average  output. Quite commonly, these numbers show a descending order from left to right. Occasionally, however, an ascending order appears e.g., like that shown in the top  output above.

1.2  GUI Output

The load average can also be displayed as a time series like that shown here in some output from a tool called ORCA.

 

Although such visual aids help us to see that the green curve is more spikey and has more variability than the red curve, and it allows us to see a complete day's worth of data, it's not clear how useful this is for capacity planning or performance analysis. We need to understand more about how the load average metric is defined and calculated.

2  So What Is It?

So, exactly what is this thing called load average that is reported by all these various commands? Let's look at the official UNIX documentation.

2.1  The man Page

[pax:~]% man "load average"
No manual entry for load average
Oops! There is no man page! The load average  metric is an output embedded in other commands so it doesn't get its own man entry. Alright, let's look at the man page for uptime, for example, and see if we can learn more that way.
...
DESCRIPTION
       uptime  gives a one line display of the following informa-
       tion.  The current time, how long the system has been run-
       ning, how many users are currently logged on, and the sys-
       tem load averages for the past 1, 5, and 15 minutes.
...
So, that explains the three metrics. They are the "... load averages for the past 1, 5, and 15 minutes."
 
Which are the GREEN, BLUE and RED curves, respectively, in Figure 1 above.
Unfortunately, that still begs the question "What is the load?

2.2  What the Gurus Have to Say

Let's turn to some UNIX hot-shots for more enlightenment.
Tim O'Reilly and Crew
The book UNIX Power Tools [POL97], tell us on p.726 The CPU:
 

The load average tries to measure the number of active processes at any time. As a measure of CPU utilization, the load average is simplistic, poorly defined, but far from useless.

That's encouraging! Anyway, it does help to explain what is being measured: the number of active processes. On p.720 39.07 Checking System Load: uptime it continues ...
 

... High load averages usually mean that the system is being used heavily and the response time is correspondingly slow.

What's high? ... Ideally, you'd like a load average under, say, 3, ... Ultimately, 'high' means high enough so that you don't need uptime to tell you that the system is overloaded.

Hmmm ... where did that number "3" come from? And which of the three averages (1, 5, 15 minutes) are they referring to?
Adrian Cockcroft on Solaris
In Sun Performance and Tuning [Coc95] in the section on p.97 entitled: Understanding and Using the Load Average, Adrian Cockcroft states:
 

The load average is the sum of the run queue length and the number of jobs currently running on the CPUs. In Solaris 2.0 and 2.2 the load average did not include the running jobs but this bug was fixed in Solaris 2.3.

So, even the "big boys" at Sun can get it wrong. Nonetheless, the idea that the load average is associated with the CPU run queue is an important point.

O'Reilly et al. also note some potential gotchas with using load average ...
 

...different systems will behave differently under the same load average. ... running a single cpu-bound background job .... can bring response to a crawl even though the load avg remains quite low.

As I will demonstrate, this depends on when you look. If the CPU-bound process runs long enough, it will drive the load average up because its always either running or runable. The obscurities stem from the fact that the load average is not your average kind of average. As we alluded to in the above introduction, it's a time-dependent average. Not only that, but it's a damped time-dependent average. To find out more, let's do some controlled experiments.

3  Performance Experiments

The experiments described in this section involved running some workloads in background on single-CPU Linux box. There were two phases in the test which has a duration of 1 hour:

A Perl script sampled the load average every 5 minutes using the uptime  command. Here are the details.

3.1  Test Load

Two hot-loops were fired up as background tasks on a single CPU Linux box. There were two phases in the test:
  1. The CPU is pegged by these tasks for 2,100 seconds.
  2. The CPU is (relatively) quiescent for the remaining 1,500 seconds.

The 1-minute average reaches a value of 2 around 300 seconds into the test. The 5-minute average reaches 2 around 1,200 seconds into the test and the 15-minute average would reach 2 at around 3,600 seconds but the processes are killed after 35 minutes (i.e., 2,100 seconds).

3.2  Process Sampling

As the authors [BC01] explain about the Linux kernel, because both of our test processes are CPU-bound they will be in a TASK_RUNNING  state. This means they are either:

The Linux kernel also checks to see if there are any tasks in a short-term sleep state called TASK_UNINTERRUPTIBLE. If there are, they are also included in the load average sample. There were none in our test load.

The following source fragment reveals more details about how this is done.
 

600  * Nr of active tasks - counted in fixed-point numbers
601  */
602 static unsigned long count_active_tasks(void)
603 {
604         struct task_struct *p;
605         unsigned long nr = 0;
606
607         read_lock(&tasklist_lock);
608         for_each_task(p) {
609                 if ((p->state == TASK_RUNNING ||
610                      (p->state & TASK_UNINTERRUPTIBLE)))
611                         nr += FIXED_1;
612         }
613         read_unlock(&tasklist_lock);
614         return nr;
615 }
So, uptime  is sampled every 5 seconds which is the linux kernel's intrinsic timebase for updating the load average calculations.

4 Kernel Magic

An Addendum

Now let's go inside the Linux kernel and see what it is doing to generate these load average numbers.

unsigned long avenrun[3];
624
625 static inline void calc_load(unsigned long ticks)
626 {
627         unsigned long active_tasks; /* fixed-point */
628         static int count = LOAD_FREQ;
629
630         count -= ticks;
631         if (count < 0) {
632                 count += LOAD_FREQ;
633                 active_tasks = count_active_tasks();
634                 CALC_LOAD(avenrun[0], EXP_1, active_tasks);
635                 CALC_LOAD(avenrun[1], EXP_5, active_tasks);
636                 CALC_LOAD(avenrun[2], EXP_15, active_tasks);
637         }
638 }
The countdown is over a LOAD_FREQ  of 5 HZ. How often is that?
 
  1 HZ    =   100 ticks
  5 HZ    =   500 ticks
  1 tick  =    10 milliseconds
500 ticks =  5000 milliseconds (or 5 seconds)
So, 5 HZ means that CALC_LOAD  is called every 5 seconds.

4.1  Magic Numbers

The function CALC_LOAD  is a macro defined in sched.h
58 extern unsigned long avenrun[];         /* Load averages */
59
60 #define FSHIFT          11              /* nr of bits of precision */
61 #define FIXED_1         (1<<FSHIFT)     /* 1.0 as fixed-point */
62 #define LOAD_FREQ       (5*HZ)          /* 5 sec intervals */
63 #define EXP_1           1884            /* 1/exp(5sec/1min) as fixed-point */
64 #define EXP_5           2014            /* 1/exp(5sec/5min) */
65 #define EXP_15          2037            /* 1/exp(5sec/15min) */
66
67 #define CALC_LOAD(load,exp,n) \
68         load *= exp; \
69         load += n*(FIXED_1-exp); \
70         load >>= FSHIFT;

A noteable curiosity is the appearance of those magic numbers: 1884, 2014, 2037. What do they mean? If we look at the preamble to the code we learn,

/*
49  * These are the constant used to fake the fixed-point load-average
50  * counting. Some notes:
51  *  - 11 bit fractions expand to 22 bits by the multiplies: this gives
52  *    a load-average precision of 10 bits integer + 11 bits fractional
53  *  - if you want to count load-averages more often, you need more
54  *    precision, or rounding will get you. With 2-second counting freq,
55  *    the EXP_n values would be 1981, 2034 and 2043 if still using only
56  *    11 bit fractions.
57  */

These magic numbers are a result of using a fixed-point (rather than a floating-point) representation.

Using the 1 minute sampling as an example, the conversion of exp(5/60) into base-2 with 11 bits of precision occurs like this:

e5 / 60 e5 / 60
211
 
 
(1)
But EXP_M represents the inverse function exp(-5/60). Therefore, we can calculate these magic numbers directly from the formula,
EXP_M = 211
2 5 log2(e) / 60M
 
 
(2)
where M = 1 for 1 minute sampling. Table 1 summarizes some relevant results
.
T EXP_T Rounded
5/60 1884.25 1884
5/300 2014.15 2014
5/900 2036.65 2037
2/60 1980.86 1981
2/300 2034.39 2034
2/900 2043.45 2043
Table 1: Load Average magic numbers.

These numbers are in complete agreement with those mentioned in the kernel comments above. The fixed-point representation is used presumably for efficiency reasons since these calculations are performed in kernel space rather than user space.

One question still remains, however. Where do the ratios like exp(5/60) come from?

4.2  Magic Revealed

Taking the 1-minute average as the example, CALC_LOAD  is identical to the mathematical expression:
load(t) = load(t-1) e-5/60 + n (1 - e-5/60)
(3)
If we consider the case n = 0, eqn.(3) becomes simply:
load(t) = load(t-1) e-5/60
(4)
If we iterate eqn.(4), between t = t0 and t = T we get:
load(tT) = load(t0) e-5t/60
(5)
which is pure exponential decay, just as we see in Fig. 2 for times between t0 = 2100 and tT = 3600.

Conversely, when n = 2 as it was in our experiments, the load average is dominated by the second term such that:
load(tT) = 2 load(t0) (1 - e-5t/60)
(6)
which is a monotonically increasing function just like that in Fig. 2 between t0 = 0 and tT = 2100.

5  Summary

So, what have we learned? Those three innocuous looking numbers in the LA triplet have a surprising amount of depth behind them.

The triplet is intended to provide you with some kind of information about how much work has been done on the system in the recent past (1 minute), the past (5 minutes) and the distant past (15 minutes).

As you will have discovered if you tried the LA Triplets quiz, there are problems:

  1. The "load" is not the utilization but the total queue length.
  2. They are point samples of three different time series.
  3. They are exponentially-damped moving averages.
  4. They are in the wrong order to represent trend information.

These inherited limitations are significant if you try to use them for capacity planning purposes. I'll have more to say about all this in the next online column Load Average Part II: Not Your Average Average.

References

[BC01]
D. P. Bovet and M. Cesati. Understanding the Linux Kernel. O'Reilly & Assoc. Inc., Sebastopol, California, 2001.
 
 
[Coc95]
A. Cockcroft. Sun Performance and Tuning. SunSoft Press, Mountain View, California, 1st edition, 1995.
 
 
[Gun01]
N. J. Gunther. Performance and scalability models for a hypergrowth e-Commerce Web site. In R. Dumke, C. Rautenstrauch, A. Schmietendorf, and A. Scholz, editors, Performance Engineering: State of the Art and Current Trends, volume # 2047, pages 267-282. Springer-Verlag, Heidelberg, 2001.
 
 
[POL97]
J. Peek, T. O'Reilly, and M. Loukides. UNIX Power Tools. O'Reilly & Assoc. Inc., Sebastopol, California, 2nd edition, 1997.
 
 

UNIX Load Average Part 2 Not Your Average Average

1 Recap of Part 1

This is the second in a two part-series where I explore the use of averages in performance analysis and capacity planning. There are many manifestations of averages e.g., arithmetic average (the usual one), moving average (often used in financial planning), geometric average (used in the SPEC CPU benchmarks), harmonic average (not used enough), just to name a few.

In Part 1, I described some simple experiments that revealed how the load averages (the LA Triplets) are calculated in the UNIX® kernel (well, the Linux kernel anyway since that source code is available online). We discovered a C-macro called CALC_LOAD  that does all the work. Taking the 1-minute average as the example, CALC_LOAD  is identical to the mathematical expression:

load(t) = load(t - 1) e-5/60 + n (1 - e-5/60)
(1)

which corresponds to an exponentially-damped moving average. It says that your current load is equal to the load you had last time (decayed by an exponential factor appropriate for 1-minute reporting) plus the number of currently active processes (weighted by a exponentially increasing factor appropriate for 1-minute reporting). The only difference between the 1-minute load average shown here and the 5- and 15-minute load averages is the value of the exponential factors; the magic numbers I discussed in Part 1.

Another point I made in Part 1 was that we, as performance analysts, would be better off if the LA Triplets were reported in the reverse order: 15, 5, 1, because that ordering concurs with usual convention that temporal order flows left to right. In this way it would be easier to read the LA Triplets as a trend (which was part of the original intent, I suspect). Trending information could be enhanced even further by representing the LA Triplets using animation (of the type I showed in the Quiz).

Here, in Part 2, I'll compare the UNIX load averaging approach with other averaging methods as they apply to capacity planning and performance analysis.

2  Exponential Smoothing

Exponential smoothing (also called filtering by electrical engineering types) is a general purpose way of prepping highly variable data before further analysis. Filters of this type are available in most data analysis tools such as: EXCEL, Mathematica, and Minitab.

The smoothing equation is an iterative function that has the general form:

 

Y(t)

smoothed
 
= Y(t - 1) +

a

constant
 
     

X(t)

raw
 
- Y(t-1)    
(2)

where X(t) is the input raw data, Y(t - 1) is the value due to the previous smoothing iteration and Y(t) is the new smoothed value. If it looks a little incestuous, it's supposed to be.

2.1  Smooth Loads

Expressing the UNIX load average method (see equation (1)) in the same format produces:
load(t) = load(t-1) + EXP_R  [ n(t) - load(t-1) ]
(3)

Eqn.(3) is equivalent to (2) if we chose EXP_R = 1 - a. The constant a is called the smoothing constant and can range between 0.0 and 1.0 (in other words, you can think of it as a percentage). EXCEL uses the terminology damping factor for the quantity (1 - a).

The value of a determines the percentage by which the current smoothing iteration should for changes in the data that produced the previous smoothing iteration. Larger values of a yield a more rapid response to changes in the data but produce coarser rather than smoother resultant curves (less damped). Conversely, smaller values of a produce very smoother curves but take much longer to compensate for fluctuations in the data (more damped). So, what value of a should be used?

2.2  Critical Damping

EXCEL documentation suggests 0.20 to 0.30 are ``reasonable'' values to choose for a. This is a patently misleading statement because it does not take into account how much variation in the data (e.g., error) you are prepared to tolerate.

From the analysis in Section 1 we can now see that EXP_R plays the role of a damping factor in the UNIX load average. The UNIX load average is therefore equivalent to an exponentially-damped moving average. The more usual moving average (of the type often used by financial analysts) is just a simple arithmetic average with over some number of data points.

The following Table 1 shows the respective smoothing and damping factors that are based on the magic numbers described in Part 1.

LA Factor Damping Correction
EXP_R 1 - aR aR
EXP_1 0.9200  ( ≈ 92%) 0.0800  ( ≈ 8%)
EXP_5 0.9835  ( ≈ 98%) 0.0165  ( ≈ 2%)
EXP_15 0.9945  ( ≈ 99%) 0.0055  ( ≈ 1%)

Table 1: UNIX load average damping factors.

The value of a is calculated from 1 - exp(-5/60R) where R = 1, 5 or 15. From Table 1 we see that the bigger the correction for variation in the data (i.e., aR), the more responsive the result is to those variations and therefore we see less damping (1 - aR) in the output.

This is why the 1-minute reports respond more quickly to changes in load than do the 15-minute reports. Note also, that the largest correction for the UNIX load average is about 8% for the 1-minute report and is nowhere near the 20% or 30% suggested by EXCEL.

3  Other Averages

Next, we compare these time-dependent smoothed averages with some of the more familiar forms of averaging used in performance analysis and capacity planning.

3.1  Steady-State Averages

The most commonly used average used in capacity planning, benchmarking and other kinds of performance modeling, is the steady-state average.

In terms of the UNIX load average, this would correspond to observing the reported loads over a sufficiently long time (T) much as shown in Fig. 1.

Note that sysadmins almost never use the load average metrics in this way. Part of the reason for that avoidance lies in the fact that the LA metrics are embedded inside other commands (which vary across UNIX platforms) are need to be extracted. TeamQuest View is an excellent example of the way in which such classic limitations in traditional UNIX performance tools have been partially circumvented.

... ... ...

4. Summary

So, what have we learnt from all this? Those three little numbers tucked away innocently in certain UNIX commands are not so trivial after all. The first point is that load in this context refers to run-queue length (i.e., the sum of the number of processes waiting in the run-queue plus the number currently executing). Therefore, the number is absolute (not relative) and thus it can be unbounded; unlike utilization (AKA ``load'' in queueing theory parlence).

Moreover, they have to be calculated in the kernel and therefore they must be calculated efficiently. Hence, the use of fixed-point arithmetic and that gives rise to those very strange looking constants in the kernel code. At the end of Part 1 I showed you that the magic number are really just exponential decay and rise constants expressed in fixed-point notation.

In Part 2 we found out that these constants are actually there to provide exponential smoothing of the raw instantaneous load values. More formally, the UNIX load average is an exponentially smoothed moving average function. In this way sudden changes can be damped so that they don't contribute significantly to the longer term picture. Finally, we compared the exponentially damped average with the more common type of averages that appear as metrics in benchmarks and performance models.

On average, the UNIX load average metrics are certainly not your average average.

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended


Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: October 20, 2015