Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

Virtual memory

News Recommended Books Recommended Links Reference Dubeau. Beej Rusling
System V IPC shmmax Linux Swap filesystem Unix Kernel Info Open Group Search Engine Gnu C library Etc
Note: this is by and large obsolete page from my 1996 class Operating systems architecture.  Most links are broken, but can be recovered via Google

Memory is central to the operation of modern computer systems. In order to execute a program (or collection of parallel tasks), it must be stored in main memory. Once the program is loaded in memory, program execution is initiated by supplying its starting address to the CPU (or to nodes in the multiprocessor system). The CPU then, sends address of instruction or data to be accessed to the memory unit. This source of this address is completely unknown to the memory unit and it does not distinguish CPU requests whether it is for accessing instruction or data.

The source of these address may be PC (programmer counter) for instruction request, index (for array access), literal address (for fixed location such as a well know port address), and so on. To manage all these issues, operating system must provide a unit called memory manager. It is responsible for managing memory requests associated right from the end-user request for program execution to the program termination including dynamic memory requests generated by the program during its execution.

Modern computer systems demand execution of multiple programs at the same time. Hence, it is necessary to have multiple in the main memory in ready-to-run state and this concept is popularly called multiprogramming. Multiprogramming concept to work, it is necessary that program must be able to execute successful irrespective of memory location where it is loaded in main memory for execution. From this it can be inferred that, the memory access quests (address) generated by the program (logical-or virtual address) is not the same as the one to be used to access (physical address) information from the main memory. That is, program generates logical address whereas, memory unit request physical address.

Here is good  overview from Metakernel Theory Page

In order to allow multiple programs to run at the same time, an OS kernel must be able to enforce protections on memory. Additionally, kernels typically implement a number of optimizations using the memory protection mechanisms. Two important ideas in the area of systems are the notion of caching and lazy evaluation. Caching refers to keeping the contents of a slow storage medium (like a disk) in a faster one (like main memory). Lazy evaluation refers to delaying work until it is absolutely necessary, possibly avoiding it altogether.

Segmentation

The most primitive means of memory protection and management is called "segmentation". In segmentation, the system maintains a segment table, which contains the base address, size, and permissions of various segments of memory. A program accesses memory by providing the segment, as well as an offset into the segment. Segmentation is limiting because it requires programs to consciously take it into account. Furthermore, segmentation makes it difficult to load additional program components at runtime, as segments must be resized. Finally, since segments can be any size, allocating them tends to be difficult, and memory tends to become "fragmented", or broken into many small pieces. Because of this, segmentation is generally not used in modern operating systems.

Paging

A more advanced scheme is called "paging". In paging, each thread has its own address space, which is separate from others. This address space is called a "virtual address space". Actual memory is called "physical memory". Both physical and virtual memory are divided into units called "pages" (on IA-32, pages are 4 kilobytes in size). Each address space has a structure called a "page table", which maps virtual memory pages to physical memory pages.

When a program supplies a memory address, the hardware uses the page table to determine the physical address. The lowest binary digits (bits) of the address are used as an offset into the page itself. The other, higher bits are used as an index into the page table. Each entry in the page table contains a physical address, as well as the permissions for that page. If the type of access matches the protections on the page, the hardware continues with the operation. Otherwise, the hardware generates a protection fault, and the kernel takes over. Additionally, since not every page may be mapped in a given address space, the page table entry can indicate that a given page is invalid. Accessing an invalid page causes another kind of fault, called a page fault. Finally, most architectures have some mechanism that indicates if a page has been written to (called "dirty"), and when it was last accessed.

On 32 and 64 bit architectures (meaning most of the world), a page table will be huge. For this reason, it is common to use several "levels" of page tables. The top-level page table's entries point to another page table, whose entries point to physical pages. 64-bit architectures often have as many as 4 or 5 layer page tables.

Especially with multi-level page tables, address translation becomes expensive. To avoid this, hardware uses a sort of translation cache called a "TLB", or translation lookaside buffer. The TLB contains the results of past address translations. However, when the kernel changes address spaces, it must clear the TLB, or else the entries from the old space will still be present, and will cause incorrect address translation. Anytime an address space is changed, the TLB entries for the pages that were changed must be cancelled. In a multiprocessor machine, the TLB's on OTHER CPU's must also be cancelled if the other CPUs are running in the same address space. This is called a "TLB shootdown".

Semaphore and Shared Segment Kernel Parameters

For an Oracle 10g database, some kernel parameters need to be changed to meet Oracle's requirements at Oracle Database Quick Installation Guide 10g Release 1 (10.1.0.3) for Linux x86. For Oracle10g, the following kernel parameters have to be set to values greater than or equal to the recommended values which can be changed in the proc filesystem:

shmmax  = 2147483648     (To verify, execute: cat /proc/sys/kernel/shmmax)
shmall  = 2097152        (To verify, execute: cat /proc/sys/kernel/shmall)
shmmni  = 4096           (To verify, execute: cat /proc/sys/kernel/shmmni)

semmsl  = 250            (To verify, execute: cat /proc/sys/kernel/sem | awk '{print $1}')
semmns  = 32000          (To verify, execute: cat /proc/sys/kernel/sem | awk '{print $2}')
semopm  = 100            (To verify, execute: cat /proc/sys/kernel/sem | awk '{print $3}')
semmni  = 128            (To verify, execute: cat /proc/sys/kernel/sem | awk '{print $4}')

file-max = 65536         (To verify, execute: cat /proc/sys/fs/file-max)

ip_local_port_range = 1024 65000 
                         (To verify, execute: cat /proc/sys/net/ipv4/ip_local_port_range)
To see the above kernel parameters with one command, you can type:
su - root
sysctl -a |egrep "shmmax|shmall|shmmni|sem|file-max|ip_local_port_range"

For ip_local_port_range Oracle recommends to set the local port range for outgoing messages to "1024 65000" which is needed for high-usage systems. This kernel parameter defines the local port range for TCP and UDP traffic to choose from.
For more information on shmmax, shmmni, and shmall, see Setting Shared Memory.
For more information on semmsl, semmni, semmns, and semopm, see Setting Semaphores.
For more information on filemax, see Setting File Handles.

NOTE: Do not change the value of any kernel parameter on a system where it is already higher than listed as minimum requirement.

For SLES-9, SLP-9.1, SLP-9.2, and SLP-9.3 I had to increase the kernel parameters shmmax, semopm, file-max, ip_local_port_range to meet the minimum requirement. To change these kernel parameters permanently, add the following lines below to the configuration file /etc/sysctl.conf. This file is used during the boot process to change default kernel settings. Note that in SLES-9 and SLP-9.1 the /etc/sysctl.conf file does not exist. Simply create the file if it does not exist on your system.

net.ipv4.ip_local_port_range=1024 65000
kernel.sem=250 32000 100 128
kernel.shmmax=2147483648
fs.file-max=65536
Or simply run the following command to add new kernel settings:
su - root 
cat >> /etc/sysctl.conf << EOF
kernel.shmmax=2147483648
kernel.sem=250 32000 100 128
fs.file-max=65536
net.ipv4.ip_local_port_range=1024 65000
EOF

In SLES-9 and SLP-9.1 you also have to instruct SUSE Linux to read the /etc/sysctl.conf file during the boot process. This is done by enabling the boot.sysctl system service:
su - root
# chkconfig boot.sysctl
boot.sysctl  off
# chkconfig boot.sysctl on
# chkconfig boot.sysctl
boot.sysctl  on
#

To load the new kernel settings from the /etc/sysctl.conf file without reboot, execute the following command:
su - root
# sysctl -p
kernel.shmmax = 2147483648
kernel.sem = 250 32000 100 128
fs.file-max = 65536
net.ipv4.ip_local_port_range = 1024 65000
#

Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News

[Feb 11, 2009] redhat.com Red Hat Magazine - Understanding Virtual Memory

Old article based on RHEL 3 VM

One of the most important aspects of an operating system is the Virtual Memory Management system. Virtual Memory (VM) allows an operating system to perform many of its advanced functions, such as process isolation, file caching, and swapping. As such, it is imperative that an administrator understand the functions and tunable parameters of an operating system's Virtual Memory Manager so that optimal performance for a given workload may be achieved. After reading this article, the reader should have a rudimentary understanding of the data the Red Hat Enterprise Linux (RHEL3) VM controls and the algorithms it uses. Further, the reader should have a fairly good understanding of general Linux VM tuning techniques. It is important to note that Linux as an operating system has a proud legacy of overhaul. Items which no longer serve useful purposes or which have better implementations as technology advances are phased out. This implies that the tuning parameters described in this article may be out of date if you are using a newer or older kernel. Fear not however! With a well grounded understanding of the general mechanics of a VM, it is fairly easy to convert knowledge of VM tuning to another VM. The same general principles apply, and documentation for a given kernel (including its specific tunable parameters) can be found in the corresponding kernel source tree under the file Documentation/sysctl/vm.txt.

[Nov 21, 2008] How the Linux Kernel Manages Virtual Memory - Virtual Memory is Fundamental to OS Performance by Charlie Schluting

November 21, 2008 |

To optimally configure your Virtual Memory Manager (VMM), it's necessary to understand how it does its job. We're using Linux for example's sake, but the concepts apply across the board, though some slight architectural differences will exist between the Unixes.

Nearly every VMM interaction involves the MMU, or Memory Management Unit, excluding the disk subsystem. The MMU allows the operating system to access memory through virtual addresses by using data structures to track these translations. Its main job is to translate these virtual addresses into physical addresses, so that the right section of RAM is accessed.

The Zoned Buddy Allocator interacts directly with the MMU, providing valid pages when the kernel asks for them. It also manages lists of pages and keeps track of different categories of memory addresses.

The Slab Allocator is another layer in front of the Buddy Allocator, and provides the ability to create cache of memory objects in memory. On x86 hardware, pages of memory must be allocated in 4KB blocks, but the Slab Allocator allows the kernel to store objects that are differently sized, and will manage and allocate real pages appropriately.

Finally, a few kernel tasks run to manage specific aspects of the VMM. Bdflush manages block device pages (disk IO), and kswapd handles swapping pages to disk.

Pages of memory are either Free (available to allocate), Active (in use), or Inactive. Inactive pages of memory are either dirty or clean, depending on if it has been selected for removal yet or not. An inactive, dirty page is no longer in use, but is not yet available for re-use. The operating system must scan for dirty pages, and decide to deallocate them. After they have been guaranteed sync'd to disk, an inactive page my be "clean," or ready for re-use.

Tunable parameters may be adjusted in real-time via the proc fils system, but to persist across a reboot, /etc/sysctl.conf is the preferred method. Parameters can be entered in real-time via the sysctl command, and then recorded in the configuration file for reboot persistence.

You can adjust everything from the interval at which pages are scanned to the amount of memory to reserve for pagecache use. Let's see a few examples.

Often we'll want to optimize a system for IO performance. A busy database server, for example, is generally only going to run the database, and it doesn't matter if the user experience is good or not. If the system doesn't require much memory for user applications, decreasing the available bdflush tunables is beneficial. The specific parameters being adjusted are just too lengthy to explain here, but definitely look into them if you wish to adjust the values further. They are fully explained in vm.txt, usually located at /usr/src/linux/Documenation/sysctl/vm.txt.

In general, an IO-heavy server will benefit from the following settings in sysctl.conf:

vm.bdflush="100 5000 640 2560 150 30000 5000 1884 2"

The pagecache values control how much memory is used for pagecache. The amount of pagecache allowed translates directly to how many programs and open files can be held in memory.

The three tunable parameters with pagecache are:

On a file server, we'd want to increase the amount of pagecache available, so that data isn't moved to disk as often. Using vm.pagecache="10 50 100" provides more caching, allowing larger and less frequent disk writes for file IO intensive work loads.

On a single-user machine, say your workstation, large number will keep pages in memory, allowing programs to execute faster. Once the upper limit is reached, however, you will start swapping constantly.

Conversely, a server with many users that frequently executes many different programs will not want high amounts of pagecache. The pagecache can easily eat up available memory if it's too large, so something like vm.pagecache="10 20 30" is a good compromise.

Finally, the swappiness and vm.overcommit parameters are also very powerful. The overcommit number can be used to allow more memory allocation than RAM exists, which allows you to overcommit the amount of pages. Programs that have a habit of trying to allocate many gigabytes of memory are a hassle, and frequently they don't use nearly that much memory. Upping the overcommit factor will allow these allocations to happen, but if the application really does use all the RAM, you'll be swapping like crazy in no time (or worse: running out of swap).

The swappiness concept is heavily debated. If you want to decrease the amount of swapping done by the system, just echo a small number of the range 0-100 into /proc/sys/vm/swappiness. You don't generally want to play with this, as it its more mysterious and non-deterministic than the advanced parameters described above. In general, you want applications to swap to avoid using memory for no reason. Task-specific servers, where you know the amount of RAM and the application requirements, are best suited for swappiness tuning (using a low number to decrease swapping).

These parameters all require a bit of testing, but in the end, you can dramatically increase the performance of many types of servers. The common case of disappointing disk performance stands to gain the most: Give the settings a try before going out and buying a faster disk array.

When he's not writing for Enterprise Networking Planet or riding his motorcycle, Charlie Schluting is the Associate Director of Computing Infrastructure at Portland State University. Charlie also operates OmniTraining.net, and recently finished Network Ninja, a must-read for every network engineer.

Linux Memory Management

Summary Shared Memory-IPC settings msg#00105 os.solaris.managers.summaries

want to say thank to everyone who responded. I really appreciate your help in this manner. All of the answers were excellent. Thanks to Charlotte_Ratliff, Jon Andrews, secroft, JESSE CARROLL

First semmap is obsolete in Solaris 8, I had searched docs.sun.com and found it in Sys Admin Guide Vol 2 so
I thought it was still in use. I had also searched Solaris Tunable Parameters Manual but couldn't find it
there before I received an email indicating it was in the manual. I still haven't located
c2audit:audit_load = 1 or abort_enable = 0, in the Solaris Tunable Parameters manual

I knew the system had to be rebooted and had done that but I hadn't mentioned that fact in my e-mail, so that
could have been the correct answer.

I added a forceload: sys/msgsys to the system file, so more detail are displayed when I issued the sysdef
command.

Jon describes what the audit_load and abort_enable below, but the puzzling thing was he indicated he
found the answer with google. I had also used google before asking for help, but simply missed the
Functions of the Basic Security Module Script document.

Following are the individual answers I've attached the Solaris Tunable Parameters Manual, which you can also download from docs.sun.com. This contains descriptions for the tunables in Solaris 8, most of which are also in the older releases. I hope this helps. JC

Not to be obvious, the to answer the first question, you have to re-boot for changes to take affect and the second I have not seen. Scott


The first one is that the module is not loaded. You can do this by hand. modload sys/msgsys should do it.

The last two are not shared memory settings. Quick google serach yields :-
131 echo "set c2audit:audit_load = 1" >>
/tmp/etc.system.$$
132 echo "set abort_enable = 0" >> /tmp/etc.system.$$

Line 132 disables the "Stop-a" keyboard sequence. Without this line in /etc/system, any user can halt
the system with the aforementioned keyboard sequence.

Line 131 enables auditing in the kernel and on the system. A value of 1 enables c2 auditing, while a
value of 0 would disable it. c2audit is a kernel module, which implements event auditing within the
Solaris OE. (The name c2 originates from a government-defined security level. In relation to the
Solaris OE, c2 is used as another word for audit.)

*********************************
The semmap has been obsoleted with Solaris 8, I
believe. You can verify this by going to the
docs.sun.com website and doing a search for tunable
parameters solaris 8. It should give you information
about the configurations and the changes to the
parameters. Hope this helps. -Charlotte

Original Question Shared Memory/IPC settings

Our DBA is requesting some shared memory setting that
I can not find documented anywhere. So this may be an
easy question, but I swear I have tried to figure this
out.

The very latest info on 9IAS version 2, says the
kernel parameters should be (their were others but I
was able to increase their values)
SEMMAP = 64
c2audit:audit_load = 1
abort_enable = 0

For SEMMAP = 64, I add the following line to
/etc/system
set semsys:seminfo_semmap=64
Question Number 1 &#61672; When I issue a sysdef -I
command I can't see the change I made? Sysdef below
(last few lines)
* Streams Tunables
*
9 maximum number of pushes allowed (NSTRPUSH)
65536 maximum stream message size (STRMSGSZ)
1024 max size of ctl part of message (STRCTLSZ)
*
* IPC Messages module is not loaded
*
*
* IPC Semaphores
*
4096 semaphore identifiers (SEMMNI)
4096 semaphores in system (SEMMNS)
4096 undo structures in system (SEMMNU)
4096 max semaphores per id (SEMMSL)
4096 max operations per semop call (SEMOPM)
64 max undo entries per process (SEMUME)
32767 semaphore maximum value (SEMVMX)
16384 adjust on exit max value (SEMAEM)
*
* IPC Shared Memory
*
4294967295 max shared memory segment size
(SHMMAX)
1 min shared memory segment size (SHMMIN)
512 shared memory identifiers (SHMMNI)
128 max attached shm segments per process (SHMSEG)
*
* Time Sharing Scheduler Tunables
*
60 maximum time sharing user priority (TSMAXUPRI)
SYS system class name (SYS_NAME)

Question Number 2 &#61672; I can not find any
information about the last two setting. I have seen
reference to them in the /etc/system file, but I don't
like adding them in the /etc/system file without
knowing what it does?

Any help or guidance will be appreciated.


=====
Kathy Ange
Virginia Department of Agriculture & Consumer Services
Information Systems
(804) 786-1340 Voice Mail
(804) 786-2110 FAX
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

Recommended Links

History

Capability Addressing, Protection Capabilities, Single Virtual Address Space, & Protection Rings

Distributed Shared Memory, & The Mach VM

Memory Consistency, & Consistency Models Requiring & Not Requiring Synchronization Operations

NUMA, Replication Of Memory, Achieving Sequential Consistency, & Synchronization in DSM Systems

Management of Available Storage, Swapping and Paging, & Inverted Page Tables

Performance of Demand Paging, Replacement Strategies, Stack Algorithms and Priority Lists, Approximations to LRU Replacement, Page vs. Segment Replacement, & Page Replacement in DSM Systems

Locality of Reference, User-Level Memory Managers,The Working Set Model, Load Control in UNIX, & Performance of Paging Algorithms


Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: March 12, 2019