Softpanorama

May the source be with you, but remember the KISS principle ;-)
Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

Solaris UFS File System

News Recommended Links Solaris Volume Manager (SVM) Solaris Volume Manager - Soft Partitioning Explained Disk Partitions in Solaris
Solaris TMPFS NFS Floppy FAT filesystem CdRom Solaris snapshots
Swap Space and Virtual Memory mount Mount Options  Humor Etc

A file system consists of blocks of data. The number of bytes constituting a block varies depending on the OS. The internal physical structure of a hard disk consists of cylinders. The hard disk is divided into groups of cylinders known as cylinder groups, further divided into blocks.

The file system is comprised of five main blocks (boot block, superblock, Inode block, data block,

  1. Boot block. The boot block is part of the disk label that contains a loader used to boot the operating system.

  2. Super block. All partitions within the Unix filing system usually contain a special block called the super block. The super block contains the basic information about the entire file system. It stores the following details about the file system:

  3. Inode block. Inode is a kernel structure that contains a pointer to the disk blocks that store data. This pointer points to information such as file type, permission type, owner and group information, file size, file modification time, and so on. Note that the inode does not contain the filename as part of the information. The filename is listed in a directory that contains a list of filenames and related inodes associated with the file. When a user attempts to access a given file by name, the name is looked up in the directory where the corresponding inode is found. Inode stores the following information about every file:

    Each inode has a unique number associated with it, called the inode number. The -li option of the ls command displays the inode number of a file:

    # ls -li

    When a user creates a file in the directory or modifies it, the following events occur:

  4. Data block

    The data block is the storage unit of data in the Solaris file system. The default size of a data block in the Solaris file system is 8192 bytes. After a block is full, the file is allotted another block. The addresses of these blocks are stored as an array in the Inode.

    The first 12 pointers in the array are direct addresses of the file; that is, they point to the first 12 data blocks where the file contents are stored. If the file grows larger than these 12 blocks, then a 13th block is added, which does not contain data. This block, called an indirect block, contains pointers to the addresses of the next set of direct blocks.

    If the file grows still larger, then a 14th block is added, which contains pointers to the addresses of a set of indirect blocks. This block is called the double indirect block. If the file grows still larger, then a 15th block is added, which contains pointers to the addresses of a set of double indirect blocks. This block is called the triple indirect block.
     

  5. Vnodes. A Virtual Node or vnode is a data structure that represents an open file, directory, or device that appears in the file system namespace. A vnode does not render the physical file system it implements. The vnode interface allows high-level operating system modules to perform uniform operations on vnodes.

Links

Hard and soft links are a great features of Unix. It is a reference in a directory to a file stored in another directory. In case of soft links it can be a reference to a directory. There might be multiple links to a file. Links eliminate redundancy because you do not need to store multiple copies of a file.

Links are of two types: hard and soft (also known as symbolic).

To create a symbolic link, you must use the -s option with the ln command. Files that are soft linked contain an l symbol at the first bit of the access permission bits displayed by the ls -l command, whereas those that are hard linked do not contain the l symbol. A directory is symbolically linked to a file. However, it cannot be hard linked.

It is obvious that no file exists with a link count less than one.  Relative pathnames . or .. are nothing but links for the current directory and its parent directory.  These are present in every directory: any directory stores the two links ., .. and the Inode numbers of the files. They can be listed by the ls -lia option. A directory must have a minimum of two links. The number of links increases as the number of sub-directories increase. Whenever you issue a command to list the file attributes, it refers to the Inode block with the Inode number and the corresponding data is retrieved.

Solaris File Systems and Their Functions

Each file system used in Solaris is intended for a specific purpose.

The root file system is at the top of an inverted tree structure. It is the first file system that the kernel mounts during booting. It contains the kernel and device drivers. The / directory is also called the mount point directory of the file system. All references in the file system are relative to this directory. The entire file system structure is attached to the main system tree at the root directory during the process of mounting, and hence the name. During the creation of the file system, a lost + found directory is created within the mount point directory. This directory is used to dump into the file system any unredeemed files that were found during the customary file system check, which you do with the fsck command.

/ (root)

The directory located at the top of the Unix file system. It is represented by the "/" (forward slash) character.

You create file systems with the newfs command. The newfs command accepts only logical raw device names. The syntax is as follows:

newfs [ -v ] [ mkfs-options ] raw-special-device

For example, to create a file system on the disk slice c0t3d0s4, the following command is used:

# newfs -v /dev/rdsk/c0t3d0s4

The -v option prints the actions in verbose mode. The newfs command calls the mkfs command to create a file system. You can invoke the mkfs command directly by specifying a -F option followed by the type of file system.

Mounting File Systems

Mounting file systems is the next logical step to creating file systems. Mounting refers to naming the file system and attaching it to the inverted tree structure. This enables access from any point in the structure. A file system can be mounted during booting, manually from the command line, or automatically if you have enabled the automount feature.

With remote file systems, the server shares the file system over the network and the client mounts it.

The / and /usr file systems, as mentioned earlier, are mounted during booting. To mount a file system, attach it to a directory anywhere in the main inverted tree structure. This directory is known as the mount point. The syntax of the mount command is as follows:

# mount <logical block device name>   <mount point>

The following steps mount a file system c0t2d0s7 on the /export/home directory:

# mkdir /export/home
# mount  /dev/dsk/c0t2d0s7 /export/home

You can verify the mounting by using the mount command, which lists all the mounted file systems.

Note:  If the mount point directory has any content prior to the mounting operation, it is hidden and remains inaccessible until the file system is unmounted.

Data is stored and retrieved from the physical disk where the file system is mounted.

Although there are no defined specifications for creating the file systems on the physical disk, slices are usually allocated as following:

    0.  Root or /— Files and directories of the OS.

  1. Swap— Virtual memory space.
  2. Refers to the entire disk.
  3. /export— Different OS versions.
  4. /export/swap— Unused. Left to user's choice.
  5. /opt— Application software added to a system.
  6. /usr— OS commands by users.
  7. /home— Files created by users.

The slices shown above are all allocated on a single single disk. However, there is no restriction that all file systems need to be located on a single disk. They can also span across multiple disks. Slice 2 refers to the entire disk. Hence, if you want to allocate an entire disk for a file system, you can do so by creating it on slice 2. The mount command supports a variety of useful options.

Option

Description

-o largefiles

Files larger than 2GB are supported in the file system.

-o nolargefiles

Does not mount file systems with files larger than 2GB.

-o rw

File system is mounted with read and write permissions.

-o ro

File system is mounted with read-only permission.

-o bg

Repeats mount attempts in the background. Used with non-critical file systems.

-o fg

Repeats mount attempts in the foreground. Used with critical file systems.

-p

Prints the list of mounted file systems in /etc/vfstab format.

-m

Mounts without making an entry in /etc/mnt /etc/tab file.

-O

Performs an Overlay mount. Mounts over an existing mount point.


The mountall command mounts all file systems that have the mount at boot field in the /etc/vfstab file set to yes. It can also be used anytime after booting.

Unmounting File Systems

A file system can be unmounted with the umount command. The following is the syntax for umount:

umount <mount-point or logical block device name >
File systems cannot be unmounted when they are in use or when the umount command is issued from any subdirectory within the file system mount point.

Note: A file system can be unmounted forcibly if you use the -f option of the umount command. Please refer to the man page to learn about the use of these options.

The umountall command is used to unmount a group of file systems. The umountall command unmounts all file systems in the /etc/mnttab file except the /, /usr, /var, and /proc file systems. If you want to unmount all the file systems from a specified host, use the -h option. If you want to unmount all the file systems mounted from remote hosts, use the -r option.

/etc/vfstab File

The /etc/vfstab (Virtual File System Table) file plays a very important role in system operations. This file contains one record for every device that has to be automatically mounted when the system enters run level 2.

Column Name

Description

device to mount

The logical block name of the device to be mounted. It can also be a remote resource name for NFS.

device to fsck

The logical raw device name to be subjected to the fsck check during booting. It is not applicable for read-only file systems, such as High Sierra File System (HSFS) and network File systems such as NFS.

Mount point

The mount point directory.

FS type

The type of the file system.

fsck pass

The number used by fsck to decide whether the file system is to be checked.

0— File system is not checked.

1— File system is checked sequentially.

2— File system is checked simultaneously along with other file systems where this field is set to 2.

Mount at boot

The file system to be mounted by the mount all command at boot time is determined by this field. The options are either yes or no.

Mount options

The mount options to be supported by the mount command while the particular file system is mounted.

 

Note the no values in this field for the root, /usr, and /var file systems. These are mounted by default. The fd field refers to the floppy disk and the swap field refers to the tmpfs in the /tmp directory.

A sample vfstab file looks like:

#device         device          mount           FS      fsck    mount   mount
#to mount       to fsck         point           type    pass   at boot options
#
fd      -       /dev/fd fd      -       no      -
/proc   -       /proc   proc    -       no      -
/dev/dsk/c0t0d0s4       -       -       swap    -       no      -
/dev/dsk/c0t0d0s0       /dev/rdsk/c0t0d0s0      /       ufs     1     no
-
/dev/dsk/c0t0d0s6       /dev/rdsk/c0t0d0s6      /usr    ufs     1     no
-
/dev/dsk/c0t0d0s3       /dev/rdsk/c0t0d0s3      /var    ufs     1     no
-
/dev/dsk/c0t0d0s7       /dev/rdsk/c0t0d0s7      /export/home    ufs   2
yes     -
/dev/dsk/c0t0d0s5       /dev/rdsk/c0t0d0s5      /opt    ufs     2     yes
-
/dev/dsk/c0t0d0s1       /dev/rdsk/c0t0d0s1      /usr/openwin    ufs   2 yes  -
swap    -   /tmp    tmpfs   -       yes     -

Finding Information About the Mounted File Systems

The /etc/mnttab file comprises a table that defines which partitions and/or disks are currently mounted by the system.

The /etc/mnttab file contains the following details about each mounted file system:

A sample mnttab file:

/dev/dsk/c0t0d0s0       /       ufs     rw,intr,largefiles,xattr,onerror=panic,s
uid,dev=2200000 1014366934
/dev/dsk/c0t0d0s6       /usr    ufs     rw,intr,largefiles,xattr,onerror=panic,s
uid,dev=2200006 1014366934
/proc   /proc   proc    dev=4300000     1014366933
mnttab  /etc/mnttab     mntfs   dev=43c0000     1014366933
fd      /dev/fd fd      rw,suid,dev=4400000     1014366935
/dev/dsk/c0t0d0s3       /var    ufs     rw,intr,largefiles,xattr,onerror=panic,s
uid,dev=2200003 1014366937
swap    /var/run        tmpfs   xattr,dev=1     1014366937
swap    /tmp    tmpfs   xattr,dev=2     1014366939
/dev/dsk/c0t0d0s5       /opt    ufs     rw,intr,largefiles,xattr,onerror=panic,s
uid,dev=2200005 1014366939
/dev/dsk/c0t0d0s7       /export/home    ufs     rw,intr,largefiles,xattr,onerror
=panic,suid,dev=2200007 1014366939
/dev/dsk/c0t0d0s1       /usr/openwin    ufs     rw,intr,largefiles,xattr,onerror
=panic,suid,dev=2200001 1014366939
-hosts  /net    autofs  indirect,nosuid,ignore,nobrowse,dev=4580001     10143669
44
auto_home       /home   autofs  indirect,ignore,nobrowse,dev=4580002    10143669
44
-xfn    /xfn    autofs  indirect,ignore,dev=4580003     1014366944
sun:vold(pid295)        /vol    nfs     ignore,dev=4540001      1014366950
#

Restricting the File Size

Some applications and processes create temporary files that occupy a lot of hard disk space. As a result, it is necessary to impose a restriction on the size of the files that are created.

Solaris provides tools to control the storage. They are:

ulimit Command

The ulimit command is a built-in shell command, which displays the current file size limit. The default value for the maximum file size, set inside the kernel, is 1500 blocks. The following syntax displays the current limit:

$ ulimit -a
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 256
memory(kbytes) unlimited

If the limit is not set, it reports as unlimited.

The system administrator and the individual users change this value to set the file size at the system level and at the user level, respectively. The following is the syntax of the ulimit command:

ulimit <value>

For example, the following syntax sets the file size limit to 1600 blocks:

# ulimit  1600

# ulimit -a
time(seconds) unlimited
file(blocks) 1600
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 256
memory(kbytes) unlimited
#

The file size can be limited at the system level or the user level. To set it at the system level, change the value of the ulimit variable in the /etc/profile file. To set it at the user level, change the value in the .profile file present in the user's home directory. The user-level setting always takes precedence over the system-level setting. It is the user's profile file that sets the working environment.

Note: The ulimit values set at the user level and system level cannot exceed the default ulimit value set in the kernel.


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

[Aug 11, 2007] Blog O’ Matty » Growing Solaris UFS file systems

Blogged by matty as Solaris Storage — matty Sat 29 Jan 2005 12:14 am

I recently needed to grow a Solaris UFS file system, and accomplished this with the growfs(1m) utility. The growfs(1m) utility takes two arguments. The first argument to growfs ( the value passed to “-M” ) is the mount point of the file system to grow. The second argument is the raw device that backs this mount point. The following example will grow “/test” to the maximum size available on the meta device d100:

$ growfs -M /test /dev/md/rdsk/d100

To see how many sectors will be available on d100 after the grow operation completes, you can run newfs with the “-N” option, and compare that with the current value of df (1m):

$ newfs -N /dev/md/dsk/d100
/dev/md/rdsk/d0: 232331520 sectors in 56944 cylinders of 16 tracks, 255 sectors
113443.1MB in 2191 cyl groups (26 c/g, 51.80MB/g, 6400 i/g)

This will report the number of sectors, cylinders and MBs that would be allocated if a new file system was created on meta device d100. As always, test everything on a non critical system prior to making changes to critical boxen.

[Aug 10, 2007] Padraig's Blog Creating a UFS File System on an External Hard Drive with Solaris 10

Recently, I wanted to create a UFS file system on a Maxtor OneTouch II external hard drive I have. I wanted to use the external hard drive for storing some large files and I was going to use the drive exclusively with one of my Solaris systems. Now, I didn't find much information on the web about how to perform this with Solaris (maybe I wasn't searching very well or something) so I thought I would post the procedure I followed here so I'll know how to do it again if I need to.

After plugging the hard drive into my system via one of the USB ports, we can verify that the disk was recognized by the OS by examining the /var/adm/messages file. With the hard drive I was using, I saw entries like the following:
Mar  2 13:10:33 solaris-filer usba: [ID 912658 kern.info] USB 2.0 device (usbd49,7100) operating at hi speed (USB 2.x) on USB 2.0 root hub: storage@3, scsa2u
sb0 at bus address 2
Mar  2 13:10:33 solaris-filer usba: [ID 349649 kern.info]       Maxtor OneTouch II L60LHYQG
Mar  2 13:10:33 solaris-filer genunix: [ID 936769 kern.info] scsa2usb0 is /pci@0,0/pci1028,11d@1d,7/storage@3
Mar  2 13:10:33 solaris-filer genunix: [ID 408114 kern.info] /pci@0,0/pci1028,11d@1d,7/storage@3 (scsa2usb0) online
Mar  2 13:10:33 solaris-filer scsi: [ID 193665 kern.info] sd1 at scsa2usb0: target 0 lun 0

The dmesg command could also be used to see similar information. Also, we could use the rmformat command (this lists removable media) to see this information in a much nicer format like so:
 
# rmformat -l
Looking for devices...
     1. Logical Node: /dev/rdsk/c1t0d0p0
        Physical Node: /pci@0,0/pci-ide@1f,1/ide@1/sd@0,0
        Connected Device: QSI      CDRW/DVD SBW242U UD25
        Device Type: DVD Reader
     2. Logical Node: /dev/rdsk/c2t0d0p0
        Physical Node: /pci@0,0/pci1028,11d@1d,7/storage@3/disk@0,0
        Connected Device: Maxtor   OneTouch II      023g
        Device Type: Removable
#
Now that we now the drive has been identified by Solaris (as /dev/rdsk/c2t0d0p0) we need to create one Solaris partition (this is Solaris 10 running on the x86 architecture) that uses the whole disk. This accomplished by passing the -B flag to the fdisk command, like so:
# fdisk -B /dev/rdsk/c2t0d0p0

Now we will print the disk table to standard out like so:
# fdisk -W - /dev/rdsk/c2t0d0p0

This will output the following information to the screen for the hard drive I am using:
* /dev/rdsk/c2t0d0p0 default fdisk table
* Dimensions:
*    512 bytes/sector
*     63 sectors/track
*    255 tracks/cylinder
*   36483 cylinders
*
* systid:
*    1: DOSOS12
*    2: PCIXOS
*    4: DOSOS16
*    5: EXTDOS
*    6: DOSBIG
*    7: FDISK_IFS
*    8: FDISK_AIXBOOT
*    9: FDISK_AIXDATA
*   10: FDISK_0S2BOOT
*   11: FDISK_WINDOWS
*   12: FDISK_EXT_WIN
*   14: FDISK_FAT95
*   15: FDISK_EXTLBA
*   18: DIAGPART
*   65: FDISK_LINUX
*   82: FDISK_CPM
*   86: DOSDATA
*   98: OTHEROS
*   99: UNIXOS
*  101: FDISK_NOVELL3
*  119: FDISK_QNX4
*  120: FDISK_QNX42
*  121: FDISK_QNX43
*  130: SUNIXOS
*  131: FDISK_LINUXNAT
*  134: FDISK_NTFSVOL1
*  135: FDISK_NTFSVOL2
*  165: FDISK_BSD
*  167: FDISK_NEXTSTEP
*  183: FDISK_BSDIFS
*  184: FDISK_BSDISWAP
*  190: X86BOOT
*  191: SUNIXOS2
*  238: EFI_PMBR
*  239: EFI_FS
*

* Id    Act  Bhead  Bsect  Bcyl    Ehead  Esect  Ecyl    Rsect    Numsect
  191   128  0      1      1       254    63     1023    16065    586083330
We now need to calculate the maximum amount of usable storage. This is done by multiplying bytes/sectors (512 in my case) by the number of sectors listed at the bottom of the output shown above. We then divide this number by 1024*1024 to yield MBs.

So in my case, this will work out as 286173.5009765625 MB.

Now, we need to setup a partition table file. This will be a regular text file and you can name it whatever you like. For the sake of this post, I will name it disk_slices.txt. The contents of this file are:
 
slices: 0 = 2MB, 286170MB, "wm", "root" :
        1 = 0, 1MB, "wu", "boot" :
        2 = 0, 286172MB, "wm", "backup"
To create these slices on the disk, we run:
# rmformat -s disk_slices.txt /dev/rdsk/c2t0d0p0
# devfsadm
# devfsadm -C

To create the UFS file system on the newly created slice, I run the following and the output from running this command is also shown:
# newfs /dev/rdsk/c2t0d0s0
newfs: construct a new file system /dev/rdsk/c2t0d0s0: (y/n)? y
/dev/rdsk/c2t0d0s0:     586076160 sectors in 95390 cylinders of 48 tracks, 128 sectors
        286170.0MB in 5962 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
Initializing cylinder groups:
...............................................................................
........................................
super-block backups for last 10 cylinder groups at:
 585105440, 585203872, 585302304, 585400736, 585499168, 585597600, 585696032,
 585794464, 585892896, 585991328
# 

And now I'm finished, I now have a UFS file system created on my USB hard drive which can be mounted by my Solaris system. To mount this file system, I can just:
# mount -F ufs /dev/rdsk/c2t0d0p0 /u01

[Aug 9, 2007] Alan Hargreaves' Weblog Weblog

I should add is that anyone who tries to mount an unknown ufs filesystem without at least running "fsck -n" over it probably deserves what they get.

[Jun 19, 2006] Tuning UFS in the Solaris OS for Use With Many Small Files

This methodology utilizes a tmpfs volume, and it can speed up operations approximately three times.

This document describes a methodology for configuring a fast file system that handles several small files on the Solaris Operating System. This could be used for building a Java technology-based product or for handling many operations on a large amount of small files. This methodology utilizes a tmpfs volume, and it can speed up operations approximately three times.

The requirements are as follows:

Warning: Do not develop on a tmpfs volume. A tmpfs volume is only persistent while the system is powered up, so a power loss or system problem will cause you to lose any changes to that volume.

Procedure

Solaris tmpfs volumes are easy to create, but require a significant amount of RAM and swap space. It is recommended that you have at least 1 Gbyte of RAM, but there have also been major performance gains on systems with 512 Mbytes of RAM. In addition, you should add twice as much swap space as the tmpfs volume you are creating. That is, for a 2-Gbyte tmpfs volume, add 4 Gbytes of swap space to the system. Feel free to experiment with these values.

The following examples are for a 2-Gbyte tmpfs volume, which is approximately what is needed to do a developer build. Replace <swapfilename> with the absolute path to a swapfile (such as /disk1/swapfile), and <mountpoint> with the absolute path to where you want the tmpfs volume mounted (such as /ramdisk).

Add swap space to your workstation:

root# /usr/sbin/mkfile 2000m <swapfilename>

Create a mount point for the tmpfs volume:

root# mkdir <mountpoint>

Edit your /etc/vfstab file to use the swap and create the tmpfs volume at boot time. Add the following two lines:

<swapfilename> - - swap - no -
RAMDISK - <mountpoint> tmpfs - yes size=2000m

Note that on the Solaris 7 OS you may not make a single tmpfs volume larger than 2 Gbytes.

Edit your kernel parameters to increase the number of files you can create in the tmpfs volume. Add the following line to your /etc/system file. (We've had the most success using this value.)

set tmpfs:tmpfs_maxkmem=250000000

Reboot your workstation. Then verify that the tmpfs volume exists at the size you specified:

% df -k <mountpoint>

Make the tmpfs volume writable. Note: This step is necessary after each reboot of the workstation.

root# chmod 777 <mountpoint>

UFS and the Solaris VM Subsystem

http://www.google.com/search?q=Structure+of+UFS+filesystem&hl=en&rlz=1T4GZHZ_enUS227US227&start=20&sa=N
 Wednesday Jun 01, 2005

More UFS technical tidbits in anticipation of OpenSolaris. Today's talk is about UFS I/O. It is a complicated beast and has many different parts and paths it can take.

Overview of file system I/O in Solaris:




The interaction of UFS and the VM subsystem has been the cause of numerous bugs, and hard to find problems. Today's blog is an overview of the UFS I/O, with particular attention paid to the VM subsystem interaction. Details on the paths taken when a read() system call is initiated are to show the interaction of UFS and the VM subsystem. I am making some assumptions here that the readers of this blog will have some basic Solaris file system knowledge, or at a minimum some of the basic Solaris file system terminology is understood.

Basic Solaris VM facts

Solaris virtual memory is demand paged, and globally managed. There is integrated file caching and it is layered to allow VM to describe multiple memory types. The paging vnode cache is the unification of file and memory management by use of a vnode object. 1 page of memory == <vnode, offset> tuple. The UFS file system uses this relationship to implement caching for vnodes. The paging vnode cache provides a set of functions for cache management and I/O for vnodes.

The paging vnode cache functions are specified with a pvn_ <xxx> title. The source code for this is located at: xxxx. Some of the more important paging vnode functions are listed below, with basic function descriptions. Also shown is pointers to the code so you can get more detailed data about each of these.

Some important paging vnode cache functions:

pvn_read_kluster():

pvn_write_kluster():

pvn_vplist_dirty():


What is a seg_map and why do you care?

The seg_map segment maintains mappings of pieces of files into kernel address space. It is only used by file systems and it allows copying of data to or from user to kernel address space. At any given time, seg_map segment has some portion of total file system cache mapped in to the kernel address space. The seg_map segment driver divides the segment in to file system block sized slots.

Some important seg_map functions:

segmap_getmap() && segmap_getmapflt():

segmap_release():

segmap_pagecreate():

Important in the mapping and getting data from the segmap driver is the fbuf structure. It is defined as follows:

struct fbuf {

caddr_t fb_addr;

u int_t fb_count;

};

This structure is used to get a mapping to part of a file via the segkmap interfaces. It is also used by the pseudo bio functions(shown below) for reading and writing of data. fbuf is used by directory reading to get on UFS on disk contents via a call to blkatoff().

seg_vn and UFS and memory mapped I/O:

Memory mapping allows for a file to be mapped in the a processes address space. This mapping is done via the VOP_MAP call and the seg_vn memory driver. File pages are read when a fault occurs in the address space. The seg_vn driver enables I/O's without process initiated system calls. I/O is performed ,,in units of pages, upon reference to the pages mapped into the address space. Reads are initiated by a memory access, writes are initiated as the VM subsystem finds dirty pages in the mapped address space.

So, why not use the seg_vn driver for non mmap'd I/O as well.? It could be used for mapping the file in to the kernel's address space, but seg_vn is a complex segment driver that manages the mapping of protections, copy-on-write fault handling, shared memory, etc...This is too heavy weight for what is needed for read and write system calls, so the seg_map driver was developed. Read and write system calls only require a few basic mapping functions since they do not map files into a process's address space. seg_map reduces locking complexity and gives better performance.

Pseudo bio functions:

Solaris has a set of interfaces which are considered buffered I/O interfaces, but that are used to read and write buffers containing directory entries only. These interfaces all use the seg_map driver for mapping to address file data. The functions are fbread(), fbwrite(), fbrelese(), fbdwrite(), fbiwrite(), fbzero(). Although these are not directly shown in the picture above, they are important enough to be worth mentioning.

A UFS/VM example, read() system call - non mmap'd:

Note: In general UFS caches the pages for write, but will also cache pages for reads if they are frequently reusable.

read()->ufs_read()->rdip():

Technorati Tag: Solaris


=====================================================================================

1UFS directio will be saved for a later post.

2freebehind is always set to 1.

3The i_contents lock is a krwlock_t which is part of the ufs inode data structure. It protects most of the inodes contents. See my previous blog posting on UFS locking for more details.

4See my previous blog post regarding the use of maxcontig in UFS

5seqmode is determined from the i_nextr field in the current working inode. i_nextr represents the next byte offset for reads. If i_nextr == current offset and we are not creating a page, then we set seqmode == 1.

 

Recommended Links

Softpanorama hot topic of the month

Softpanorama Recommended

Internal

External

The history of "Solaris UFS bug"



Etc

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of environmental, political, human rights, economic, democracy, scientific, and social justice issues, etc. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit exclusivly for research and educational purposes.   If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner. 

ABUSE: IPs or network segments from which we detect a stream of probes might be blocked for no less then 90 days. Multiple types of probes increase this period.  

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2016 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: February 19, 2014