Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Ext2/Ext3 File System

News See also Recommended Links NTFS Disk Repartitioning    
Solaris File System Structure Etx2/Ext3     Filesystems Recovery Humor Etc

Designed for educational purposes, the original Linux file system was limited to 64 MB in size and supported file names up to 14 characters. In 1992, the ext file system was created, and increased the file system size to 2 GB and file name length to 255 characters. However, file access, modification, and creation times were missing from file system data structures and performance tended to be low. Modeled after the Berkeley Fast File System, the ext2 file system used a better on disk layout, extended the file system size limit to 4 TB and file name sizes to 255 bytes, delivered improved performance, and emerged as the de facto standard file system for Linux environments. More information on the logging capabilities of the ext3 file system can be found in EXT3, Journaling File System by Dr. Stephen Tweedie located at http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html 

An evolution of the ext2 file system, the ext3 file system added logging capabilities to facilitate fast reboots following system crashes. Key features of the ext3 file system include:

Logging in the ext3 File System The ext3 file system supports different levels of journalling which can be specified as mount options. These options can impact data integrity and performance.

Here is ext3.txt document that comes with kernel:

Ext3 Filesystem
===============

Ext3 was originally released in September 1999. Written by Stephen Tweedie
for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger,
Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.

Ext3 is the ext2 filesystem enhanced with journalling capabilities.

Options
=======

When mounting an ext3 filesystem, the following option are accepted:
(*) == default

journal=update Update the ext3 file system's journal to the current
format.

journal=inum When a journal already exists, this option is ignored.
Otherwise, it specifies the number of the inode which
will represent the ext3 file system's journal file.

journal_dev=devnum When the external journal device's major/minor numbers
have changed, this option allows the user to specify
the new journal location.  The journal device is
identified through its new major/minor numbers encoded
in devnum.

noload Don't load the journal on mounting.

data=journal All data are committed into the journal prior to being
written into the main file system.

data=ordered (*) All data are forced directly out to the main file
system prior to its metadata being committed to the
journal.

data=writeback Data ordering is not preserved, data may be written
into the main file system after its metadata has been
committed to the journal.

commit=nrsec (*) Ext3 can be told to sync all its data and metadata
every 'nrsec' seconds. The default value is 5 seconds.
This means that if you lose your power, you will lose
as much as the latest 5 seconds of work (your
filesystem will not be damaged though, thanks to the
journaling).  This default value (or any low value)
will hurt performance, but it's good for data-safety.
Setting it to 0 will have the same effect as leaving
it at the default (5 seconds).
Setting it to very large values will improve
performance.

barrier=1 This enables/disables barriers.  barrier=0 disables
it, barrier=1 enables it.

orlov (*) This enables the new Orlov block allocator. It is
enabled by default.

oldalloc This disables the Orlov block allocator and enables
the old block allocator.  Orlov should have better
performance - we'd like to get some feedback if it's
the contrary for you.

user_xattr Enables Extended User Attributes.  Additionally, you
need to have extended attribute support enabled in the
kernel configuration (CONFIG_EXT3_FS_XATTR).  See the
attr(5) manual page and http://acl.bestbits.at/ to
learn more about extended attributes.

nouser_xattr Disables Extended User Attributes.

acl Enables POSIX Access Control Lists support.
Additionally, you need to have ACL support enabled in
the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL).
See the acl(5) manual page and http://acl.bestbits.at/
for more information.

noacl This option disables POSIX Access Control List
support.

reservation

noreservation

bsddf (*) Make 'df' act like BSD.
minixdf Make 'df' act like Minix.

check=none Don't do extra checking of bitmaps on mount.
nocheck

debug Extra debugging information is sent to syslog.

errors=remount-ro(*) Remount the filesystem read-only on an error.
errors=continue Keep going on a filesystem error.
errors=panic Panic and halt the machine if an error occurs.

grpid Give objects the same group ID as their creator.
bsdgroups

nogrpid (*) New objects have the group ID of their creator.
sysvgroups

resgid=n The group ID which may use the reserved blocks.

resuid=n The user ID which may use the reserved blocks.

sb=n Use alternate superblock at this location.

quota
noquota
grpquota
usrquota

bh (*) ext3 associates buffer heads to data pages to
nobh (a) cache disk block mapping information
(b) link pages into transaction to provide
    ordering guarantees.
"bh" option forces use of buffer heads.
"nobh" option tries to avoid associating buffer
heads (supported only for "writeback" mode).


Specification
=============
Ext3 shares all disk implementation with the ext2 filesystem, and adds
transactions capabilities to ext2.  Journaling is done by the Journaling Block
Device layer.

Journaling Block Device layer
-----------------------------
The Journaling Block Device layer (JBD) isn't ext3 specific.  It was designed
to add journaling capabilities to a block device.  The ext3 filesystem code
will inform the JBD of modifications it is performing (called a transaction).
The journal supports the transactions start and stop, and in case of a crash,
the journal can replay the transactions to quickly put the partition back into
a consistent state.

Handles represent a single atomic update to a filesystem.  JBD can handle an
external journal on a block device.

Data Mode
---------
There are 3 different data modes:

* writeback mode
In data=writeback mode, ext3 does not journal data at all.  This mode provides
a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
mode - metadata journaling.  A crash+recovery can cause incorrect data to
appear in files which were written shortly before the crash.  This mode will
typically provide the best ext3 performance.

* ordered mode
In data=ordered mode, ext3 only officially journals metadata, but it logically
groups metadata and data blocks into a single unit called a transaction.  When
it's time to write the new metadata out to disk, the associated data blocks
are written first.  In general, this mode performs slightly slower than
writeback but significantly faster than journal mode.

* journal mode
data=journal mode provides full data and metadata journaling.  All new data is
written to the journal first, and then to its final location.
In the event of a crash, the journal can be replayed, bringing both data and
metadata into a consistent state.  This mode is the slowest except when data
needs to be read from and written to disk at the same time where it
outperforms all other modes.

Compatibility
-------------

Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
Ext3 is fully compatible with Ext2.  Ext3 partitions can easily be mounted as
Ext2.


External Tools
==============
See manual pages to learn more.

tune2fs: create a ext3 journal on a ext2 partition with the -j flag.
mke2fs: create a ext3 partition with the -j flag.
debugfs: ext2 and ext3 file system debugger.
ext2online: online (mounted) ext2 and ext3 filesystem resizer


References
==========

kernel source: <file:fs/ext3/>
<file:fs/jbd/>

programs: http://e2fsprogs.sourceforge.net/
http://ext2resize.sourceforge.net

useful links: http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html
http://www-106.ibm.com/developerworks/linux/library/l-fs7/
http://www-106.ibm.com/developerworks/linux/library/l-fs8/
 

Old News ;-)

Anatomy of Linux journaling file systems by M. Tim Jones

04 Jun 2008 |

You can define journaling file systems in many ways, but let's get right to the point. Journaling file systems are for people who tire of watching the boot-time fsck, or file system consistency check process. (Journaling file systems are also for anyone who likes the idea of a fault-resilient file system.) When a system using a traditional, non-journaling file system is improperly shut down, the operating system detects this and performs a consistency check using the fsck utility. This utility scans the file system (which can take a considerable amount of time) and fixes any issues that can be safely corrected. In some cases, the file system can be in such bad shape that the operating system boots into single user mode to allow the user to further the repair process.

Pronouncing fsck

To add insult to injury, the fsck process can be initiated automatically by the operating system at mount time to ensure that the file system metadata is correct (even if no corruption is detected). Therefore, removing the need for file system consistency checks is an obvious area for improvement.

So, now you know for whom journaling file systems were created, but how do they obviate the need for fsck? In general, journaling file systems avoid file system corruption by maintaining a journal. The journal is a special file that logs the changes destined for the file system in a circular buffer. At periodic intervals, the journal is committed to the file system. If a crash occurs, the journal can be used as a checkpoint to recover unsaved information and avoid corrupting file system metadata.

To sum up, journaling file systems are fault-resilient file systems that use a journal to log changes before they're committed to the file system to avoid metadata corruption (see Figure 1). But like many Linux solutions, more than one option is available to you. Let's take a short walk through journaling file system history, and then review the file systems available and how they differ.

... ... ...

Fourth extended file system

The fourth extended journaling file system (ext4fs) is the evolution of ext3fs. The ext4 file system is designed as a backward- and forward-compliant replacement for ext3fs but with many new advanced features (some of which break the compatibility). This means that you can mount an ext4fs partition as ext3fs or vice versa.

First, ext4fs is a 64-bit file system and is designed to support very large volumes (1 exabyte). It has also been designed to use extents, but if this is used, then compatibility with ext3fs is lost. Like XFS and Reiser4, ext4fs includes delayed allocation to allocate blocks on the disk only when needed (which reduces fragmentation). The contents of the journal are also checksummed to make the journal more reliable. Instead of the standard B+ or B* tree, ext4fs uses a variation of the B tree, called the H tree, which allows much larger subdirectories (ext3 was limited to 32KB).

Although the delayed allocation method reduces fragmentation, over time, a large file system can become fragmented. An online defragmentation tool (e4defrag) has been developed to address this. You can use the tool to defragment individual files or an entire file system.

Another interesting difference between ext3fs and ext4fs is the date resolution for files. In ext3, the minimum resolution for timestamp was one second. Ext4fs is looking toward the future: Where processor and interface speeds continue to increase, better resolution is needed. For this reason, the minimum timestamp resolution in ext4 is 1 nanosecond.

Ext4fs has been in the Linux kernel since 2.6.19 but is yet to be called stable. Development continues on this next generation; given its heritage, it will be the next generation in Linux journaling file systems.

Resources

Linux Today - Linux File Systems You Get What You Pay For

To the first few replies to this article, have you ever had to build a multi-GB/s filesystem that can handle arbitrary workloads and stay up at least 99.9% of the time? Henry has. I have complaints about his article, but he brings up good points:

  1. Other filesystems besides ext3 and XFS aren't supported or tested as well necessary for the types of loads he is describing (and ZFS/Fuse is alpha code). XFS is very good, but the biggest Linux vendor (Redhat) doesn't even support it.
  2. Yes, you can fix Linux code yourself, but filesystem are hard. You can't expect some random code jocky to pickup the kernel source and undertsand filesystems. What about the XFS+NFS bug that existed in Linux around 2004 ( http://www.linux.sgi.com/archives/xfs/2004-06/msg0 0100.html) It took over a year for SGI to fix the problem. Open source worked, because the original patch came from a guy at Sony in Japan, but for a year there was silent corruption on any XFS filesystem that was exported via NFS.

He says that an LT04 tape drive can push 240 MB/s. Now put 20 of those drives in your system (4.8 GB/s). Now, design your filesystem so that you have extra capacity so that you can interact with the filesystem while the tape drives are banging away (4.8x2=9.6GB/s). This is much more than your home software-raid setup to store your mp3s and pictures.

This is High-performance I/O. This is a very common-setup at many of the HPC centers around the world (except they may not be using LTO drives, but enterprise drives). Expecting Linux to push data at that rate is a stretch.

However, it is getting better. I would still rather use Linux than bring in one Solaris server that I have to hire or retrain staff because we migrated to Linux a decade ago.

[May 9, 2008] Linux File Systems You Get What You Pay For

On mid-intensity enterprise loads Ext3 is OK.  Not sure about databases...

I am frequently asked by potential customers with high I/O requirements if they can use Linux instead of AIX or Solaris.

No one ever asks me about high-performance I/O — high IOPS or high streaming I/O — on Windows or NTFS because it isn't possible. Windows and the NTFS file system, which hasn't changed much since it was released almost 10 years ago, can't scale given its current structure. The NTFS file system layout, allocation methodology and structure do not allow it to efficiently support multi-terabyte file systems, much less file systems in the petabyte range, and that's no surprise since it's not Microsoft's target market.

And what was Linux's initial target market? A Microsoft desktop replacement, of course. Linux has since moved from the desktop to run on many large SMP servers from Sun, IBM and SGI. But can Linux as an operating system and Linux file systems meet the challenge of high-performance I/O?

You may think you don't need high-performance I/O, but every server needs this type of I/O performance for something as simple as backup and restoration. Current LTO-4 tape drives can operate at 120 MB/sec without compression and can support data rates up to 240 MB/sec with compression. If your file system cannot support I/O at these streaming data rates, then the time to backup and restore will take much longer than expected. For large environments with multiple tape drives, not being able to use the tape drives at their full data rate might require additional tape drives to meet the backup time window, which affects restoration too. Therefore, it seems to me that everyone should be interested in the performance of Linux file systems, if only for backup and restore.

Can Linux file systems, which I will define as ext-4, XFS and xxx, match the performance of file systems on other UNIX-based large SMP servers such as IBM and Sun? Some might also inquire about SGI, but SGI has something called ProPack, which has a number of optimizations to Linux for high-speed I/O, and SGI also has their open proprietary Linux file system called CxFS, which is not part of standard Linux distributions. Because SGI ProPack and CxFS are not part of standard Linux distributions, we won't consider them here. We'll stick to standard Linux because that is what most people use.

We'll focus on two areas:

  1. Linux as an operating system, and
  2. Linux file systems.

Linux Operating System Issues

We'll set aside what might happen with Linux in the future and instead focus on what is available today. Linux has a number of features that match the I/O performance of AIX and Solaris, such as direct I/O, but the bottom line is that Linux wasn't designed around high-performance multi-threaded I/O.

There are a number of areas that limit performance in Linux, such as page size compared with other operating systems, the restrictions Linux places on direct I/O and page alignment, and the fact that Linux does not allow direct I/O automatically by request size — I have seen Linux kernels break large (greater than 512 KB) I/O requests into 128 KB requests. Since the Linux I/O performance and file system were designed for a desktop replacement for Windows, none of this comes as much of a surprise.

Linux has other issues, as I see it; for starters, the lack of someone to take charge or responsibility. With Linux, if you find a problem, groups of people are going to have to agree to fix it, and the people writing Linux might not necessarily be responsive to the problems you're facing. If a large vendor of Linux agrees with your problem and provides a fix, that doesn't mean it will be accepted — or accepted anytime soon — by the Linux community. And getting a patch for your problem could pose maintenance problems.

The goals for Linux file systems and the Linux kernel design seem to be trying to address a completely different set of problems than AIX or Solaris, and IBM and Sun are far more directly responsible than the Linux community if you have a problem. If you run AIX or Solaris and complain to IBM or Sun, they can't say we have no control.

Linux File Systems

Remember that most Linux file systems were designed around replacing NTFS, not some of the high-performance file systems such as GPFS (IBM), StorNext (Quantum) or QFS (Sun). These file systems were designed for streaming I/O, which we now know is important for everyone and for some high-speed IOPS, and in some cases for database access.

The Linux file systems that are commonly used today (ext-3 today and likely soon ext-4 and xfs) have not had huge structural changes in a long time. Ext-4 improves upon ext-3 and ext-2 for some improved allocation, but simple things like alignment of the superblock to the RAID stripe and the first metadata allocation are not considered.

Additionally, things like alignment of additional file system metadata regions to RAID stripe value are not considered, nor are simple things like indirect allocations (see File Systems and Volume Managers: History and Usage), which are fixed values so with the small allocations supported (4 KB maximum), large numbers of allocations are required. Take a 200 TB file system, which will require 53.7 billion allocations to represent the 200 TB using the largest allocation size of 4 KB supported by ext-3. Using 8 MB, which is feasible on enterprise file systems, it becomes a manageable 26.2 million allocations. The bitmap or allocation map could even fit in memory for this number of allocations! The xfs file system has very similar characteristics to ext-3. Yes, allocations can be larger, up to 64 KB if the Linux page size is 64 KB, but the alignment issues for the superblock, metadata regions and other issues still exist.

Linux Has Its Place

That's not to say I am anti-Linux, just as I am not pro-AIX or pro-Solaris. I am not even anti-Windows, since I use a Windows laptop as my main computer. But I do believe that the default Linux file systems are not yet up to the task of replacing the high-performance, highly scalable SMP file systems. Computers are tools, and operating systems and file systems are also tools in the toolbox. No one uses a chainsaw in place of a jigsaw, and the same analogy can be used for operating systems, file systems and the hardware they run on.

Many of the people I deal with daily use MS Word, MS Excel, MS PowerPoint and MS Visio. I could run some if not all of these applications on a Windows emulator from someone, but I routinely get incompatibilities with fonts, and I just decided long ago to live with Windows until someone can prove to me that it all works together with no problems. My point here is that every computer is a tool and has its use. Currently no single computer or file system can meet all application requirements. This should not come as a surprise. Linux has a place, but as far as I can tell, that place does not support single instances of large file systems and scaling well from large to small file systems with high-performance requirements. And I don't see this changing anytime soon.

Henry Newman, a regular Enterprise Storage Forum contributor, is an industry consultant with 27 years experience in high-performance computing and storage.
See more articles by Henry Newman.

Migrating to ext4

This is still work in progress... See the main ext4 wiki for additional information and links.
Table 1. Current and upcoming features of ext4 that provide advantages over ext3
 
Feature Advantage
Larger file systems Ext3 tops out at 32 tebibyte (TiB) file systems and 2 TiB files, but practical limits may be lower than this depending on your architecture and system settings—perhaps as low as 2 TiB file systems and 16 gibibyte (GiB) files. Ext4, by contrast, permits file systems of up to 1024 pebibyte (PiB), or 1 exbibyte (EiB), and files of up to 16 TiB. This may not be important (yet!) for the average desktop computer or server, but it is important to users with large disk arrays.
Extents An extent is a way to improve the efficiency of on-disk file descriptors, reducing deletion times for large files, among other things.
Persistent preallocation If an application needs to allocate disk space before actually using it, most file systems do so by writing 0s to the not-yet-used disk space. Ext4 permits preallocation without doing this, which can improve the performance of some database and multimedia tools.
Delayed allocation Ext4 can delay allocating disk space until the last moment, which can improve performance.
More subdirectories If you've ever felt constrained by the fact that a directory can only hold 32,000 subdirectories in ext3, you'll be relieved to know that this limit has been eliminated in ext4.
Journal checksums Ext4 adds a checksum to the journal data, which improves reliability and performance.
Online defragmentation Although ext3 isn't prone to excessive fragmentation, files stored on it are likely to become at least a little fragmented. Ext4 adds support for online defragmentation, which should improve overall performance.
Undelete Although it hasn't been implemented yet, ext4 may support undelete, which, of course, is handy whenever somebody accidentally deletes a file.
Faster file system checks Ext4 adds data structures that permit fsck to skip unused parts of the disk in its checks, thus speeding up file system checks.
Nanosecond timestamps Most file systems, including ext3, include timestamp data that is accurate to a second. Ext4 extends the accuracy of this data to a nanosecond. Some sources also indicate that the ext4 timestamps support dates through April 25, 2514, versus January 18, 2038, for ext3.

Gentoo Forums View topic - Some ext3 Filesystem Tips

Copyright (c) 2005 Peter Gordon

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license can be found here.

Overview
I'm a big fan of the Third Extended ("ext3") filesystem. It's in-kernel and userspace code has been tried, tested, fixed, and improved upon more than almost every other Linux-compatible filesystem. It's simple, robust, and extensible. In this article I intend to explain some tips that can improve both the performance and the reliability of the filesystem.

In the document, /dev/hdXY will be used as a generic partition. You should replace this with the actual device node for your partition, such as /dev/hdb1 for the first partition of the primary slave disk or /dev/sda2 for the second partition of your first SCSI or Serial ATA disk.

I: Using The tune2fs and e2fsck Utilities

Before we begin, we need to make sure you are comfortable with using the tune2fs utility to alter the filesystem options of an ext2 or ext3 partition. Please make sure to read the tune2fs man page:

Code:
$ man tune2fs
It's generally a good idea to run a filesystem check using the e2fsck utility after you've completed the alterations you wish to make on your filesystem. This will verify that your filesystem is clean and fix it if needed. You should also read the manual page for the e2fsck utility if you have not yet done so:
Code:
$ man e2fsck

WARNING: Make sure any filesystems are cleanly unmounted before altering them with the tune2fs or e2fsck utilities! (Boot from a LiveCD such as Knoppix if you need to.) Altering or tuning a filesystem while it is mounted can cause severe corruption! You have been warned!

II: Using Directory Indexing

This feature improves file access in large directories or directories containing many files by using hashed binary trees to store the directory information. It's perfectly safe to use, and it provides a fairly substantial improvement in most cases; so it's a good idea to enable it:
 
Code:
# tune2fs -O dir_index /dev/hdXY

This will only take effect with directories created on that filesystem after tune2fs is run. In order to apply this to currently existing directories, we must run the e2fsck utility to optimize and reindex the directories on the filesystem:
Code:
# e2fsck -D /dev/hdXY
 

:idea: Note: This should work with both ext2 and ext3 filesystems. Depending on the size of your filesystem, this could take a long time. Perhaps you should go get some coffee :wink:

III: Enable Full Journaling

By default, ext3 partitions mount with the 'ordered' data mode. In this mode, all data is written to the main filesystem and its metadata is committed to the journal, whose blocks are logically grouped into transactions to decrease disk I/O. This tends to be a good default for most people. However, I've found a method that increases both reliability and performance (in some situations): journaling everything, including the file data itself (known as 'journal' data mode). Normally, one would think that journaling all data would decrease performance, because the data is written to disk twice: once to the journal then later committed to the main filesystem, but this does not seem to be the case. I've enabled it on all nine of my partitions and have only seen a minor performance loss in deleting large files. In fact, doing this can actually improve performance on a filesystem where much reading and writing is to be done simultaneously. See this article written by Daniel Robbins on IBM's website for more information:

http://www-106.ibm.com/developerworks/linux/library/l-fs8.html#4

In fact, putting /usr/portage on its own ext3 partition with journal data mode seems to have decreased the time it takes to run emerge --sync significantly. I've also seen slight improvements in compile time.

There are two different ways to activate journal data mode. The first is by adding data=journal as a mount option in /etc/fstab. If you do it this way and want your root filesystem to also use it, you should also pass rootflags=data=journal as a kernel parameter in your bootloader's configuration. In the second method, you will use tune2fs to modify the default mount options in the filesystem's superblock:
Code:
# tune2fs -O has_journal -o journal_data /dev/hdXY
Please note that the second method may not work for older kernels. Especially Linux 2.4.20 and below will likely disregard the default mount options on the superblock. If you're feeling adventurous you may also want to tweak the journal size. (I've left the journal size at the default.) A larger journal may give you better performance (at the cost of more disk space and longer recovery times). Please be sure to read the relevant section of the tune2fs manual before doing so:
Code:
# tune2fs -J size=$SIZE /dev/hdXY

IV: Disable Lengthy Boot-Time Checks

WARNING: Only do this on a journalling filesystem such as ext3. This may or may not work on other journalling filesystems such as ReiserFS or XFS, but has not been tested. Doing so may damage or otherwise corrupt other filesystems. You do this AT YOUR OWN RISK.

Hmm..It seems that our ext3 filesystems are still being checked every 30 mounts or so. This is a good default for many because it helps prevent filesystem corruption when you have hardware issues, such as bad IDE/SATA/SCSI cabling, power supply failures, etc. One of the driving forces for creating journalling filesystems was that the filesystem could easily be returned to a consistent state by recovering and replaying the needed journalled transactions. Therefore, we can safely disable these mount-count- and time-dependent checks if we are certain the filesystem will be quickly checked to recover the journal if needed to restore filesystem and data consistency. Before you do this please make sure your filesystem entry in /etc/fstab has a positive integer in its 6th field (pass) so that it is checked at boot time automatically. You may do so using the following command:
Code:
# tune2fs -c 0 -i 0 /dev/hdXY


V: Checking The Filesystem Options Using tune2fs

Well, now that we've tweaked our filesystem, we want to make sure those tweaks are applied, right? :) Surprisingly, we can do this options iusing the tune2fs utility quite easily. To list all the contents of the filesystem's superblock, we can pass the "-l" (lowercase "L") option to tune2fs:
Code:
# tune2fs -l /dev/hdXY
Unlike the other tune2fs calls, this can be run on a mounted filesystem without harm, since it doesn't access or attempt to change the filesystem at such a low level.

This will give you a lot of information about the filesystem, including the block/inode information, as well as the filesystem features and default mount options, which we are looking for. If all goes well, the relevant part of the output should include "dir_index" and "has_journal" flags in the Filesystem features listing, and should show a default mount option of "journal_data".

This concludes this filesystem tweaking guide for now. Happy hacking! :D
_________________
~~ Peter: GNU/Linux geek, caffeine addict, and Free Software advocate.
Comments

Nice thing!
Just one thing: woundn't it be:
 

Code:

tune2fs -O dir_index,has_journal /dev/hdXY
tune2fs -o journal_data /dev/hdXY

 

in place of:
 
Code:

tune2fs -O dir_index /dev/hdXY
tune2fs -o has_journal,journal_data /dev/hdXY

 

:?:

Editer for clarification: notice that 'has_journal' parameter is valid for '-O' (upper case 'o') modifier, not for '-o' (lower case 'o').

====

jetsaredim wrote:
is there a way to list the options for a particular ext3 fs options that have been set?

` -k` will list the current contents of the filesystem's superblock.

===

Is there any way to do defragmentation on ext3? My experience is that havily used ext3 partitions become slower and slower while the amount of files in the filesystem and used diskspace don't significally change.

PostPosted: Fri Apr 08, 2005 4:21 pm    Post subject: Reply with quote

ext3 automagically handles defragmentation as you use it. The best thing I can think of is to try re-optimizing the filesystem structure by running the following:
Code:
# e2fsck -D /dev/hdXY
(/dev/hdXY should be unmounted as explained in the first post in order to avoid filesystem corruption.)

====

Q:  when does one benefit from using orlov?
and what exactly does commit=9999 mean?

A: You do not need to use orlov as a mount option, since, according to the mount(8) man page, it is the default if neither oldalloc nor orlov is specified. This option would tell ext2/ext3 whether to use the old inode allocator or the new Orlov inode allocator.

I also highly recommend against using commit=9999. This mount option specifies how often (in second intervals) to sync the data to disk. Setting this too high may cause excessive usage of memory and possibly CPU/swap resources. This really is not needed (and from my experience) will not give you a large performance increase at all.

===

What command do you use to set the immutable attribute under reiserfs? Man pages for chattr and lsattr indicate only functioning with ext2,3.

A:

Code:
hera etc # grep /dev/hdc3 /etc/fstab
/dev/hdc3               /               reiserfs                noatime,notail,acl                              0 0
hera etc # chattr +i /etc/shadow
hera etc # lsattr /etc/shadow
----i-------- /etc/shadow
hera etc #

Another little tidbit that needs to be discussed when talking about these immutable bits is the following.

Now I can simply chattr -i /etc/shadow anytime I want as root and it'll be like it never happened.
However with seclvl (A linux implementation of BSD Secure Levels) the behavior mimics that of
BSD. So when using these attributes remember to echo "2" > /sys/seclvl/seclvl if you have this support
built into your kernel.

The hardened kernel series I know supports this and is always a great idea to use in any secure server
implementation.

Ok, acl and immutable are completely different. ACL is access control list, thats just a major enhancement to rwx.
The file system flags mantained by chattr are just like the BSD ones that interact with the Secure level. Now if you
chattr +i the file is immutable but just a simple chattr -i can just make that entire concept null. Using th BSD secure
level implementation for linux accually enforces the rules you set by the file.

[May 6, 2008] How to Increase ext3 and ReiserFS filesystems Performance -- Ubuntu Geek

  • xenoterracide Says:

    data journaling on ext3 is much better than writeback.

    ext3 tips

    will tell you how to do it, it’s not specific to gentoo either it will work on any linux distribution, that supports ext3, which I believe is all of them (unless they are using really, really, old kernels).

    [May 6, 2008] How to Increase ext3 and ReiserFS filesystems Performance -- Ubuntu Geek

    Features of ext3 File System

    The ext3 file system is essentially an enhanced version of the ext2 file system. These improvements provide the following advantages

    Availability

    After an unexpected power failure or system crash (also called an unclean system shutdown), each mounted ext2 file system on the machine must be checked for consistency by the e2fsck program. This is a time-consuming process that can delay system boot time significantly, especially with large volumes containing a large number of files. During this time, any data on the volumes is unreachable.

    The journaling provided by the ext3 file system means that this sort of file system check is no longer necessary after an unclean system shutdown. The only time a consistency check occurs using ext3 is in certain rare hardware failure cases, such as hard drive failures. The time to recover an ext3 file system after an unclean system shutdown does not depend on the size of the file system or the number of files; rather, it depends on the size of the journal used to maintain consistency. The default journal size takes about a second to recover, depending on the speed of the hardware.

    Data Integrity

    The ext3 file system provides stronger data integrity in the event that an unclean system shutdown occurs. The ext3 file system allows you to choose the type and level of protection that your data receives. By default, Most Linux Distributions configures ext3 volumes to keep a high level of data consistency with regard to the state of the file system.

    Speed

    Despite writing some data more than once, ext3 has a higher throughput in most cases than ext2 because ext3’s journaling optimizes hard drive head motion. You can choose from three journaling modes to optimize speed, but doing so means trade offs in regards to data integrity.

    Easy Transition

    It is easy to change from ext2 to ext3 and gain the benefits of a robust journaling file system without reformatting. See the Section called Converting to an ext3 File System for more on how to perform this task.

     

    [May 6, 2008] Q ~ Tuning ext3 reads down - Ubuntu Forums

    Q ~ Tuning ext3 reads down

    Hello,

    I'm using the ext3 file system and it schedules a "read" from my disk every 5 seconds, which is annoying. How do you adjust this to something like once every minute. Will tune2fs do this? TIA.
    Old September 9th, 2007   #3
    ChrisNiemy
    Just Give Me the Beans!
     
     
     
     
     
    Join Date: Mar 2006
    Location: near Cologne, Germany
    Posts: 65
    Thanks: 0
    Thanked 1 Time in 1 Post
    Re: Q ~ Tuning ext3 reads down

    Hi there!

    Here's the solution (I guess):

    The 5 seconds are the commit interval. This is the standard behaviour. You can check this in your syslog.
    here the ext3.txt from the kernel documentation (<kernel dir>/Documentation/filesystems/ext3.txt:
    Quote:
    (...)commit=nrsec
    Ext3 can be told to sync all its data and metadata
    every 'nrsec' seconds. The default value is 5 seconds.
    This means that if you lose your power, you will lose
    as much as the latest 5 seconds of work (your
    filesystem will not be damaged though, thanks to the
    journaling). This default value (or any low value)
    will hurt performance, but it's good for data-safety.
    Setting it to 0 will have the same effect as leaving
    it at the default (5 seconds).
    Setting it to very large values will improve
    performance.

    (in the following mini-HOWTO are added more performance options, if you don't want them then only add the "commit=seconds" option (in the same order though)

    1st step
    Take your /etc/fstab and add these options for your /root (and/or /home etc) partition:
    Code:
    (previous options...),noatime,nodiratime,nobh,data=writeback,commit=100
    I guess you will also be very happy with the "data=writeback" and "nobh" option. This works for ext3. I guess for reiser also, but please check this before..

    2nd step
    To make data=writeback and the new commit interval work get your /boot/grub/menu.lst
    See the "defoptions=" line and add (e.g. after "ro quiet splash") -->
    Code:
    quiet splash rootflags=data=writeback,nobh,commit=100
    also add (only) "rootflags=data=writeback" to the altoptions=-line!

    Then
    Code:
    sudo update-grub
    3rd step
    For data=writeback, the last step before rebooting is (works with mounted filesystem )
    Code:
    sudo tune2fs -o journal_data_writeback /dev/hd(...)
    For all your partitions, e.g. if you have /root and /home seperated.

    finally...
    Then do a reboot. However, the specific option you were looking for is the "commit=sec" options. The value is measured is seconds.

    caution!
    I had several crashes (not linux' fault ) and my data is still there, although these options increases a possible risk of data loss!!! Note: You are not disabling journaling with this. so it's still pretty safe. (however, own risk)

    Appendix
    PS: My posting seems quite confusing, I guess. So here are the specific example lines/files:

    /etc/fstab:
     
    Code:
    /dev/hdc2    /   ext3   defaults,errors=remount-ro,data=writeback,noatime,nodiratime,nobh,commit=100     0     1
    (do the same for if you have a seperated /home)

    /boot/grub/menu.lst
     
    Code:
    (...)
    ## additional options to use with the default boot option, but not with the
    ## alternatives
    ## e.g. defoptions=vga=791 resume=/dev/hda5
    # defoptions=quiet splash rootflags=data=writeback,nobh,commit=100
    (...)
    ## altoption boot targets option
    ## multiple altoptions lines are allowed
    ## e.g. altoptions=(extra menu suffix) extra boot options
    ##      altoptions=(recovery) single
    # altoptions=(recovery mode) single rootflags=data=writeback
    ####for the alt options only the data=writeback options is necessary
    (...)
    don't forget to run a "sudo update-grub"!

    Be sure, to have e.g. a live cd to access the system if you make at typing error or so in one of these config files.

    WARNING (again) of possible several data loss. Do at your own risk.
    This is recommended for laptops and/or desktop systems. Don't do this on servers!

    DON'T MAKE A TYPING ERROR BY MIXING UP tune2fs with mke2fs!!! This happened once to me and will erase all your data.

    more information
    Kernel-Documentation (mostly <directories to kernel>/Documentation/filesytems/ext3.txt
    very interesting

    manpages: tune2fs
    __________________
    i just love ubuntu coffee | http://www.last.fm/user/chrisniemy | http://www.ubuntuusers.de

    Solaris ZFS and Red Hat Enterprise Linux Ext3 File System Performance White Paper

    data=writeback While the writeback option provides lower data consistency guarantees than the journal or ordered modes, some applications show very significant speed improvement when it is used. For example, speed improvements can be seen when heavy synchronous writes are performed, or when applications create and delete large volumes of small files, such as delivering a large flow of short email messages. The results of the testing effort described in Chapter 3 illustrate this topic.

    When the writeback option is used, data consistency is similar to that provided by the ext2 file system. However, file system integrity is maintained continuously during normal operation in the ext3 file system.

    In the event of a power failure or system crash, the file system may not be recoverable if a significant portion of data was held only in system memory and not on permanent storage. In this case, the filesystem must be recreated from backups. Often, changes made since the file system was last backed up are inevitably lost.

    How to make ext3 or reiserfs use journal data writeback

    First you need to take fstab file using the following command

    sudo cp /etc/fstab /etc/fstab.orig

    Edit the /etc/fstab file using the following command

    sudo vi /etc/fstab

    Add the thing marked in bold to your fstab root mount line.

    /dev/hda1 / ext3 defaults,errors=remount-ro,atime,auto,rw,dev,exec,suid,nouser,data=writeback 0 1

    Save that file and exit

    You need to take Grubmenu file backup using the following command

    sudo cp /boot/grub/menu.lst /boot/grub/menu.lst.orig

    Now you need to edit the grub menu list file using the following command

    sudo vi /boot/grub/menu.lst

    look for the following two lines

    # defoptions=quiet splash
    # altoptions=(recovery mode) single

    change to

    # defoptions=quiet splash rootflags=data=writeback
    # altoptions=(recovery mode) single rootflags=data=writeback

    Save that file and exit

    Now you need to update the grub using the following command

    sudo update-grub

    the added flags will automatically be added to the kernel line and stay there in case of kernel update

    Changes to Ext3 FileSystem Only

    Note:- tune2fs only works for ext3. Reiserfs can’t change the journal method

    Before rebooting change the filesystem manually to writeback using the following command

    sudo tune2fs -o journal_data_writeback /dev/hda1

    Check that it is running or not using the following command

    sudo tune2fs -l /dev/hda1

    Remove update of access time for files

    Having the modified time change you can understand but having the system updating the access time every time a file is accessed is not to my liking. According to the manual the only thing that might happen if you turn this off is that when compiling certain things the make might need that info.

    To change this do the following

    sudo vi /etc/fstab

    add the following marked in bold

    /dev/hda1 / ext3 defaults,errors=remount-ro,noatime,auto,rw,dev,exec,suid,nouser,data=writeback 0 1

    Now reboot and enjoy a much faster system

     

    [Aug 7, 2007] Linux Replacing atime

    August 7, 2007 | KernelTrap | Last updated 06/12/2008 17:18:17

    Submitted by Jeremy on August 7, 2007 - 9:26am.

    In a recent lkml thread, Linus Torvalds was involved in a discussion about mounting filesystems with the noatime option for better performance, "'noatime,data=writeback' will quite likely be *quite* noticeable (with different effects for different loads), but almost nobody actually runs that way."

    He noted that he set O_NOATIME when writing git, "and it was an absolutely huge time-saver for the case of not having 'noatime' in the mount options. Certainly more than your estimated 10% under some loads."

    The discussion then looked at using the relatime mount option to improve the situation, "relative atime only updates the atime if the previous atime is older than the mtime or ctime. Like noatime, but useful for applications like mutt that need to know when a file has been read since it was last modified."

    Ingo Molnar stressed the significance of fixing this performance issue, "I cannot over-emphasize how much of a deal it is in practice. Atime updates are by far the biggest IO performance deficiency that Linux has today. Getting rid of atime updates would give us more everyday Linux performance than all the pagecache speedups of the past 10 years, _combined_." He submitted some patches to improve relatime, and noted about atime:

    "It's also perhaps the most stupid Unix design idea of all times. Unix is really nice and well done, but think about this a bit: 'For every file that is read from the disk, lets do a ... write to the disk! And, for every file that is already cached and which we read from the cache ... do a write to the disk!'"

    Linux Ext2 filesystem for Windows NT driver

    Ext2 0.04 for NT4 read-write
    
    Contacts and feedback: Andrey Shedel andreys@cr.cyco.com 
    Primary site: http://www.chat.ru/~ashedel
    
    CAUTION!!! this is nt kernel-mode driver and you are using it at your
    own risk. It is highly recommended to use sync utility to flush regular
    volumes first.
    
    >> You should be aware of the fact that ext2.sys might   <<
    >> damage the data stored on your hard disks.            <<
    
    If you cannot agree to these conditions, you should NOT use ext2.sys !
    
    installation (you should be the member of administrators group):
    
    copy ext2.sys to your %systemroot%\system32\drivers directory 
    merge ext2.reg file 
    reboot to update driver information 
    edit go.cmd to point to your Linux drive 
    run go.cmd
    
    
    Known features:
    Non-regular files are converted to regular at first write attempt.
    
    Mounting partitions:
    
     NT4: 
    
    Instead of loading the driver manually or automatically (by setting
    startup mode to 1) you can use fs_rec.sys (recognizer driver). This
    driver is a superset of the recognizer that comes with NT4 and can be
    used instead of it. In addition to CDFS, NTFS and FAT (standatd set for
    NT4) it includes recognision modules for HPFS (for pinball.sys from
    NT3.51), FAT32-enabled fastfat and Ext2. It is not recommended to use
    this recognizer on NT5 because support for UDFS is not included.
    Unfortunately even in this case you still have to set persistent links
    in DosDevices namespace UNLESS YOU ARE USING NT5. For example:
    
    [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\DOS Devices]
    "E:"="\\Device\\Harddisk0\\Partition2"
    
    NT5:
    
    On NT5 you can use Disk Management utility to assign drive letter.
    
    Files included:
    readme.txt - this file
    ext2.sys   - driver  
    dosdev.exe - Define/RemoveDosDevice utility.
    kloader.exe - utility to load kernel-mode driver.  
    ln.exe      - hardlink creation utility.  
    SYNC.EXE    - Flush write-behind cache utility. 
    fs_rec.sys  - Recognizer driver.
    
    Changes:
    0.04:
    pagefile support
    initial security implementation.
    

    [Dec 12, 1999] Slashdot Ask Slashdot EXT3

    A great way to follow kernel development is to read the excellent kernel mailing list synopses written by Zack Brown at:

    http://kt.linuxcare.com -

    Ext3fs is a journaled version of ext2fs written by Stephen Tweedie. It's in beta form right now but works pretty well. Stephen and Ted Ts'o talked about ext3fs at our Linux Storage Management Workshop in Darmstadt, Germany (you can get the slides for this workshop at ftp://linux.msede.com/lsmws_talks/) The ext3 filesystem, of which early alphas are ready (version 0.0.2c, the excitement !!). Development is on the linux-fsdevel mailing list, archived here. Hello, I've been running ext3 on my laptop computer for about two months now. It works great. Just sync the disks and turn it off. No shutdown. No data loss either. If you look at e.g Solaris disk-suite you are able to control where your should store your metadata. Say that you want to have journaling file data also, this is normally slowing the system down. But if you can specify that all file metadata should be on a separate solidstate disk (naturally mirrored for safety). Then journaling of file data will be quick and swift. This is in my view quite important. If I understand everything correctly you can do that with ext3. One of the major problems with ext2fs (IMHO) is that it doesn't resize well. This is because there is a copy of every group descriptor in every group [a g.d. contains metadata for a group of blocks/inodes, typically 8M in size]. Therefore enlarging or shrinking the drive causes a major reshuffle of ALL the data; so far, the only utility I know that can do this is resiz2fs, which comes with Partition Magic (there are no doubt others now).

    This redundancy is good in theory (backups), but keeping a copy of a constant number of group descriptors (perhaps the previous and next 32) in a given group would still give you a lot of redundancy plus make resizing simpler.

    Granted, resizing isn't something you do a lot, but having had my system lock up and die while resizing and having to recover using Turbo C++ and the ext2fs spec (code and info on my ext2fs page), it would be nice if ext3fs (or XFS) made this easier.

    The Reiser Filesystems by Hans Reiser, a very ambitious project to not only improve performance and add journaling, but to redefine the filesystem as a storage repository for arbitrarily complex objects. Reiserfs is faster than ext2/3 because it uses balanced trees for it's directory-structures.
    The project is now released for 2.2.11 - 2.2.13. Mailing list archive here.

    The Xfs site has some docs. The work to unencumber the code is accelerating, and February is the target date for source code release. XFS is the one that I think has the most potential. It's a full logging filesystem from the ground up, not an extension (not that EXT3 or DTFS are bad or misguided efforts) I'm betting it will be the highest performance filesystem for linux when it goes gold. I think the tight integration of the log could be a huge plus. It's been a while since filesystem 101 but I would think that there are a ton of ways to optimize performance with log write back tricks and useage optimizations.. You could include a hit counter in metadata and have an optimizer that moves higher hit files closer to the log in the center of the disk making your more frequently used files closer to where the head is supposed to be. Those kinds of optimizations (if practical, maybe I'm full of it) wouldn't be nearly as easy with ext3 since the FS doesn't have any knowldege of the log. Plus xfs has ACLs and big file support already.
     

    Hi,ext3fs is a journaled version of ext2fs written by Stephen Tweedie. It's in beta form right now but works pretty well. Stephen and Ted Ts'o talked about ext3fs at our Linux Storage Management Workshop in Darmstadt, Germany (you can get the slides for this workshop at ftp://linux.msede.com/lsmws_talks/) -

    Stephen also gave a talk on ext3fs at the Linux Kongress in Augsburg, Germany. He is predicting Summer 2000 for production use of ext3fs. Nice features include the fact that ext3fs is backwards compatible with older versions of ext2. In addition, ext3fs uses asynchronous journaling, which means the performance will be as good or better than ext2fs. -

    I am involved with the SGI effort to port XFS to Linux. The work to unencumber the code is accelerating, and February is the target date for source code release. The read path is working at this time. More work remains however, so stay tuned to -

    http://oss.sgi.com

    From Slashdot

            Q: I hate these "/dev/hda5 has reached maximal mount count; check forced". I hope they too go away with journaling...

    A: Easy fix: raise the max-mount-counts and interval-between-checks for the filesystem with tune2fs.

    Example: tune2fs -c 200 /dev/sda1 -i 700

    The -l flag will show you, among other things, the current settings. Be aware you are defeating a built-in safeguard to protect your data.

    Recommended Links


    In case of broken links please try to use Google search. If you find the page please notify us about new location
    Google     

    Gentoo Forums View topic - Some ext3 Filesystem Tips

    Solaris ZFS and Red Hat Enterprise Linux Ext3 File System Performance White Paper

    Explores the performance characteristics and differences of Solaris ZFS and the ext3 file system through a series of benchmarks based on use cases derived from common scenarios, as well as the IOzone File System Benchmark (IOzone benchmark) which tests specific I/O patterns.

    Ext2fs Home Page Design and Implementation of the EXT/2 Filesystem - by RИmy Card - HTML


    Linux is a Unix-like operating system, which runs on PC-386 computers. It was implemented first as extension to the Minix operating system [Tanenbaum 1987] and its first versions included support for the Minix filesystem only. The Minix filesystem contains two serious limitations: block addresses are stored in 16 bit integers, thus the maximal filesystem size is restricted to 64 mega bytes, and directories contain fixed-size entries and the maximal file name is 14 characters.

    We have designed and implemented two new filesystems that are included in the standard Linux kernel. These filesystems, called ``Extended File System'' (Ext fs) and ``Second Extended File System'' (Ext2 fs) raise the limitations and add new features. 

    In this paper, we describe the history of Linux filesystems. We briefly introduce the fundamental concepts implemented in Unix filesystems. We present the implementation of the Virtual File System layer in Linux and we detail the Second Extended File System kernel code and user mode tools. Last, we present performance measurements made on Linux and BSD filesystems and we conclude with the current status of Ext2fs and the future directions.
     

    Analysis of the Ext2fs structure - Table of Contents Copyright (C) 1994 Louis-Dominique Dubeau.

    A Non-Technical Look Inside the EXT2 File System Issue 21

    Ext2fs Undeletion of Directory Structures mini-HOWTO

    Transparent compression for the ext2 filesystem

    EXT2 Futures -- by Theodore Ts'o

    These slides focus specifically on the EXT/2 filesystem, talking about its evolution, philosophy, planned new features, and relation to other linux filesystems.

    Design and Implementation of the Second Extended Filesystem -- an excellent paper

    A tour of the Linux VFS

    ext2fs Utilities (utilities for the ext2 file system - Linux' primary fs) by Theodore Ts'o ext2 Partitions Re ext2 partitions

    Filesystem Hierarchy Standard - This page is the home of two standards, the Linux Filesystem Standard (FSSTND) and its successor, the Filesystem Hierarchy Standard (FHS).



    Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    Standard disclaimer: The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

    Last modified: June 12, 2008-