|Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells
|Unix was a program gone bad. Born into poverty,
its parents, the phone company, couldn't afford more than a roll of teletype paper a year, so
Unix never had decent documentation and its source files had to go without any comments whatsoever.
Year after year, Papa Bell would humiliate itself asking for rate increases so that it could
feed its child. Still, unix had to go to school with only two and three letter command names
because the phone company just couldn't afford any better. At school, the other operating systems
with real command names, and even command completion, would taunt poor little Unix for not having
any job or terminal management facilities or for having to use its file system for interprocess
communication and locking.
... ... ...
Then one evening Unix watched television, an event which would
.... ... ...
The worst strain was on Unix's mind. Unable to assimilate all
Ian Horswill, rec.humor.funny Jan 25, 1993
To compare two very complex system is a very difficult and unrewarding task and in no way the author can claim understanding of all complex issues involved. So the notes below is a patchwork of some weakly connected observations. We will consider each operating system consisting of a seven major components
First of all Solaris and linux are two different OSes that share common set of standards (POSIX) and common set of system calls. Solaris is officially POSIX compatible, Linux seems to be reasonably well POSIX compatible too although it never got an official certification.
Having different internal architectures each OS has its strong and weak points. Generally the clean and layered design has portability and maintenance (especially compatibility) advantages, while the quick & dirty design usually has speed advantages (and huge compatibility disadvantages). This classic dichotomy have been taught at any decent university OS architecture course since probably early 70th, I just want to stress here that compatibility is a very important criteria for a large enterprise environment.
Solaris is heterogeneous OS that can perform well and has a noticeable market share on two architectures: UltraSparc and Intel/AMD. Historically Power, Itanium and Z/VM ports also existed. Due to Sun efforts Solaris on Intel became by-and-large competitive with Linux. Less so as a development workstation or desktop. The latter is weak spot of Linux too with Apple calling the shots in Unix-based desktop space. Essentially Linux is now marginalized on the desktop, with temporary revival due to netbooks coming to an end. VMware that gave possibility to run Linux parallel with Windows put the last nail into the coffin. Still it is important to understand that while unsuccessful in "mainstream" desktops and laptop market it has niches where is performed reasonably well. One such niche are netbooks where Ubuntu has marginal presence.
As the server Linux presence is mainly limited to Intel/AMD. Outside Intel/AMD there is also a major presence in stripped down versions designed for smartphones (Android). On servers it has no major presence outside X86 platform, althouth IBM pushing it for Power. There is also some vitality in IBM S3xx port (used strictly under VM). In addition there are several abandoned ports:
IBM is trying to create a viable presence of Linux on Power but the results remain to be seen. As for mainframes, in this environment Linux works under VM which performs key OS functions (and first of all virtual memory management) much better then Linux ever did. that means that on mainframes Linux is actually relegated to the role of DOS (you can imagine a picture of a dozen or two of featherless penguins locked in the steel VM cages ;-).
In the remaining part of paper we limit ourselves to Intel implementation of Linux vs. SPARC implementation of Solaris as we try to stress differences between OSes disregarding or de-emphasizing large similarities. You need to keep the perspective and remember that after all both are version of Unix OS and Linux while cannot be officially called Unix is by-and-large POSIX-compatible OS. For a different assessment of the same topic see [Bruning2005].
As a side note we should remember that the base instruction set of all X86-64 CPUs including Opteron is the old Intel x86 instruction set. With several generations of extensions it became a very complex, non-orthogonal and not very complier friendly instruction set. Intel CPUs has Little Endean byte order, typical for early microcomputers. SPARC belongs to Big Endean architectures that also include IBM's famous System/3xx and Power5. Currently "endianess" is masked by compliers, but generally Big Endean is a more natural byte order that has some implicit advantages making, for example analysis of hex dumps as well as generation of IP packets more natural. For example, the string "UNIX", packed with two bytes per 16-bit integer will look like "NUXI" in the dump of the machine with a little Endean architecture. Big-endian numbers are easier to read in hex when debugging a program. The IP Protocol defines a standard "Big-Endian" network byte order. This byte order is used for all numeric values in the packet headers and by many higher level protocols and file formats. That means that on Little Endean architecture those values need "translation". Not that this file saves SPARC but still it is worth mentioning that there other characteristics of the architecture then price/performance ration and SPEC benchmarks.
|legacy: (Electronics & Computer Science / Computer Science) (modifier) surviving computer systems, hardware, or software legacy network legacy application|
With twenty or more years under the belt any OS is considered to be legacy OS, unless you assume that computer science stagnated and did not produce anything valuable in this area for the last 20 years. Both Linux and Solaris have the key ideas, the key design elements that are approximately 50 years old althouth both kernels experienced substantial modernization from version 1. In both cases version 2 has almost nothing to do with version 1, so actually the age is slightly less. But as Bill joy noted (Salon.com)
"If I had to rewrite Unix from scratch, I could do it in a summer, easily," says Joy. "And it would be much better. A much, much better job. The ideas are old."
Actually this is completely true. Kernel mutate as they mature and one of the most significant influences is the necessity to accommodate new hardware. For example, kernel internals took significant reorganization with the wide adoption of SMP. With Sun Solaris that happened earlier while Linux can rely of well bitten trail.
Solaris is Unix (initial Solaris, Solaris 4.x, is more then legacy, it is essentially a classic Unix. Initial version of Solaris was the continuation of groundbreaking work on Unix done in Berkeley with Bill Joy as the principal architect) and POSIX standard reflects Solaris technical design decisions not the other way around. If we put aside a cute fat mascot, Linux did not contribute much to the field (as I mentioned above, Sun adapted /proc VFS from Plan 9 in 1985, it also introduced RPC, NFS, and NIS/NIS+, and PAM to name just a few contributions).
It is interesting to note that linux was legacy OS from the very beginning: from the very start it was a very conservative design - attempt of a Finish student single handingly reimplement the basic functionality of Unix kernel in order to run GNU utilities. In twenty years this original modest attempt had drown into more or less complete reimplementation of the POSIX standard. It actually took Linux kernel fifteen years to get into more or less POSIX-compatible shape. Still even now in some areas Linux kernel is still cutting corners. It's actually pretty funny when anybody talk about Linux as an upstart OS.
Nobody can deny linux kernel team contribution and they definitely deserve respect. But if you think about the amount of money spent on Linux kernel development you might come to an unpleasant conclusion that technical brilliance of Linus Torvalds is an exaggeration or a mere media creation. He is solid, talented reimplementer of somebody else ideas, but that's about it. Much like Microsoft Linux kernel development team wait until some design feature is implemented by competitors, polished and then reimplement in in the kernel. In many case the Solaris was the source of inspiration of Linux kernel developers. That happened with VFS layer from Solaris, scheduler from Solaris, proc system from Solaris and many other things about which I don't know because I am not a kernel developer. Wait, cherry peak and include is the way such strategy works. To the very core this is a derivative project. And in this case you can be in no hurry as it is difficult to beat the linux price. This ultimate dumping strategy works wonders in the Intel servers marketplace.
For all twenty years of Linux kernel development Linus Torvalds did not authored any significant idea of extending Unix kernel design. Not a single significant idea. And it is an undisputable fact that FreeBSD team beats Linux kernel development team using a tiny fraction of the resources. For example, it was FreeBSD that got lightweight VM implementation first (jails were added to FreeBSD in 1999). Later they were adopted by all major Unixes and in my view they represent one of the few truly innovative extensions of Unix kernel that appeared recently. Jails offered "light weight virtual machines" isolating parts of the file system, process tree, and networking namespaces, and removed all super-user privileges for objects outside the jail. In a Web-hosting environment, it provided 99% of functionality of a full partitioning solution such as VMware with much lower overhead and performance impact.
|For all twenty years of Linux kernel development Linus Torvalds did not authored any significant idea of extending Unix kernel design. Not a single significant idea.|
With the Solaris 10 release the impression is that Solaris 10 development team also managed to beat Linux kernel development team. Zones, new RBAC implementation and ZFS were impressive achievements that discarded the "Cathedral and Bazaar" fairytale that it is open source development model which is the source of innovation in Unix kernel design space. Actually opposite is true. The source of innovation is not particular development paradigm but talented developers. They can be open source developers (like Poul-Henning Kamp who introduced jails in FreeBSD) they can be proprietary developers (Plan 9 development team invented /proc pseudofilesystem, Solaris teams designed ZFS, etc), but it's talent, not a development model, that matters most. See also [Bezroukov 1999a] and [Bezroukov 1999b]. Some people, such as Bill Joy, made contributions both as free/open source developers and proprietary developers.
Both BSD and Solaris are more server biased distributions: the designers do not care so much about desktop users to say nothing about laptop users. Still both Solaris and BSD are OK desktops for professionals like me. Sun staff uses Solaris on Acer laptops. I would say that for any professional a Unix server is an acceptable desktop ;-).
Kernel must fulfill two main objectives:
Interact with the hardware components, servicing all low-level programmable components included in the hardware platform.
Provide an execution environment to the applications that run on the computer system (user-space programs).
Linux kernel is a traditional Monolithic kernel (Wikipedia) that from the beginning was designed as the lowest common denominator (get GNU utilities running), not as the state of the art project. That also connected with Linux Torvalds obsession with the speed of execution (like Don Knuth noted: premature optimization is the root of all evils). If we remember old critique that was stated in Andy Tanenbaum letter in which he suggested that microkernel based kernels are a some ways more scalable design, that might still be an issue even almost 20 years since the debate occurred.
It is true that monolithic kernels are robust and much faster. It is also true that their structure better studied and more established. But that does not mean that it can scale to the huge size that current Linux kernel is suffering from. The Achill part of monolithic kernel is the stability of drivers and for a long time this was a weak point of Linux. I experienced huge problems with stability of Linux drivers on Dell servers in 2006. Later the situation improved.
But retuning to scalability it looks like the amount of bugs in modern Linux kernels suggest that after monolithic kernel exceeds certain critical mass the quantity of code turns in quality in a sense that it become almost impossible to debug. And the current suggestion by Andrew Morton, the current maintainer of the stable version of the kernel to devote the whole development cycle to the elimination of bugs sounds like a pure pragmatism on his part [Barr2006, Vaughan-Nichols2006]:
"Is the Linux kernel getting buggier? According to Andrew Morton, the lead maintainer of the Linux 2.6 kernel, in a CNET report from the LinuxTag conference in Germany, there's getting to be too much bad code in the kernel.
"'I believe the 2.6 kernel is slowly getting buggier. It seems we're adding bugs at a higher rate than we're fixing them,' said Morton..."
This impression is shared by ISPs, which are probably the largest and the most challenging environment in which Linux is deployed. For example Matt Heaton, President of Bluehost.com, recently wrote in his blog:
Problem - Major linux kernels problems with Redhat Enterprise 4. This affected about 40 servers. Solution - Finally built our own custom kernel that solved our multitude of issues.
Even from the theoretical standpoint the view that "the whole argument that microkernels are somehow `more secure' or `more stable' is also total crap'' to site apt Linus definition ;-) is now open to review and it might be the fearless leader of Linux kernel development was wrong and like Ivan Susanin navigated his troops into the woods due to the unanticipated overcomplexity-related issues in very large monolithic kernels. In other words he failed to predict the level of success of his design and the amount of money that large firms will dump into the development of kernel and drivers (which in monolithic design works in the same address space and can take everything down). True, microkernels might be a "total crap" if you compare them with the simple uniprocessor kernel of the size of Linux kernel 1.0. But the situation might be a little bit different with SMP-support and related additional complexity of all key subsystems that made kernel 2.6 many times bigger. There are few linux administrators who did not experienced system instability related to drivers. Spontaneous reboots of linux servers are an object of pretty nasty jokes toward Linus Torvalds among professional sysadmins who administer more then one flavor of Unix.
In his May 11, 2006 article "Debunking Linus's Latest" Dr. Jonathan Shapiro from Systems Research Laboratory of the Dept. of Computer Science Johns Hopkins University suggested that the problem might be the level of componentization (please take into account the he is not an unbiased participant and represents pro-microkernel faction of kernel designers). He raises an interesting point that the level of componentization of the kernel is the key to coping with the complexity in large complex kernels and that after a certain size microkernel-based OS kernels might have advantage here over monolithic design [Shapiro2006]:
What modern microkernel advocates claim is that properly component-structured systems are engineerable, which is an entirely different issue. There are many supporting examples for this assertion in hardware, in software, in mechanics, in construction, in transportation, and so forth. There are no supporting examples suggesting that unstructured systems are engineerable. In fact, the suggestion flies in the face of the entire history of engineering experience going back thousands of years. The triumph of 21st century software, if there is one, will be learning how to structure software in a way that lets us apply what we have learned about the systems engineering (primarily in the fields of aeronautics and telephony) during the 20th century.
Linus argues that certain kinds of systemic performance engineering are difficult to accomplish in component-structured systems. At the level of drivers this is true, and this has been an active topic of research in the microkernel community in recent years. At the level of applications, it is completely false. The success of things like GNOME and KDE rely utterly on the use of IDL-defined interfaces and separate component construction. Yes, these components share an address space when they are run, but this is an artifact of implementation. The important point here is that these applications scale because they are component structured.
Ultimately, Linus is missing the point. The alternative to structured systems is unstructured systems. The type of sharing that Linus advocates is the central source of reliability, engineering, and maintenance problems in software systems today. The goal is not to do sharing efficiently. The goal is to structure a system in such a way that sharing is minimized and carefully controlled. Shared-memory concurrency is extremely hard to manage. Consider that thousands of bugs have been found in the Linux kernel in this area alone. In fact, it is well known that this approach cannot be engineered for robustness, and shared memory concurrency is routinely excluded from robust system designs for this reason.
Yes, there are areas where shared memory interfaces are required for performance reasons. These are much fewer than Linus supposes, but they are indeed hard to manage (see: Vulnerabilities in Synchronous IPC Designs). The reasons have to do with resource accountability, not with system structure.
When you look at the evidence in the field, Linus's statement ``the whole argument that microkernels are somehow `more secure' or `more stable' is also total crap'' is simply wrong. In fact, every example of stable or secure systems in the field today is microkernel-based. There are no demonstrated examples of highly secure or highly robust unstructured (monolithic) systems in the history of computing.
The important thing to understand is that from the point of view of kernel architecture Linux is just one of several free Unix kernels and from the architectural standpoint it is not the best. It is definitely the base of the most popular free OS, that exists is hundreds of different distributions but as for kernel architecture Solaris has more modern and more robust free kernel than Linux.
Based on my limited knowledge of Linux kernel development it looks like Linux development suffered from a classic case of premature optimization disease, the disease that due to Linus "number uno" personality (and related "cult of personality" problem) became much more pronounced then in other free kernels development teams. Just look on the Linux scheduler development from version 1 to 2.6. In this area the commercial development teams like Solaris team might have some advantage over open software developers due to higher level of discipline and better architecture. Receiving a salary from abstract entity called company helps to diminish ego-related problem as the problem of ownership becomes much less personal and helps to suppress envy. While Linux development teams changed scheduler implementations like a woman changes gloves, and still have problems in this area, Solaris team proved to be able to implement more reliable and at the same time advanced architecture. In this area Solaris kernel definitely has an edge, the edge quite noticeable on high server loads.
Actually my critique of Linux kernel development efforts versus commercial kernel development teams is slightly unfair. Linux kernel stopped to be a volunteer development long ago (after version 1.0 if not before that). Since then it is structured as a cooperative commercially subsidized development (with IBM and Intel as key sources of funds) with highly paid key developers (Linus probably is the most highly paid Unix kernel developer on the planet). Those professional developers sometimes use volunteers as a support force but all key decisions are limited to the close circle of highly paid Linus lieutenants. This tendency toward establishing "kernel oligarchy" became even more pronounced since Linus was transferred from Transmeta to Linux system laboratories due to SCO-IBM lawsuit.
Solaris kernel also belongs to monolithic kernels, but in contrast to Linux, Solaris kernel is organized as a set of kernel threads. A kernel thread is an execution context that can be independently scheduled; if may be associated with user program, or it nay run only some kernel functions. Context switches between kernel threads are usually much less expensive that context switches between ordinary processes, because the former operates in a common address space.
Multithreaded kernel allows concurrency across multiple processors. This architecture is a departure from the traditional UNIX kernel schedulers. In Solaris, kernel threads are the unit of CPU scheduling. It allow multiple streams of execution within a single virtual memory environment; switching execution between threads is inexpensive because no virtual memory context switch is required.
As Max Bruning noted in his paper:
One of the big differences between Solaris and the other two OSes is the capability to support multiple "scheduling classes" on the system at the same time. All three OSes support Posix
SCHED_RRtypically result in "realtime" threads. (Note that Solaris and Linux support kernel preemption in support of realtime threads.) Solaris has support for a "fixed priority" class, a "system class" for system threads (such as page-out threads), an "interactive" class used for threads running in a windowing environment under control of the X server, and the Fair Share Scheduler in support of resource management. See
priocntl(1)for information about using the classes, as well as an overview of the features of each class. See
FSS(7)for an overview specific to the Fair Share Scheduler. The scheduler on FreeBSD is chosen at compile time, and on Linux the scheduler depends on the version of Linux.
The ability to add new scheduling classes to the system comes with a price. Everywhere in the kernel that a scheduling decision can be made (except for the actual act of choosing the thread to run) involves an indirect function call into scheduling class-specific code. For instance, when a thread is going to sleep, it calls scheduling-class-dependent code that does whatever is necessary for sleeping in the class. On Linux and FreeBSD, the scheduling code simply does the needed action. There is no need for an indirect call. The extra layer means there is slightly more overhead for scheduling on Solaris (but more features).
Solaris kernel is compiled using Sun proprietary compiler that produces a reasonably optimized code. Sun Studio 11 complier beats GCC on SPARC in all major tests. That means that GCC compiled kernels or software packages on SPARC need to overcome a significant handicap to perform even equally. I have no data about the behavior of Solaris compiler on Intel, but I suspect that it is noticeably worse then Intel optimizing compiler, probably somewhere in between Intel compiler and GCC. 64-bit support is probably the only area were significant differences can exist. Otherwise the compilers probably produce pretty close to quality code, taking into account typical kernel programmers style of programming (for "applications" programmers optimizations might make differences much more pronounced).
Solaris kernel is a very stable, well engineered kernel and due to this fact Solaris 8 and 9 on UltraSparc might be two most stable flavor of Unix that I ever encountered. Actually all BSD-derived kernels are amazingly stable and are slightly more secure then Linux, if only because they are less popular ;-). Linux is still in catch up mode in terms of standards, quality of crucial components (scheduler, memory subsystem, multithreading) and, especially, number and the rate of publishing of new exploits which make patching of linux servers almost as time-consuming as Windows. The mere fact of running OS on UltraSparc means some additional protection ("security via obscurity" layer which is nothing to complain about ;-). Also linux still has a tendency to cheat, or only partially implement a standard. It's getting better.
Until approximately 2003 Solaris was the only fully preemptive kernel on the market. Linux 2.6 managed to close the gap as preemption is required for multicore CPUs which became dominant in server space. Here is a structure of Solaris kernel as described in
But the main difference is the stability. For some reason Solaris development team managed to solve the problem that bothers Linux kernel development team and stability of the kernel is close to exceptional. It also better behaves under high loads.
All this means the Linux kernel is fully competitive with Solaris kernel but in no way one or the other can be claimed to be superior. It also looks that design of Solaris kernel is a little bit cleaner and that entails some additional overhead (there is no free lunch). Linux kernel was designed later then Solaris and many design decision were inspired by Solaris solutions as the major competitor in server space. It is newer and has less legacy baggage. So the fact that it failed to "outsmart" its teacher is also not a positive development and creates some level of skepticism toward Linux kernel development team.
If we classify both Linux and Solaris as legacy OSed as both are having 20 or more years of development under the belt and considerable legacy baggage, we need to examine compatibility record.
Linux always played fast and dirty with the compatibility with previous version. Actually the level of binary compatibility of applications between versions of Linux kernel is simply horrible. In other words, it continue to reinvent itself as if it is still a student project at the university of Helsinki, not a production level OS.
Solaris has a very good compatibility record, one of the best in industry. Software written 15 years ago for Solaris 2.6 works on Solaris 9 and 10 without any problems. That's a real achievement.
Both Solaris 10 x86 and Linux use grub, a very versatile and powerful bootloader. Before grub Sparc systems used to have an edge with its elegant, Forth based Open boot subsystem (which beat lilo hands down). But that is no longer true. Grub actually beats Open boot BIOS in many (but not all) areas. For example, the recovery of unbootable system is much easier with grub (one advantage is that you can specify init=/bin/bash to get to shell even if one or several of critical configuration files is completely hosed).
Grub uses two stage process for booting:
When a computer is turned on, the computer's BIOS finds the primary bootable device (usually the computer's hard disk) and loads the initial bootstrap program from the master boot record (MBR), the first 512 bytes of the hard disk, and then transfers control to this code.
The MBR contains GRUB stage 1. Given the small size of the MBR, Stage 1 does little more than load the next stage of GRUB (which may reside physically elsewhere on the disk). Stage 1 can either load Stage 2 directly, or it can load stage 1.5: GRUB Stage 1.5 is located in the first 30 kilobytes of hard disk immediately following the MBR. Stage 1.5 loads Stage 2.
When GRUB Stage 2 receives control, it presents an interface to the user in order to select which operating system to boot. This normally takes the form of a graphical menu, although if this is not available or the user wishes further control, GRUB has its own command prompt, where the user can manually specify the boot parameters. GRUB can also be set to automatically load a particular kernel after a timeout period.
Once boot options have been selected, GRUB loads the selected kernel into memory and passes control on to the kernel, which then continues to start itself. At this stage GRUB can also pass control of the boot process to another loader, using chain loading, for operating systems such as Windows that do not support the Multiboot standard. In this case, copies of the other system's boot programs have been saved by GRUB; instead of a kernel, the other system is loaded as though it had been started from the MBR. This may be yet another boot manager, such as the Microsoft boot menu, allowing further selection of non-Multiboot operating systems. (This behavior is often automatic when modern Linux distributions are installed "on top of" existing Windows systems, allowing the user to retain the original operating system without modification, including systems that contain multiple versions of Windows.)
Historically, Solaris is one of the most network oriented flavor of Unix. Along with pioneering remote procedure call concept and NFS, Sun started (often misunderstood) practice of the /home directory being a default automount point (meaning you can access the same home directory on any machine that you use, if it can mounts it). That's why Sun uses /export/home for the actual location of local home directories as it presuppose that it will be NFS-mounted to /home.
I am not in a position to provide definitive analysis of this topic and my observations should be taken with the grain of salt.
I have a subjective impression that networking in Linux is less sophisticated and less stable even in best enterprise distributions. I can support this impression by just a couple of observations
Linux networking was significantly improved after famous Mindcraft fiasco . The latter took place in the early 1999 and was related tot he results of Microsoft sponsored (and Mindcraft executed) tests that showed that despite Raymondism claims, Linux 2.2 had problems in the application area were it is most widely used -- as a web server. Windows NT simply wiped the floor with Linux. Here are some important results from the test (which was devastating to the pride of linuxoids, and served as an important stimulus for the improvement of the subsystem):
My impression is that in such complex subsystems as TCP/IP stack, which requires careful, painstaking research and tuning open source model per se is irrelevant and even may serve as a disadvantage as the "benevolent dictator" is not a specialist in networking. My discussions with Unix specialists from companies which deploy both linux and one or several commercial flavors of OS also support the impression that linux networking stack is still less mature and less stable then in all three major commercial flavors of Unix (AIX, HP-UX and Solaris).
In comparison with Solaris 10 x86 it might be also slower. According to Sin, FireEngine Phase 1 integrated in Solaris 10 contained significant improvements in TCP/IP:
● Achieved 45% gain on web like workload on SPARC
● Achieved 43% gain on web like workload on x86
● Other gains (just due to FireEngine):
– 25% SSL
– 15% fileserving
- 40% throughput (ttcp)
● On v20z, Solaris is faster than RHEL 4.3 by 10-20% using Apache or Sun One Webserver on a web based workload
Sun claims that Solaris 10 can fully saturate a 1Gb link with only 8% of 1x2.2Ghz Opteron and can drive a 10Gb link at 7.3Gbps (limited by PCI-X bandwidth) using 2x2.2Ghz Opteron CPUs utilized at less than 50%.
The default Linux networking stack provides only BSD sockets; for applications requiring STREAMS, a third party package is needed (see for example www.gcom.com/home/linux/lis/). Streams are rarely used so the level of differences is smaller then it looks.
Also many network protocols implementations, for example Linux NFS and automounter implementations, are weaker. One of the supporting this hypothesis fact is that NFS group at Sun has run into interesting situations where a customer has deployed both Solaris and linux clients, but can't use the more advanced Solaris automounter features.
Although Sun gave up control of NFS to the IETF it is still the king of the hill with probably the best implementation of NFSv4 in the industry. I would like to stress that one feature of Solaris 10 that often slips under the radar in OS comparisons is the quality of NFSv4 implementation. Yes, Linux also has NFSv4 implementation but some corners are cut. Actually this advantage is not limited to Solaris. It is true for all commercial Unixes. For example, AIX supports NFS 4 as well as Solaris and in some benchmarks come on top. As Eric Kurzal aptly noted in his blog:
One new feature of Solaris 10 that has slipped under the radar is NFSv4. I work on the client side for Solaris. You can find the rfc here and the community website here. Original Design Considerations. So what's the big deal of NFSv4 anyways?
NFSv4 makes security mandatory. NFS IS SECURE! Sun gave up control of NFS to the IETF. A common ACL model (Unix and Windows working together before Gates and Papadopulos made it popular). File level delegation to reduce network latency and server workload for a class of applications. COMPOUND procedure to allow for flexible request construction and reduction of network latency and server workload. Built in minor versioning to extend the protocol through the auspices of the IETF protocol process. Integrated locking (no need of a separate protocol - NLM, and we work with Windows locking semantics). One protocol and one port (2049) for all functionality.
So who else is implementing NFSv4? University of Michigan/CITI has putback to Linux 2.6. Back in 1999/2000, this is where I spent my last year in school working. Netapp has already released their implementation. IBM too.
Have Windows in your environment? Hummingbird can hook you up today. Rick at the University of Guelph is implementing a server for BSD.
I'll go into details for some of the features of NFSv4 in future posts.
We hold regular interoperability-fests about every 4 months via Connectathon and (NFSv4 only) bake-a-thons ( last one).
Both Solaris and Linux also supports SMB (MS Windows) file sharing, allowing sharing of files to Windows hosts. I know nothing about relative quality of both subsystems but suspect that linux might well have an edge in this area (due to Novell's participation in the development).
NIS implementation (if anyone cares to use it, for example, for automount ) is better on Solaris with security extensions and authentication handled using Pluggable Authentication Modules (PAM).
As for LDAP quality I suspect that Suse integrates better with Microsoft AD which is very important for large enterprises. Generally much depends on which directory the enterprise is using but Suse might have an edge over Solaris both in case of AD and Novell NDS. In Solaris "poor man" integration with Active directory can be achieved via NIS emulation in Microsoft SFU 3.5. It is actually very stable.
Sun is working on networking in several directions that altogether might prevent linux from compensating current weaknesses. Among to Sun presentation to BayLISA group in December 2005 [Tripathi2005] new features include (the term Squeue used below is explained in the presentation):
Dynamically switch between Interrupt and Polling (and packet chaining)
Networking interrupts are bad because writers gets pinned, context switches, etc.
Bind a NIC to a Squeue and the let the Squeue own the NIC
On backlog, Squeue turns the NIC interrupts off
Squeue can retrieve packets from the ring (in chains) after the backlog is cleared (poll mode)
If no backlog, Squeue switches the NIC back to interrupt mode
They promise another 25% improvement on x86 and 20% on SPARC platforms on web workloads due to the decrease in interrupts, context switches, mutex contentions.
Solaris supports trunking (aggregation of several NICs into one higher speed link). You can create trunks of 1Gb NICs or 10GB NICs. Each member or the trunk is owned by individual Squeues which control thhey demonstrated pretty linear scalability for a trunk of 4 1Gb NICs – 3.6Gbps. Sun plan to achieve a combined bandwidth of 30Gbps from a trunk of 4 x 10Gb NICs somewhere in 2007. Using trunking in 2005 Solaris 10 set new LAN record during Internet 2: Land speed record challenge by transferring 14Gbps over 2 x 10Gbps on a v20z server (a pretty outdated server by December 2006 standards)
Sun's achievements in network stack virtualization will be a distant target for linux vendors in years to come because networking community in linux is pretty fragmented and there is no single "brain center" that pushes this area unless really bad results were shown on some benchmarks with Microsoft. Over the years they demonstrated pretty much reactive approach on major events in networking.
Sun network development team can boast about the presence of such world class specialists as Dr. Radia Perlman. They were able to virtualized the 1Gb and 10Gb NICs based on protocol, service, or container with the following features:
They achieved that by developing so call "The Crossbow Architecture". According to Sun the key features of this architecture include:
Use the NIC to separate out the incoming traffic and divide NIC memory amongst the virtual stacks
Assign MSI interrupt per virtual stack
The FireEngine Squeue controls the rate of packet arrival into the virtual stack by dynamically switching between interrupt & polling
Incoming B/W is controlled by pulling only the allowed number of packets per second
Virtual stack priority is controlled by the squeue thread which does the Rx/Tx processing
Each Solaris container can have its own virtual stack with private routing table. Each Container can have its own routing table, firewall, etc and the Container administrator can tune it according to his/her requirements.
There are also interesting protection mechanisms against DoS attacks. Only the impacted services or virtual machine takes the hit instead of the entire grid. Under attack, impacted services switch to starting all new connections under limited resource, low priority stack. Connections transition to appropriate priority stacks only after application authentication.
All-in-all Solaris 10 virtual stacks are pretty much isolated from each other and at the same time enjoy low performance overheads due to virtualization (probably less then 10%).
Linux I/O devices driver layer, especially IDE layer, is an Achilles point of linux. It is very fragile and is a cause of immense number of problems including severe crashes of production servers ("blue screen of death" type if we reuse Windows terminology with the only difference that no blue screen of death is produced), crashes which more than anything discredit linux as an enterprise class OS.Some time ago Torvalds responded to Davidsen's post by writing [Linux.com Linux 2.6 and the ide-scsi module]:
On 6 Nov 2003, bill davidsen wrote:Joerg Schilling, author of cdrtools (which includes cdrecord) has even more harsh view on the situation then Linus Torvalds. He wrote:
> There is a problem with ide-scsi in 2.6, and rather than fix it someone
> came up with a patch to cdrecord to allow that application to work
> properly, and perhaps "better" in some way.
The "somebody" strongly felt that ide-scsi was not just ugly but _evil_, and that the syntax and usage of "cdrecord" was absolutely stupid.
That somebody was me.
ide-scsi has always been broken. You should not use it, and indeed there was never any good reason for it existing AT ALL. But because of a broken interface to cdrecord, cdrecord historically only wanted to touch SCSI devices. Ergo, a silly emulation layer that wasn't really worth it.
The fact that nobody has bothered to fix ide-scsi seems to be a result of nobody _wanting_ to really fix it.
So don't use it. Or if you do use it, send the fixes over.
Sorry, I did have to learn that the Linux kernel developers (and above all their loudest speaker Linus Torvalds) don't have the knowledge to discuss kernel internals :-(
The more I try to explain them how a decent SCSI transport interface should look, the more I fail. I never did check a 2.6 Linux kernel and as SuSE did stop giving away free SuSE distributions to developers more than half a year ago, it is very unlikely that I will install a newer Linux kernel.
Linux is the worst OS I am aware of if you compare SCSI transport implementations. Every even year, a new completely disjunct new kernel interface appears. Non of the new kernel interfaces includes the features that I like to have and documented since 1995. For this reason, it is not possible to develop cdrecord on Linux - I use Solaris where I get the needed features.
It looks like the situation did not improved four years later...
Solaris 10 before ZFS has simple but fast volume manager (Softslice Suit) and linux LVM was definitely superior in capabilities as well as in ease of administration. Linux LVM 2 supports snapshots, software RAID and other advanced features (I think it was donated by IBM). Snapshots are actually extremely useful in enterprise environment as they can help to cut downtime during backups. You need just shutdown application, create snapshot and restart the application; actually backup can be run using snapshot, not the original partition(s). Here is the short history of Linux LVM
LVM is a Logical Volume Manager for the Linux operating system. There are now two version of LVM for Linux:
- LVM 2 - The latest and greatest version of LVM for Linux.
LVM 2 is almost completely backward compatible with volumes created with LVM 1. The exception to this is snapshots (You must remove snapshot volumes before upgrading to LVM 2)
LVM 2 uses the device mapper kernel driver. Device mapper support is in the 2.6 kernel tree and there are patches available for current 2.4 kernels.
- LVM 1 - The version that is in the 2.4 series kernel,
LVM 1 is a mature product that has been considered stable for a couple of years. The kernel driver for LVM 1 is included in the 2.4 series kernels, but this does not mean that your 2.4.x kernel is up to date with the latest version of LVM. Look at the README for the latest information about which kernels have the current code in them.
Truth be told Solaris usually was used with Veritas Volume Manager, which was equal or better the linux LVM.
From the architectural standpoint there is no free lunch and the levels of indirection incroduced by volume managers can slow the boot process and complicate disaster recovery. Approach taken by ZFS looks more modern. For comparison table between major LVM implementations see Logical volume management - Wikipedia
While formally this is a different subsystem in practice it is fairly integrated with the supported filesystems.
Both Solaris UFS and Linux ext2f have a common ancestor: classic BSD UFS filesystem[McKusick-Joy-Leffler-1984]. Being more then 20 years old this filesystems shows problems on modern servers with their huge filesystems and insane amount of files in such filesystems, which are pretty typical situation in badly architectured enterprise systems that can work only by "brute force" (as in "pigs can fly") by using much more powerful servers and the most expensive disk subsystems available.
Linux I/O subsystem is a mess. This is the weakest part of the OS.
If we try to compare filesystems, the general impression is that despite having a common ancestor the current Solaris filesystem (UFS) is more reliable then Linux ext2fs/ext3f. At the same time feature-wise it definitely more limited. Ext3 cannot compete with ZFS.
Reiser filesystem is more modern, NTFS-style filesystem which is faster but even less reliable (now dead). But the future of Reiser filesystem is problematic with its architect is in deep trouble (he is convicted killer). Recently it was removed as a default filesystem in Open Suse (Suse 10 still uses Reiser filesystem as default, but this might change). It is unclear how it goes against ZFS in Solaris.
As assortment of supported drives for Solaris is pretty narrow Solaris is more friendly to some advanced features like 4K sectors on SCSI, SAS and, especially solid state drives. The size of cache of SCSI or SAS controller generally are larger for Intel hardware (I saw HP controllers with 1GB of cache).
One interesting for large enterprise environment test of OS filesystem layer is database performance. The recent test conducted by Sun [MySQL2006] had show that optimized for Solaris MySQL beats MySQL on RHEL ES by a considerable margin, the margin which is difficult to explain by vendor bias:
...the open source MySQL database running online transaction processing (OLTP) workload on 8-way Sun Fire V40z servers. The testing, which measured the performance of both read/write and read-only operations, showed that MySQL 5.0.18 running on the Solaris 10 Operating System (OS) executed the same functions up to 64 percent faster in read/write mode and up to 91 percent faster in read-only mode than when it ran on the Red Hat Enterprise Linux 4 Advanced Server Edition OS.Driven by two Sun Fire V40z servers, powered by Dual-Core AMD Opteron(TM) Model 875 processors, the benchmark testing generated data points at 11 different load levels, starting with one concurrent user connection (CUC) and gradually doubling that number, up to a high of 1024 CUC.
The primary difference between the two servers was in the underlying operating system, keeping the hardware configuration and database properties the same. During the read/write test, both systems reached their saturation point at eight CUC, at which point the the server running the Solaris 10 OS was 30 percent faster. Additionally, the Sun Fire V40z server running the Solaris 10 OS was running database queries at a 64 percent better rate on average, when compared to the server running Linux.
The Solaris advantage was magnified during the read-only test, where performance exceeded the Linux test case by 91 percent. Remarkably, in this experiment, the peak performance under the Solaris 10 OS was achieved with 16 CUC, while the less robust Red Hat Linux tapered off at only eight CUC. Despite running at twice the load during the peak phase, the Solaris 10-based server was performing 53 percent more transactions per second than the Linux-based server.
Independent test shows that linux generally hold its own against Solaris with the exception of some configurations (for example 4 sockets configurations). With MySQL Linux has higher peak but Solaris generally deteriorates less with increased concurrency. See also Database test Sun UltraSparc T1 vs. AMD Opteron (1-10) Tweakers.net. The authors of this study reviewed the relative performance of similarly priced UltraSparc T1 and Opteron systems and came to the following conclusion:
Testing the UltraSparc T1 was not a trivial task: we spent a good three months finding the optimal configuration for our tests, for which we worked together with people from Sun, who in turn worked with people from MySQL. In all, two billion queries were fired, spread out across 3,500 serial runs, which took more than nineteen days to complete. Our research showed that results can vary greatly, which taught us that this machine is not easy to tame. If you want to get it to achieve its maximum potential, you need the right combination of software and settings, which can demand a great deal of patience. We are reasonably convinced that we have done the best that we can do for now, but Sun is still researching our benchmark because the company believes that it can be improved. In response to the problems we found, the company has come up with an improvement for the Sun Studio compiler which may allow considerable gains at a later stage, but we did not want to wait for its release before publishing our results.
... ... ...
Unfortunately, we have little choice but to be disappointed in the UltraSparc T1's performance: even the perfectly scaling PostgreSQL allows the machine to be very convincingly overtaken by the 'average' Opteron server, costing just below half its price. It can only be hoped that the T2000 manages better in other situations, with performance gains that are large enough to justify the difference in price, since otherwise the radically designed chip is in danger of being trampled by competitors that with a more gradual and conservative approach to the switch to multicore architecture. It has to be said though that a plus of the T2000 is that it is very energy efficient: full loading only pulls 322W out of the mains, while the Opteron needs about 50% more, doing the task. When we look at the performance per Watt in PostgreSQL, we measure peaks of 1,17 requests per second per Watt for the Opteron, and 1,34 requests per second per Watt for the T2000; which translates to an advantage of about 15% for the Niagara. Sun also wants to take the server's height into consideration in its own SwaP measure, but since both of them were 2U, this does not change the picture.
In file systems, a snapshot is a copy of a set of files and directories as they were at a particular point in the past. The term was coined as an analogy to that in photography. A snapshot can be thought as a concept similar to RAID 1 (mirroring). Rather than performing a block-by-block copy of a disk, and then performing all writes to both copies, a snapshot takes a shortcut. The snapshot starts from an original partition and instead of copying all of the original blocks, it creates a copy of the metadata structures. In essence, it has pointers to all the data blocks. Thus, a snapshot is relatively fast to create (few seconds on relatively free of activities filesystem).
Most modern OSes support this concept either on the level of filesystem of on the level of LVM. One of the first implementation that included snapshot feature was the VERITAS File System. In Solaris this facility was available since 2001 with the introduction of fssnap command in Solaris 8 . A similar Windows term is Shadow Copy (also called Volume Snapshot Service or VSS) was introduced with Windows XP with SP1, Windows Server 2003, and available in all subsequent releases of Microsoft Windows. Snapshots have also been available in the NSS (Novell Storage Services) file system on Netware since version 4.11, and more recently on Linux platforms in the Novell Open Enterprise Server product. Linux LVM is also snapshots capable
Solaris has a very elegant implementation of snapshots both in UFS and ZFS. We will discuss UFS implementation first. My discussion of UFS snapshot capabilities is partially based on the article Free Snapshots by Peter Baer Galvin in Sys Admin .
The snapshot is placed within an existing file system or (theoretically) on a raw device. Changes to the snapped filesystem are then handled specially. For every block (metadata or normal data) that is to be written to the snapped filesystem, a copy of the original contents is created and placed on the snapshot and then the write is allowed to occur to the original file system. In this manner, the original source file system is kept up to date and its snapshot copy has the contents that the file system had when the snapshot occurred. In essence Solaris snapshot facility implement perfect "one generation back" file recovery scheme at a very low cost. that means that if you created a snapshot in the morning any destruction of files to the last yesterday copy is easily reversible during the day. this is an extremely convenient feature on workstation to have and I personally cannot live without it a day as I am pretty absent minded individual when working with the large amount of similar files like Softpanorama website.
Because snapshots are fast and low overhead, they can be used extensively without great concern for system performance or disk use (although those aspects must also be considered).
The basic Solaris command to create a snapshot is:
# fssnap -o backing-store=/snap / /dev/fssnap/0
In this command backing-store is the file system on which to put the snapshot, and the last argument / is the file system to snap. The command returns a device name, which is an access point to the snapshot file system. Of course, you can create multiple snapshots, one per file system on the system:
# fssnap -o backing-store=/snap /opt /dev/fssnap/1
The snapshot operation on a quiet file system took a few seconds. The busier the file system, the longer the operation.
A snapshot can reside on any file system type, even NFS. An unmounted device can also be used as the backing store
Now we can check the status of a snapshot:
# fssnap -i / Snapshot number : 0 Block Device : /dev/fssnap/0 Raw Device : /dev/rfssnap/0 Mount point : / Device state : idle Backing store path : /snap/snapshot0 Backing store size : 2624 KB Maximum backing store size : Unlimited Snapshot create time : Wed Oct 31 10:20:18 2001 Copy-on-write granularity : 32 KB
Note that there are several options on snapshot creation, including limiting the maximum amount of disk space that the snap can take on its backing store.
From the system point of view, the snapshot looks a bit strange. The disk use, at least at the initial snap, is minimal as would be expected:
# df -k Filesystem kbytes used avail capacity Mounted on /dev/dsk/c0t0d0s0 4030518 1411914 2578299 36% / /proc 0 0 0 0% /proc fd 0 0 0 0% /dev/fd mnttab 0 0 0 0% /etc/mnttab swap 653232 16 653216 1% /var/run swap 653904 688 653216 1% /tmp /dev/dsk/c0t1d0s7 5372014 262299 5055995 5% /opt /dev/dsk/c0t0d0s7 4211158 2463312 1705735 60% /export/home /dev/dsk/c0t1d0s0 1349190 3313 1291910 1% /snap
However, an ls shows seemingly very large files:
# ls -l /snap total 6624 drwx------ 2 root root 8192 Oct 31 10:19 lost+found -rw------- 1 root other 4178771968 Oct 31 10:30 snapshot0 -rw------- 1 root other 5500942336 Oct 31 10:24 snapshot1
These files are “holey”. Logically, they are the same size as the snapped file system. As changes are made to the original, the actual size of the snapshot grows as it holds the original versions of each block. However, almost all of the blocks are empty at the start, and so are left as “holes” in the file. The disk use is thus only the metadata and blocks that have changed.
The performance impact of a snapshot is that any write to a snapped-file system first has the original block written to the snap, so writes are 2X non-snapped file systems. This is similar to the overhead of RAID-1. Typically, RAID-1 writes are done synchronously to both mirror devices. That is, the writes must make it to both disks before the write operation is considered to be complete. This extra overhead makes writes more expensive. It is not clear whether snapfs commands are done synchronously or asynchronously, although it is likely the former.
What can be done once a snapshot is created? Certainly a backup can be made of the snapshot, solving the previously ever-present “how to back up a live file system consistently” problem. In fact, fssnap has built-in options to make it trivial to use in conjunction with ufsdump:
# ufsdump 0ufN /dev/rmt/0 'fssnap -F ufs -o raw,bs=/snap,unlink \ /dev/rdsk/c0t0d0s0'
This command will snapshot the root partition, ufsdump it to tape, and then unlink the snapshot so the snapshot file is removed when the command finishes (or at least the file should be removed). In testing, the unlink option does indeed unlink the snapshot file, but the fssnap -d command is required to terminate the use of the snapshot and actually free up the disk space. Thus, this would be the full command:
# ufsdump 0ufN /dev/rmt/0 'fssnap -F ufs -o raw,bs=/snap,unlink \ /dev/rdsk/c0t0d0s0' # fssnap -d /
fssnap gets interesting when the snapshot itself is mounted for access, as in:
# mount -o ro /dev/fssnap/0 /mnt
Now we can create a file in /, and see that it does not appear in:
/mnt: # touch /foo # ls -l /foo -rw-r--r-- 1 root other 0 Nov 5 12:25 /foo # ls -l /mnt/foo /mnt/foo: No such file or directory
Unfortunately, there does not appear to be any method to promote the snapshot to replace the current active file system. For example, if a systems administrator was about to attempt something complicated, such as a system upgrade, she could perform a snapshot first. If she did not like the result, she could restore the system to the snapshot version. (the “live upgrade” in Solaris provides a similar functionality.)
The backing store can be deleted manually after it’s finished. fssnap -d “deletes” the snap, but that is probably the wrong terminology. Rather, it stops the use of the snapshot, more like “detaching” it from the source file system. To actually remove the snapshot, the snapshot file must also be deleted via rm.
Alternately, the unlink option can be specified when the snap is created. This prevents a file system directory entry from being made for the file. In essence, it is then like an open, deleted file. Once the file is closed, the inode and its data are automatically removed. Unlinked files are not visible in the file system via ls and similar commands, making them harder to manage than normal “linked” file systems.
Apparently only one active snapshot of a file system can exist. This limits the utility of UFS snapshots to be a kind of safety net for users or systems administrators. For instance, a snapshot could be made once a night, but only one day’s worth of old data would then be available.
On the whole, fssnap is a welcome addition to the Unix filesystems functionality. Sun is obviously paying attention to its file systems, and adding features to make it more competitive.
With ZFS you can not only create snapshot but create a clone of a snapshot. The ZFS Administration Guide describes a clone thus:
A clone is a writable volume or file system whose initial contents are the same as the dataset from which it was created. As with snapshots, creating a clone is nearly instantaneous, and initially consumes no additional disk space. In addition, you can snapshot a clone.
The discussion of linux capabilities was partially borrowed from Linux.org - Taking a Backup Using Snapshots
A snapshot volume is a special type of volume that presents all the data that was in the volume at the time the snapshot was created. This means we can back up that volume without having to worry about data being changed while the backup is going on, and we don't have to take the database volume offline while the backup is taking place.
A snapshot volume can be as large or a small as you like but it must be large enough to hold all the changes that are likely to happen to the original volume during the lifetime of the snapshot. In the example below we are allocating 500 megabytes of changes to the volume:
# lvcreate -L592M -s -n dbbackup /dev/ops/databases lvcreate -- WARNING: the snapshot must be disabled if it gets full lvcreate -- INFO: using default snapshot chunk size of 64 KB for "/dev/ops/dbbackup" lvcreate -- doing automatic backup of "ops" lvcreate -- logical volume "/dev/ops/dbbackup" successfully created
# mkdir /mnt/ops/dbbackup # mount /dev/ops/dbbackup /mnt/ops/dbbackup mount: block device /dev/ops/dbbackup is write-protected, mounting read-only
# mount /dev/ops/dbbackup /mnt/ops/dbbackup -onouuid,ro
Solaris UFS definitely has better ACL support record: it support ACLs for ten years (since version 2.5; released in 1995), while Linux only recently adding limited and somewhat inconsistent support (only in 2.6 kernel it was incorporated into kernel, in previous versions it can be supported only with a patch). Moreover in Linux ACLs are still more a vulnerability then a security feature due to the behavior of GNU utilities (most of them does not yet understand ACLs). Interoperability of Linux and Solaris for ACLs is limited. NFS works correctly with ACLs only on Solaris. Usage of GNU utilities like ls and tar for files with ACLs on Solaris can lead to strange results and wrong permissions displayed or set.
Linux was "ACL-retarded" OS for quite a long time caused a lot of grief for administrators who use to work with ACLs on other OSes. But this recently changed. It is fully implemented in RHEL 5.2 and later, as well as Suse 10 SP2 and later ACLs are additional sets of read/write/execute triplets (rwx) that can be added on to files, directories, devices, or any other file system objects on per user or per group basis.
Now we can assume that implementations are more or less equal with the usual slight Solaris advantage in stability and less amount of bugs.
Support of extended attributes is limited in both OSes. See Extended file attributes - Wikipedia, the free encyclopedia:
In Linux, the ext2, ext3, ext4, JFS, ReiserFS and XFS filesystems support extended attributes (abbreviated xattr) if the libattr feature is enabled in the kernel configuration. Any regular file may have a list of extended attributes. Each attribute is denoted by a name and the associated data. The name must be a null-terminated string, and must be prefixed by a namespace identifier and a dot character. Currently, four namespaces exist: user, trusted, security and system. The user namespace has no restrictions with regard to naming or contents. The system namespace is primarily used by the kernel for access control lists. The security namespace is used by SELinux, for example.
Extended attributes are not widely used in user-space programs in Linux, although they are supported in the 2.6 and later versions of the kernel. Beagle does use extended attributes, and freedesktop.org publishes recommendations for their use.
... ... ...The Solaris operating system version 9 and later allows files to have "extended attributes", which are effectively forks. Internally, they are actually stored and accessed like normal files, so their names cannot contain "/" characters, their size is practically unlimited and their ownership and permissions can differ from those of the parent file.
Version 4 of the Network File System supports extended attributes in much the same way as Solaris.
None of implementation has quality and flexibility of NTFS implementation:
Windows NT supports extended attributes on FAT and HPFS filesystems in the same way as OS/2 does. The NTFS file system was also designed to store them, as one of the many possible file forks, to accommodate the OS/2 subsystem. OS/2 extended attributes are accessible to any OS/2 programs the same way as in native OS/2 and to any Windows program through the BackupRead and BackupWrite system calls. They are notably used by the NFS server of the Interix POSIX subsystem in order to implement Unix-like permissions.
Solaris UFS filesystem does not implement some innovations introduced by later versions of BSD filesystem like immutable attribute for files while Linux ext2fs/ext3fs implements them incorrectly. Immutable attribute was a really interesting innovation originated in BSD camp. It eliminates "god-like" status of root: it is bound not to UID, but depends purely on the run level and security level. As the name implies files with this attribute set can only be read. What is important is that even root user on higher runlevels cannot write or delete them. The system first needs to be switched to a single user mode to perform those operations. This attribute is perfect for military-grade protection of sensitive configuration files and executables: for most such servers patching can and probably should be done in a single user mode. I am amazed that Solaris never implemented this concept.
Servers with high requirement for uptime might represent a problem but here one probably needs to use clusters, anyway.
Also immutable file or directory can 't be renamed, no further link can be created to it and it cannot be removed. Note that this also prevents changes to access time, so files with immutable attribute have "no access time" attribute activated implicitly and as such can be accessed faster.
The second impresting additional attribute introduced by BSD can was append-only files: a weaker version of immutable attribute with similar semantic. Append-only files can be opened in write mode, but data is always appended at the end of the file. Like immutable files, they cannot be deleted or renamed. This is especially useful for log files which can only grow. For a directory, this means that you can only add files to it, but you cannot rename or delete any existing file. That means that for directories they actually represent a useful variant of immutable attribute: you can only add files but cannot tough any existing files.
In BSD the access to filesystem depends on additional global flag called a securelevel. It is a one-way street: as soon as a securelevel set is cannot be decremented At higher securelevels, not even root, can access the disk directly, which is a classic method of bypassing all other protection mechanism as long as one got root access. In some sense securelevels are similar to runlevels.
Linux replicated BSD style attributes in ext2/ext3 filesystem, but they are implemented incorrectly as the key idea behind BSD solution (that idea that attributes are not UID associated privileges, but the run level associated privileges) is missing. BTW what linux implemented is really Windows-style solution. Here is the list of ext2fs attributes:
There is a third-party patch for 2.6 kernel that makes the behavior identical to BSD (see Linux-Kernel Archive [PATCH] BSD Secure Levels LSM (1-3)). See also Improving the Unix API
Without tuning native Solaris filesystem behaves badly serving huge amount of small files concentrated in few directories (the situation typical for corporate mail servers running Sendmail as well as some spamfilters). Here a B-tree based filesystem like Reiser might have an edge but as I mentioned before for all practical purposes Reiser filesystem is dead.
I suspect that Linux can be tuned to perform substantially better in this environment, but availability of NAS makes this point rather mute in the enterprise environment. In this case NFS implementation in one that really matter.
Newer ZFS is still not deployed long enough in large enterprise environments for full-fledged comparisons. Still it is a really modern filesystem with a lot to promise in performance area. One thing, the level of total complexity looks very promising as it looks that ZFS is less complex the UFS implementation-wise:
A lot of comparisons have been done, and will continue to be done, between ZFS and other filesystems. People tend to focus on performance, features, and CLI tools as they are easier to compare. I thought I'd take a moment to look at differences in the code complexity between UFS and ZFS. It is well known within the kernel group that UFS is about as brittle as code can get. 20 years of ongoing development, with feature after feature being bolted on tends to result in a rather complicated system. Even the smallest changes can have wide ranging effects, resulting in a huge amount of testing and inevitable panics and escalations. And while SVM is considerably newer, it is a huge beast with its own set of problems. Since ZFS is both a volume manager and a filesystem, we can use this script written by Jeff to count the lines of source code in each component. Not a true measure of complexity, but a reasonable approximation to be sure. Running it on the latest version of the gate yields:------------------------------------------------- UFS: kernel= 46806 user= 40147 total= 86953 SVM: kernel= 75917 user=161984 total=237901 TOTAL: kernel=122723 user=202131 total=324854 ------------------------------------------------- ZFS: kernel= 50239 user= 21073 total= 71312 -------------------------------------------------
The numbers are rather astounding. Having written most of the ZFS CLI, I found the most horrifying number to be the 162,000 lines of userland code to support SVM. This is more than twice the size of all the ZFS code (kernel and user) put together! And in the end, ZFS is about 1/5th the size of UFS and SVM. I wonder what those ZFS numbers will look like in 20 years...
ZFS has also some interesting ideas in CLI interface design:
... One of the hardest parts of designing an effective CLI is to make it simple enough for new users to understand, but powerful enough so that veterans can tweak everything they need to. With that in mind, we adopted a common design philosophy:
"Simple enough for 90% of the users to understand, powerful enough for the other 10% to use
A good example of this philosophy is the 'zfs list' command. I plan to delve into some of the history behind its development at a later point, but you can quickly see the difference between the two audiences. Most users will just use 'zfs list':$ zfs list NAME USED AVAIL REFER MOUNTPOINT tank 55.5K 73.9G 9.5K /tank tank/bar 8K 73.9G 8K /tank/bar tank/foo 8K 73.9G 8K /tank/foo
But a closer look at the usage reveals a lot more power under the hood:list [-rH] [-o property[,property]...] [-t type[,type]...] [filesystem|volume|snapshot] ...
In particular, you can ask questions like 'what is the amount of space used by all snapshots under tank/home?' We made sure that sufficient options existed so that power users could script whatever custom tools they wanted.
Solution driven error messages
Having good error messages is a requirement for any reasonably complicated system. The Solaris Fault Management Architecture has proved that users understand and appreciate error messages that tell you exactly what is wrong in plain English, along with how it can be fixed.
A great example of this is through the 'zpool status' output. Once again, I'll go into some more detail about the FMA integration in a future post, but you can quickly see how basic FMA integration really allows the user to get meaningful diagnostics on their pool:$ zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool online' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: none requested config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 mirror ONLINE 0 0 0 c1d0s0 ONLINE 0 0 3 c0d0s0 ONLINE 0 0 0
Consistent command syntax
When it comes to command line syntax, everyone seems to have a different idea of what makes the most sense. When we started redesigning the CLI, we took a look at a bunch of other tools in Solaris, focusing on some of the more recent ones which had undergone a more rigorous design. In the end, our primary source of inspiration were the SMF (Server Management Facility) commands. To that end, every zfs(1M) and zpool(1M) command has the following syntax:<command> <verb> <options> <noun> ...
There are no "required options". We tried to avoid positional parameters at all costs, but there are certain subcommands (zfs get, zfs get, zfs clone, zpool replace, etc) that fundamentally require multiple operands. In these cases, we try to direct the user with informative error messages indicating that they may have forgotten a parameter:# zpool create c1d0 c0d0 cannot create 'c1d0': pool name is reserved pool name may have been omitted
If you mistype something and find that the error message is confusing, please let us know - we take error messages very seriously. We've already had some feedback for certain commands (such as 'zfs clone') that we're working on.
Modular interface design
On a source level, the initial code had some serious issues around interface boundaries. The problem is that the user/kernel interface is managed through ioctl(2) calls to /dev/zfs. While this is a perfectly fine solution, we wound up with multiple consumers all issuing these ioctl() calls directly, making it very difficult to evolve this interface cleanly. Since we knew that we were going to have multiple userland consumers (zpool and zfs), it made much more sense to construct a library (libzfs) which was responsible for managing this direct interface, and have it present a unified object-based access method for consumers. This allowed us to centralize logic in one place, and the command themselves became little more than glorified argument parsers around this library.
Linux kernel also have several interesting enhancements that increase flexibility of filesystems. Among them
And that's not surprising. The key developers of Linux are approaching "greybeard" mode and the key designer in ten years might well be on his way to personal yacht and other entertainments of dot-com millionaires. In other words the situation can be aptly characterized by immortal works of Anarchist Zheleznyakov "the guard is tiered".
As much as I do not like Java, it did become a new Cobol and the most common development language for enterprise developers. Here Solaris has a definite edge over Linux in the quality of implementation: Java is native for Solaris environment and the amount of attention to this environment among developers is second only to Windows. Therefore Solaris has home field advantage over Linux in this very important for large enterprises area. Solaris has an additional edge due to an excellent support of threads. Linux provides several third party threads packages. Information on a variety of threads implementations available for Linux can be found in the Linux Documentation Project, www.tldp.org/FAQ/Threads-FAQ/.
The most common package is the Linux threads package based on the 2.4 kernels, which is present in GNU libc version 2 and provided as part of all current distributions. While similar to the POSIX threads implementation it has a number of shortcomings. For more information see http://pauillac.inria.fr/~xleroy/linuxthreads Detailed API comparisons can be found at: www.mvista.com/developer/shared/SolMig.pdf.
There is also a the newer Native POSIX Threads Library (NPTL). The Native POSIX Threads Library (NPTL) implementation is much closer to the POSIX standard and is based on the 2.6 kernel. This version is also included in GNU libc and has been backported onto some distributions with RH kernel see www.redhat.com/whitepapers/developer/POSIX_Linux_Threading.pdf
HP provides to customers Solaris threads-compatible threads library for Linux, see www.opensource.hp.com/the_source/linux_papers/scl_solaris.htm . The project homepage is located at www.sourceforge.net/projects/sctl .
Solaris has the concept of processor sets as CPU sets. CPU sets let you restrict which processors are utilized by various processes or process groups. This is a very useful feature on systems where you can benefit from static allocation of CPU resources and it can help to prevent certain types of denial of service attacks.
Linux does not has this capability in standard kernels. but so third party tools are available, see www.bullopensource.org/cpuset/. HP provides its Process Resource Manager (PRM) for their customers www.hp.com/go/prm
It looks like on servers with 4 CPUs linux is still competitive. Sun Fire V40z holds several world benchmarks with some paradoxically achieved under Linux. It demonstrated SPECint_rate2000 score of 89.9:
SPEC CPU2000 is an industry-standard benchmark that measures CPU and memory intensive computing tasks. It is made up of two benchmark suites focused on integer and floating point performance of the processor, memory and compiler on the tested system. The Sun Fire V40z server, equipped with four AMD Opteron(TM) Model 856 processors and running SuSE Linux (SLES9), achieved the record breaking SPECint_rate2000 score of 89.9.
Still World Record 4-CPU floating point throughput performance for x86 systems belongs to Solaris 10:
The combination of the Solaris 10 OS and Sun(TM) Studio 11 software enabled the Sun Fire V40z server, equipped with four AMD Opteron Model 856 processors, to generate the SPECfp_rate2000 result of 106, effectively more than doubling the score of 52.5 produced by the competing Dell PowerEdge 6850 server, equipped with four Intel Xeon processors.
The difference with linux is marginal, though (106 vs. 100.37):
Based on real world applications, the SPEC CPU2000 suite measures the performance of the processor, memory and compiler on the tested system. The Sun Fire V40z server, equipped with four AMD Opteron Model 856 processors and running SuSE Linux (SLES9), beats other 4-CPU x86 Linux systems with SPECfp_rate2000 result of 100.37.
Situation became more positive on servers with 8 CPUs, but the difference is still marginal:
On the floating point throughput component of the compute-intensive SPEC CPU2000 benchmark, the Sun Fire V40z server, equipped with the latest multi-core AMD Opteron 880 processors, demonstrates linear scalability with the processor frequency, when compared to the previously posted result. By utilizing the latest Sun Studio 11 software running on the Solaris 10 OS, Sun's server achieved the score of 153 and surpassed the previous HP record of 144 by over 6%.
All-in-all SMP advantages in Solaris are not pronounced on less then 8 CPUs.
Sun provides set of POSIX utilities for Solaris and they look pretty basic. This is a serious weakness. GNU set of utilities while far from being perfect is definitely more flexible and more sysadmin friendly. For example:
Solaris does not have chpasswd utility and that unnecessary complicates mass changing of passwords (it requires scripting via expect and its derivatives or deployment of custom implementation of chpasswd). That's a definite disadvantage over linux and some other commercial Unixes (for example, AIX does provide chpasswd). Moreover linux passwd command has option --stdin option that permits for root change passwords supplied via stand inpout command). In March 2006 Stephen Hahn promised to write chpasswd for OpenSolaris but I did not see results yet. There is also a Perl module Description - Passwd::Solaris
The best current workaround for Solaris is to deploy Perl module
expect.pm; the latter requires the IO::Tty, also available
from CPAN and optionally IO::Stty). If
expect.pm is present on all the boxes then require
changes of passwords can be done automatically from the central location.
In Solaris "native" grep does not support Perl regular expressions, which is pretty annoying
if you need to work with both Solaris and linux
Solaris find is much weaker then GNU find. You can use GNU versions but understanding of RBAC and ACLs leaves much to be desired.
The advantage of Solaris set of utilities is a good, clean integration with the OS: they are integral part of the OS and that shows. That means full understanding and support of ACL, RBAC, etc.
But it will be an oversimplification to say that linux is completely free of similar problems. First of all despite additional options GNU utilities are a mixed blessing as they smell Microsoft: arbitrary extensions and Christmas tree approach to architecture are clearly visible and probably unavoidable within the framework of GPL licensing. Some extensions are good, but a lot of either not necessary or plain vanilla bad. Integration of GNU utilities and linux sometimes is pretty spotty. And this goes far beyond complex issues related to treatment of ACLs. For example, here is how GNU find behaves on Suse 10 (the error was corrected in OpenSuse 10.3 so SLES find can be replaced with OpenSuse 10.3 version it to avoid this problem)
# find / -name apache ... ... ... find: /proc/0: No such file or directory find: WARNING: Hard link count is wrong for /: this may be a bug in your filesystem driver. Automatically turning on find's -noleaf option.
Earlier results may have failed to include directories that should have been searched. ns2:/etc #
While this bug is connected with the fact that /proc filesystem is dynamic. And it is true that this "Automatically turning on find's -noleaf option" bug can be avoided using predicate -prune. For example,
find . -wholename '/proc' -prune -o -name file_to_be_found
But for sysadmins this is pretty annoying and if you administer a lot of linux boxes it is better to create alias ff (or, even better, the procedure ff, but alias is simpler and will limit ourselves to this solution):
if [[ `uname` == "Linux" ]] ; do
alias ff='find . -wholename '/proc' -prune -o -name '
ff='find . -name ' # not GNU find does not support -wholename
Linux beats Solaris hands down in package management. while RPM format is a derivative of the Solaris package format (Posix package format to be exact) additional tools like yum make the difference very pronounced.
You cannot talk about internals not talking about system documentation even if it is actually belongs to the applications area. While Unix pioneered online documentation with man, after so many years man format looks backward and completely obsolete. HTML format is the way to go and Microsoft understood that simple fact early on. Unfortunately neither Solaris not linux developers still can get that. Tools provided for documentation leave much to desired in both Solaris and linux. In the past Solaris used to provide html documentation with a small web server, but this subsystem disappeared due to some security holes. At least Sun maintain its documentation in XML, so it is translatable into any format imaginable. In addition to man Linux has documentation in another semi-obsolete format texinfo (based on famous Donald Knuth TeX typesetter). This is now "dead-on-arrival" GNU project and inability of GNU folks to face reality and switch to html is simply amazing. For linux one can say that not only documentation sacks, the formats and tools sucks too. Of course you can assemble pretty decent set of tools yourself (elinks is a very good command line WEB browser that can be used for this purpose), but we are talking about what is delivered with 'stock" OS distribution. Again both commercial linux distributors and Sun in this area demonstrate amazing level of incompetence.
Now let's talk briefly about the content of the online document freely available with each of the OSes. Solaris has the best online documentation of all free OSes. Just compare the quality of Solaris man pages with linux man pages to see the difference. While Solaris man pages are far from being perfect they are current, workable documents. In many cases linux man pages provide just an illusion of information and some of them in no way correspond to the version of software installed.
In addition to man pages Solaris has an extensive online documentation. In modern web-enabled environment only masochistic Solaris administrators ever use internal Solaris man pages when HTML version is available from the WEB. This does not excuse the stupidity of not supplying html version of man pages with the OS but still it is better then nothing.
Sun's forte is actually in midsize documents called blueprints which are somewhat similar to long dead linux How-to project. Each blueprint which usually provided in PDF, not HTML, is devoted to one specific topic. More then a dozen out of approximately hundred published blueprints are of extremely high technical quality (the level equal of the best O'Reilly books) and each of them addresses important areas of Solaris deployment saving many hours for Solaris system administrators. In my opinion the level of fragmentation of linux into semi-compatible distributions put brakes on any meaningful work in this area.
As for amount of free "full-size" books Sun also looks very good. It provides administrators with free electronic versions of all major books for Solaris administration. Only IBM rivals Sun in the amount and quality of electronic books provided to administrators (IBM's world famous "Red Books" series).
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Created Jan 2, 2005. Last modified: September 12, 2017