Softpanorama May the source be with you, but remember the KISS principle ;-)	Home	Switchboard	Unix Administration	Red Hat	TCP/IP Networks	Neoliberalism	Toxic Managers
	(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix

Softpanorama Bulletin. Vol 31, No. 03 (July, 2019)

Bulletin	1998	1999	2000	2001	2002	2003	2004	2005	2006	2007
2008	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018
2019
Jan	Feb	Mar	Apr	May	Jun	Jul	Sept	Oct	Nov	Dec

The tar pit of Red Hat overcomplexity

Dr. Nikolai Bezroukov

Version 1.62

Introduction
Five flavors of Red Hat
Mutation of Red Hat techsupport onto Knowledgebase's indexing service: further deterioration of the quality of tech support in comparison with RHEL6 caused by overcomplexity
Human competence vs excessive automation
Regression to the previous system image as a troubleshooting method
RHEL 7 fiasco: adding desktop features in a server-oriented distribution
RHEL 7 denies the usefulness of previous knowledge which makes overcomplexity more painful
Note about moving directories /bin, /lib, /lib64, and /sbin to /usr
RHEL licensing
Other problems with RHEL7
Avoiding useless daemons during installation
Conclutions
Acknowledgment

Abstract

The key problem with RHEL7 is its adoption of systemd as well as multiple and significant changes in standard utilities and daemons. Troubleshooting skills needed are different from RHEL6. It also "obsoletes "pre-RHEL 7" books including many (or even most) good books from O'Reilly and other publishers.

The distance between RHEL6 and RHEL7 is approximately the same as the distance between RHEL6 and Suse 12 so that we can view RHEL7 as a new flavor of Linux introduced into an enterprise environment. Which increases the load of a sysadmin who previously supported RHEL5,6 and Suse Enterprise 12 30% or more. The amount of headache and hours you need to spend on the job increased, but salaries not so much. The staff of IT departments continues to shrink, despite the growth of the overcomplexity.

Passing RHCSA exam is now also more difficult than for RHEL 6 and requires at least a year of hands-on experience along with self-study even for those who were certified in RHEL6. "Red Hat engineer" certification became useless as complexity is such that no single person can master the whole system. Due to growing overcomplexity tech support for RHEL7 mutated into Red Hat Knowledgebase indexing service. Most problems now require extensive research to solve, hours of reading articles in Red Hat Knowledgebase, documentation, posts in Stack Overflow and similar sites. In this sense "Red Hat engineer" certification probably should be renamed to "Red Hat detective", or something like that.

This behavior of Red Hat, which amounts to the intentional destruction of the existing server ecosystem by misguided "desktop Linux" enthusiasts within Red Hat developer community, and by Red Hat brass greed, requires some adaptation. As support became increasingly self-support anyway, it might make sense to switch to wider use CentOS on the low end, non-critical servers too, which allows buying Premium licenses for the critical servers.

Introduction

Without that discipline, too often, software teams get lost in what are known in the field as "boil-the-ocean" projects -- vast schemes to improve everything at once. That can be inspiring, but in the end we might prefer that they hunker down and make incremental improvements to rescue us from bugs and viruses and make our computers easier to use.

Idealistic software developers love to dream about world-changing innovations; meanwhile, we wait and wait for all the potholes to be fixed.

Frederick Brooks, Jr,
The Mythical Man-Month, 1975

Imagine a language in which both grammar and vocabulary is changing each decade. Add to this that the syntax is complex, the vocabulary is huge and each verb has a couple of dozens of suffixes that often change its meaning, sometimes drastically. This is the situation of "complexity trap" that we have in enterprise Linux. In this sense troubles with RHEL7 are just a tip of the iceberg. You can learn some subset when you closely work with the particular subsystem (package installation, networking, nfsd, httpd, sshd, Puppet/Ansible, Nagios, and so on and so forth) only to forget vital details after a couple quarters, or a year. I have a feeling that RHEL is spinning out of control at an increasingly rapid pace. Many older sysadmins now have the sense that they no longer can understand the OS they need to manage and were taken hostages by desktop enthusiasts faction within Red Hat. They now need to work in a kind of hostile environment. Many complain that they feel overwhelmed with this avalanche of changes, and can't troubleshoot problems that they used to able to troubleshoot before. In RHEL7 it is easier for an otherwise competent sysadmin to miss a step, or forget something important in the stress and pressure of the moment even for tasks that he knows perfectly well. That means that SNAFU become more common. Overcomplexity of RHEL7, of course, increases the importance of checklists and personal knowledgebase (most often set of files, or a simple website, or a wiki).

But creation of those requires time that is difficult to find and thus is often skipped: it is not uncommon to spent more time on creating documentation to the problem, than on its resolution. So "document everything" mantra is definitely increasing the overload.

The situation reached the stage of the inability to remember the set of information necessary for productive work. Now even "the basic set" is way too large for mere mortals, especially for sysadmins, who need to maintain one additional flavor of Linux (say, Suse Enterprise). Any uncommon task became a research: you need to consult man pages, as well as available on the topic Web pages such as documents at Red Hat portal, Stack overflow discussion, and other relevant to the problem in hand sites. Often you can't perform again the task that you performed a quarter or two ago without consulting your notes and documentation.

Even worse is the resulting from overcomplexity "primitivization" of your style of work. Sometimes you discover an interesting and/or more productive way to perform an important task. If you do not perform this task frequently and do not write this up, it will soon be displaced in your memory with other things: you will completely forget about it and degrade to the "basic" scheme of doing things. This is very typically for any environment that is excessively complex. That's probably why so few enterprise sysadmin those days have their personal .profile and .bashrc files and often simply use defaults. Similarly the way Ansible/Puppet/Chef are used is often a sad joke -- they are used on so primitive level that execution of scripts via NFS would do the same thing with less fuss. The same is true for Nagios which in many cases is "primitivised" to a glorified ping.

RHEL6 was complex enough to cause problems due to overcomplexity. But RHEL7 increased the level of complexity to such a level that it become painful to work with, and especially to troubleshoot complex problems. That's probably why the quality of Red Hat support deteriorated so much (it essentially became referencing service to Red Hat advisories) -- they are overwhelmed, and no longer can concentrate of a single ticket in this river of tickets related to RHEL7 that they receive daily. In many cases it is clear from their answers that they did not even tried to understand the problem you face and just searched the database for keywords.

While adoption of hugely complex, less secure (with a new security exploit approximately once a quarter), and unreliable systemd instead of init probably played a major role in this situation (and if you patch you server periodically you will notice that the lion share of updates includes updates to systemd), other changes were also significant enough to classify RHEL 7 as a new flavor of Linux, distinct from RHEL4-RHEL6 family: along with systemd, multiple new utilities and daemons were introduced as well; also many methods of going common tasks changed, often dramatically. Especially in the area of troubleshooting.

RHEL7 should be viewed as a new flavor of Linux, not as a version upgrade. As such productivity of sysadmins juggling multiple flavors of Linux drops. The value of many (or most) high quality books about Linux was destroyed by this distribution, and there is no joy to see this vandalism

It might well be the resulting complexity crosses some "red line" (as in "The straw that broke the camel's back ") and instantly started to be visible and annoying to vastly more people then before. In other words, with the addition of systemd quantity turned into quality Look, for example, at the recovery of forgotten sysadmin password in RHEL7 vs. RHEL6. This made this version of RHEL dramatically less attractive choice, as learning curve is steep and troubleshooting skills needed are different from RHEL6. The value of many high quality books about Linux was destroyed by this distribution, and there is no joy to see this vandalism. And you feel somewhat like a prisoner of somebody else decisions: no matter whether you approve those changes or not, the choice is not yours: after 2020 most enterprise customers need to switch, as RHEL6 will be retired... You have essentially no control.

RHEL7 might well create the situation in which overcomplexity started to be visible and annoying to vastly more sysadmins then before.
(as in "The straw that broke the camel's back "). In other words, with the addition of systemd quantity turned into quality

Still it will be an exaggeration to say that all sysadmin (or RHEL 7 users in general) resent systemd. Many don't really care. Not much changed for "click-click-click" sysadmins who limit themselves to GUI tools, and other categories of users who use the RHEL7 "as is", such Linux laptop users. But even for laptop users (who mostly use Ubuntu, not RHEL), there are doubts that they are benefitting from faster boot ( systemd is at least 10 times larger then classic init and SSD drives accelerated loading of daemons dramatically, making delayed approach adopted by systemd obsolete). It is also easier for entry level sysadmins who are coming to the enterprises right now and for whom RHEL7 is their first system. Most use Ubuntu at home and at university and they do not know or use any system without systemd. Also, they use much smaller subset of services and utilities, and generally are not involved with problem connected with more complex problems connected with hanged boots, botched networking, and crashed daemons. But enterprise sysadmins who often are simultaneously "level 3 support" specialists were hit hard. Now they understand internals much less.

In a way, introduction of systemd signify "Microsoftization" of Red Hat. This daemon replaces init in a way I find problematic: the key idea is to replace a language in which init scripts were written (which provides programmability and has its own flaws, which still were fixable) with the fixed set of all singing, all dancing parameters in so called "unit files", removing programmability.

The key idea of systemd is to replace a language in which init scripts were written (which provides programmability and has its own flaws, which still were fixable) with the fixed set of all singing, all dancing parameters grouped in so called "unit files" and the interpret of this "ad hoc" new non-procedural language written in C. This is a flawed approach from the architectural standpoint, and no amount of systemd fixes can change that.

Systemd is essentially an implementation of the interpreter of implicit non-procedural language defined by those parameters by a person, who never have written an interpreter from a programming language in his life, and never studied writing compilers and interpreters as a discipline. Of course, that shows.

It supports over a hundred various parameters which serve as keywords of this ad-hoc language, which we can call "Poettering init language." You can verify the number of introduced keywords yourself, using a pipe like the following one:

cat /lib/systemd/system/[a-z]* | egrep -v "^#|^ |^\[" | cut -d '=' -f 1 | sort | uniq -c | sort -rn

Yes, this is "yet another language" (as if sysadmins did not have enough of them already), and a badly designed one judging from to the total number of keywords/parameters introduced. The most popular are the following 35 (in reverse frequency of occurrence order):

    234 Description
    215 Documentation
    167 After
    145 ExecStart
    125 Type
    124 DefaultDependencies
    105 Before
     78 Conflicts
     68 WantedBy
     56 RemainAfterExit
     55 ConditionPathExists
     47 Requires
     37 Wants
     28 KillMode
     28 Environment
     22 ConditionKernelCommandLine
     21 StandardOutput
     21 AllowIsolate
     19 EnvironmentFile
     19 BusName
     18 ConditionDirectoryNotEmpty
     17 TimeoutSec
     17 StandardInput
     17 Restart
     14 CapabilityBoundingSet
     13 WatchdogSec
     13 StandardError
     13 RefuseManualStart
     13 PrivateTmp
     13 ExecStop
     12 ProtectSystem
     12 ProtectHome
     12 ConditionPathIsReadWrite
     12 Alias
     10 RestartSec

Another warning sign about systemd is paying outsize attention for the subsystem that loads/unloads the initial set of daemons and then manages the working set of daemon, replacing init scripts and runlevels with a different and dramatically more complex alternative. Essentially they replaced reasonably simple and understandable subsystem with some flaws, by a complex and opaque subsystem with non-procedural language defined by the multitude of parameters. While the logic of application of those parameters is hidden within systemd code. Which due to its size has a lot more flaws. As well as side effects because proliferation of those parameters and sub parameters is never ending process: the answer to any new problem discovered in the systemd is the creation of additional parameters or, in best case, modification of existing. Which looks to me like a self-defeating, never ending spiral of adding complexity to this subsystem, which requires incorporation into systemd many things that simply do not belong to the init. Thus making it "all signing all dancing" super daemon. In other words, systemd expansion might be a perverted attempt to solve problems resulting from the fundamental flaws of the chosen approach.

This fascinating story of personal and corporate ambition gone awry still wait for its researcher. When we are thinking about some big disasters like, for example, Titanic, it is not about the outcome -- yes, the ship sank, lives were lost -- or even the obvious cause (yes, the iceberg), but to learn more about WHY. Why the warnings were ignored ? Why such a dangerous course was chosen? One additional and perhaps most important for our times question is: Why the lies that were told were believed?

Currently customers on the ruse are assured that difficulties with systemd (and RHEL7 in general) are just temporary, until the software could be improved and stabilized. While some progress was made, that day might never come due to the architectural flaws of such an approach and the resulting increase in complexity as well as the loss of flexibility, as programmability is now is more limited. If you run the command find /lib/systemd/system | wc -l on a "pristine" RHEL system (just after the installation) you get something like 327 unit files. Does this number raise some questions about the level of complexity of systemd? Yes it does. It's more then three hundred files and with such a number it is reasonable to assume that some of them might have some hidden problem in generation of the correct init file logic on the fly.

You can think of systemd as the "universal" init script that on the fly is customized by supplied unit file parameters. Previously part of this functionality was implemented as a PHP-style pseudo-language within the initial comment block of the regular bash script. While the implementation was very weak (it was never written as a specialized interpreter with formal language definition, diagnostics and such), this approach was not bad at all in comparison with the extreme, "everything is a parameter" approach taken by systems, which eliminated bash from init file space. And it might make to return to such a "mixed" approach in the future on a new level, as in a way systemd provides the foundation for such a language. Parameters used to deal with dependencies and such can be generated and converted into something less complex and more manageable.

One simple and educational experiment that shows brittleness of system approach how is to replace on a freshly installed RHEL7 VM /etc/passwd /etc/shadow and /etc/group with files from RHEL 6 and see what happens during the reboot (BTW such error can happen in any organization with novice sysadmins, who are overworked and want to cut corners during the installation of the new system, so such behaviour is not only of theoretical interest)

Now about the complexity: the growth of codebase (in lines of codes) is probably more then ten times. I read that the systemd has around 100K lines of C code (or 63K if you exclude journald, udevd, etc). In comparison sysvinit has about 7K lines of C code. The total number of lines in all systemd-related subsystems is huge and by some estimates is close to a million (Phoronix Forums). It is well known that the number of bugs grows with the total lines of codes. At a certain level of complexity "quantity turns in quality": the number of bugs became infinite in a sense that the system can't be fully debugged due to intellectual limitations of authors and maintainers in understanding the codebase as well as gradual deterioration of the conceptual integrity with time.

At a certain level of complexity "quantity turns in quality": the number of bugs became infinite in a sense that the system can't be fully debugged due to intellectual limitations of authors and maintainers in understanding the codebase as well as gradual deterioration of the conceptual integrity with time.

It is also easier to implant backdoors in complex code, especially in privileged complex code. In this sense a larger init system means a larger attack surface, which on the current level of Linux complexity is already substantial with the never ending security patches for major subsystems (look at security patches in CentOS Pulse Newsletter, January 2019 #1901). as well as the parallel stream of Zero Day exploits for each major version of Linux on which such powerful organization as NSA is working day and night, as it is a now a part of their toolbox. As the same systemd code is shared by all four major Linux distributions (RHEL, SUSE, Debian, and Ubuntu), systemd represents a lucrative target for Zero Day exploits.

As boot time does not matter for servers (which often are rebooted just a couple of times a year) systemd raised complexity level and made RHEL7 drastically different from RHEL6, while providing nothing constructive in return. The distance between RHEL6 and RHEL7 is approximately the same as distance between RHEL6 and Suse, so we can speak of RHEL7 as a new flavor of Linux and about introduction of a new flavor of Linux into enterprise environment. Which, as any new flavor of Linux, raises the cost of system administration (probably around 20-30%, if the particular enterprise is using mainly RHEL6 with some SLES instances)

Systemd and other changes made RHEL7 as different from RHEL6 as Suse distribution is from Red Hat. Which, as any new flavor of Linux, raises the cost of system administration (probably around 20-30%, if the particular enterprise is using RHEL6 and Suse12 right now)

Hopefully before 2020 we do not need to upgrade to RHEL7, and have time to explore this new monster. But eventually, when support of RHEL 6 ends, all RHEL enterprise users either need to switch to RHEL7 or to alternative distribution such as Oracle Linux or SUSE Enterprise. Theoretically the option to abandon RHEL for other distribution that does not use systemd also exists, but corporate inertia is way to high and such a move is too risky to materialize. Some narrow segments, such as research probably can use Debian without systemd, but in general that answer is no. Because this is a Catch 22 -- adopting distribution without systemd raised the level complexity (via increased diversification of Linux flavors) to the same levels as adopting RHEL 7 with system.

Still, at least theoretically, this is actually an opportunity for Oracle to bring Solaris solution of this problem to Linux, but I doubt that they want to spend money on it. They probably will continue their successful attempts to clone Red Hat (under the name of Oracle Linux) improving kernel and some other subsystems to work better with Oracle database. They might provide some extension of the useful life of RHEL6 though, as they did with RHEL 5.

Taking into account dominant market share of RHEL (which became Microsoft of Linux) finding an alternative ("no-systemd) distribution is a difficult proposition. But Red Hat acquisition by IBM might change that. Neither HP, nor Oracle have any warm feelings toward IBM and preexisting Red Hat neutrality to major hardware and enterprise software vendors now disappeared, making IBM de-facto owner of enterprise Linux. As IBM enterprise software directly competes with enterprise software from HP and Oracle, that fact gives IBM a substantial advantage.

IBM acquisition of Red Hat made IBM the de-facto owner of Enterprise Linux

One emerging alternative to CentOS in research is Devuan (a clone of Debian created specifically to avoid using systemd). If this distribution survives till 2020, it can be used by customers who do not depend much on commercial software (for example, for genomic computational clusters). But the problem is that right now this is not an enterprise distribution per se, as there no commercial support for it. I hope that they will create some startup to provide it. Still it is pretty reliable and can compete with CentOS. Latest versions of Debian also can be installed without systemd and Gnome, but systemd remains the default option.

But most probably you will be forced to convert in the same way as you was forced to convert from Windows 7 to Windows 10 on enterprise desktop. So much about open source as something providing a choice. This is a choice similar to famous quote from Henry Ford days: "A customer can have a car painted any color he wants as long as it’s black". With the current complexity of Linux the key question is "Open for whom?" For hackers? Availability of the source code does not affect much most enterprise customers as the main mode of installation of OS and packages is via binary, precompiled RPMs, not the source complied on the fly like in Gentoo or FreeBSD/OpenBSD. I do not see RHEL7 in this respect radically different from, say, Solaris 11.4, which is not supplied in source form for most customers. BTW while more expensive (and with the uncertain future) is also more secure (due to RBAC and security via obscurity ;-) and has a better polished light weight virtual machines (aka zones, including branded zones ), DTrace, as well as a better filesystem (ZFS), although XFS is not bad iether, although it is unable fully utilize advantages of SSD disks.

Actually Lennart Poettering is an interesting Trojan horse within the Linux developers community. This is an example how one talented, motivated and productive Apple (or Windows) desktop biased C programmer can cripple a large open source project facing no any organized resistance. Of course, that means that his goals align with the goals of Red Hat management, which is to control the Linux environment in a way similar to Microsoft -- via complexity (Microsoft can be called the king of software complexity) providing for lame sysadmins GUI based tools for "click, click, click" style administration. Also standardization on GUI-based tools for the administration provides more flexibility to change internals without annoying users. The tools inner working of which they do not understand and are not interested in understanding. He single-handedly created a movement, which I would call "Occupy Linux" ;-). See Systemd Humor.

And respite wide resentment, I did not see the slogan "Boycott RHEL 7" too often and none on major IT websites jointed the "resistance". The key for stowing systemd down the developers throats was its close integration with Gnome (both Suse and Debian/Ubuntu adopted systemd). It is evident, that the absence of "architectural council" in projects like Linux is a serious weakness. It also suggests that developers from companies representing major Linux distribution became uncontrolled elite of Linux world, a kind of "open source nomenklatura", if you wish. In other words we see the "iron law of oligarchy" in action here. See also Systemd invasion into Linux distributions

Five flavors of Red Hat

Like ice-cream Red Hat exist in multiple flavors:

RHEL -- leading commercial distribution of Linux. The lowest cost is HPC node license which is $100 per year (needs a headnode license; provides only patches -- self-support -- with restricted number of packages in repositories). Patches only regular Enterprise version (self-support license) is $350 per year. Standard license which provides patches and "mostly" web-based support is $800 per server per year (two sockets max; 4 sockets are extra). Premium license that provides phone support for severity 1 and 2 (24 x 7) is more expensive (~ $1300 per year). See for example CDW prices

Three year licenses are also available and are slightly cheaper. From HP and Dell you can by five years licenses as well. There is also HPC license in which you license the headnode with regular, or premium license and then each computational node is $100 each (with severely limited set of packages, and no GUI). I think Oracle Linux self-support license is a better deal then RHEL HPC license (see below)

Red Hat also went "IBM way" and now is charging for 4 socket systems more then for two socket systems. Which now with 16 core CPU available (which means 32 core in a two socket server) are needed only for special, mostly computational and database applications. Few applications scale well above 32 cores as memory management became more convoluted with each additional core (although some categories of user do not understand that and thus belong to the bizarre pseudo-scientific cult -- with the main article of faith "the more cores, the better" ;-)

Oracle Linux. For all practical purposes this is a distribution identical to RHEL, but slightly cheaper. Self-support for enterprise version is almost three times cheaper ( ~$120 per year). It can be used either with Red Hat stock kernel or with (free) Oracle supplied kernel, which supposedly is less buggy and better optimized for running Oracle database and similar application (MySQL is one). The Oracle move to milk weaknesses of GPL was pretty unique and it proved to be shrewd and highly successful.

CentOS -- A community supported distribution that now is sponsored by Red Hat. Both distribution itself and patches are free and are of reasonable, usable quality. Although the general quality is less then Oracle Linux. Suffers from limited resources including access to servers to download patches (kind of "tragedy of commons" type of problems with any free distributions -- in a way CentOS is a victim of its own success). Releases of a clone of the most recent Red Hat version is typically delayed by several months, which is actually a good thing, except for beta-addicts. Here is how they define it in wiki:

CentOS is an Enterprise Linux distribution based on the freely available sources from Red Hat Enterprise Linux. Each CentOS version is supported for 7 years (by means of security updates). A new CentOS version is released every 2 years and each CentOS version is periodically updated (roughly every 6 months) to support newer hardware. This results in a secure, low-maintenance, reliable, predictable and reproducible Linux environment.

Scientific Linux is RHEL based distribution produced by Fermi National Accelerator Laboratory and the European Organization for Nuclear Research (CERN). The web site is Scientific Linux. Like CentOS it aims to be "as close to the commercial enterprise distribution as we can get it." It is a very attractive option for computational nodes of HPC clusters. It also appeals to individual application programmers in such areas as genomic and molecular modeling as it provides a slightly better environment for software development then stock Red Hat.

Fedora -- community distribution, which generally can be considered as a beta version of RHEL. Attractive mostly for beta addicts. Red Hat Enterprise Linux 6 was forked from Fedora 12 and contains back ported features from Fedora 13 and 14. It was the first distribution to adopt systemd: versions of Fedora, starting with Fedora 15 use systemd daemon by Lennart Poettering (who is a well-known desktop Linux enthusiast and Red Hat developer (mainly in C) and who produced, saying politely, "mixed feelings" in server sysadmin community; along with systemd he also authored two other desktop oriented daemons -- pulseaudio and avahi, which also found their way into the server space)

We can also mention SUSE Enterprise, which historically was derived from RHEL and still is by and large compatible with it. While using RPM packages it uses a different package manager. It also allow to use a very elegant AppArmor kernel module for enhanced security (RHEL does not allow to use this module).

Mutation of Red Hat techsupport onto Knowledgebase's indexing service: further deterioration of the quality of tech support in comparison with RHEL6 caused by overcomplexity

The quality of Red Hat tech support progressively deteriorated from RHEL 4 to RHEL 6. So further deterioration is nothing new. It is just a new stage of the same process caused by growing overcomplexity of OS and correspondingly difficulties in providing support to the billion of features present in it.

But while this is nothing new, the drop in case of RHEL 7 was pretty noticeable and pretty painful. Despite several level of support included in licenses (with premium supposedly to be higher level) technical support for really complex cases for RHEL7 is uniformly weak. It degenerated into "looking in database" type of service: an indexing service for Red Hat vast Knowledgebase.

While Knowledgebase is a useful tool and along with Stack overflow often provides good tips were to look, it fails short of classic notion of tech support. Only in rare cases the article recommended exactly describes your problem. Usually this is just a starting point of your own research, no more no less. In some cases references to articles provided are simply misleading and it is completely clear that engineers on the other end have no time or desire seriously word on your tickers and help you.

I would also dispute the notion that the title of Red Hat engineer makes sense for RHEL 7. This distribution is so complex that is it clearly above the ability to comprehend it for even talanted and dedicated professionals. Each area such as networking, process management, installation of packages and patching, security, etc is incredibly complex.   Certification exam barely scratched the surface of this complexity.

Of course, much depends on individual and probably it does serve a useful role providing some assurance that the particular person can learn further, but in no way passing it means that the person has the ability to troubleshoot complex problem connected with RHEL7 (the usual meaning of the term engineer). This is just an illusion. He might be able to cover "in depth" one or two area, for example troblshooing of netwroking and package installation. or troubleshooting of web servers or troubleshooting Docker. In all other areas he will be much less advanced tepoblshooter, almost a novice. And in no way the person can possesses "universal" troubleshooting skill for Red Hat. just network stack is so complex that it now requires not a vendor certification (like Cisco in the past), but PHD in computer science in this area from the top university. And even that might not be enough, especially in environment where proxy is present and some complex staff like bonded interfaces are used. Just making Docker to pull images from repositories, if your network uses Web proxy is a non-trivial exercise ;-).

If you have a complex problem, you are usually stuck, although premium service provide you an opportunity to talk with a live person, which might help. for appliances and hardware you now need to resort to helpdesk of OEM (which means getting extended 5 years warraty from OEM is now a must, not an option).

In a way, unless you buy premium license, the only way to use RHEL7 is "as is". While possible for most production servers, this is not a very attractive proposition for a research environment.

Of course, this deterioration is only particularly connected with RHEL7 idiosyncrasies. Linux generally became a more complex OS. But the trend from RHEL4 to RHEL7 is a clear trend to a Byzantine OS that nobody actually knows well (Even a number of utilities now is such that nobody knows probably more then 30% or 50% of them; just yum and installation of packages in general represent such a complex maze, that you might learn it all your professional career). Other parts in IT infrastructure are also growing in complexity and additional security measures throw an additional monkey wrench

The level of complexity definitely reached human limits and I observed several times that even if you learn some utility during particular case of troubleshooting you will soon forget it if the problem doers not reoccur, say with a year. Look like acquired knowledge is displaced by new and wiped out die to limited capacity of human brain to hold information. In this sense the title "Red Hat Engineer" became a sad joke.   The title probably should be called "Red Hat detective", as many problems are really mysteries which deserve new Agatha Christy to tell the world.

With RHEL 7 the "Red Hat Certified Engineer" certification became a sad joke as any such "engineer" ability to troubleshoot complex cases deteriorated significantly. This is clearly visible when you are working with certified Red Hat engineers from Dell or HP, when using their Professional Services

The premium support level RHEL7 is very expensive for small and medium firms. But now there no free lunch and if you are using commercial version of RHEL you simply need to pay this annual maintenance for some critical servers. Just as insurance. But, truth be told, for many servers you might well be OK with just patches (and/or buy higher level. premium support license only for one server out of bunch of similar servers). Standard support RHEL7 is only marginally better that self-support using Google and access to Red Hat knowledgebase.

Generally the less static your datacenter is, and the more unique type of servers you use, the more premium support licenses you need. But you rarely need more then one for each distinct type of the servers (aka "server group"). Moreover for certain type of problems, for example driver related problems, now you need such level of support not from Red Hat, but from you hardware vendor as they provide better quality for problems related to hardware, as they know it much better.

For database servers getting a license from Oracle makes sense too as Oracle engineers are clearly superior in this area. So diversification of licensing and minimization and strategic placement of number of the most expensive premium licenses now makes perfect sense and provides you is an opportunity to save some money.   This approach is used for a long time for computational clusters where typically only one node (the headnode) gets the premium support license, and all other nodes get the cheapest license possible or even running Academic Linux or CentOS. Now it is time to generalize this approach to other part of the enterprise datacenter.

Even if you learned something important today you will soon forget if you do not use it as there way too may utilities, application, configuration files, etc. You name it. Keeping your own knowledgebase as a set of html files or private wiki is now the necessity. Lab journal type of logs are now not enough

Another factor that makes this "selective licensing" necessary is that prices increased from RHEL5 to RHEL7 (especially is you use virtual guests a lot; see discussion at RHEL 6 how much for your package (The Register). Of course, Xen is preferable for creation of virtual machines, so usage of "native" RHEL VM is just an issue due to inertia. Also now instead of full virtualization in many cases can use light weight virtualization (or special packaging, if you wish) via Docker (as long as major version of the kernel needed is the same). In any case this Red Hat attempt to milk the cow by charging for each virtual guest is so IBM-like that it can generate nothing but resentment, and the resulting from it desire to screw Red Hat in return. The feeling for a long time known for IBM enterprise software customers ;-).

First I was disappointed with the quality of RHEL7 support and even complained, but with time I start to understand that they have no choice but screw customers -- the product is way over their heads and the number of tickets to be resolved is too high resulting in overload. The fact that the next day other person can work in your ticker adds insult to the injury. That's why for a typical ticket their help now is limited to pointing you to relevant (or semi-relevant ;-) articles in the knowledgebase. Premium support still is above average, and they can switch you to a live engineer on a different continent in critical cases in later hours, so if server downtime is important this is a kind of (weak) insurance. In any case, my impression is that Red Hat support is probably overwhelmed by problems with RHEL7 and even for subsystems fully developed by Red Hat such as subscription manager and yum. Unless you are lucky and accidentally get really knowledgeable guy, who is willing to help. Attempt to upgrade the ticket to higher level of severity sometimes help, but usually does not.

The other problem (that I already mentioned) is that ticket are now assigned to a new person each day, so if the ticket in not resolved by the first person, you get in treadmill with each new person starting from scratch. That's why now it is important to submit with each ticket a good write-up and SOS file from the very beginning, as this slightly increase you chances that the first support person will give you a valid advice, not just as semi-useless reference to the article in the knowledgebase. Otherwise their replies often demonstrates complete lack of understanding of what problem you are experiencing.

It the ticket is important for you, in no way you can just submit a description of the problem now. You always need to submit SOS tarball, and, preferably, some result of your initial research. Do not expect they will be looked closely. But just the process of organizing all the information you know for submission greatly help to improve you understanding of the problem, and sometimes led to resolution of ticket by yourself.   As RHEL engineers get used to work with plain vanilla RHEL installed you generally need to point what deviations your install has (proxy is one example --- it should be mentioned and configuration specifically listed; complex cards like Intel four port 10Gbit cards, or Mellanox Ethernet cards with Infiniband layer from HP need to be mentioned too)

Human expert competence vs excessive automation

Linux is a system complexity of which is far beyond the ability of mere mortals. Even seasoned sysadmins with, say, 20 years under the best know of tiny portion of the system -- the portion which their previous experience allows them to study and learn. And in this sense introduction of more and more complex subsystems into Linux is a crime. It simply makes already bad situation worse.

The paradox of complex system with more "intelligence" that now tend to replace simple systems that existed before is that in "normal circumstances" they accommodate incompetence. As Guardian noted they are made easy to operate by automatically correcting mistakes.

Earl Wiener, a cult figure in aviation safety, coined what is known as Wiener’s Laws of aviation and human error. One of them was: “Digital devices tune out small errors while creating opportunities for large errors.” We might rephrase it as:

“Automation will routinely tidy up ordinary messes, but occasionally create an extraordinary mess.” It is an insight that applies far beyond aviation.

The behaviour of Anaconda installer in RHEL7 is a good example of this trend. Its behaviour is infuriating for an expert (constant "hand holding"), but is a life saver for a novice and most decisions are taken automatically (it even prevents novices from accidentally wiping out existing linux partitions) and while the system is most probably suboptimal both in disk structure (I think it is wrong to put root partition under LVM: that complicates troubleshooting and increases the possibility of data loss) and in selection of packages (that can be corrected), but it probably will boot after the installation.

Because of this, a unqualified operator can function for a long time before his lack of skill becomes apparent – his incompetence is a hidden and if everything is OK such situation can persist almost indefinitely. The problems starts as soon as situation goes out of control, for example server crashes or for some reason became unbootable. And God forbid if such system contains valuable data. There is some subtle analogy in what Red Hat did with RHEL7 and what Boeing did with 737 MAX. Both tried to hide the flaw in architecture and overcomplexity via introduction of additional of software to make it usable for unqualified operator.

Even if a sysadmin was an expert, automatic systems erode their skills by removing the need for practice. It also diminish the level of understanding by introducing an additional intermediary in situations where previously sysadmin operated "directly". This is the case with systemd -- it does introduce a powerful intermediary for many operations that previously were manual and make booting process much less transparent and more difficult to tprblshoot if something goes wrong. In such instances systemd becomes an enemy of sysadmin instead of a helper.   This is a similar situation as with HP RAID controllers which when they lose configuration happily wipe out your disk array creating a new just to help you.

But such systems tend to fail either in unusual situations or in ways that produce unusual situations, requiring a particularly skilful response. A more capable and reliable automatic system is, the more chances that the situation in case of disaster will be worse.

When something goes wrong in such situations, it is hard to deal with a situation that is very likely to be bewildering.

As aside note there is a way to construct interface in a way that it can even train system in command line.

Regression to the previous system image as a troubleshooting method

Overcomplexity of RHEL7 also means also that you need to spend real money on configuration management (or hire a couple of dedicated guys to create your own system) as comparison of "states" or the system and regression to the previous state often are now the only viable method of resolving, or at least mitigating the problem you face.

In view of those complications, the ability to return to the "previous system image" now is a must and software such as ReaR (Relax-and-Recover ) needs to be known very well by each RHEL7 sysadmin.   Capacity of low profile USB 3.0 drives (FIT drives) now reached 256GB (and Samsung SSD is also small enough that it can hang from the server USB port without much troubles which increase this local space to 1TB), and they are now capable to store at least a one system image locally on the server (if you exclude large data directories), providing bare metal recovery to "known good" state of the system.

The option of booting to the last good image is what you might desperately need when you can't resolve the problem, which now happen more and more often as in many cases troubleshooting leads nowhere. For those who are averse to USB "solution" as "unclean", can , of course, stored images on NFS.

In any case this is the capability that you will definitely need in case of some "induced" problem such as botched upgrades. Dell servers have flash cards in the enterprise version of DRAC, which can be used as perfect local repository of configuration information (via git or any other suitable method), which will survive the crash of the server. And configuration information stored in /etc, /boot, /root and few other places is the most important information for the recovery of the system. What is important is that this repositories be updated continuously with each change of the state the system.

Using GIT or Subversion (or any other version control system that you are comfortable with) also mow makes more sense GIT is not well suited for sysadmin changes management control as by default it does not preserve attributes and ACLs, but some packages like etckeeper add this capability. Of course, etckeeper is far from being perfect but at least it can serve as a starting point). That capitalizes on the fact that after the installation OS usually works ;-) It just deteriorates with time. So we are starting to observe the phenomenon who is well known to Windows administrators: self-disintegration of OS with time ;-) The problems typically comes later with additional packages, libraries, etc, which often are not critical for system functionality. "Keep it simple stupid" is a good approach here. Although for servers that are in research this mantra is impossible to implement.

Due to overcomplexity, you should be extremely frugal with packages that you keep on the server. For example, for many servers you do not need X11 to be installed (and Gnome is a horrible package, if you ask me. LXDE (which is default on Knoppix) is smaller and is adequate for most sysadmins (even Cockpit, which has even smaller footprint might be adequate). That cuts a lot of packages, cuts the size of /etc directory and a lot of complexity. Avoiding a complex package in favor of simpler and leaner alternative is almost always worthwhile approach.

RHEL 7 fiasco: adding desktop features in a server-oriented distribution

RHEL 7 looks like strange mix of desktop functionality artificially added to distribution which is oriented strictly on the server space (since the word "Enterprise" in the name.) This increase in complexity does not provide any noticeable return on the substantial investment in learning of yet another flavor of Linux.   Which is absolutely necessary as in no way RHEL6 and RHEL7 can be viewed as a single flavor of Linux. The difference is similar to the difference between RHEL6 and SLES 10.

Emergence of Devian was a strong kick in the chin of Red Hat honchos, and they doubled their efforts in pushing systemd and resolving existing problem with some positive results. Improvement of systemd during the life of RHEL 7 are undeniable and its quality is higher in recent RHEL 7 releases, starting with RHEL7.5. But architectural problems can't be solved by doubling the efforts to debug existing problems. They remain, and people who can compare systemd with Solaris 10 solution instantly can see all the defects of the chosen architecture.

Similarly Red Hat honchos either out of envy to Apple (and/or Microsoft) success in desktop space, or for some other reason ( such as the ability to dictate the composition of enterprise Linux) make this server oriented Linux distribution a hostage of desktop Linux enthusiasts whims, and by doing so broke way too many things. And systemd is just one, although the most egregious example of this activity (see, for example, the discussion at Systemd invasion into Linux Server space.)

You can browse horror stories about systemd using any search engine you like (they are very educational). Here is random find from Down The Tech Rabbit Hole – systemd, BSD, Dockers (May 31, 2018):

...The comment just below that is telling… (Why I like the BSD forums – so many clueful in so compact a space…)

Jan 31, 2018
#22
CoreOS _heavily_ relies on and makes use of systemd and provides no secure multi-tenancy as it only leverages cgroups and namespaces and lots of wallpaper and duct tape (called e.g. docker or LXC) over all the air gaps in-between…

Most importantly, systemd can’t be even remotely considered production ready (just 3 random examples that popped up first and/or came to mind…), and secondly, cgroups and namespaces (combined with all the docker/LXC duct tape) might be a convenient toolset for development and offer some great features for this use case, but all 3 were never meant to provide secure isolation for containers; so they shouldn’t be used in production if you want/need secure isolation and multi-tenancy (which IMHO you should always want in a production environment).

SmartOS uses zones, respectively LX-zones, for deployment of docker containers. So each container actually has his own full network stack and is safely contained within a zone. This provides essentially the same level of isolation as running a separate KVM VM for each container (which seems to be the default solution in the linux/docker world today) – but zones run on bare-metal without all the VM and additional kernel/OS/filesystem overhead the fully-fledged KVM VM drags along.[…]

The three links having tree horror stories of systemD induced crap. Makes me feel all warm and fuzzy that I rejected it on sight as the wrong idea. Confirmation, what a joy.

First link:

10 June 2016
Postmortem of yesterday’s downtime

Yesterday we had a bad outage. From 22:25 to 22:58 most of our servers were down and serving 503 errors. As is common with these scenarios the cause was cascading failures which we go into detail below.

Every day we serve millions of API requests, and thousands of businesses depend on us – we deeply regret downtime of any nature, but it’s also an opportunity for us to learn and make sure it doesn’t happen in the future. Below is yesterday’s postmortem. We’ve taken several steps to remove single point of failures and ensure this kind of scenario never repeats again.

Timeline

While investigating high CPU usage on a number of our CoreOS machines we found that systemd-journald was consuming a lot of CPU.

Research led us to https://github.com/coreos/bugs/issues/1162 which included a suggested fix. The fix was tested and we confirmed that systemd-journald CPU usage had dropped significantly. The fix was then tested on two other machines a few minutes apart, also successfully lowering CPU use, with no signs of service interruption.

Satisfied that the fix was safe it was then rolled out to all of our machines sequentially. At this point there was a flood of pages as most of our infrastructure began to fail. Restarting systemd-journald had caused docker to restart on each machine, killing all running containers. As the fix was run on all of our machines in quick succession all of our fleet units went down at roughly the same time, including some that we rely on as part of our service discovery architecture. Several other compounding issues meant that our architecture was unable to heal itself. Once key pieces of our infrastructure were brought back up manually the services were able to recover...

I would just to add that any sysadmin who is working behind the proxy in enterprise environment would be happy to smash a cake or two into systemd developers face. Here is an explanation of what in involved in adding proxy to Docker configuration ( Control Docker with systemd Docker Documentation ):

HTTP/HTTPS proxy

The Docker daemon uses the HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environmental variables in its start-up environment to configure HTTP or HTTPS proxy behavior. You cannot configure these environment variables using the daemon.json file.

This example overrides the default docker.service file. If you are behind an HTTP or HTTPS proxy server, for example in corporate settings, you need to add this configuration in the Docker systemd service file.

Create a systemd drop-in directory for the docker service:
$ sudo mkdir -p /etc/systemd/system/docker.service.d

Create a file called /etc/systemd/system/docker.service.d/http-proxy.conf that adds the HTTP_PROXY environment variable:
[Service] Environment="HTTP_PROXY=http://proxy.example.com:80/"

Or, if you are behind an HTTPS proxy server, create a file called /etc/systemd/system/docker.service.d/https-proxy.conf that adds the HTTPS_PROXY environment variable:

[Service] Environment="HTTPS_PROXY=https://proxy.example.com:443/"

If you have internal Docker registries that you need to contact without proxying you can specify them via the NO_PROXY environment variable:
[Service] Environment="HTTP_PROXY=http://proxy.example.com:80/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"

Or, if you are behind an HTTPS proxy server:

[Service] Environment="HTTPS_PROXY=https://proxy.example.com:443/" "NO_PROXY=localhost,127.0.0.1,docker-registry.somecorporation.com"

Flush changes:
$ sudo systemctl daemon-reload

Restart Docker:
$ sudo systemctl restart docker

Verify that the configuration has been loaded:
$ systemctl show --property=Environment docker Environment=HTTP_PROXY=http://proxy.example.com:80/

Or, if you are behind an HTTPS proxy server:

$ systemctl show --property=Environment docker Environment=HTTPS_PROXY=https://proxy.example.com:443/

In other words you now has deal with various idiosyncrasies that did not existed with the regular startup scripts. As everything in systemd is "yet another setting" it adds opaque and difficult to understand layer of indirection. With problems amplified by poorly written documentation. The fact that you no longer can edit shell scripts to do such a simple change as adding a couple of environment variables to set a proxy (some applications need a different proxy that "normal applications", for example a proxy without authentication) is somewhat anxiety-producing factor.

Revamping Anaconda in "consumer friendly" fashion in RHEL 7 was another "enhancement", which can well be called a blunder. But that can be compensated by usage of kickstart files. There is also an option to modify anaconda itself, but that's requires a lot of work. See Red Hat Enterprise Linux 7 Anaconda Customization Guide. Road to hell is paved with good intention and in their excessive zeal to protect users from errors in deleting existing partitions (BTW RHEL stopped displaying labels for partitions for some time, and now displayed UUIDs only) and make anaconda more "fool proof" they made the life professional sysadmins really miserable. Recently I tried to install RHEL 7.5 on a server with existing partition (which required unique install and Kickstart was not too suitable for this) and discovered that in RHEL 7 installer GUI there is not easy way to delete existing partitions. I was forced to exist installation boot from RHEL6 ISO delete partition and then continue (at this point I realized that choosing RHEL7 for the particular server was a mistake and corrected it ;-). Later I have found the following post from 2014 that describes the same situation that I encountered in 2019 very well:

Can not install RHEL 7 on disk with existing partitions - Red Hat Customer Portal

May 12 2014 | access.redhat.com
Can not install RHEL 7 on disk with existing partitions. Menus for old partitions are grayed out. Can not select mount points for existing partitions. Can not delete them and create new ones. Menus say I can but installer says there is a partition error. It shows me a complicated menu that warns be what is about to happen and asks me to accept the changes. It does not seem to accept "accept" as an answer. Booting to the recovery mode on the USB drive and manually deleting all partitions is not a practical solution.
You have to assume that someone who is doing manual partitioning knows what he is doing. Provide a simple tool during install so that the drive can be fully partitioned and formatted. Do not try so hard to make tools that protect use from our own mistakes. In the case of the partition tool, it is too complicated for a common user to use so don't assume that the person using it is an idiot.

The common user does not know hardware and doesn't want to know it. He expects security questions during the install and everything else should be defaults. A competent operator will backup anything before doing a new OS install. He needs the details and doesn't need an installer that tells him he can't do what he has done for 20 years.

RHEL7 is doing a lot of other unnecessary for server space things. Most of those changes also increase complexity by hiding "basic" things with the additional layers of "indirection". For example, try to remove Network Manager in RHEL 7. Now explain to me why we need it for a server room with servers that use "forever" attached by 10 Mbit/sec cables and which does not run any virtual machines or Docker.

RHEL 7 denies the usefulness of previous knowledge which makes overcomplexity more painful

But what is really bad is that RHEL7 in general and systemd in particular denies usefulness of previous knowledge of Red Hat and destroys the value of several dozens of good or even excellent books related to system administration and published in 1998-2015. That's a big and unfortunate loss, as many of authors of excellent Linux books will never update them for RHEL7.   In a way, deliberate desire to break with the past demonstrated by systemd might well be called vandalism.

And Unix is almost 50 Years OS (1973-2019). Linux was created around 1991 and in 2019 will be 28 years old. In other words this is a mature OS with more then two decades of history under the belt. Any substantial changes for OS of this age are not only costly, they are highly disruptive.

systemd denies usefulness of previous knowledge of Red Hat and destroys the value of books related to system administration and published in 1998-2015

This utter disrespect to people who spend years learning Red Hat and writing about it increased my desire to switch to CentOS or Oracle Linux. I do not want to pay Red Hat money any longer and Red hat support is now outsourced and deteriorated to the level when it is almost completely useless "database query with human face", unless you buy premium support. And even in this case much depends on to what analyst your ticket is assigned.

Moreover the architectural quality of RHEL 7 is low. Services mixed into the distribution are produced by different authors and as such have no standards on configuration including such important for the enterprise thing as the enable the WEB proxy for those services that need to deal with Internet (yum is one example). At the same time the complexity of this distribution is considerably higher then RHEL6 with almost zero return on investment. Troubleshooting it different and much more difficult. Seasoned sysadmin switching to RHEL 7 feel like a novice on the skating ring for several months. And as for troubleshooting complex problem much longer then that.

It is clear that RHEL 7 main problem is not systemd by itself but the fact that the resulting distribution suffers from overcomplexity. It is way too complex to administer and requires regular human having to remember way so many things. Things that they can never fit into one head. This means this is by definition a brittle system, the elephant that nobody understands completely. Much like Microsoft Windows. And I am talking not only about systemd, which was introduced in this version. Other subsystems suffer from overcomplexity as well. Some of them came directly from RHEL, some not (Biosdevname, which is a perversion, came from Dell).

Troubleshooting became not only more difficult. In complex cases you also are provided with less information (especially cases where the system does not boot).   Partially this is due to generally complexity of modern Linux environment (library hell, introduction of redundant utilities, etc), partially due to increase complexity of hardware,. More one of the key reasons is the lack of architectural vision on the part of Red Hat.

Add to this multiple examples of sloppy programming (dealing with proxies is still not unified and individual packages have their own settings which can conflict with the environment variable settings) and old warts (RPM format now is old and partially outlived its usefulness, creating rather complex issues with patching and installing software, issues that take a lot of sysadmin time to resolve) and you get the picture.

A classic example of Red Hat inaptitude is how they handle proxy settings. Even for software fully controlled by Red Hat such as yum and subscription manager they use proxy settings in each and every configuration file. Why not to put them /etc/sysconfig/network and always, consistently read them from the environment variables first and this this file second? Any well behaving application needs to read environment variables which should take precedence over settings in configuration files. They do not do it. God knows why.

They even manage to embed proxy settings from /etc/rhsm/rhsm.conf into yum file /etc/yum.repos.d/redhat.repo, so the proxy value is taken from this file. Not from your /etc/yum.conf settings, as you would expect. Moreover this is done without any elementary checks for consistency: if you make a pretty innocent mistake and specify proxy setting in /etc/rhsm/rhsm.conf as

proxy http://yourproxy.yourdomain.com

The Red Hat registration manager will accept this and will work fine. But for yum to work properly /etc/rhsm/rhsm.conf proxy specification requires just DNS name without prefix http:// or https:// -- prefix https will be added blindly (and that's wrong) in redhat.repo without checking if you specified http:// (or https://) prefix or not. This SNAFU will lead to generation in redhat.repo the proxy statement of the form https://http://yourproxy.yourdomain.com

At this point you are up for a nasty surprise -- yum will not work with any Redhat repository. And there is no any meaningful diagnostic messages. Looks like RHEL managers are iether engaged in binge drinking, or watch too much porn on the job ;-).

Yum which started as a very helpful utility also gradually deteriorated. It turned into a complex monster which requires quite a bit of study and introduces by itself a set of very complex bugs, some of which are almost features.

SELinux was never a transparent security subsystems and has a lot of quirks of its own. And its key idea is far from being elegant like were the key ideas of AppArmor ( per application assigning umask for key directories and config files) which actually disappeared from the Linux security landscape. Many sysadmins simply disable SElinux leaving only firewall to protect the server. Some application require disabling SELinux for proper functioning, and this is specified in the documentation.

The deterioration of architectural vision within Red Hat as a company is clearly visible in the terrible (simply terrible, without any exaggeration) quality of the customer portal, which is probably the worst I ever encountered. Sometimes I just put tickets to understand how to perform particular operation on the portal. Old or Classic as they call it RHEL customer portal actually was OK, and even has had some useful features. Then for some reason they tried to introduced something new and completely messes the thing. As of August 2017 quality somewhat improves, but still leaves much to be desired. Sometimes I wonder why I am still using the distribution, if the company which produced it (and charges substantial money for it ) is so tremendously architecturally inapt, that it is unable to create a usable customer portal. And in view of existence of Oracle Linux I do not really know the answer to this question. Although Oracle has its own set of problems, thousands of EPEL packages, signed and built by Oracle, have been added to Oracle Linux yum server.

Note about moving directories /bin, /lib, /lib64, and /sbin to /usr

In RHEL 7 Red Hat decided to move four system directories ( /bin, /lib /lib64 and /sbin ) previously located at / into /usr. They were replaced with symlinks to preserve compatibility.

[0]d620@ROOT:/ # ll total 316K dr-xr-xr-x. 5 root root 4.0K Jan 17 17:46 boot/ drwxr-xr-x. 18 root root 3.2K Jan 17 17:46 dev/ drwxr-xr-x. 80 root root 8.0K Jan 17 17:46 etc/ drwxr-xr-x. 12 root root 165 Dec 7 02:12 home/ drwxr-xr-x. 2 root root 6 Apr 11 2018 media/ drwxr-xr-x. 2 root root 6 Apr 11 2018 mnt/ drwxr-xr-x. 2 root root 6 Apr 11 2018 opt/ dr-xr-xr-x. 125 root root 0 Jan 17 12:46 proc/ dr-xr-x---. 7 root root 4.0K Jan 10 20:55 root/ drwxr-xr-x. 28 root root 860 Jan 17 17:46 run/ drwxr-xr-x. 2 root root 6 Apr 11 2018 srv/ dr-xr-xr-x. 13 root root 0 Jan 17 17:46 sys/ drwxrwxrwt. 12 root root 4.0K Jan 17 17:47 tmp/ drwxr-xr-x. 13 root root 155 Nov 2 18:22 usr/ drwxr-xr-x. 20 root root 278 Dec 13 13:45 var/ lrwxrwxrwx. 1 root root 7 Nov 2 18:22 bin -> usr/bin/ lrwxrwxrwx. 1 root root 7 Nov 2 18:22 lib -> usr/lib/ lrwxrwxrwx. 1 root root 9 Nov 2 18:22 lib64 -> usr/lib64/ -rw-r--r--. 1 root root 292K Jan 9 12:05 .readahead lrwxrwxrwx. 1 root root 8 Nov 2 18:22 sbin -> usr/sbin/

You would think that RHEL RPMs now use new locations. Wrong. You can do a simple experiment of RHEL7 or CentOs7 based VM -- deleted those symlinks using the command

rm -f /bin /lib /lib64 /sbin

and see what will happen ;-). Actually recovery from this situation might represents a good interview question for candidates pretending on senior sysadmin positions.

Note of never ending stream of security patches

First and foremost not too much zeal
( Talleyrand advice to young diplomats)

In security patches Red Hat found its Eldorado. I would say that they greatly helped Red Hat to became a billion dollars' in revenue company, despite obvious weakness of GPL for the creating a stable revenue stream. Constantly shifting sands created by over-emphasizing security patches compensate weakness of GPL to the extent that profitability can approach the level of a typical commercial software company.

And Red Hat produces them at the rate of around a dozen a week or so ;-). As keeping up with patches is an easy metric to implement inthe Dilbertized neoliberal enterprise Sometimes even it becomes the key metric by which sysadmins is judged along with another completely fake metric -- average time to resolve tickets of a given category. So clueless corporate brass often play the role of Trojan Horse for Red Hat by enforcing too strict (and mostly useless) patching policies (which is such cases are often not adhered in practice, creating a gap between documentation and reality typical for any corporate bureaucracy). If this part of IT is outsourced, everything became the contract negotiation, including this one and the situation with the patching became even more complicated with additional levels of indirection and bureaucratic avoidance of responsibility.

Often patching is elevated to such high priority that it consumes lion share of sysadmin efforts (and is loved by sysadmins who are not good for anything else, as they can report their accomplishments each month ;-) why other important issue are lingering if obscurity and disrepair.

While not patching servers at all is questionable policy (which bounds on recklessness), too frequent patching of servers is another extreme, and extremes meet. As Talleyrand advised to young diplomats "First and foremost not too much zeal". this is fully applicable to the patching of RHEL servers.

Security of linux server (taking into account that RHEL is Microsoft Windows of Linux world and it the most attacked flavor of linux) is mostly architectural issue. Especially network architecture issue.

For example, using address translation (private subnet witin the enterprise) is now a must. Prosy for some often abused protocols is desirable.

As attach can happen not only via main interface, but also via management interface and you do not control or under remote management firmware, providing a different, tightly controlled subnet for DRAC assessable only for selected IPs belongs to sysadmin team improves security far above what patching of DRAC firmware can accomplish (see Linux Servers Most Affected by IPMI Enabled JungleSec Ransomware by Christine Hall ).

Similarly implementing jumpbox for accessing servers from DHCP segment (corporate desktops) via ssh also improves ssh security without any patches (and in the past Openssh was a source of very nasty exploits, as is a very complex subsystem). There is no reason why "regular" desktops on DHCP segment should be able to initiate login directly (typically IP addressee for sysadmins are made "pseudo-static" by supplying DHCP server with MAC addresses of those desktops.)

In any case, as soon as you see a email from the top IT honcho that congratulates subordinates for archiving 100% compliance with patching schedule in the enterprise that does not implements those measures, you know what kind of IT and enterprise is that.

I mentioned two no-nonsense measures (separate subnet for DRAC/ILO and jumpbox for ssh), but there are several other "common sense" measures which really effect security far more the any tight patching schedule. Among them:

Two factor authentication (and believe me there are some large enterprises which in 2019 do not still use two factor authentication for the accounts of the servers on their networks, while paying arm-and-leg for RHEL licenses, and, usually, to a couple of semi-useless security firms). When I heard that John Podesta did not use two factor authentication in his Gmail email account (for Gmail token costs around $20, I think) I instantly realized what quality of IT personal he has in his disposal. Moreover as he was former the head of staff under Bill Clinton ( a person that in many ways is as powerful, if not more powerful then the President of the USA, and position of which is to a large extend a security-related position ) it tells us something about John Podesta personality too ;-)

Hardening scripts which "normalize" configuration for each important groups of server. And the notion of "normal" configuration for a given group of servers for example webservers or computational cluster is first and foremost the architectural issue. this is level that is far above the level of which security department operates

So if those measures are not implemented, creating spreadsheets with the dates of installation of patches is just perversion of security, not a real security improvement measure. Yet another Dilbertalised enterprise activity. As well as providing as a measure that provides some justification for the existence of the corporate security department. In reality the level of security by-and-large is defined by IT network architecture and no amount of patching will improve bad Enterprise IT architecture.

I thing that this never ending stream of security patches serves dual purpose, with the secondary goal to entice customers to keep continues Red Hat subscription for the life of the server. For example for research servers, I am not so sure that updating to the next minor release (say from CentOS 7.4 -- to CentOS 7.5) provide less adequate level of security. And such schedule creates much less fuss, enabling people to concentrate on more important problems.

And if the server is internet facing then well thought out firewall policy and SElinux enforcing policy block most "remote exploits" even without any patching. Usually only HTTP port is opened (ssh port in most cases should be served via VPN, or internal DMZ subnet, if the server is on premises ) and that limits exposure patching or no patching, although of course does not eliminate this. Please understands that the mere existence of NSA, CIA with their own team of hackers (and their foreign counterparts) guarantees at any point of time availability of zero-day exploits for "lucrative" targets on the level of the network stack, or lower. Against these zero-day threats, reactive security is helpless (and patching is a reactive security measure.) In this sense, scanning of servers and running set of predefined exploits (mainly PHP oriented) by script kiddies is just a noise and does not qualify as an attack, no matter how dishonest security companies try to hype this threat. The real threat is quite different.

If you need higher security you probably need to switch to the distribution that supports AppArmor kernel module (Suse Enterprise in one). Or if you have competent sysadmins with certification in Solaris to Solaris (on Intel, or even Spark) which not only provides better overall security, but also implement "security via obscurity" defense which alone stops 99.95 of Linux exploit dead of tracks (and outside of encryption algorithms there is nothing shameful in using security via obscurity; and even for encryption algorithms it can serve a useful role too ;-). Solaris has RBAC (role based access control) implemented which allows to c ompartmentalize many roles (including root). Sudo is just a crude attempt to "bolt on" RBAC on Linux and Free BSD. Solaris also has a special hardened version called Trusted Solaris.

Too frequent patching of the server also sometimes introduces some subtle changes that break some applications, which are not that easy to detect and debug.

All-in-all "too much zeal" in patching the servers and keep up with the never ending stream of RHEL patches often represents a voodoo dance around the fire. Such thing happens often when that ability to understand a very complex phenomenon is beyond normal human comprehension. In this case easy "magic" solution, a palliative is adopted.

Also as I noted above, in this area the real situation on corporate servers most often does not corresponds to spreadsheets and presentations to corporate management.

RHEL licensing

We have two distinct problem with RHEL licensing:

1. Minimizing costs within RHEL licensing model

2. Obtaining better, higher quality of tech support, then it is possible with Red Hat alone.

Licensing model: four different types of RHEL licenses

RHEL is struggling to fence off "copycats" by complicating access to the source of patches, but the problem is that its licensing model in pretty much Byzantium. It is based of a half-dozen of different types of subscriptions. The most useful are pretty expensive. That dictates the necessity of diversification within this particular vendor and combining/complementing it with licenses from other vendors. With RHEL 7 is does not makes sense any longer to license all you server from Red Hat.

As for Byzantium structure of Red Hat licensing I resent paying Red Hat for our 4 socket servers to the extent that I stop using this type of servers and completely switched to two socket servers. Which with Intel CPUs rising core count was a easy escape from RHEL restrictions and greed. Currently Red Hat probably has most complex, the most Byzantine system of subscriptions comparable with IBM (which is probably the leader in "licensing obscurantism" and the ability to force customers to overpay for its software ;-).

And there are at least four different RHEL licenses for real (hardware-based) servers ( https://access.redhat.com/support/offerings/production/sla )

Self-support. If you have many identical or almost identical servers or virtual machines it does not make sense to buy standard or premium licenses for all. It can be bought for one server and all other can be used with self-support licenses, which provides access to patches). They sometimes are used for a group for servers, for example a park of webservers with only one server getting standard of premium licensing.

Self support provides the most restrictive set of repositories. For missing you can use Epel repository, or download packages from CentOs repositories. Including CentOs repositories in yum configuration breaks RHEL.

For scripting language and some other packages you can compesate weakness/conservatism of RHEL repositories by using SpecialInterestGroup-SCLo-CollectionsList - which is rebuilt of Red Hat Software Collections (RHSCL and DTS are not available for Red Hat Enterprise Linux Server, Self-Support

NOTE: In most cases it makes sense to buy Oracle Linux self-support license instead

Standard (actually means web only, although formally you can try chat and phone during business hours (if you manage to get to a support specialist). So this is mostly web-based support. This level of subscription provides better set of repositories, but Red Hat is playing games in this area and some of them now require additional payment.

Customers with Standard RHEL subscriptions can get access to RHSCL and DTS for free, but need to initiate a special request for it: See How to use Red Hat Software Collections (RHSCL) or Red Hat Developer Toolset (DTS)

Premium (Web and phone with phone 24 x7 for severity 1 and 2 problems). Here you really can get specialist on phone if you have a critical problems. Even in after hours.

HPC computational nodes. Those are the cheapest but they have limited number of packages in the distribution and repositories. In this sense using Oracle Linux self-support licenses is a better deal for computational nodes then this type of RHEL licenses; sometime CentOS can be used too which eliminates the licensing problem completely, but adds other problems (access to repositories sometimes is a problem as too many people are using too few mirrors). In any case, I have positive experience with using CentOS for computational clusters that run bioinformatics software. The headnode can be licensed from Red Hat or Oracle. Recently I have found that Dell provides pretty good, better the Red Hat, support for headnode type problems too. The same is probably true for the HP Red Hat support.

NOTE: In most cases it makes sense to buy Oracle Linux self-support license instead

No-Cost RHEL Developer Subscription is available from March 2016 -- I do not know much about this license.

There is also several restricted support contracts:

Red Hat Enterprise Linux AS Academic Edition provides a platform for applications such as network infrastructure, web hosting, and High Performance Computing (HPC) server farms. The cost of an annual subscription for an individual system is $50. Avalble only for qualified educational institutions

Red Hat Enterprise Linux Developer Support subscription. Red Hat^® Enterprise Linux^®Developer Support includes 25 Red Hat Enterprise Linux Developer Suite subscriptions with an unlimited number of incidents and either a 2-business-day or 4-business-hour support response. Just designate a single point of contact and we'll support Red Hat Enterprise Linux Server, Red Hat Software Collections, Red Hat Developer Toolset, and much more. This subscription is for development purposes only. The cost is $5K (professional) or $10K enterprise.

RHEL licensing scheme is based on so called "entitlements" which oversimplifying is one license for a 2 socket server. In the past they were "mergeable" so if your 4 socket license expired and you have two spare two socket licenses Red Hat was happy to accommodate your needs. Now they are not and use IBM approach to this issue. And that's a problem :-).

Now you need the right mix if you have different classes of servers. All is fine until you use mixture of licenses (for example some cluster licenses, some patch only ( aka self-support), some premium license -- 4 types of licenses altogether). In this case to understand where is the particular license landed in the past was almost impossible to predict. But if you use uniform server part this scheme worked reasonably well (actually better then the current model.) Red Hat fixed the problem of unpredictability where the particular license goes (now you have a full control) but created a new problem -- if you license expired it's now your problem -- with subscription manager it nor replaced by the license from "free license pool".

This path works well but to cut the costs and you now need to buy five year license with the server to capitalize the cost of the Red Hat license. But five year term means that you lack the ability to switch to a different falvour of Linux. Most often this is not a problem, but still.

This also a good solution for computational cluster licenses -- Dell and HP can install basic cluster software on the enclosure for minimal fee. They try to force upon an additional software which you might not want or need (Bright Cluster Manager in case of Dell), but that's a solvable issue (you just do not extend support contract after the evaluation period ends).

And believe me this HPC architecture is very useful outside computational tasks so HPC license for node can be used outside the areas, where computational cluster are used to cut the licensing costs. It is actually an interesting paradigm of managing heterogeneous datacenter. The only problem is that you need to learn to use it :-). For example SGE is a well engineered scheduler (originally from Sun, but later open sourced) that with some minimal additional software can be used as enterprise scheduler. While this is a free software it beats many commercial offerings and while it lacks calendar scheduling, any calendar scheduler can be used with it to compensate for this (even cron -- in this each cron task becomes SGE submit script). Another advantage is that it is no longer updated, as often updates of commercial software are often just directed on milking customers and add nothing or even subtract from the value of software. Other cluster schedulers can be used instead of SGE as well.

Using HPC-like config with the "headnode" and "computational nodes" is option to lower the fees if you use multiple similar servers which do not need any fancy software (for example, a blade enclosure with 16 blades used as HTTP server farm, or Oracle DB farm). It is relatively easy to organize a particular set of servers as a cluster with SGE (or other similar scheduler) installed on the head node and the common NFS (or GPFS) filesystem exported to all nodes. Such a common filesystem is ideal for complex maintenance tasks and it alone serves as a poor man software configuration system. Just add a few simple scripts and parallel execution software and in many cases you this is all you need for configuration management ;-). It is amazing that many aspects/advantages of this functionality is not that easy to replicate with Ansible, Puppet, or Chef.

BTW now Hadoop is a fashionable thing (while being just a simple case of distributed search) and you can always claim that this is a Hadoop type of service, which justifies calling such an architecture a cluster. In this case you can easily pay premium license for the headnode and one nodes, but all other computation nodes are $100 a year each or so. Although you can get the same self-support license from Oracle for almost the price ($119 per year the last time I checked) without any Red Hat restrictions, so from other point of view, why bother licensing self-support servers from Red Hat ?

As you can mix and match licenses it is recommended to buy Oracle self-support licenses instead of RHEL self-support licenses (Oracle license costs $119 per year the last time I checked). It provides a better set of repositories and does not entail any Red Hat restrictions, so from other point of view, why bother licensing self-support servers from Red Hat ?

The ides of Guinea Pig node

Rising costs also created strong preference creating server group so that only one server in the server group has "expensive license and used as guinea pig for problems, which all other enjoy self-support licenses. That allow you getting full support for complex problem by replicating them of Guinea Pig node. And outside financial industry companies now typically are tight on money for IT.

It is natural that at the certain, critical level of complexity, qualified support disappears. Read RHEL gurus are unique people with tremendous talent and they are exceedingly rare, dying our breed.   Now you what you get is low level support which mostly consist of pointing you to a relevant (or often to an irrelevant) to your case article in the Red Hat knowledgebase. With patience you can upgrade your ticket and get to proper specialist, but the amount of effort might nor justify that -- for the amount of time and money you might be better off using a good external consultants. For example, from universities, which have high quality people in need of additional money.

In most cases the level of support still can be viewed as acceptable only if you have Premium subscription. But at least for me this creates resentment; that why now I an trying to install CentOS or Scientific Linux instead if they work OK for a particular application. You can buy Premium license only for one node out of several similar saving on many and using this node of Guinea Pig for the problems.

New licensing scheme, while improving the situation in many areas, has a couple of serious drawbacks , such as dealing with 4 socket servers and expiration of licenses problems. We will talk about 4 socket servers later (and who now need them if Intel packs 16 or more cores into one CPU ;-). But too tight control of licenses by licensing manager that Red Hat probably enjoys, is a hassle for me: if the license expired now this is "your problem", as there is no automatic renewal from the pool of available licenses (which was one of the few positive thing about now discontinued RHN).

Instead you need to write scripts and deploy them on all nodes to be able to use patch command on all nodes simultaneously via some parallel execution tool (which is OK way to deploy security patches after testing).

And please note that typically large corporation overpay Red Hat for licensing, as they have bad or non-operational licensing controls and prefer to err on higher number of licenses then they need. This adds insult to injury -- why on the earth we can't use available free licenses to automatically replace them when the license expires but equivalent pool of licenses is available and not used ?

Diversification of RHEL licensing and tech support providers

Of course, diversifying licensing from Red Hat now must be the option that should be given a serious consideration. One important fact is that college graduated now comes to the enterprise with the knowledge of Ubuntu. And due to that tend to deploy application using Docker, as they are more comfortable, more knowledgably in Ubuntu then in CentOS or Red Hat enterprise. This is a factor that should be considered.

But there is an obvious Catch 22 with switching to another distribution: adding Debian/Devian to the mix also is not free -- it increases the cost of administration by around the same threshold as the switch to RHEL7: introducing to the mix "yet another Linux distribution" usually have approximately the same cost estimated around 20-30% of loss of productivity of sysadmins. As such "diversification" of flavors of Linux in enterprise environment generally should be avoided at all costs.

So while paying for continuing systemd development to Red Hat is not the best strategy, switching to alternative distribution which is not RHEL derivative and typically uses a different package manager entrain substantial costs and substantial uncertainty. Many enterprise applications simply do not support flavors other then Red Hat and they require licensed server for the installation. So the pressure to conform with the whims of Red Hat brass is high and most people are not ready to drop Red Hat just due to the problems with systemd. So mitigation of the damage caused by systemd strategies are probably the most valuable avenue of actions. One such strategy is diversification of RHEL licensing and providers of tech support, while not abandoning RHEL compatible flavors.

This diversification strategy first of all should include larger use of CentOS as well as Oracle linux as more cost effective alternatives. This allow to purchase more costly "premium" license for server that really matter. Taking into account the increase complexity buying such a license this is just a sound insurance against increased uncertainty. You do need at least one such license for a each given class of servers at the datacenter. As well as cluster headnode, if you have computational clusters in your datacenter.

So a logical step would be switching to "license groups" in which only one server is licensed with expensive premium, subscription and all other are used with minimal self-support license. This plan can be executed with Oracle even better then with Red Hat as it has lower price for self-support subscription.

The second issue is connected with using Red Hat as support provider. Here we also have alternatives as both Dell and HOP provides "in house" support of RHEL for their hardware. While less known and popular Suse provided RHEL support too.

It is important to understand that due to excessive complexity of RHEL7, and the flow of tickets related to systemd, Red Hat tech support mostly degenerated to the level of "pointing to relevant Knowledgebase article." Sometime the article is relevant and helps to solve the problem, but often it is just a "go to hell" type of response, an imitation of support, if you wish. In the past (in time of RHEL 4) the quality of support was much better and can even discuss your problem with the support engineer. Now it is unclear what we are paying for. My experience suggests that most complex problem typically are in one way or another are connected with the hardware interaction with OS. If this observation is true it might be better to use alternative providers, which in many cases provide higher quality tech support as they are more specialized.

So it you have substantial money allocated to support (and here I mean 100 or more systems to support) you probably should be thinking about third party that suit your needs too. There are two viable options here:

RHEL resellers (for example, Dell and HP). In case Dell or HP engineers provide support for RHEL they naturally know their hardware much better then RHEL engineers. So in this critical area where it is unclear whether this is OS/driver, or hardware problem they are more easy to work with. Dell actually helps you to compile and test a new driver in such cases (I have one case when 4 port Intel card that came with Dell blade has broken driver in regular RHEL distribution and it needed to be replaced. They also are noticeably better in debugging complex cases when the server can't start normally. And there are some very tricky cases here. For example the problem in Dell can be connected with DRAC but demonstrate itself on OS level.

Alternative distributions vendors. Although this is little known fact and is not too heavily advertized, but both Oracle and Suse support Red Hat distribution too

In the past for large customers SUSE used to provide a "dedicated engineer" who can serve as your liaison to developers and tier III level of support.

For Oracle it is easier to get to the engineer in case of complex problem that is the case with Red Hat.

Note about licensing management system

There are two licensing system used by Red Hat

Classic(RHN) -- old system that was phased out in mid 2017 (now has only historical interest)

RHEL6 registration in RHN ("Red Hat Classic") on proxy protected network

RHEL5 registration in RHN ("Red Hat Classic") on proxy protected network

Registering Red Hat Enterprise Linux 4

"New" (RHSM) -- a new, better, system used predominantly on RHEL 6 and 7. Obligatory from August 2017.

Migrating systems from RHN to RHNSM

Registering a server using Red Hat Subscription Manager (RHSM)

RHSM is complex and requires study. Many hours of sysadmin time are wasted on mastering its complexities, while in reality this is an overhead that allows Red Hat to charge money for the product. So the fact that they are NOT supporting it well tells us a lot about the level of deterioration of the company.   So those Red Hat honchos with high salaries essentially create a new job in enterprise environment -- a license administrator. Congratulations !

All-in-all Red Hat successful created almost un-penetrable mess of obsolete and semi obsolete notes, poorly written and incomplete documentation, dismal diagnostic and poor troubleshooting tools. And the level of frustration sometimes reaches such a level that people just abandon RHEL. I did for several non-critical system. If CentOS or Academic Linux works there is no reason to suffer from Red Hat licensing issues. Also that makes Oracle, surprisingly, more attractive option too :-). Oracle Linux is also cheaper. But usually you are bound by corporate policy here.

"New" subscription system (RHSM) is slightly better then RHN for large organizations, but it created new probalme, for example a problem with 4 socket servers which now are treated as a distinct entity from two socket servers. In old RHN the set of your "entitlements" was treated uniformly as licensing tokens and can cover various number of sockets (the default is 2). For 4 socket server it will just take two 2-socket licenses. This was pretty logical (albeit expensive) solution. This is not the case with RHNSM. They want you to buy specific license for 4 socket server and generally those are tiled to the upper levels on RHEL licensing (no self-support for 4 socket servers). In RHN, at least, licenses were eventually converted into some kind of uniform licensing tokens that are assigned to unlicensed systems more or less automatically (for example if you have 4 socket system then two tokens were consumed). With RHNSM this is not true, which creating for large enterprises a set of complex problems. In general licensing by physical socket (or worse by number of cores -- an old IBM trick) is a dirty trick step of Red Hat which point its direction in the future.

RHSM allows to assign specific license to specific box and list the current status of licensing. But like RHN it requires to use proxy setting in configuration file, it does not take them from the environment. If the company has several proxies and you have mismatch you can be royally screwed. In general you need to check consistency of your environment settings with conf files settings. The level of understanding of proxies environment by RHEL tech support is basic of worse, so they are using the database of articles instead of actually troubleshooting based on sosreport data. Moreover each day there might a new person working on your ticket, so there no continuity.

RHEL System Registration Guide (https://access.redhat.com/articles/737393) is weak and does not cover more complex cases and typical mishaps.

"New" subscription system (RHSM) is slightly better then old RHN in a sense that it creates for you a pool of licensing and gives you the ability to assign more expensive licensees to the most valuable servers. It allows to assign specific license to specific box and to list the current status of licensing.

Troubleshooting

How to troubleshoot subscription-manager and yum issues

RHSM Subscription Issues Troubleshooting Do's and Don'ts

Learn More

Red Hat Subscription Management

product documentation - Red Hat Subscription Management

Why should I use Red Hat Subscription Management?

Understanding Red Hat Subscription Management

Subscription-manager for the former Red Hat Network User - a multi-part series

How to subscribe a VMware or Hyper-V guest using "Red Hat Enterprise Linux for Virtual Datacenter subscription"

Red Hat Network (RHN) Classic

What's the difference between RHN Classic and Red Hat Subscription Management?

Learn how to Migrate to Red Hat Subscription Management

How to migrate a Red Hat Enterprise Linux System from RHN Classic to RHSM

The RHEL System Registration Guide (https://access.redhat.com/articles/737393) which outlines major options available for registering a system (and carefully avoids mentioning bugs and pitfalls, which are many). For some reason migration from RHN to RHNSM usually worked well

Also might be useful (to the extent any Red Hat purely written documentation is useful) is the document: How to register and subscribe a system to the Red Hat Customer Portal using Red Hat Subscription-Manager (https://access.redhat.com/solutions/253273). At least it tires to answers to some most basic questions.

Other problems with RHEL 7

Mitigating damage done by systemd

Some tips:

First of all those daemons that are not designed to work with systemd or have problems with systemd can be started directly from cron using @reboot directive. Stopping them represent some problems but it can be solved by creating "fake" systemd entries (see below)

You can use Docker and run image that does not have problems with your application that your "Native RHEL 7" has. Including those that does not use Systemd. In a way Docker represents a technology which allow selective use of Devian and Debian in enterprise environment.

You can also run XEN and multiple images with some of them representing OS without systems. But that increases complexity.

Sometime you just need to be inventive and add additional startup script to mitigate the damage. Here is one realistic example (systemd sucks):

How to umount NFS before killing any processes. or How to save your state before umounting NFS. or The blind and angry leading the blind and angry down a thorny path full of goblins. April 29, 2017

A narrative, because the reference-style documentation sucks.

So, rudderless Debian installed yet another god-forsaken solipsist piece of over-reaching GNOME-tainted garbage on your system: systemd. And you've got some process like openvpn or a userspace fs daemon or so on that you have been explicitly managing for years. But on shutdown or reboot, you need to run something to clean up before it dies, like umount. If it waits until too late in the shutdown process, your umounts will hang.

This is a very very very common variety of problem. To throw salt in your wounds, systemd is needlessly opaque even about the things that it will share with you.

"This will be easy."

Here's the rough framework for how to make a service unit that runs a script before shutdown. I made a file /etc/systemd/system/greg.service (you might want to avoid naming it something meaningful because there is probably already an opaque and dysfunctional service with the same name already, and that will obfuscate everything):

[Unit] Description=umount nfs to save the world After=networking.service [Service] ExecStart=/bin/true ExecStop=/root/bin/umountnfs TimeoutSec=10 Type=oneshot RemainAfterExit=yes

The man pages systemd.unit(5) and systemd.service(5) are handy references for this file format. Roughly, After= indicates which service this one is nested inside -- units can be nested, and this one starts after networking.service and therefore stops before it. The ExecStart is executed when it starts, and because of RemainAfterExit=yes it will be considered active even after /bin/true completes. ExecStop is executed when it ends, and because of Type=oneshot, networking.service cannot be terminated until ExecStop has finished (which must happen within TimeoutSec=10 seconds or the ExecStop is killed).

If networking.service actually provides your network facility, congratulations, all you need to do is systemctl start greg.service, and you're done! But you wouldn't be reading this if that were the case. You've decided already that you just need to find the right thing to put in that After= line to make your ExecStop actually get run before your manually-started service is killed. Well, let's take a trip down that rabbit hole.

The most basic status information comes from just running systemctl without arguments (equivalent to list-units). It gives you a useful triple of information for each service:

greg.service loaded active exited

loaded means it is supposed to be running. active means that, according to systemd's criteria, it is currently running and its ExecStop needs to be executed some time in the future. exited means the ExecStart has already finished.

People will tell you to put LogLevel=debug in /etc/systemd/system.conf. That will give you a few more clues. There are two important steps about unit shutdown that you can see (maybe in syslog or maybe in journalctl):

systemd[4798]: greg.service: Executing: /root/bin/umountnfs systemd[1]: rsyslog.service: Changed running -> stop-sigterm

That is, it tells you about the ExecStart and ExecStop rules running. And it tells you about the unit going into a mode where it starts killing off the cgroup (I think cgroup used to be called process group). But it doesn't tell you what processes are actually killed, and here's the important part: systemd is solipsist. Systemd believes that when it closes its eyes, the whole universe blinks out of existence.

Once systemd has determined that a process is orphaned -- not associated with any active unit -- it just kills it outright. This is why, if you start a service that forks into the background, you must use Type=forking, because otherwise systemd will consider any forked children of your ExecStart command to be orphans when the top-level ExecStart exits.

So, very early in shutdown, it transitions a ton of processes into the orphaned category and kills them without explanation. And it is nigh unto impossible to tell how a given process becomes orphaned. Is it because a unit associated with the top level process (like getty) transitioned to stop-sigterm, and then after getty died, all of its children became orphans? If that were the case, it seems like you could simply add to your After rule.

After=networking.service getty.target

For example, my openvpn process was started from /etc/rc.local, so systemd considers it part of the unit rc-local.service (defined in /lib/systemd/system/rc-local.service). So After=rc-local.service saves the day!

Not so fast! The openvpn process is started from /etc/rc.local on bootup, but on resume from sleep it winds up being executed from /etc/acpi/actions/lm_lid.sh. And if it failed for some reason, then I start it again manually under su.

So the inclination is to just make a longer After= line:

After=networking.service getty.target acpid.service

Maybe [email protected]? Maybe systemd-user-sessions.service? How about adding all the items from After= to Requires= too? Sadly, no. It seems that anyone who goes down this road meets with failure. But I did find something which might help you if you really want to:

systemctl status 1234

That will tell you what unit systemd thinks that pid 1234 belongs to. For example, an openvpn started under su winds up owned by /run/systemd/transient/session-c1.scope. Does that mean if I put After=session-c1.scope, I would win? I have no idea, but I have even less faith. systemd is meddlesome garbage, and this is not the correct way to pay fealty to it.

I'd love to know what you can put in After= to actually run before vast and random chunks of userland get killed, but I am a mere mortal and systemd has closed its eyes to my existence. I have forsaken that road.

I give up, I will let systemd manage the service, but I'll do it my way!

What you really want is to put your process in an explicit cgroup, and then you can control it easily enough. And luckily that is not inordinately difficult, though systemd still has surprises up its sleeve for you.

So this is what I wound up with, in /etc/systemd/system/greg.service:

[Unit] Description=openvpn and nfs mounts After=networking.service [Service] ExecStart=/root/bin/openvpn_start ExecStop=/root/bin/umountnfs TimeoutSec=10 Type=forking

Here's roughly the narrative of how all that plays out:

When my wlan-discovery script is ready to start openvpn, it executes systemctl start greg.service.

Then /root/bin/openvpn_start is executed:

#!/bin/bash openvpn --daemon --config ... --remote `cat /run/openvpn/remoteinfo` ( echo 'nameserver 10.1.0.1'; echo 'search myvpn' ) | resolvconf -a tun0 mount | grep -q nfsmnt || mount -t nfs -o ... server:/export /nfsmnt exit 0

Because of Type=forking, systemd considers greg.service to be active as long as the forked openvpn is running (note - the exit 0 is important, if it gets a non-zero exit code from the mount command, systemd doesn't consider the service to be running).

Then a multitude of events cause the wlan-discovery script to run again, and it does a killall -9 openvpn. systemd sees the SIGCHLD from that, and determines greg.service is done, and it invokes /root/bin/umountnfs:

#!/bin/sh if [ "$EXIT_STATUS" != "KILL" ] then umount.nfs -f /nfsmnt fi

umountnfs does nothing because $EXIT_STATUS is KILL (more on this later)

wlan-discovery finishes connecting and re-starts greg.service

Eventually, system shutdown occurs and stops greg.service, executing /root/bin/umountnfs, but this time without EXIT_STATUS=KILL, and it successfully umounts.

Then the openvpn process is considered orphaned and systemd kills it.

While /root/bin/umountnfs is executing, I think that all of your other shutdown is occurring in parallel.

So, this EXIT_STATUS hack... If I had made the NFS its own service, it might be strictly nested within the openvpn service, but that isn't actually what I desire -- I want the NFS mounts to stick around until we are shutting down, on the assumption that at all other times, we are on the verge of openvpn restoring the connection. So I use the EXIT_STATUS to determine if umountnfs is being called because of shutdown or just because openvpn died (anyways, the umount won't succeed if openvpn is already dead!). You might want to add an export > /tmp/foo to see what environment variables are defined.

And there is a huge caveat here: if something else in the shutdown process interferes with the network, such as a call to ifdown, then we will need to be After= that as well. And, worse, the documentation doesn't say (and user reports vary wildly) whether it will wait until your ExecStop completes before starting the dependent ExecStop. My experiments suggest Type=oneshot will cause that sort of delay...not so sure about Type=forking.

Fine, sure, whatever. Let's sing Kumbaya with systemd.

I have the idea that Wants= vs. Requires= will let us use two services and do it almost how a real systemd fan would do it. So here's my files:

/etc/systemd/system/greg-openvpn.service:

[Unit] Description=openvpn Requires=networking.service After=networking.service [Service] ExecStart=/root/bin/openvpn_start TimeoutSec=10 Type=forking

/etc/systemd/system/greg-nfs.service:

[Unit] Description=nfs mounts Wants=greg-openvpn.service After=greg-openvpn.service [Service] ExecStart=/root/bin/mountnfs ExecStop=/root/bin/umountnfs TimeoutSec=10 Type=oneshot RemainAfterExit=yes

/root/bin/openvpn_start:

#!/bin/bash openvpn --daemon --config ... --remote `cat /run/openvpn/remoteinfo` ( echo 'nameserver 10.1.0.1'; echo 'search myvpn' ) | resolvconf -a tun0

/root/bin/mountnfs:

#!/bin/sh mount | grep -q nfsmnt || mount -t nfs -o ... server:/export /nfsmnt exit 0

/root/bin/mountnfs:

#!/bin/sh umount.nfs -f /nfsmnt

Then I replace the killall -9 openvpn with systemctl stop greg-openvpn.service, and I replace systemctl start greg.service with systemctl start greg-nfs.service, and that's it.

The Requires=networking.service enforces the strict nesting rule. If you run systemctl stop networking.service, for example, it will stop greg-openvpn.service first.

On the other hand, Wants=greg-openvpn.service is not as strict. On systemctl start greg-nfs.service, it launches greg-openvpn.service, even if greg-nfs.service is already active. But if greg-openvpn.service stops or dies or fails, greg-nfs.service is unaffected, which is exactly what we want. The icing on the cake is that if greg-nfs.service is going down anyways, and greg-openvpn.service is running, then it won't stop greg-openvpn.service (or networking.service) until after /root/bin/umountnfs is done.

Exactly the behavior I wanted. Exactly the behavior I've had for 14 years with a couple readable shell scripts. Great, now I've learned another fly-by-night proprietary system.

GNOME, you're as bad as MacOS X. No, really. In February of 2006 I went through almost identical trouble learning Apple's configd and Kicker for almost exactly the same purpose, and never used that knowledge again -- Kicker had already been officially deprecated before I even learned how to use it. People who will fix what isn't broken never stop.

As an aside - Allan Nathanson at Apple was a way friendlier guy to talk to than Lennart Poettering is. Of course, that's easy for Allan -- he isn't universally reviled.

A side story

If you've had systemd foisted on you, odds are you've got Adwaita theme too.

rm -rf /usr/share/icons/Adwaita/cursors/

You're welcome. Especially if you were using one of the X servers where animated cursors are a DoS. People who will fix what isn't broken never stop.

[update August 10, 2017]

I found out the reason my laptop double-unsuspends and other crazy behavior is systemd. I found out systemd has hacks that enable a service to call into it through dbus and tell it not to be stupid, but those hacks have to be done as a service! You can't just run dbus on the commandline, or edit a config file. So in a fit of pique I ran the directions for uninstalling systemd.

It worked marvelously and everything bad fixed itself immediately. The coolest part is restoring my hack to run openvpn without systemd didn't take any effort or thought, even though I had not bothered to preserve the original shell script. Unix provides some really powerful, simple, and *general* paradigms for process management. You really do already know it. It really is easy to use.

I've been using sysvinit on my laptop for several weeks now. Come on in, the water's warm!

So this is still a valuable tutorial for using systemd, but the steps have been reduced to one: DON'T.

[update September 27, 2017]

systemd reinvents the system log as a "journal", which is a binary format log that is hard to read with standard command-line tools. This was irritating to me from the start because systemd components are staggeringly verbose, and all that shit gets sent to the console when the services start/stop in the wrong order such that the journal daemon isn't available. (side note, despite the intense verbosity, it is impossible to learn anything useful about why systemd is doing what it is doing)

What could possibly motivate such a fundamental redesign? I can think of two things off the top of my head: The need to handle such tremendous verbosity efficiently, and the need to support laptops. The first need is obviously bullshit, right -- a mistake in search of a problem. But laptops do present a logging challenge. Most laptops sleep during the night and thus never run nightly maintenance (which is configured to run at 6am on my laptop). So nothing ever rotates the logs and they just keep getting bigger and bigger and bigger.

But still, that doesn't call for a ground-up redesign, an unreadable binary format, and certainly not deeper integration. There are so many regular userland hacks that would resolve such a requirement. But nevermind, because.

I went space-hunting on my laptop yesterday and found an 800MB journal. Since I've removed systemd, I couldn't read it to see how much time it had covered, but let me just say, they didn't solve the problem. It was neither an efficient representation where the verbosity cost is ameliorated, nor a laptop-aware logging system.

When people are serious about re-inventing core Unix utilities, like ChromeOS or Android, they solve the log-rotation-on-laptops problem.

Pretty convoluted RPM packaging system which creates problems

The idea of RPM was to simplify installation of complex packages. But they created of a set of problem of their own. Especially connected with libraries (which not exactly Red Hat problem, it is Linux problem called "libraries hell"). One example is so called multilib problem that is detected by YUM:

--> Finished Dependency Resolution Error: Multilib version problems found. This often means that the root cause is something else and multilib version checking is just pointing out that there is a problem. Eg.: 1. You have an upgrade for libicu which is missing some      dependency that another package requires. Yum is trying to      solve this by installing an older version of libicu of the      different architecture. If you exclude the bad architecture      yum will tell you what the root cause is (which package      requires what). You can try redoing the upgrade with      --exclude libicu.otherarch ... this should give you an error      message showing the root cause of the problem. 2. You have multiple architectures of libicu installed, but      yum can only see an upgrade for one of those arcitectures.      If you don't want/need both architectures anymore then you      can remove the one with the missing update and everything      will work. 3. You have duplicate versions of libicu installed already.      You can use "yum check" to get yum show these errors. ...you can also use --setopt=protected_multilib=false to remove this checking, however this is almost never the correct thing to do as something else is very likely to go wrong (often causing much more problems). Protected multilib versions: libicu-4.2.1-14.el6.x86_64 != libicu-4.2.1-11.el6.i686

The idea of precompiled package is great until it is not. And that's what we have now. Important package such as R language or Infiniband drivers from Mellanox routinely prevent the ability to patch systems in RHEL 6.

The total number of packages installed is just way too high with many overlapping packages. Typically it is over one thousand, unless you use base system, or HPC computational node distribution. In the latter case it is still over six hundred. So in no way you can understand the packages structure in your system without special tools. It became a complex and not very transparent layer of indirection between you and installed binaries and packages that you have on the system. And the level on which you know this subsystem is now important indication of the qualification of sysadmin. Along with networking and LAMP stack, or its enterprise variations.

Number of daemons running in the default RHEL installation is also very high and few sysadmins understand what all those daemons are doing and why they are running after the startup. In other word we already entered Microsoft world. In other words RHEL is Microsoft Windows of the Linux word. And with systemd pushed through the throat of enterprise customers, you will understand even less.

And while Red Hat support is expensive, the help in such cases form Red Hat support is marginal. Looks like they afraid not only of customers, but of their own packages too. All those guys do and to look into database to see if a similar problem is described. That works some some problems, but for more difficult one it usually does not. Using free version of Linux such as CentOS is an escape, but with commercial applications you are facing troubles: they can easily blame Os for the the problem you are having and them you are holding the bag.

No efforts are make to consolidate those hundreds of overlapping packages (with some barely supported or unsupported). This "library hell" is a distinct feature of modern enterprise linux distribution.

When /etc/resolv.conf is no longer a valid DNS configuration file

In RHEL7 Network Manager is the default configuration tool for the entire networking stack including DNS resolution. One interesting problem with Network Manager is that in default installation it happily overwrites /etc/resolv.conf putting the end to the Unix era during which you can assume that if you write something into config file it will be intact and any script that generate such a file should be able to detect that it was changed manually and re-parse changes or produce a warning.

In RHEL6 most sysadmins simply deinstall Network Manager on servers, and thus did not face its idiosyncrasies. BTW Network Manager is a part of Gnome project, which is pretty symbolic taking into account that the word "Gnome" recently became a kind of curse among Linux users ( GNOME 3.x has alienated both users and developers ;-).

In RHEL 6 and before certain installation options excluded Network Manager from installation by default (Minimal server in RHEL6 is one). For all other you can deinstall it after the installation. This is no longer recommended in RHEL7, but still you can disable it. See How to disable NetworkManager on CentOS - RHEL 7 for details. Still as the Network Manager is the preferred by Red Hat intermediary for connecting to the network for RHEL7 (and now is it is present even in minimal installation) many sysadmin prefer to keep it on RHEL7. People who tried run into various types of problems if they have a setup that was more or less non-trivial. See for example How to uninstall NetworkManager in RHEL7 Packages like Docker expect it to be present as well.

Non-documented by Red Hat solution exists even with Network Manager running (CentOS 7 NetworkManager Keeps Overwriting -etc-resolv.conf )

To prevent Network Manager to overwrite your resolv.conf changes, remove the DNS1, DNS2, ... lines from /etc/sysconfig/network-scripts/ifcfg-*.

and

...tell NetworkManager to not modify the DNS settings:

/etc/NetworkManager/NetworkManager.conf [main] dns=none

After that you need to restart Network manager service. Red Hat does not addresses this problem properly even in training courses:

Alex Wednesday, November 16, 2016 at 00:38 - Reply ↓

Ironically, Redhat’s own training manual does not address this problem properly.

I was taking a RHEL 7 Sysadmin course when I ran into this bug. I used nmcli thinking it would save me time in creating a static connection. Well, the connection was able to ping IPs immediately, but was not able to resolve any host addresses. I noticed that /etc/resolv.conf was being overwritten and cleared of it’s settings.

No matter what we tried, there was nothing the instructor and I could do to fix the issue. We finally used the “dns=none” solution posted here to fix the problem.

Actually the behaviour is more complex and tricky then I described and hostnamectl produce the same effect of overwriting the file:

Brian Wednesday, March 7, 2018 at 01:08 -

Thank you Ken – yes – that fix finally worked for me on RHEL 7!

Set “dns=none” in the [man] section of /etc/NetworkManager/NetworkManager.conf
Tested: with “$ cat /etc/resolv.conf” both before and after “# service network restart” and got the same output!

Otherwise I could not find out how to reliably set the “search” domains list, as I did not see an option in the /etc/sysconfig/network-scripts/ifcfg-INT files.

Brian Wednesday, March 7, 2018 at 01:41 -

Brian again here… note that I also had “DNS1, DNS2” removed from /etc/sysconfig/nework-scripts/ifcfg-INT.

CAUTION: the “hostnamectl”[1] command will also reset /etc/resolv.conf rather bluntly… replacing the default “search” domain and deleting any “nameserver” entries. The file will also include the “# Generated by NetworkManager” header comment.

[1] e.g. #hostnamectl set-hostname newhost.domain –static; hostnamectl status
Then notice how that will overwrite /etc/resolv.conf as well

But this is troubling sign: now in addition to which configuration file you need to edit, you need to know which files you can edit and which you can't (or how you can disable the default behaviour). Which is a completely different environment which is just one step away from Microsoft Registry.

Moreover, even minimal installation of RHEL7 has over hundred of *.conf files.

Problems with architectural vision of Red Hat brass

Both architectural level of thinking of Red Hat brass (with daemons like avahi, systemd, network manager installed by default for all types of servers ) and clear attempts along the lines "Not invented here" in virtualization creates concerns. It is clear that Red Hat by itself can't become a major virtualization player like VMware. It just does not have enough money for development and marketing. That's why they now try to became a major player in "private cloud" space with Docker.

You would think that the safest bet is to reuse the leader among open source offerings, which is currently Xen. But Red Hat brass thinks differently and wants to play more dangerous poker game: it started promoting KVM, making it obligatory part of RHCSA exam. Actually, Red Hat has released Enterprise Linux 5 with integrated Xen and then changed their mind after RHEL 5.5 or so. In RHEL 6 Xen no longer present even as an option. It was replaced by KVM.

What is good that after ten years they eventually manage to some extent to re-implement Solaris 10 zones (without RBAC). In RHEL 7 they are more or less usable.

Security overkill with SELinux

RHEL contain security layer called SELinux, but in most cases of corporate deployment it is either disabled, or operates in permissive mode. The reason is that is notoriously difficult to configure correctly and in many cases the game does not worth the candles. This is a classic problem that overcomplexity created: functionality to me OS more secure is here, but it not used/configured properly, because very few sysadmin can operate on the required level of complexity.

Firewall which is a much simpler concept proved to be tremendously more usable in corporate deployments, especially in cases when you have obnoxious or incompetent security department (a pretty typical situation for a large corporation ;-) as it prevents a lot of stupid questions from utterly incompetent "security gurus" about opened ports and can stop dead scanning attempts of tools that test for known vulnerabilities and by using which security departments are trying to justify their miserable existence. Those tools sometimes crash production servers.

Generally it is dangerous to allow exploits used in such tools which local script kiddies (aka "security team") recklessly launch against your production server (as if checking for a particular vulnerability using internal script is inferior solution). Without understanding of their value (which is often zero) and possible consequences (which are sometimes non zero ;-)

Another interesting but seldom utilized option is to use AppArmor which is now integrated into Linux kernel. AppArmor is a security module for Linux kernel and it is part of the mainstream kernel since 2.6.36 kernel. It is considered as an alternative to SELinux and is IMHO more elegant, more understandable solution to the same problem. But you might need to switch to Suse in this case. Red Hat Enterprise Linux kernel doesn't have support for AppArmor security modules ( Does Red Hat Enterprise Linux support AppArmor )

To get the idea about the level of complexity of SE try to read the RHEL7 Deployment Guide. Full set of documentation is available from www.redhat.com/docs/manuals/enterprise/ So it is not accidental that in many cases SElinix is disabled in enterprise installations. Some commercial software packages explicitly recommend to disable it in their installation manuals.

Unfortunately AppArmor which is/was used in SLES (only by knowledgeable sysadmins ;-), never got traction and never achieved real popularity iether (SLES now has SELinux as well, and as such suffers from overcomplexity even more then RHEL ;-). It is essentially the idea of introducing "per-application" unmask for all major directories, which can block dead attempts to write to them and/or read the information from certain sensitive files, even if the application has a vulnerability.

Escaping potential exploits by using Solaris on Intel

As I mentioned above patching became high priority activity in large Red Hat enterprise customers. Guidelines now are strict and usually specify monthly or in best case quality application of security patches. This amount of efforts might be better applied elsewhere with much better return on investment.

Solaris has an interesting security subsystem called RBAC, which allow selectively grant part of root privileges and can be considered as a generalization of sudo. But Solaris is dying in enterprise environment as Oracle limits it to their own (rather expensive) hardware. But you can use Solaris for X86 via VM (XEN). IMHO if you need a really high level of security for a particular server, which does not have any fancy applications installed, this sometimes might be a preferable path to go despite being "security via obscurity" solution. There is no value of using Linux in highly security applications as Linux is the most hacked flavor of Unix. And this situation will not change in the foreseeable future.

This solution is especially attractive if you still have knowledgeable Solaris sysadmin from "the old guard" on the floor. Security via obscurity actually works pretty well in this case; add to this RBAC capabilities and you have a winner. The question here is why take additional risks with "zero day" Linux exploits, if you can avoid them. See Potemkin Villages of Computer Security for more detailed discussion.

I never understood IT managers who spend additional money on enhancing linux security. Especially via "security monitoring" solutions using third party providers, which more often then not is a complete, but expensive fake. Pioneered by ISS more then ten year ago.

Normal hardening scripts are OK, but to spend money of same fancy and expensive system to enhance Linux security is a questionable option in my opinion, as Linux will always be the most attacked flavor of Unix with the greatest number of zero time exploits. That is especially true for foreign companies operating in the USA. You can be sure that specialists in NSA are well ahead of any hackers in zero time exploits for Linux (and, if not Linux, then CISCO is good enough too ;-)

So instead of Sisyphus task of enhancing Linus security via keeping up with patching schedule (which is a typical large enterprise mantra, as this is the most obvious and understandable by corporate IT brass path) it makes sense to switch to a different OS for critical servers, especially Internet facing servers or to use a security appliance to block most available paths to the particular server group (for example one typical blunder is allocating IP addresses for DRAC/ILO remote controls on the same segment as the main server interface; those "internal appliances" rarely have up-to-date firmware, sometimes have default passwords, are in general are much more hackable and do need additional protection by a separate firewalls which limit access to selected sysadmins desktops with static IP addresses).

My choice would be Oracle Solaris as this is a well architectured by Sun Microsystems OS with an excellent security record and additional, unique security mechanisms (up to Trusted Solaris level). And a good thing is that Oracle (so far) did not spoil it with excessive updates, like Red Hat spoiled it distribution with RHEL 7. Your mileage may vary.

Also important in today's "semi-outsourced" IT environments is the level of competence and level of loyalty of the people who are responsible for selecting and implementing security solutions. For example low loyalty of contractor based technical personnel naturally leads to increased probability of security incidents and/or signing useless or harmful for the enterprise contracts: security represents the new Eldorado for snake oil sellers. A good recent example was (in the past) tremendous market success of ISS intrusion detection appliances. Which were as close to snake oil as one can get. May be that's why they were bought by IBM for completely obscene amount of money: list of fools has tremendous commercial value for such shrewd players as IBM.

In such an environment "security via obscurity" is probably the optimal path of increasing the security of both OS and typical applications. Yes, Oracle is more expensive. But there is no free lunch. I am actually disgusted with the proliferation of security products for Linux ;-)

Road to hell in paved with good intentions: biosdevname package on Dell servers

Loss of architectural integrity of Unix is now very pronounced in RHEL. Both RHEL6 and RHEL7 although 7 is definitely worse. And this is not only systemd fiasco. For example, recently I spend a day troubleshooting an interesting and unusual problem: one out of 16 identical (both hardware and software wise) blades in a small HPC cluster (and only one) failed to enable bonded interface on boot and this remained off line. Most sysadmins would think that something is wrong with hardware. For example, Ethernet card on the blade and/or switch port or even internal enclosure interconnect. I also initially thought his way. But this was not the case ;-)

Tech support from Dell was not able to locate any hardware problem although they did diligently upgraded CMC on enclosure and BIOS and firmware on the blade. BTW this blade has had similar problems in the past and Dell tech support once even replaced Ethernet card in it, thinking that it is culprit. Now I know that this was a completely wrong decision on their part, and waist of both time and money :-), They come to this conclusion by swapping the blade to a different slot and seeing that the problem migrated into new slot. Bingo -- the card is the root cause. The problem is that it was not. What is especially funny is that replacing the card did solve the problem for a while. After reading the information provided below you will be as puzzled as me why it happened.

To make long story short the card was changed but after yet another power outage the problem returned. This time I started to suspect that the card has nothing to do with the problem. After more close examination I discovered that in its infinite wisdom in RHEL 6 Red Hat introduced a package called biosdevname. The package was developed by Dell (the fact which seriously undermined my trust in Dell hardware ;-). This package renames interfaces to a new set of names, supposedly consistent with their etching on the case of rack servers. It is useless (or more correctly harmful) for blades. The package is primitive and does not understand that server is a rack server or blade. Moreover by doing this supposedly useful remaining this package introduces in 70-persistent-net.rules file a stealth rule:

KERNEL=="eth*", ACTION=="add", PROGRAM="/sbin/biosdevname -i %k", NAME="%c"

I did not look at the code, but from the observed behaviour it looks that in some cases in RHEL 6 (and most probably in RHEL 7 too) the package adds a "stealth" rule to the END (not the beginning but the end !!!) of /udev/rules.d/70-persistent-net.rules file, which means that if similar rule in 70-persistent-net.rules exits it is overwritten. Or something similar to this effect.

If you look at Dell knowledge base there are dozens of advisories related to this package (search just for biosdevname). Which suggests that there some deeply wrong with its architecture.

What I observed that on some (the key word is some, converting the situation in Alice in Wonderland environment) rules for interfaces listed in 70-persistent-net.rules file simply does not work, if this package is enabled. For example Dell Professional services in their infinite wisdom renamed interfaces back to eth0-eth4 for Inter X710 4-port 10Gb Ethernet card that we have on some blades. On 15 out of 16 blades in the Dell enclosure this absolutely wrong solution works perfectly well. But on blades 16 sometimes it does not. As a result this blade does not boot after power outage of reboot. When this happens is unpredictable. Sometimes it boots, but sometimes it does not. And you can't understand what is happening, no matter how hard you try because of stealth nature of changes introduced by biosdevname package.

Two interfaces on this blades (as you now suspect eth0 and eth1) were bonded on this blade. After around 6 hours of poking around the problem I discover that despite presence of rule for "eth0-eth4 in 70-persistent-net.rules file RHEL 6.7 still renames on boot all four interfaces to em0-em4 scheme and naturally bonding fails as eth0 and eth1 interfaces do not exit.

First I decided to deinstall biosdevname package and see what will happen. Did not work (see below why -- de-installation script in this RPM is incorrect and contains a bug -- it is not enough to remove files you also need to run the command update-initramfs -u (Hat tip to Oler). Searching for "Renaming em to eth" I found a post in which the author recommended disabling this "feature" via adding biosdevname=0 to kernel parameters /etc/grub.conf.That worked. So two days my life were lost for finding a way to disable this completely unnecessary for blades RHEL "enhancement".

Here is some information about this package

biosdevname Copyright (c) 2006, 2007 Dell, Inc. Licensed under the GNU General Public License, Version 2. biosdevname in its simplest form takes a kernel device name as an argument, and returns the BIOS-given name it "should" be. This is necessary on systems where the BIOS name for a given device (e.g. the label on the chassis is "Gb1") doesn't map directly and obviously to the kernel name (e.g. eth0).

The distro-patches/sles10/ directory contains a patch needed to integrate biosdevname into the SLES10 udev ethernet naming rules.This also works as a straight udev rule. On RHEL4, that looks like:

KERNEL=="eth*", ACTION=="add", PROGRAM="/sbin/biosdevname -i %k", NAME="%c"

This makes use of various BIOS-provided tables:

PCI Confuration Space PCI IRQ Routing Table ($PIR) PCMCIA Card Information Structure SMBIOS 2.6 Type 9, Type 41, and HP OEM-specific types

therefore it's likely that this will only work well on architectures that provide such information in their BIOS.

To add insult to injury this behaviour was demonstrated on only one of 16 absolutely identically configured Dell M630 blades with identical hardware and absolutely identical (cloned) OS instances. Which makes RHEL real "Alice in Wonderland" system. This is just one example. I have more similar stories to tell

I would like to repeat that while idea is not completely wrong and sometimes might even make sense, the package itself is very primitive and the utility that is included in this package does not understand that the target for installation is a blade (NOTE to DELL: there is no etchings on blade network interfaces ;-)

If you look at this topic using your favorite search engine (which should be Google anymore ;-) you will find dozens of posts in which people try to resolve this problem with various levels of competency and success. Such a tremendous waist of time and efforts. Among best that I have found were:

networking - interface device name em1 to eth0 linux 14.04 lts - Ask Ubuntu

Solved Renaming em1 to eth0 on Red Hat Enterprise Linux 6 SysArchitects (this might be original post (dated Feb 23, 2012) that introduced the solution biosdevname=0although it does not understand the influence of biosdevname package on initramfs and the necessity to run update-initramfs -u after the de-installation of the package.

ethernet - Change eth0 to em1 on Scientific Linux - Unix & Linux Stack Exchange

But this is not the only example of harmful packages installed. They also install audio packages on servers that have no audio card ;-)

Current versions and year of end of support

As of October 2018 supported versions of RHEL are 6.10 and 7.3-7.5. Usually a large enterprise uses a mixture of versions, so knowing the end of support for each is a common problem. Compatibility within a single version of RHEL is usually very good (I would say on par with Solaris) and the risk on upgrading from, say, 6.5 to 6.10 is minimal. There were some broken minor upgrades like RHEL 6.6, but this is a rare even, and this particular on is in the past.

Problems arise with major version upgrade. Usually total reinstallation is the best bet. Formally you can upgrade from RHEL 6.10 to RHEL 7.5 but only for a very narrow class of servers and not much installed (or much de-installed ;-) . You need to remove Gnome via yum groupremove gnome-desktop, if it works (how to remove gnome-desktop using yum ) and several other packages.

Against, here your mileage may vary and reinstallation might be a better option (RedHat 6 to RedHat 7 upgrade):

1. Prerequisites

1. Make sure that you are on running on the latest minor version (i.e rhel 6.8).

2. The upgrade process can handle only the following package groups and packages: Minimal (@minimal),
Base (@base), Web Server (@web-server), DHCP Server, File Server (@nfs-server), and Print Server
(@print-server).

3. Upgrading GNOME and KDE is not supported, So please uninstall the GUI desktop before the upgrade and
install it after the upgrade.

4. Backup the entire system to avoid potential data loss.

5. If in case you system is registered with RHN classic , you must first unregister from RHN classic and
register with subscription-manager.

6. Make sure /usr is not on separate partition.

2. Assessment

Before upgrading we need to assess the machine first and check if it is eligible for the upgrade and this can be
done by an utility called "Preupgrade Assistant".

Preupgrade Assistant does the following:

Assesses the system for any limitations such as package removal etc.

When run this will generate an assessment results in the html format.

This will not change anything on the system except for storing some information.

We can run this tool multiple times until all the issues get resolved.

2.1. Installing dependencies of Preupgrade Assistant:

Preupgrade Assistant needs some dependencies packages (openscap , openscap-engine-sce, openscap-
utils , pykickstart, mod_wsgi) all these packages can be installed from the installation media but "openscap-
engine-sce" package needs to be downloaded from the redhat portal.

See Red Hat Enterprise Linux - Wikipedia. and DistroWatch.com Red Hat Enterprise Linux.

Tip:

In linux there is no convention for determination which flavor of linux you are running. For Red Hat in order to determine which version is installed on the server you can use command

cat /etc/redhat-release

Oracle linux adds its own file preserving RHEL file, so a more appropriate command would be

cat /etc/*release

End of support issues

See Red Hat Enterprise Linux Life Cycle - Red Hat Customer Portal:

RHEL 7 was released in June 2014. Support should last till June 30, 2024. As version 7 is the most problematic for server sysadmins. Logs now are binary like in Windows and special utility journalctl is used to access them. To avoid incompatibility they forward log to the regular rsyslogd, but the format of many messages now is different. Also in case of crash of journald, log messages might nor be available. Runlevels are imitated with targets.

RHEL 6 was released in November 2010. So technical support and patching will last till November 30, 2020. It initially gave me impression of half-baked, rushed to customer distribution and may be signal internal crisis in RHEL development as in some areas it is worse then RHEL 5.6 and later. It stabilized around version 6.5. Bash version is 4.1. It updated Perl to 5.10.1. PHP to 5.3.2. It was uses NFT4 by default.

6.0 Red Hat Enterprise Linux 6 (Santiago), 10 November 2010 ) kernel 2.6.32-71)

6.1, also termed Update 1, 19 May 2011 (kernel 2.6.32-131)

6.2, also termed Update 2, 6 December 2011 (kernel 2.6.32-220)

6.3, also termed Update 3, 20 June 2012 (kernel 2.6.32-279)

6.4, also termed Update 4, 21 February 2013 (kernel 2.6.32-358)

6.5, also termed Update 5, 21 November 2013 (kernel 2.6.32-431)

6.6, also termed Update 6, 13 October 2014 (kernel 2.6.32-504)

6.7 (ended July 31, 2017)

6.8 -- unclear but probably 2020

6.9 -- unclear but probably 2020

6.10 -- unclear but probably 2020

RHEL 5 was released 14 March 2007. Support ended March 31, 2017. It is probably one of the most stable version of Red Hat I ever encountered. It still support more or less recent hardware (Oracle provides updated kernel if you want it). This is a very conservative distribution. For example, it still uses such really old (or obsolete, if you wish) versions as bash 3.2.25, Perl 5.8.8, and Python 2.4.3. The last release is 5.11

5.0. Red Hat Enterprise Linux 5 (Tikanga), 14 March 2007. (kernel 2.6.18-8)

5.1, also termed Update 1, 7 November 2007 (kernel 2.6.18-53)

5.2, also termed Update 2, 21 May 2008 (kernel 2.6.18-92)

5.3, also termed Update 3, 20 January 2009 (kernel 2.6.18-128)

5.4, also termed Update 4, 2 September 2009 (kernel 2.6.18-164)

5.5, also termed Update 5, 30 March 2010 (kernel 2.6.18-194)

5.6, also termed Update 6, 13 January 2011 (kernel 2.6.18-238)

5.7, also termed Update 7, 21 July 2011 (kernel 2.6.18-274)

5.8, also termed Update 8, 20 February 2012 (kernel 2.6.18-308)

5.9, also termed Update 9, 7 January 2013 (kernel 2.6.18-348)

5.10, also termed Update 10, 1 October 2013 (kernel 2.6.18-371)

5.11, also termed Update 11, 16 September 2014 (kernel 2.6.18-398)

Oracle produced improved kernel for 5.x versions based of later version of linux kernel then "stock" RHEL kernel. It might benefit stability if you are running Oracle applications. It is 64-bit only and is more capricious toward hardware then Red Hat stack kernel so your mileage can vary.

Avoiding useless daemons during installation

While new Anaconda sucks, you still can improve the typical for RHEL situation with a lot of useless daemons installed by carefully selecting packages and then reusing generated kickstart file. That can be done via advanced menu for one box and then using this kickstart file for all other boxes with minor modifications.

Kickstart still works, despite trend toward overcomplexity in other parts of distribution. They did not manage to screw it up yet ;-)

Acknowledgment. I would like to thank sysadmins who have sent me the feedback for the version 1.0 of this article. It definitely helped to improve it. Mistakes and errors in the current version are my own.

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D

Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: September 04, 2019