Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Bigger doesn't imply better. Bigger often is a sign of obesity, of lost control, of overcomplexity, of cancerous cells

Text files processing

News Red Hat Certification Program Understanding and using essential tools Access a shell prompt and issue commands with correct syntax Finding Help Managing files in RHEL Working with hard and soft links Working with archives and compressed files Using the Midnight Commander as file manager
Text files processing Using redirection and pipes Use grep and extended regular expressions to analyze text files Connecting to the server via ssh, using multiple consoles and screen command Introduction to Unix permissions model Tips Sysadmin Horror Stories Unix History with some Emphasis on Scripting Humor

You need to know several text utilities to be a good system administrators. They are also called filter and they often are used as stages in pipes to create quick ad hoc program by sysadmin. Combines with pipes they constitute powerful non-procedural language that any administrator should know and use. Pipes are cascading redirection in which output of one program (stage of the pipe) serves as an input for another program (the next stage of the pipe).  Symbol | is used to separate stages of the pipe. Here is an example of multistage pipe (it operated on http logs and output the list of the most visited sites): 

gzip -dc http_logs.gz | grep '" 200' | cut -d '"' -f 2 | cut -d '/' -f 3 | \ 
     tr '[:upper:]' '[:lower:]' | sort | uniq -c | sort -r > most_frequent_sites.lst

There are over a hundred utilities in Red Hat that sysadmin needs to know. Among them two dozens or so are filters. The above example uses just five filters (grep, cut, tr, sort, and uniq ) accomplish pretty complex task -- creating from HTTP proxy log the list of most frequently accesses site

In this lesson we will study just 9 filters.

  1. cat
  2. cut
  3. less
  4. head
  5. tail
  6. sort
  7. wc
  8. grep
  9. uniq

Please note that until Red Hat screws its flavor of linux by introduction of systemd (which includes journald  with its log database) log files in Linux were regular text files. Truth be told Red Hat partially corrected this blunder redirecting log from journald to traditional log daemons and this preserving text file trandition for logs, which allow their processing with filters and pipes.

cat

The most basic command for reading files is cat. The cat filename command scrolls the text within the filename file. It also works with multiple filenames; it concatenates the filenames that you might list as one continuous output to your screen. You can redirect the output to the filename of your choice

less and more

Larger files demand command utilities that can help you scroll through the file text at your leisure. These utilities are known as pagers, and the most common are more and less. With the more filename command, you can scroll through the text of a file, from start to finish, one screen at a time. With the less filename command, you can scroll in both directions through the same text with the PAGE UP, PAGE DOWN, and arrow keys. More does not have this  capability. Both commands support vi-style searches.

As the less and more commands do not change files, they’re an excellent way to scroll through and search for items in a large text file such as an error log. For example, you can seach for a pattern in /var/log/messages file. You can search multiple files. That is, if the search reaches the END of the current file without finding a match, the search continues in the next file in the command line list.

Less can read gzipped file without specifying additional parameters and as such can be used instead of man to view man pages as well as compresses /var/log/messages files.

Less has subset of command that resembles vi, so any user of vi are instantly at home using less.  At any time you can switch to editor by typing v command.

To search in the reverse direction, substitute a ? for /.

Here are commands from the web page. M any of them I never used in my life ;-)

In the following descriptions, ^X means control-X. ESC stands for the ESCAPE key; for example ESC-v means the two character sequence "ESCAPE", then "v".
h or H
Help: display a summary of these commands. If you forget all the other commands, remember this one.
SPACE or ^V or f or ^F
Scroll forward N lines, default one window (see option -z below). If N is more than the screen size, only the final screenful is displayed. Warning: some systems use ^V as a special literalization character.
z
Like SPACE, but if N is specified, it becomes the new window size.
ESC-SPACE
Like SPACE, but scrolls a full screenful, even if it reaches end-of-file in the process.
RETURN or ^N or e or ^E or j or ^J
Scroll forward N lines, default 1. The entire N lines are displayed, even if N is more than the screen size.
d or ^D
Scroll forward N lines, default one half of the screen size. If N is specified, it becomes the new default for subsequent d and u commands.
b or ^B or ESC-v
Scroll backward N lines, default one window (see option -z below). If N is more than the screen size, only the final screenful is displayed.
w
Like ESC-v, but if N is specified, it becomes the new window size.
y or ^Y or ^P or k or ^K
Scroll backward N lines, default 1. The entire N lines are displayed, even if N is more than the screen size. Warning: some systems use ^Y as a special job control character.
u or ^U
Scroll backward N lines, default one half of the screen size. If N is specified, it becomes the new default for subsequent d and u commands.
ESC-) or RIGHTARROW
Scroll horizontally right N characters, default half the screen width (see the -# option). If a number N is specified, it becomes the default for future RIGHTARROW and LEFTARROW commands. While the text is scrolled, it acts as though the -S option (chop lines) were in effect.
ESC-( or LEFTARROW
Scroll horizontally left N characters, default half the screen width (see the -# option). If a number N is specified, it becomes the default for future RIGHTARROW and LEFTARROW commands.
r or ^R or ^L
Repaint the screen.
R
Repaint the screen, discarding any buffered input. Useful if the file is changing while it is being viewed.
F
Scroll forward, and keep trying to read when the end of file is reached. Normally this command would be used when already at the end of the file. It is a way to monitor the tail of a file which is growing while it is being viewed. (The behavior is similar to the "tail -f" command.)
g or < or ESC-<
Go to line N in the file, default 1 (beginning of file). (Warning: this may be slow if N is large.)
G or > or ESC->
Go to line N in the file, default the end of the file. (Warning: this may be slow if N is large, or if N is not specified and standard input, rather than a file, is being read.)
p or %
Go to a position N percent into the file. N should be between 0 and 100, and may contain a decimal point.
P
Go to the line containing byte offset N in the file.
{
If a left curly bracket appears in the top line displayed on the screen, the { command will go to the matching right curly bracket. The matching right curly bracket is positioned on the bottom line of the screen. If there is more than one left curly bracket on the top line, a number N may be used to specify the N-th bracket on the line.
}
If a right curly bracket appears in the bottom line displayed on the screen, the } command will go to the matching left curly bracket. The matching left curly bracket is positioned on the top line of the screen. If there is more than one right curly bracket on the top line, a number N may be used to specify the N-th bracket on the line.
(
Like {, but applies to parentheses rather than curly brackets.
)
Like }, but applies to parentheses rather than curly brackets.
[
Like {, but applies to square brackets rather than curly brackets.
]
Like }, but applies to square brackets rather than curly brackets.
ESC-^F
Followed by two characters, acts like {, but uses the two characters as open and close brackets, respectively. For example, "ESC ^F < >" could be used to go forward to the > which matches the < in the top displayed line.
ESC-^B
Followed by two characters, acts like }, but uses the two characters as open and close brackets, respectively. For example, "ESC ^B < >" could be used to go backward to the < which matches the > in the bottom displayed line.
m
Followed by any lowercase letter, marks the current position with that letter.
'
(Single quote.) Followed by any lowercase letter, returns to the position which was previously marked with that letter. Followed by another single quote, returns to the position at which the last "large" movement command was executed. Followed by a ^ or $, jumps to the beginning or end of the file respectively. Marks are preserved when a new file is examined, so the ' command can be used to switch between input files.
^X^X
Same as single quote.
/pattern
Search forward in the file for the N-th line containing the pattern. N defaults to 1. The pattern is a regular expression, as recognized by the regular expression library supplied by your system. The search starts at the second line displayed (but see the -a and -j options, which change this).

Certain characters are special if entered at the beginning of the pattern; they modify the type of search rather than become part of the pattern:

^N or !
Search for lines which do NOT match the pattern.
^E or *
Search multiple files. That is, if the search reaches the END of the current file without finding a match, the search continues in the next file in the command line list.
^F or @
Begin the search at the first line of the FIRST file in the command line list, regardless of what is currently displayed on the screen or the settings of the -a or -j options.
^K
Highlight any text which matches the pattern on the current screen, but don't move to the first match (KEEP current position).
^R
Don't interpret regular expression metacharacters; that is, do a simple textual comparison.
?pattern
Search backward in the file for the N-th line containing the pattern. The search starts at the line immediately before the top line displayed.

Certain characters are special as in the / command:

^E or *
Search multiple files. That is, if the search reaches the beginning of the current file without finding a match, the search continues in the previous file in the command line list.
^F or @
Begin the search at the last line of the last file in the command line list, regardless of what is currently displayed on the screen or the settings of the -a or -j options.
^K
As in forward searches.
^R
As in forward searches.
ESC-/pattern
Same as "/*".
ESC-?pattern
Same as "?*".
n
Repeat previous search, for N-th line containing the last pattern. If the previous search was modified by ^N, the search is made for the N-th line NOT containing the pattern. If the previous search was modified by ^E, the search continues in the next (or previous) file if not satisfied in the current file. If the previous search was modified by ^R, the search is done without using regular expressions. There is no effect if the previous search was modified by ^F or ^K.
N
Repeat previous search, but in the reverse direction.
ESC-n
Repeat previous search, but crossing file boundaries. The effect is as if the previous search were modified by *.
ESC-N
Repeat previous search, but in the reverse direction and crossing file boundaries.
ESC-u
Undo search highlighting. Turn off highlighting of strings matching the current search pattern. If highlighting is already off because of a previous ESC-u command, turn highlighting back on. Any search command will also turn highlighting back on. (Highlighting can also be disabled by toggling the -G option; in that case search commands do not turn highlighting back on.)
&pattern
Display only lines which match the pattern; lines which do not match the pattern are not displayed. If pattern is empty (if you type & immediately followed by ENTER), any filtering is turned off, and all lines are displayed. While filtering is in effect, an ampersand is displayed at the beginning of the prompt, as a reminder that some lines in the file may be hidden.

Certain characters are special as in the / command:

^N or !
Display only lines which do NOT match the pattern.
^R
Don't interpret regular expression metacharacters; that is, do a simple textual comparison.
:e [filename]
Examine a new file. If the filename is missing, the "current" file (see the :n and :p commands below) from the list of files in the command line is re-examined. A percent sign (%) in the filename is replaced by the name of the current file. A pound sign (#) is replaced by the name of the previously examined file. However, two consecutive percent signs are simply replaced with a single percent sign. This allows you to enter a filename that contains a percent sign in the name. Similarly, two consecutive pound signs are replaced with a single pound sign. The filename is inserted into the command line list of files so that it can be seen by subsequent :n and :p commands. If the filename consists of several files, they are all inserted into the list of files and the first one is examined. If the filename contains one or more spaces, the entire filename should be enclosed in double quotes (also see the -" option).
^X^V or E
Same as :e. Warning: some systems use ^V as a special literalization character. On such systems, you may not be able to use ^V.
:n
Examine the next file (from the list of files given in the command line). If a number N is specified, the N-th next file is examined.
:p
Examine the previous file in the command line list. If a number N is specified, the N-th previous file is examined.
:x
Examine the first file in the command line list. If a number N is specified, the N-th file in the list is examined.
:d
Remove the current file from the list of files.
t
Go to the next tag, if there were more than one matches for the current tag. See the -t option for more details about tags.
T
Go to the previous tag, if there were more than one matches for the current tag.
= or ^G or :f
Prints some information about the file being viewed, including its name and the line number and byte offset of the bottom line being displayed. If possible, it also prints the length of the file, the number of lines in the file and the percent of the file above the last displayed line.
-
Followed by one of the command line option letters (see OPTIONS below), this will change the setting of that option and print a message describing the new setting. If a ^P (CONTROL-P) is entered immediately after the dash, the setting of the option is changed but no message is printed. If the option letter has a numeric value (such as -b or -h), or a string value (such as -P or -t), a new value may be entered after the option letter. If no new value is entered, a message describing the current setting is printed and nothing is changed.
--
Like the - command, but takes a long option name (see OPTIONS below) rather than a single option letter. You must press RETURN after typing the option name. A ^P immediately after the second dash suppresses printing of a message describing the new setting, as in the - command.
-+
Followed by one of the command line option letters this will reset the option to its default setting and print a message describing the new setting. (The "-+X" command does the same thing as "-+X" on the command line.) This does not work for string-valued options.
--+
Like the -+ command, but takes a long option name rather than a single option letter.
-!
Followed by one of the command line option letters, this will reset the option to the "opposite" of its default setting and print a message describing the new setting. This does not work for numeric or string-valued options.
--!
Like the -! command, but takes a long option name rather than a single option letter.
_
(Underscore.) Followed by one of the command line option letters, this will print a message describing the current setting of that option. The setting of the option is not changed.
__
(Double underscore.) Like the _ (underscore) command, but takes a long option name rather than a single option letter. You must press RETURN after typing the option name.
+cmd
Causes the specified cmd to be executed each time a new file is examined. For example, +G causes less to initially display each file starting at the end rather than the beginning.
V
Prints the version number of less being run.
q or Q or :q or :Q or ZZ
Exits less.

The following four commands may or may not be valid, depending on your particular installation.

v
Invokes an editor to edit the current file being viewed. The editor is taken from the environment variable VISUAL if defined, or EDITOR if VISUAL is not defined, or defaults to "vi" if neither VISUAL nor EDITOR is defined. See also the discussion of LESSEDIT under the section on PROMPTS below.
! shell-command
Invokes a shell to run the shell-command given. A percent sign (%) in the command is replaced by the name of the current file. A pound sign (#) is replaced by the name of the previously examined file. "!!" repeats the last shell command. "!" with no shell command simply invokes a shell. On Unix systems, the shell is taken from the environment variable SHELL, or defaults to "sh". On MS-DOS and OS/2 systems, the shell is the normal command processor.
| <m> shell-command
<m> represents any mark letter. Pipes a section of the input file to the given shell command. The section of the file to be piped is between the first line on the current screen and the position marked by the letter. <m> may also be ^ or $ to indicate beginning or end of file respectively. If< m> is . or newline, the current screen is piped.
s filename
Save the input to a file. This only works if the input is a pipe, not an ordinary file.

 

head and tail

The head and tail commands are separate tools that work in essentially the same way. By default, the head filename command looks at the first 10 lines of a file; the tail filename command looks at the last 10 lines of a file. You can specify the number of lines shown with the -nxy switch. For example, the tail -n 15 /etc/passwd command lists the last 15 lines of the /etc/passwd file.

The command head has just two important options, which are easy to remeber: 

-c, --bytes=[-]K
print the first K bytes of each file; with the leading '-', print all but the last K bytes of each file
-n, --lines=[-]K
print the first K lines instead of the first 10; with the leading '-', print all but the last K lines of each file

 

The tail command can be especially useful for problems in progress. For example, if there’s an ongoing problem with failed login attempts, the following command monitors the noted file and displays new lines on the screen as new log entries are recorded. Tail implements -c and -n options similstly to head:

-c, --bytes=K
output the last K bytes; alternatively, use -c +K to output bytes starting with the Kth of each file
-n, --lines=K
output the last K lines, instead of the last 10; or use -n +K to output lines starting with the Kth

But it has more options, of which especially useful is -f (follow) which allow to view file that is changed dynamically like log file ( --pid=PID terminates tail if the process with PID dies):

-f, --follow[={name|descriptor}]
output appended data as the file grows; -f, --follow, and --follow=descriptor are equivalent
-F
same as --follow=name --retry
 
--max-unchanged-stats=N
with --follow=name, reopen a FILE which has not changed size after N (default 5) iterations to see if it has been unlinked or renamed (this is the usual case of rotated log files). With inotify, this option is rarely useful.
--pid=PID
with -f, terminate after process ID, PID dies
-q, --quiet, --silent
never output headers giving file names
--retry
keep trying to open a file even when it is or becomes inaccessible; useful when following by name, i.e., with --follow=name
-s, --sleep-interval=N
with -f, sleep for approximately N seconds (default 1.0) between iterations.
With inotify and --pid=P, check process P at
least once every N seconds.
-v, --verbose
always output headers giving file names
--help
display this help and exit
--version
output version information and exit

sort

You can sort the contents of a file in a number of ways. By default, the sort command sorts the contents in alphabetical order, depending on the first letter in each line. For example, the sort /etc/passwd command would sort all users (including those associated with specific services and such) by username.

You can specifies filed to soft and the order, as well as whether comparison is numeric.

cut

When working with text files, it can be useful to filter out specific fields. Imagine that you need to see a list of all users in the /etc/passwd file. In this file, several fields are defined, of which the first contains the name of the users who are defined. To filter out a specific field, the cut command is useful. To do this, use the -d option to specify the field delimiter followed by -f with the number of the specific field you want to filter out. So, the complete command is cut -d : -f 1 /etc/passwd if you want to filter out the first field of the /etc/passwd file. You can see the result in Listing 4.1.

Listing 4.1 Filtering Specific Fields with cut

Click here to view code image


[root@localhost ~]# cut -f 1 -d : /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
...

Counting Lines, Words, and Characters with wc

When working with text files, you sometimes get a large amount of output. Before deciding which approach works best in a specific case, you might want to have an idea about the amount of text you are dealing with. In that case, the wc command is useful. In its output, this command gives three different results: the number of lines, the number of words, and the number of characters.

Consider, for example, the ps aux command. When executed as root, this command gives a list of all processes running on a server. One solution to count how many processes there are exactly is to pipe the output of ps aux through wc, as in ps aux | wc. You can see the result of the command in Listing 4.3. In the result in Listing 4.3, you can see that the total number of lines is 90 and that there are 1,045 words and 7,583 characters in the command output.

Listing 4.3 Counting the Number of Lines, Words, and Characters with wc

 

 

Old News ;-)

Recommended Links

Google matched content

Softpanorama Recommended

https://www.certdepot.net/

Red Hat Certified System Administrator (RHCSA EX200) – Study Guide

...



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least


Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

 

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: October 11, 2018