Softpanorama
May the source be with you, but remember the KISS principle ;-)

Contents Bulletin Scripting in shell and Perl Network troubleshooting History Humor

AWK Programming

News Unix Tools Recommended Books Recommended Links AWK One Liners

mini-tutorial

Tutorials Reference and FAQ
AWK Regular Expression AWK Scripts Implementations Regular expressions Pipes Chapter 11 The awk Programming Language Gawk Effective AWK programming AWK Language Programming
JAWK MAWK Tips Perl one liners  Shell Giants Admin Horror Stories Humor Etc


awk /awk/ 1. n. [UNIX techspeak] An interpreted language for massaging text data developed by Alfred Aho, Peter Weinberger, and Brian Kernighan (the name derives from their initials). It is characterized by C-like syntax, a declaration-free approach to variable typing and declarations, associative arrays, and field-oriented text processing. See also {Perl}. 2. n. Editing term for an expression awkward to manipulate through normal {regexp} facilities (for example, one containing a {newline}). 3. vt. To process data using `awk(1)'.

Jargon File

AWK  is a simple and elegant pattern scanning and processing language. I would call it the first and last simple scripting language. AWK is also the most portable scripting language in existence. It's the precursor and the main inspiration of Perl. Although originated in Unix it is available and widely used in Windows environment too.

It was created in late 70th of the last century almost simultaneously with Borne shell. The name was composed from the initial letters of three original authors Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. The team was more talented then Stephen Bourne and produced higher quality product. Unfortunately it was never well integrated into the shell. It is commonly used as a command-line filter in pipes to reformat the output of other commands.

AWK takes two inputs: data file and command file. The command file can be absent and necessary commands can be passed as augments. As Ronald P. Loui aptly noted awk is very underappreciated language:

Most people are surprised when I tell them what language we use in our undergraduate AI programming class. That's understandable. We use GAWK. GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old pattern scanning language isn't even viewed as a programming language by most people. Like PERL and TCL, most prefer to view it as a "scripting language." It has no objects; it is not functional; it does no built-in logic programming. Their surprise turns to puzzlement when I confide that (a) while the students are allowed to use any language they want; (b) with a single exception, the best work consistently results from those working in GAWK. (footnote: The exception was a PASCAL programmer who is now an NSF graduate fellow getting a Ph.D. in mathematics at Harvard.) Programmers in C, C++, and LISP haven't even been close (we have not seen work in PROLOG or JAVA).

The main advantage of AWK is that unlike Perl and other "scripting monsters" that it is very slim without feature creep so characteristic of Perl and thus it can be very efficiently used with pipes. Also it has rather simple, clean syntax and like much heavier TCL can be used with C for "dual-language" implementations.

Generally Perl might be better for really complex tasks, but this is not always the case. In reality AWK much better integrates with Unix shell and until probably in 2004 for simple scripts there was no noticeable difference in speed due to the additional time to load and initialize huge Perl interpreter (but Perl 5 still grows and is now looks slim for a typical PC with dual core 3GHz CPU and 2GB of RAM or server, which typically has at least four core CPU and 6GB or more of RAM).

Unfortunately, Larry Wall then decided to throwing in the kitchen sink, and as a side effect sacrificed the simplicity and orthogonally. I would agree that Perl added some nice things, but it probably added too much nice things :-). Perl4 can probably be used as AWK++ but it's not that portable or universally supported. Like I mentioned above, AWK is the most portable scripting language in existence. 

IMHO the original book that describes AWK ( Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger The Awk Programming Language, Addison-Wesley, 1988.) can serve as an excellent introduction into scripting. One chapter is available free Chapter 11 The awk Programming Language

AWK has a unique blend of simplicity and power that is especially attractive for novices, who do not have to spend days and weeks learning all those intricacies of Perl before they become productive. In awk you can became productive in several hours. For instance, to print only the second and sixth fields of the date command--the month and year--with a space separating them, use:

  date | awk '{print $2 " " $6}'

The GNU Project produced the most popular version of awk, gawk. gawk has precompiled binaries for MS-DOS and Win32. It has some interesting and useful enhancement. File can be read under control of powerful getline function.  Unlike other implementation GNU AWL contains the dgawk debugger is purposely modeled after GDB.  GNU AWK 4.0 and higher has  "--sandbox" option disables the call of system() and write access to the file system.

The question arise why to use AWK if Perl is widely available and includes its as a subset. I would like to reproduce here the answer given in the newsgroup comp.lang.awk.

9. Why would anyone still use awk instead of perl?

...a valid question, since awk is a subset of perl (functionally, not necessarily syntactically); also, the authors of perl have usually known awk (and sed, and C, and a host of other Unix tools) very well, and still decided to move on.

...there are some things that perl has built-in support for that almost no version of awk can do without great difficulty (if at all); if you need to do these things, there may be no choice to make. for instance, no reasonable person would try to write a web server in awk instead of using perl or even C, if the actual socket programming has to be written in awk. keep in mind that gawk 3.1.0's /inet and ftwalk's built-in networking primitives should help this situation.

however, there are some things in awk's favor compared to perl:

Here is a nice into to awk from gawk manual (Getting Started with awk):

The basic function of awk  is to search files for lines (or other units of text) that contain certain patterns. When a line matches one of the patterns, awk  performs specified actions on that line. awk  keeps processing input lines in this way until it reaches the end of the input files.

Programs in awk  are different from programs in most other languages, because awk  programs are data-driven; that is, you describe the data you want to work with and then what to do when you find it. Most other languages are procedural; you have to describe, in great detail, every step the program is to take. When working with procedural languages, it is usually much harder to clearly describe the data your program will process. For this reason, awk  programs are often refreshingly easy to read and write.

When you run awk, you specify an awk  program that tells awk  what to do. The program consists of a series of rules. (It may also contain function definitions, an advanced feature that we will ignore for now. See User-defined.) Each rule specifies one pattern to search for and one action to perform upon finding the pattern.

Syntactically, a rule consists of a pattern followed by an action. The action is enclosed in curly braces to separate it from the pattern. Newlines usually separate rules. Therefore, an awk  program looks like this:

pattern { action }
     pattern { action }
     ...

1.1 How to Run awk  Programs

There are several ways to run an awk  program. If the program is short, it is easiest to include it in the command that runs awk, like this:

awk 'program' input-file1 input-file2 ...

When the program is long, it is usually more convenient to put it in a file and run it with a command like this:

awk -f program-file input-file1 input-file2 ...

This section discusses both mechanisms, along with several variations of each.

1.1.1 One-Shot Throwaway awk  Programs

Once you are familiar with awk, you will often type in simple programs the moment you want to use them. Then you can write the program as the first argument of the awk  command, like this:

     awk 'program' input-file1 input-file2 ...

where program consists of a series of patterns and actions, as described earlier.

This command format instructs the shell, or command interpreter, to start awk  and use the program to process records in the input file(s). There are single quotes around program so the shell won't interpret any awk  characters as special shell characters. The quotes also cause the shell to treat all of program as a single argument for awk, and allow program to be more than one line long.

This format is also useful for running short or medium-sized awk  programs from shell scripts, because it avoids the need for a separate file for the awk  program. A self-contained shell script is more reliable because there are no other files to misplace.

Very Simple, later in this chapter, presents several short, self-contained programs.

1.1.2 Running awk  Without Input Files

You can also run awk  without any input files. If you type the following command line:

awk 'program'

awk  applies the program to the standard input, which usually means whatever you type on the terminal. This continues until you indicate end-of-file by typing Ctrl-d. (On other operating systems, the end-of-file character may be different. For example, on OS/2 and MS-DOS, it is Ctrl-z.)

As an example, the following program prints a friendly piece of advice (from Douglas Adams's The Hitchhiker's Guide to the Galaxy), to keep you from worrying about the complexities of computer programming (BEGIN  is a feature we haven't discussed yet):

awk "BEGIN { print \"Don't Panic!\" }" 

This program does not read any input. The `\' before each of the inner double quotes is necessary because of the shell's quoting rules—in particular because it mixes both single quotes and double quotes.

This next simple awk  program emulates the cat  utility; it copies whatever you type on the keyboard to its standard output (why this works is explained shortly).

     $ awk '{ print }'
     Now is the time for all good men
     -| Now is the time for all good men
     to come to the aid of their country.
     -| to come to the aid of their country.
     Four score and seven years ago, ...
     -| Four score and seven years ago, ...
     What, me worry?
     -| What, me worry?
     Ctrl-d

1.1.3 Running Long Programs

Sometimes your awk  programs can be very long. In this case, it is more convenient to put the program into a separate file. In order to tell awk  to use that file for its program, you type:

awk -f source-file input-file1 input-file2 ...

The -f  instructs the awk  utility to get the awk  program from the file source-file. Any file name can be used for source-file. For example, you could put the program:

BEGIN { print "Don't Panic!" }

into the file advice. Then this command:

awk -f advice

does the same thing as this one:

awk "BEGIN { print \"Don't Panic!\" }"

This was explained earlier (see Read Terminal). Note that you don't usually need single quotes around the file name that you specify with -f, because most file names don't contain any of the shell's special characters. Notice that in advice, the awk  program did not have single quotes around it. The quotes are only needed for programs that are provided on the awk  command line.

If you want to identify your awk  program files clearly as such, you can add the extension .awk  to the file name. This doesn't affect the execution of the awk  program but it does make "housekeeping" easier.

1.1.4 Executable awk  Programs

Once you have learned awk, you may want to write self-contained awk  scripts, using the `#!' script mechanism. You can do this on many Unix systems7 as well as on the GNU system. For example, you could update the file advice  to look like this:

#! /bin/awk -f
     
     BEGIN { print "Don't Panic!" }

After making this file executable (with the chmod  utility), simply type `advice' at the shell and the system arranges to run awk  as if you had typed `awk -f advice':

chmod +x advice
     $ advice
     -| Don't Panic!

(We assume you have the current directory in your shell's search path variable (typically $PATH). If not, you may need to type `./advice' at the shell.)

Self-contained awk  scripts are useful when you want to write a program that users can invoke without their having to know that the program is written in awk.

Advanced Notes: Portability Issues with `#!'

Some systems limit the length of the interpreter name to 32 characters. Often, this can be dealt with by using a symbolic link.

You should not put more than one argument on the `#!' line after the path to awk. It does not work. The operating system treats the rest of the line as a single argument and passes it to awk. Doing this leads to confusing behavior—most likely a usage diagnostic of some sort from awk.

Finally, the value of ARGV[0]  (see Built-in Variables) varies depending upon your operating system. Some systems put `awk' there, some put the full pathname of awk  (such as /bin/awk), and some put the name of your script (`advice'). Don't rely on the value of ARGV[0]  to provide your script name.

1.1.5 Comments in awk  Programs

A comment is some text that is included in a program for the sake of human readers; it is not really an executable part of the program. Comments can explain what the program does and how it works. Nearly all programming languages have provisions for comments, as programs are typically hard to understand without them.

In the awk  language, a comment starts with the sharp sign character (`#') and continues to the end of the line. The `#' does not have to be the first character on the line. The awk  language ignores the rest of a line following a sharp sign. For example, we could have put the following into advice:

# This program prints a nice friendly message.  It helps
     # keep novice users from being afraid of the computer.
     BEGIN    { print "Don't Panic!" }

You can put comment lines into keyboard-composed throwaway awk  programs, but this usually isn't very useful; the purpose of a comment is to help you or another person understand the program when reading it at a later time.

Caution: As mentioned in One-shot, you can enclose small to medium programs in single quotes, in order to keep your shell scripts self-contained. When doing so, don't put an apostrophe (i.e., a single quote) into a comment (or anywhere else in your program). The shell interprets the quote as the closing quote for the entire program. As a result, usually the shell prints a message about mismatched quotes, and if awk  actually runs, it will probably print strange messages about syntax errors. For example, look at the following:

awk '{ print "hello" } # let's be cute'
     >

The shell sees that the first two quotes match, and that a new quoted object begins at the end of the command line. It therefore prompts with the secondary prompt, waiting for more input. With Unix awk, closing the quoted string produces this result:

awk '{ print "hello" } # let's be cute'
     > '
     error--> awk: can't open file be
     error-->  source line number 1

Putting a backslash before the single quote in `let's' wouldn't help, since backslashes are not special inside single quotes. The next subsection describes the shell's quoting rules.

1.1.6 Shell-Quoting Issues

For short to medium length awk  programs, it is most convenient to enter the program on the awk  command line. This is best done by enclosing the entire program in single quotes. This is true whether you are entering the program interactively at the shell prompt, or writing it as part of a larger shell script:

awk 'program text' input-file1 input-file2 ...

Once you are working with the shell, it is helpful to have a basic knowledge of shell quoting rules. The following rules apply only to POSIX-compliant, Bourne-style shells (such as bash, the GNU Bourne-Again Shell). If you use csh, you're on your own.

Mixing single and double quotes is difficult. You have to resort to shell quoting tricks, like this:

awk 'BEGIN { print "Here is a single quote <'"'"'>" }'
     -| Here is a single quote <'>

This program consists of three concatenated quoted strings. The first and the third are single-quoted, the second is double-quoted.

This can be "simplified" to:

awk 'BEGIN { print "Here is a single quote <'\''>" }'
     -| Here is a single quote <'>

Judge for yourself which of these two is the more readable.

Another option is to use double quotes, escaping the embedded, awk-level double quotes:

awk "BEGIN { print \"Here is a single quote <'>\" }"
     -| Here is a single quote <'>

This option is also painful, because double quotes, backslashes, and dollar signs are very common in awk  programs.

A third option is to use the octal escape sequence equivalents for the single- and double-quote characters, like so:

awk 'BEGIN { print "Here is a single quote <\47>" }'
     -| Here is a single quote <'>
     $ awk 'BEGIN { print "Here is a double quote <\42>" }'
     -| Here is a double quote <">

This works nicely, except that you should comment clearly what the escapes mean.

A fourth option is to use command-line variable assignment, like this:

awk -v sq="'" 'BEGIN { print "Here is a single quote <" sq ">" }'
     -| Here is a single quote <'>

If you really need both single and double quotes in your awk  program, it is probably best to move it into a separate file, where the shell won't be part of the picture, and you can say what you mean.

1.2 Data Files for the Examples

Many of the examples in this Web page take their input from two sample data files. The first, BBS-list, represents a list of computer bulletin board systems together with information about those systems. The second data file, called inventory-shipped, contains information about monthly shipments. In both files, each line is considered to be one record.

In the data file BBS-list, each record contains the name of a computer bulletin board, its phone number, the board's baud rate(s), and a code for the number of hours it is operational. An `A' in the last column means the board operates 24 hours a day. A `B' in the last column means the board only operates on evening and weekend hours. A `C' means the board operates only on weekends:

aardvark 555-5553 1200/300 B
alpo-net 555-3412 2400/1200/300 A
barfly 555-7685 1200/300 A
bites 555-1675 2400/1200/300 A
camelot 555-0542 300 C
core 555-2912 1200/300 C
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sdace 555-3430 2400/1200/300 A
sabafoo 555-2127 1200/300 C


The data file inventory-shipped  represents information about shipments during the year. Each record contains the month, the number of green crates shipped, the number of red boxes shipped, the number of orange bags shipped, and the number of blue packages shipped, respectively. There are 16 entries, covering the 12 months of last year and the first four months of the current year.
Jan 13 25 15 115
Feb 15 32 24 226
Mar 15 24 34 228
Apr 31 52 63 420
May 16 34 29 208
Jun 31 42 75 492
Jul 24 34 67 436
Aug 15 34 47 316
Sep 13 55 37 277
Oct 29 54 68 525
Nov 20 87 82 577
Dec 17 35 61 401

Jan 21 36 64 620
Feb 26 58 80 652
Mar 24 75 70 495
Apr 21 70 74 514

1.3 Some Simple Examples

The following command runs a simple awk  program that searches the input file BBS-list  for the character string `foo' (a grouping of characters is usually called a string; the term string is based on similar usage in English, such as "a string of pearls," or "a string of cars in a train"):

awk '/foo/ { print $0 }' BBS-list

When lines containing `foo' are found, they are printed because `print $0' means print the current line. (Just `print' by itself means the same thing, so we could have written that instead.)

You will notice that slashes (`/') surround the string `foo' in the awk  program. The slashes indicate that `foo' is the pattern to search for. This type of pattern is called a regular expression, which is covered in more detail later (see Regexp). The pattern is allowed to match parts of words. There are single quotes around the awk  program so that the shell won't interpret any of it as special shell characters.

Here is what this program prints:

awk '/foo/ { print $0 }' BBS-list
     -| fooey        555-1234     2400/1200/300     B
     -| foot         555-6699     1200/300          B
     -| macfoo       555-6480     1200/300          A
     -| sabafoo      555-2127     1200/300          C

In an awk  rule, either the pattern or the action can be omitted, but not both. If the pattern is omitted, then the action is performed for every input line. If the action is omitted, the default action is to print all lines that match the pattern.

Thus, we could leave out the action (the print  statement and the curly braces) in the previous example and the result would be the same: all lines matching the pattern `foo' are printed. By comparison, omitting the print  statement but retaining the curly braces makes an empty action that does nothing (i.e., no lines are printed).

Many practical awk  programs are just a line or two. Following is a collection of useful, short programs to get you started. Some of these programs contain constructs that haven't been covered yet. (The description of the program will give you a good idea of what is going on, but please read the rest of the Web page to become an awk  expert!) Most of the examples use a data file named data. This is just a placeholder; if you use these programs yourself, substitute your own file names for data. For future reference, note that there is often more than one way to do things in awk. At some point, you may want to look back at these examples and see if you can come up with different ways to do the same things shown here:

Dr. Nikolai Bezroukov


Top updates

Bulletin Latest Past week Past month
Google Search


NEWS CONTENTS

Old News ;-)

25 Best AWK Commands - Tricks

UrFix's Blog

stuck on OS X, so here’s a Perl version for the Mac:

curl -u username:password --silent "https://mail.google.com/mail/feed/atom" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | perl -pe 's/^<title>(.*)<\/title>.*<name>(.*)<\/name>.*$/$2 - $1/'

If you want to see the name of the last person, who added a message to the conversation, change the greediness of the operators like this:

curl -u username:password --silent "https://mail.google.com/mail/feed/atom" | tr -d '\n' | awk -F '<entry>' '{for (i=2; i<=NF; i++) {print $i}}' | perl -pe 's/^<title>(.*)<\/title>.*?<name>(.*?)<\/name>.*$/$2 - $1/'

5) Remove duplicate entries in a file without sorting.

awk ‘!x[$0]++’ <file>

Using awk, find duplicates in a file without sorting, which reorders the contents. awk will not reorder them, and still find and remove duplicates which you can then redirect into another file.

6) find geographical location of an ip address

lynx -dump http://www.ip-adress.com/ip_tracer/?QRY=$1|grep address|egrep ‘city|state|country’|awk ‘{print $3,$4,$5,$6,$7,$8}’|sed ‘s\ip address flag \\’|sed ‘s\My\\’

I save this to bin/iptrace and run "iptrace ipaddress" to get the Country, City and State of an ip address using the http://ipadress.com service.

I add the following to my script to get a tinyurl of the map as well:

URL=`lynx -dump http://www.ip-adress.com/ip_tracer/?QRY=$1|grep details|awk ‘{print $2}’`

lynx -dump http://tinyurl.com/create.php?url=$URL|grep tinyurl|grep "19. http"|awk ‘{print $2}’

7) Block known dirty hosts from reaching your machine
wget -qO " http://infiltrated.net/blacklisted|awk ‘!/#|[a-z]/&&/./{print "iptables -A INPUT 
	-s "$1″ -j DROP"}’

Blacklisted is a compiled list of all known dirty hosts (botnets, spammers, bruteforcers, etc.) which is updated on an hourly basis. This command will get the list and create the rules for you, if you want them automatically blocked, append |sh to the end of the command line. It’s a more practical solution to block all and allow in specifics however, there are many who don’t or can’t do this which is where this script will come in handy. For those using ipfw, a quick fix would be {print "add deny ip from "$1″ to any}. Posted in the sample output are the top two entries. Be advised the blacklisted file itself filters out RFC1918 addresses (10.x.x.x, 172.16-31.x.x, 192.168.x.x) however, it is advisable you check/parse the list before you implement the rules

8) Display a list of committers sorted by the frequency of commits
svn log -q|grep "|"|awk "{print \$3}"|sort|uniq -c|sort -nr

Use this command to find out a list of committers sorted by the frequency of commits.

9) List the number and type of active network connections
netstat -ant | awk ‘{print $NF}’ | grep -v ‘[a-z]‘ | sort | uniq -c
10) View facebook friend list [hidden or not hidden]

lynx -useragent=Opera -dump ‘http://www.facebook.com/ajax/typeahead_friends.php?u=4&__a=1" |gawk -F’\"t\":\"‘ -v RS=’\",’ ‘RT{print $NF}’ |grep -v ‘\"n\":\"‘ |cut -d, -f2

There’s no need to be logged in facebook. I could do more JSON filtering but you get the idea…

Replace u=4 (Mark Zuckerberg, Facebook creator) with desired uid.

Hidden or not hidden… Scary, don’t you?

11) List recorded formular fields of Firefox

cd ~/.mozilla/firefox/ && sqlite3 `cat profiles.ini | grep Path | awk -F= ‘{print $2}’`/formhistory.sqlite "select * from moz_formhistory" && cd " > /dev/null

When you fill a formular with Firefox, you see things you entered in previous formulars with same field names. This command list everything Firefox has registered. Using a "delete from", you can remove anoying Google queries, for example ;-)

12) Brute force discover

sudo zcat /var/log/auth.log.*.gz | awk ‘/Failed password/&&!/for invalid user/{a[$9]++}/Failed password for invalid user/{a["*" $11]++}END{for (i in a) printf "%6s\t%s\n", a[i], i|"sort -n"}’

Show the number of failed tries of login per account. If the user does not exist it is marked with *.

13) Show biggest files/directories, biggest first with ‘k,m,g’ eyecandy
du "max-depth=1 | sort -r -n | awk ‘{split("k m g",v); s=1; while($1>1024){$1/=1024; s++} 
	print int($1)" "v[s]"\t"$2}’

I use this on debian testing, works like the other sorted du variants, but i like small numbers and suffixes :)

14) Analyse an Apache access log for the most common IP addresses
tail -10000 access_log | awk ‘{print $1}’ | sort | uniq -c | sort -n | tail

This uses awk to grab the IP address from each request and then sorts and summarises the top 10

15) copy working directory and compress it on-the-fly while showing progress
tar -cf " . | pv -s $(du -sb . | awk ‘{print $1}’) | gzip > out.tgz

What happens here is we tell tar to create "-c" an archive of all files in current dir "." (recursively) and output the data to stdout "-f -". Next we specify the size "-s" to pv of all files in current dir. The "du -sb . | awk ?{print $1}?" returns number of bytes in current dir, and it gets fed as "-s" parameter to pv. Next we gzip the whole content and output the result to out.tgz file. This way "pv" knows how much data is still left to be processed and shows us that it will take yet another 4 mins 49 secs to finish.

Credit: Peteris Krumins http://www.catonmat.net/blog/unix-utilities-pipe-viewer/

16) List of commands you use most often
history | awk ‘{print $2}’ | sort | uniq -c | sort -rn | head
17) Identify long lines in a file
awk ‘length>72" file

This command displays a list of lines that are longer than 72 characters. I use this command to identify those lines in my scripts and cut them short the way I like it.

18) Makes you look busy

alias busy=’my_file=$(find /usr/include -type f | sort -R | head -n 1); my_len=$(wc -l $my_file | awk "{print $1}"); let "r = $RANDOM % $my_len" 2>/dev/null; vim +$r $my_file’

This makes an alias for a command named ‘busy’.

The ‘busy’ command opens a random file in /usr/include to a random line with vim. Drop this in your .bash_aliases and make sure that file is initialized in your .bashrc.

19) Show me a histogram of the busiest minutes in a log file:

cat /var/log/secure.log | awk ‘{print substr($0,0,12)}’ | uniq -c | sort -nr | awk ‘{printf("\n%s ",$0) ; for (i = 0; i<$1 ; i++) {printf("*")};}’

20) Analyze awk fields

awk ‘{print NR": "$0; for(i=1;i<=NF;++i)print "\t"i": "$i}’

Breaks down and numbers each line and it’s fields. This is really useful when you are going to parse something with awk but aren’t sure exactly where to start.

21) Browse system RAM in a human readable form
sudo cat /proc/kcore | strings | awk "length > 20" | less

This command lets you see and scroll through all of the strings that are stored in the RAM at any given time. Press space bar to scroll through to see more pages (or use the arrow keys etc).

Sometimes if you don’t save that file that you were working on or want to get back something you closed it can be found floating around in here!

The awk command only shows lines that are longer than 20 characters (to avoid seeing lots of junk that probably isn’t "human readable").

If you want to dump the whole thing to a file replace the final ‘| less’ with ‘> memorydump’. This is great for searching through many times (and with the added bonus that it doesn’t overwrite any memory…).

Here’s a neat example to show up conversations that were had in pidgin (will probably work after it has been closed)…

sudo cat /proc/kcore | strings | grep '([0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\})'

(depending on sudo settings it might be best to run

sudo sufirst to get to a # prompt)

22) Monitor open connections for httpd including listen, count and sort it per IP
watch "netstat -plan|grep :80|awk {‘print \$5"} | cut -d: -f 1 | sort | uniq -c | sort -nk 1″

It’s not my code, but I found it useful to know how many open connections per request I have on a machine to debug connections without opening another http connection for it.

You can also decide to sort things out differently then the way it appears in here.

23) Purge configuration files of removed packages on debian based systems
sudo aptitude purge `dpkg "get-selections | grep deinstall | awk ‘{print $1}’`

Purge all configuration files of removed packages

24) Quick glance at who’s been using your system recently
last  | grep -v "^$" | awk ‘{ print $1 }’ | sort -nr | uniq -c

This command takes the output of the ‘last’ command, removes empty lines, gets just the first field ($USERNAME), sort the $USERNAMES in reverse order and then gives a summary count of unique matches.

25) Number of open connections per ip.
netstat -ntu | awk ‘{print $5}’ | cut -d: -f1 | sort | uniq -c | sort -n

Here is a command line to run on your server if you think your server is under attack. It prints our a list of open connections to your server and sorts them by amount.

[Jul 05, 2011] GNU awk 4.0

Jul 05, 2011 | freshmeat.net

Changes: New options were added. All long options now have corresponding short options. The "--sandbox" option disables the call of system() and write access to the file system. The POSIX... 2008 behavior for "sub" and "gsub" is now the default, bringing a change from the previous behavior. Regular expressions were enhanced. Many further enhancements as well as bugfixes and code cleanups were made

[Jan 28, 2011] runawk by Aleksey Cheusov

runawk is a small wrapper for the AWK interpreter that helps one write standalone AWK scripts. Its main feature is to provide a module/library system for AWK which is somewhat similar to Perl's "use" command. It also allows one to select a preferred AWK interpreter and to set up the environment for AWK scripts. Dozens of ready for use [modules].awk are also provided.

The GNU Awk User's Guide Cut Program

The awk  implementation of cut  uses the getopt  library function (see section Processing Command Line Options), and the join  library function (see section Merging an Array Into a String).

The program begins with a comment describing the options and a usage  function which prints out a usage message and exits. usage  is called if invalid arguments are supplied.

# cut.awk --- implement cut in awk
# Arnold Robbins, arnold@gnu.org, Public Domain
# May 1993

# Options:
#    -f list        Cut fields
#    -d c           Field delimiter character
#    -c list        Cut characters
#
#    -s        Suppress lines without the delimiter character

function usage(    e1, e2)
{
    e1 = "usage: cut [-f list] [-d c] [-s] [files...]"
    e2 = "usage: cut [-c list] [files...]"
    print e1 > "/dev/stderr"
    print e2 > "/dev/stderr"
    exit 1
}

The variables e1  and e2  are used so that the function fits nicely on the screen.

Next comes a BEGIN  rule that parses the command line options. It sets FS  to a single tab character, since that is cut's default field separator. The output field separator is also set to be the same as the input field separator. Then getopt  is used to step through the command line options. One or the other of the variables by_fields  or by_chars  is set to true, to indicate that processing should be done by fields or by characters respectively. When cutting by characters, the output field separator is set to the null string.

BEGIN    \
{
    FS = "\t"    # default
    OFS = FS
    while ((c = getopt(ARGC, ARGV, "sf:c:d:")) != -1) {
        if (c == "f") {
            by_fields = 1
            fieldlist = Optarg
        } else if (c == "c") {
            by_chars = 1
            fieldlist = Optarg
            OFS = ""
        } else if (c == "d") {
            if (length(Optarg) > 1) {
                printf("Using first character of %s" \
                " for delimiter\n", Optarg) > "/dev/stderr"
                Optarg = substr(Optarg, 1, 1)
            }
            FS = Optarg
            OFS = FS
            if (FS == " ")    # defeat awk semantics
                FS = "[ ]"
        } else if (c == "s")
            suppress++
        else
            usage()
    }

    for (i = 1; i < Optind; i++)
        ARGV[i] = ""

Sectioning data with awk

Oct 30, 2005 | Blog O’ Matty

I was working on a shell script last week and wanted to grab just the CPU section from the Solaris prtdiag(1m) output. I was able to perform this operation with awk by checking $0 for one or more "=" characters, and then setting a variable named SECTION to the value contained in the second position variable. If this variable was equal to the string CPUs, all subsequent lines would be printed up until the next block of "=" characters were detected. The awk script looked similar to the following:

$ prtdiag -v | awk ‘ $1 ~ /^\=+$/ {SECTION=$2} { if (SECTION == "CPUs") print }’

==================================== CPUs ====================================
               E$          CPU                  CPU
CPU  Freq      Size        Implementation       Mask    Status      Location
---  --------  ----------  -------------------  -----   ------      --------
  0   502 MHz  256KB       SUNW,UltraSPARC-IIe   1.4    on-line     +-board/cpu0

I really dig awk!

Posted by matty, filed under UNIX Shell. Date: October 29, 2005, 8:01 pm | No Comments

[Jul 03, 2008] Common threads Awk by example, Part 1

03 Jul 2008 | www.ibm.com/developerworks

Conditional statements

Awk also offers very nice C-like if statements. If you'd like, you could rewrite the previous script using an if  statement:

{ 
  if ( $5 ~ /root/ ) { 
          print $3 
  } 
} 
Both scripts function identically. In the first example, the boolean expression is placed outside the block, while in the second example, the block is executed for every input line, and we selectively perform the print command by using an if  statement. Both methods are available, and you can choose the one that best meshes with the other parts of your script.

Here's a more complicated example of an awk if  statement. As you can see, even with complex, nested conditionals, if  statements look identical to their C counterparts:

{ 
  if ( $1 == "foo" ) { 
           if ( $2 == "foo" ) { 
                    print "uno" 
           } else { 
                    print "one" 
           } 
  } else if ($1 == "bar" ) { 
           print "two" 
  } else { 
           print "three" 
  } 
} 
Using if statements, we can also transform this code:
! /matchme/ { print $1 $3 $4 }
to this:
{ 
  if ( $0 !~ /matchme/ ) { 
          print $1 $3 $4 
  } 
}
Both scripts will output only those lines that don't contain a matchme  character sequence. Again, you can choose the method that works best for your code. They both do the same thing.

Awk also allows the use of boolean operators "||" (for "logical or") and "&&"(for "logical and") to allow the creation of more complex boolean expressions:

( $1 == "foo" ) && ( $2 == "bar" ) { print } 
This example will print only those lines where field one equals foo  and field two equals bar.

Averaging a Column of Numbers By S. Lee Henry

This week, we're going to look at a technique for adding a column of numbers. This particular technique requires that the column line up vertically, as we're going to strip off the leftmost part of each line in the file using the cut command. The sample script examines only those lines that contain a particular tag, which enables us to easily ignore lines not containing the numbers we seek and process only those containing numeric data. Assume in this example that the numbers in question are timing measurements (the "ms:" tag indicates that the numbers are in milliseconds). The script isolates the tag and the columnar position of the data to be averaged at the top of the script, making these values obvious and easy to modify.

#!/bin/sh
#
# Compute the average of specified column in a file
TAG = "ms:"
COL = 29
if [ "$1" = "" ]; then
   echo "Usage: $0 <filename>"
   exit
else
   INFILE=$1
fi
grep $TAG $INFILE | cut -c$COL- | \
awk '{sum += $1;total += 1;printf"avg = %.2f\n", sum/total}' | \
tail -1

The file's name containing the data to be averaged needs to be supplied as an argument when the script is run; otherwise, a usage statement is issued, and the script exits. The usage statement includes $0 so that it reflects whatever name the user decides to give the script. I call mine addcol.

boson% ./addcol
Usage: ./addcol <filename>
boson%
A sample data file for this script might look like this:
date: 06/11/01 12:11:00 ms: 117
measurement from boson.particle.net
date: 06/11/01 12:21:00 ms: 132
measurement from boson.particle.net
date: 06/11/01 12:31:00 ms: 109
measurement from boson.particle.net
date: 06/11/01 12:41:00 ms: 121
measurement from boson.particle.net
This data file contains a time measurement taken once every ten minutes
and a comment.
The grep command reduces this to:
date: 06/11/01 12:11:00 ms: 117
date: 06/11/01 12:21:00 ms: 132
date: 06/11/01 12:31:00 ms: 109
date: 06/11/01 12:41:00 ms: 121
The cut command further reduces it to:
117
132
109
121

By the time data is passed to the awk command, only a list of numbers remains of the original data. The awk command sees each of these numbers, therefore, as $1 and computes a sum and an average (i.e., su/total) for each data point. This data is then passed to the tail command, so that only the final computation appears on the user's screen.

You would use a different approach for numbers that don't appear in the same columnar position or for those that include a decimal.

[Dec 20, 2006] SHELLdorado - Good Shell Coding Practices

Neat trick with concatenating constants when only few of them used in inline AWK program. Note the line substfile = "'"$SubstFile"'" in which first single quote closes previous signle-quote constant. Then we insert he value of the variable using double quoted constant and then open another single quoted constant to continue the inline program.

nawk '
BEGIN {
# Read the whole substitution file
# into the array tab[].
# Format of the substitution file:
# oldword newword
substfile = "'"$SubstFile"'"
while ( getline < substfile ) {
tab [$1] = $2 # fill conversion table
}
close (substfile)
}
{
for ( i=1; i<=NF; i++ ) {
if ( tab [$i] != "" ) {
# substitute old word
$i = tab [$i]
}
}
print
}
' "$@"

an alterative and IMHO more attractive way of doing the same is to use "variable assignment" parameters:
Pseudo-files

AWK knows another way to assign values to AWK variables, like in the following example:

awk '{ print "var is", var }' var=TEST file1 

This statement assigns the value "TEST" to the AWK variable "var", and then reads the files "file1" and "file2". The assignment works, because AWK interprets each file name containing an equal sign ("=") as an assignment.

This example is very portable (even oawk understands this syntax), and easy to use. So why don't we use this syntax exclusively?

This syntax has two drawbacks: the variable assignment are interpreted by AWK the moment the file would have been read. At this time the assignment takes place. Since the BEGIN  action is performed before the first file is read, the variable is not available in the BEGIN action.

The second problem is, that the order of the variable assignments and of the files are important. In the following example

awk '{ print "var is", var }' file1 var=TEST file2

the variable var is not defined during the read of file1, but during the reading of file2. This may cause bugs that are hard to track down.

Squid Logfile Analysis SCALAR AWK script developed by Yuri N. Fominov.

SCALAR (Squid Cache Advanced Log Analyzer & Reporter) produces many detailed reports, such as:

most of reports are splitted on Requests, Traffic, Timeouts and Denies statistic.

SCALAR is highly customizable tool/script written on AWK - all setting can be defined inside script header. SCALAR developed by Yuri N. Fominov.

Sobell on the Bourne Again Shell and the Linux Command Line

LinuxPlanet

Here is the section of my book that talks about how to get gawk to communicate with a coprocess:

"Coprocess: Two-Way I/O

"A coprocess is a process that runs in parallel with another process. Starting with version 3.1, gawk can invoke a coprocess to exchange information directly with a background process. A coprocess can be useful when you are working in a client/server environment, setting up an SQL front end/back end, or exchanging data with a remote system over a network. The gawk syntax identifies a coprocess by preceding the name of the program that starts the background process with a |& operator.

"The coprocess command must be a filter (i.e., it reads from standard input and writes to standard output) and must flush its output whenever it has a complete line rather than accumulating lines for subsequent output. When a command is invoked as a coprocess, it is connected via a two-way pipe to a gawk program so that you can read from and write to the coprocess.

"When used alone the tr utility does not flush its output after each line. The to_upper shell script is a wrapper for tr that does flush its output; this filter can be run as a coprocess. For each line read, to_upper writes the line, translated to uppercase, to standard output. Remove the # before set -x if you want to_upper to display debugging output.

$ cat to_upper
#!/bin/bash
#set -x
while read arg
do
    echo "$arg" | tr '[a-z]' '[A-Z]'
done

$ echo abcdef | to_upper
ABCDEF

"The g6 program invokes to_upper as a coprocess. This gawk program reads standard input or a file specified on the command line, translates the input to uppercase, and writes the translated data to standard output.

$ cat g6
    {
    print $0 |& "to_upper"
    "to_upper" |& getline hold
    print hold
    }

$ gawk -f g6 < alpha
AAAAAAAAA
BBBBBBBBB
CCCCCCCCC
DDDDDDDDD

"The g6 program has one compound statement, enclosed within braces, comprising three statements. Because there is no pattern, gawk executes the compound statement once for each line of input.

"In the first statement, print $0 sends the current record to standard output. The |& operator redirects standard output to the program named to_upper, which is running as a coprocess. The quotation marks around the name of the program are required. The second statement redirects standard output from to_upper to a getline statement, which copies its standard input to the variable named hold. The third statement, print hold, sends the contents of the hold variable to standard output."

[Nov 30, 2004] AWK programming lesson 6

Sometimes, you just want to use awk as a formatter, and dump the output stright to the user. The following script takes a list of users as its argument, and uses awk to dump information about them out of /etc/passwd.

Note: observe where I unquote the awk expression, so that the shell does expansion of $1, rather than awk.

#!/bin/sh

while [ "$1" != "" ] ; do
	awk -F: '$1 == "'$1'" { print $1,$3} ' /etc/passwd
	shift
done

Sometimes you just want to use awk as a quick way to set a value for a variable. Using the passwd theme, we have a way to grab the shell for a user, and see if it is in the list of official shells.

Again, be aware of how I unquote the awk expression, so that the shell does expansion of $1, rather than awk.

#!/bin/sh

user="$1"
if [ "$user" ="" ] ; then echo ERROR: need a username ; exit ; fi

usershell=`awk -F: '$1 == "'$1'" { print $7} ' /etc/passwd`
grep -l $usershell /etc/shells
if [ $? -ne 0 ] ; then
	echo ERROR: shell $usershell for user $user not in /etc/shells
fi

Other alternatives:

# See "man regex"
usershell=`awk -F: '/^'$1':/ { print $7} ' /etc/passwd`

#Only modern awks take -v. You may have to use "nawk" or "gawk"
usershell=`awk -F: -v user=$1 '$1 == user { print $7} ' /etc/passwd`

The explanation of the extra methods above, is left as an exercise to the reader :-)

In a pipe-line

Sometimes, you just want to put awk in as a filter for data, either in a larger program, or just a quickie one-liner from your shell prompt. Here's a quickie to look at the "Referrer" field of weblogs, and see what sites link to your top page many different types of web browsers come to look at your site.

#!/bin/sh

grep -h ' /index.html' $* | awk -F\" '{print $4}' | sort -u

[Nov 18, 2004] The awk programming language

awk is a programming language that gets its name from the 3 people who invented it (Aho, Weinberger, and Kernighan). Because it was developed on a Unix operating system, its name is usually printed in lower-case ("awk") instead of capitalized ("Awk"). awk is distributed as free software, meaning that you don't have to pay anything for it and you can get the source code to build awk yourself .

It's not an "I can do anything" programming language like C++ or VisualBasic, although it can do a lot. awk excels at handling text and data files, the kind that are created in Notepad or (for example) HTML files. You wouldn't use awk to modify a Microsoft Word document or an Excel spreadsheet. However, if you take the Word document and Save As "Text Only" or if you take the Excel spreadsheet and Save As tab-delimited (*.txt) or comma-delimited (*.csv) output files, then awk could do a good job at handling them.

I like awk because it's concise. The shortest awk program that does anything useful is just 1 character:

awk 1 yourfile

On a DOS/Windows machine, this converts Unix line endings (LF) to standard DOS line endings (CR,LF). awk programs are often called "scripts" because they don't require an intermediate stage of compiling the progam into an executable form like an *.EXE file. In fact, awk programs are almost never compiled into *.EXE files (although I think it's possible to do this). Thus, many people refer to awk as a "scripting language" instead of a "programming language."

This doesn't mean that you couldn't run an awk program from an icon on the Windows desktop. It means that instead of creating a shortcut to something like "mywidget.exe", you'd create a shortcut to "awk -f mywidget.awk somefile.txt" when Windows prompts you for the Command Line.

Recommended Links

Softpanorama Top Visited

Softpanorama Recommended

Reference and FAQ

Scripts




Etc

Society

Groupthink : Understanding Micromanagers and Control Freaks : Toxic Managers : BureaucraciesHarvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Two Party System as Polyarchy : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

Skeptical Finance : John Kenneth Galbraith : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Oscar Wilde : Talleyrand : Somerset Maugham : War and Peace : Marcus Aurelius : Eric Hoffer : Kurt Vonnegut : Otto Von Bismarck : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Oscar Wilde : Bernard Shaw : Mark Twain Quotes

Bulletin:

Vol 26, No.1 (January, 2013) Object-Oriented Cult : Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks: The efficient markets hypothesis : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

 

The Last but not Least


Copyright © 1996-2014 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Site uses AdSense so you need to be aware of Google privacy policy. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine. This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to make a contribution, supporting hosting of this site with different providers to distribute and speed up access. Currently there are two functional mirrors: softpanorama.info (the fastest) and softpanorama.net.

Disclaimer:

The statements, views and opinions presented on this web page are those of the author and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.

Last modified: May 08, 2014