Softpanorama
(slightly skeptical) Open Source Software Educational Society

May the source be with you, but remember the KISS principle ;-)

Google   


Softpanorama University
Classic (pipable) Unix Tools

News

Recommended Books

Recommended Links

See Also

Reference

shell

Other Unix utilities

Solaris Selected
Manpages
regex pipes grep find awk sed ed mail pax
sort uniq tee join split  paste cat col xargs
exec head tail col fmt printenv man  fold fc
m4 expand
unexpand
dos2unix
unix2dos
ls  locate cd wc ngrep Netcat
tar dd diff gzip cut tr Beautifiers Humor Etc
 
There are many people who use UNIX or Linux but who IMHO do not understand UNIX. UNIX is not just an operating system, it is a way of doing things, and the shell plays a key role by providing the glue that makes it work. The UNIX methodology relies heavily on reuse of a set of tools rather than on building monolithic applications. Even perl programmers often miss the point, writing the heart and soul of the application as perl script without making use of the UNIX toolkit.

David Korn(bold italic is mine -- BNN)

"The tools we use have a profound (and devious!) influence on our thinking habits, and, therefore, on our thinking abilities."

-- Edsger Dijkstra

IMHO there are three Unix tools that can spell the difference between really good programmer or sysadmin and just above average one (even if the latter has solid knowledge of shell and Perl, knowledge of shell and Perl is necessary but not sufficient):

Unix has an impressive array of command line utilities that glues with pipes constitute a powerful file processing language. Shell serves as a glue to pipes and non-pipe utilities.  The main innovation of Unix was that these commands were able to communicate via pipes, the first component architecture in existence. The following commands are most often used in pipes:

Dr. Nikolai Bezroukov


Notes:
  • This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Some amount of grammar and spelling errors should be expected.
  • The site contain some broken links as it develops like a living tree... Please try to use Google, Open directory, etc. to find a replacement link (see HOWTO search the WEB for details). We would appreciate if you can mail us a correct link.
Google Search
Open directory

Research Index


Old News ;-)

UnixPower Tools excepts

eval: When You Need Another Chance
Ever want to use a variable to get a variable in a shell script or to construct a command on the fly? Find out how in this excerpt from Unix Power Tools, 2nd Edition.

Build Strings with { }
Save typing by expanding strings at the shell prompt. Learn hot to use the {} pattern-expansion characters in this excerpt from Unix Power Tools, 2nd Edition.

Handle Too-Long Command Lines with xargs
That command line getting too long? Conquer it with one of the tools that makes Unix "weird and wonderful" in this excerpt from Unix Power Tools, 2nd Edition.

xargs: Problems with Spaces and Newlines
Gnu's xargs patches up a sticky problem in the original - it choked on filenames with spaces or newlines. Find out how to take advantage of that patch in this excerpt from Unix Power Tools, 2nd Edition.

Using Standard Input and Output
A quick review of basic redirection techniques used by every Unix guru.

The () Subshell Operators
Learn why using parentheses to group commands is a useful shell trick in this modified excerpt from Unix Power Tools, 2nd Edition.

What Can You Do with an Empty File?
There are more uses for /dev/null than you might have imagined. Learn four of them in this excerpt from Unix Power Tools, 2nd Edition.

Copying Directory Trees with cp -r
Want to recursively copy everything under a given directory? Don't get caught by the gotcha's. Learn about cp -r in this excerpt from Unix Power Tools, 2nd Edition.

Copying Directory Trees with (tar | tar)
Not just for tape archives, tar can overcome several of the pitfalls of using cp -r. Find out how in this excerpt from Unix Power Tools, 2nd Edition.

Telling tar Which Files to Exclude or Include
Sometimes you don't want to tar just everything in a directory. Or maybe you want to include some subdirectories and exclude others. Find out how in this excerpt from Unix Power Tools, 2nd Edition.

Protecting Files with the Sticky Bit
Want to keep others from altering or deleting your files even if they have write permissions to your directory? Learn about the sticky bit in this excerpt from Unix Power Tools, 2nd Edition.

Checking Differences with diff
Quickly examine differences between similar files.

Comparing Three Different Versions with diff3
Got three similar files to compare? Use diff3!

Context diffs
Context diffs show the lines around changes in similar files.

ex Scripts Built by diff
diff can build automatic editing scripts you can use to change multiple files or to store a revision history.

Looking for Closure
A gawk script that can be used to make sure items that need to occur in pairs actually do so.

Change Many Files by Editing Just One
Use ed and diff to edit mulitple files.

patch: Generalized Updating of Files that Differ
There's an easy way to make changes based on diffs, use Larry Wall's patch utility.

Hacking on Characters with tr
Want to quickly strip special characters from a file or change a mac text file into a Unix text file? Learn how in this excerpt from Unix Power Tools, 2nd Edition.

Trapping Exits Caused by Interrupts
If your shell script is terminated prematurely it could get messy. Learn how to trap those unruly interrupts in this excerpt from Unix Power Tools, 2nd Edition.

Handling Command-Line Arguments with a for Loop
Need a shell script that can step through its command line arguments one by one? Read how to do it with a for loop in this excerpt from Unix Power Tools, 2nd Edition.

The exec Command
There is more than one use for exec, learn a couple of new ones in this excerpt from Unix Power Tools, 2nd Edition.

Standard Input to a for Loop
A for loop can be used to step through a list of arguments from standard input. Find out how in this excerpt from Unix Power Tools, 2nd Edition.

Making a for Loop with Multiple Variables
Got more than one variable you want to use in your for loop? Find out how in this excerpt from Unix Power Tools, 2nd Edition.

Sys Admin Magazine

Useful Solaris Commands

truss -c (Solaris >= 8): This astounding option to truss provides a profile summary of the command being trussed:

$ truss -c grep asdf work.doc
syscall              seconds   calls  errors
_exit                    .00       1
read                     .01      24
open                     .00       8      4
close                    .00       5
brk                      .00      15
stat                     .00       1
fstat                    .00       4
execve                   .00       1
mmap                     .00      10
munmap                   .01       3
memcntl                  .00       2
llseek                   .00       1
open64                   .00       1
                        ----     ---    ---
sys totals:              .02      76      4
usr time:                .00
elapsed:                 .05

It can also show profile data on a running process. In this case, the data shows what the process did between when truss was started and when truss execution was terminated with a control-c. It’s ideal for determining why a process is hung without having to wade through the pages of truss output.

truss -d and truss -D (Solaris >= 8): These truss options show the time associated with each system call being shown by truss and is excellent for finding performance problems in custom or commercial code. For example:

$ truss -d who
Base time stamp:  1035385727.3460  [ Wed Oct 23 11:08:47 EDT 2002 ]
 0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1
 0.0032 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0
 0.0037 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT
 0.0042 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT
 0.0047 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3
 0.0051 fstat(3, 0xFFBEF42C)                            = 0
. . .

truss -D is even more useful, showing the time delta between system calls:

 

Dilbert> truss -D who
 0.0000 execve(“/usr/bin/who”, 0xFFBEFD5C, 0xFFBEFD64)  argc = 1
 0.0028 stat(“/usr/bin/who”, 0xFFBEFA98)                = 0
 0.0005 open(“/var/ld/ld.config”, O_RDONLY)             Err#2 ENOENT
 0.0006 open(“/usr/local/lib/libc.so.1”, O_RDONLY)      Err#2 ENOENT
 0.0005 open(“/usr/lib/libc.so.1”, O_RDONLY)            = 3
 0.0004 fstat(3, 0xFFBEF42C)                            = 0

In this example, the stat system call took a lot longer than the others.

 

truss -T: This is a great debugging help. It will stop a process at the execution of a specified system call. (“-U” does the same, but with user-level function calls.) A core could then be taken for further analysis, or any of the /proc tools could be used to determine many aspects of the status of the process.

truss -l (improved in Solaris 9): Shows the thread number of each call in a multi-threaded processes. Solaris 9 truss -l finally makes it possible to watch the execution of a multi-threaded application.

Truss is truly a powerful tool. It can be used on core files to analyze what caused the problem, for example. It can also show details on user-level library calls (either system libraries or programmer libraries) via the “-u” option.

pkg-get: This is a nice tool (http://www.bolthole.com/solaris) for automatically getting freeware packages. It is configured via /etc/pkg-get.conf. Once it’s up and running, execute pkg-get -a to get a list of available packages, and pkg-get -i to get and install a given package.

plimit (Solaris >= 8): This command displays and sets the per-process limits on a running process. This is handy if a long-running process is running up against a limit (for example, number of open files). Rather than using limit and restarting the command, plimit can modify the running process.

coreadm (Solaris >= 8): In the “old” days (before coreadm), core dumps were placed in the process’s working directory. Core files would also overwrite each other. All this and more has been addressed by coreadm, a tool to manage core file creation. With it, you can specify whether to save cores, where cores should be stored, how many versions should be retained, and more. Settings can be retained between reboots by coreadm modifying /etc/coreadm.conf.

pgrep (Solaris >= 8): pgrep searches through /proc for processes matching the given criteria, and returns their process-ids. A great option is “-n”, which returns the newest process that matches.

preap (Solaris >= 9): Reaps zombie processes. Any processes stuck in the “z” state (as shown by ps), can be removed from the system with this command.

pargs (Solaris >= 9): Shows the arguments and environment variables of a process.

nohup -p (Solaris >= 9): The nohup command can be used to start a process, so that if the shell that started the process closes (i.e., the process gets a “SIGHUP” signal), the process will keep running. This is useful for backgrounding a task that should continue running no matter what happens around it. But what happens if you start a process and later want to HUP-proof it? With Solaris 9, nohup -p takes a process-id and causes SIGHUP to be ignored.

prstat (Solaris >= 8): prstat is top and a lot more. Both commands provide a screen’s worth of process and other information and update it frequently, for a nice window on system performance. prstat has much better accuracy than top. It also has some nice options. “-a” shows process and user information concurrently (sorted by CPU hog, by default). “-c” causes it to act like vmstat (new reports printed below old ones). “-C” shows processes in a processor set. “-j” shows processes in a “project”. “-L” shows per-thread information as well as per-process. “-m” and “-v” show quite a bit of per-process performance detail (including pages, traps, lock wait, and CPU wait). The output data can also be sorted by resident-set (real memory) size, virtual memory size, execute time, and so on. prstat is very useful on systems without top, and should probably be used instead of top because of its accuracy (and some sites care that it is a supported program).

trapstat (Solaris >= 9): trapstat joins lockstat and kstat as the most inscrutable commands on Solaris. Each shows gory details about the innards of the running operating system. Each is indispensable in solving strange happenings on a Solaris system. Best of all, their output is good to send along with bug reports, but further study can reveal useful information for general use as well.

vmstat -p (Solaris >= 8): Until this option became available, it was almost impossible (see the “se toolkit”) to determine what kind of memory demand was causing a system to page. vmstat -p is key because it not only shows whether your system is under memory stress (via the “sr” column), it also shows whether that stress is from application code, application data, or I/O. “-p” can really help pinpoint the cause of any mysterious memory issues on Solaris.

pmap -x (Solaris >= 8, bugs fixed in Solaris >= 9): If the process with memory problems is known, and more details on its memory use are needed, check out pmap -x. The target process-id has its memory map fully explained, as in:

# pmap -x 1779
1779:   -ksh
 Address  Kbytes     RSS    Anon  Locked Mode   Mapped File
00010000     192     192       -       - r-x--  ksh
00040000       8       8       8       - rwx--  ksh
00042000      32      32       8       - rwx--    [ heap ]
FF180000     680     664       -       - r-x--  libc.so.1
FF23A000      24      24       -       - rwx--  libc.so.1
FF240000       8       8       -       - rwx--  libc.so.1
FF280000     568     472       -       - r-x--  libnsl.so.1
FF31E000      32      32       -       - rwx--  libnsl.so.1
FF326000      32      24       -       - rwx--  libnsl.so.1
FF340000      16      16       -       - r-x--  libc_psr.so.1
FF350000      16      16       -       - r-x--  libmp.so.2
FF364000       8       8       -       - rwx--  libmp.so.2
FF380000      40      40       -       - r-x--  libsocket.so.1
FF39A000       8       8       -       - rwx--  libsocket.so.1
FF3A0000       8       8       -       - r-x--  libdl.so.1
FF3B0000       8       8       8       - rwx--    [ anon ]
FF3C0000     152     152       -       - r-x--  ld.so.1
FF3F6000       8       8       8       - rwx--  ld.so.1
FFBFE000       8       8       8       - rw---    [ stack ]
-------- ------- ------- ------- -------
total Kb    1848    1728      40       -

Here we see each chunk of memory, what it is being used for, how much space it is taking (virtual and real), and mode information.

df -h (Solaris >= 9): This command is popular on Linux, and just made its way into Solaris. df -h displays summary information about file systems in human-readable form:

$ df -h
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c0t0d0s0      4.8G   1.7G   3.0G    37%    /
/proc                    0K     0K     0K     0%    /proc
mnttab                   0K     0K     0K     0%    /etc/mnttab
fd                       0K     0K     0K     0%    /dev/fd
swap                   848M    40K   848M     1%    /var/run
swap                   849M   1.0M   848M     1%    /tmp
/dev/dsk/c0t0d0s7       13G    78K    13G     1%    /export/home

Conclusion

Each administrator has a set of tools used daily, and another set of tools to help in a pinch. This column included a wide variety of commands and options that are lesser known, but can be very useful. Do you have favorite tools that have saved you in a bind? If so, please send them to me so I can expand my tool set as well. Alternately, send along any tools that you hate or that you feel are dangerous, which could also turn into a useful column!

[13 Jan 2004] The art of writing Linux utilities Peter Seebach (developerworks@seebs.plethora.net)

What makes a good utility?

There is a wonderful discussion of this question in The UNIX Programming Environment, by Kernighan & Pike. A good utility is one that does its job as well as possible. It has to play well with others; it has to be amenable to being combined with other utilities. A program that doesn't combine with others isn't a utility; it's an application.

Utilities are supposed to let you build one-off applications cheaply and easily from the materials at hand. A lot of people think of them as being like tools in a toolbox. The goal is not to have a single widget that does everything, but to have a handful of tools, each of which does one thing as well as possible.

Some utilities are reasonably useful on their own, whereas others imply cooperation in pipelines of utilities. Examples of the former include sort and grep. On the other hand, xargs is rarely used except with other utilities, most often find.

What language to write in?
Most of the UNIX system utilities are written in C. The examples here are in Perl and sh. Use the right tool for the right job. If you use a utility heavily enough, the cost of writing it in a compiled language might be justified by the performance gain. On the other hand, for the fairly common case where a program's workload is light, a scripting language may offer faster development.

If you aren't sure, you should use the language you know best. At least when you're prototyping a utility, or figuring out how useful it is, favor programmer efficiency over performance tuning. Most of the UNIX system utilities are in C, simply because they're heavily used enough to justify the development cost. Perl and sh (or ksh) can be good languages for a quick prototype. Utilities that tie other programs together may be easier to write in a shell than in a more conventional programming language. On the other hand, any time you want to interact with raw bytes, C is probably looming on your horizon.

Designing a utility
A good rule of thumb is to start thinking about the design of a utility the second time you have to solve a problem. Don't mourn the one-off hack you write the first time; think of it as a prototype. The second time, compare what you need to do with what you needed to do the first time. Around the third time, you should start thinking about taking the time to write a general utility. Even a merely repetitive task might merit the development of a utility; for instance, many generalized file-renaming programs have been written based on the frustration of trying to rename files in a generalized way.

Here are some design goals of utilities; each gets its own section, below.

Do one thing well
Do one thing well; don't do multiple things badly. The best example of this doing one thing well is probably
sort. No utilities other than sort have a sort feature. The idea is simple; if you only solve a problem once, you can take the time to do it well.

Imagine how frustrating it would be if most programs sorted data, but some supported only lexographic sorts, while others supported only numeric sorts, and a few even supported selection of keys rather than sorting by whole lines. It would be annoying at best.

When you find a problem to solve, try to break the problem up into parts, and don't duplicate the parts for which utilities already exist. The more you can focus on a tool that lets you work with existing tools, the better the chances that your utility will stay useful.

You may need to write more than one program. The best way to solve a specialized task is often to write one or two utilities and a bit of glue to tie them together, rather than writing a single program to solve the whole thing. It's fine to use a 20-line shell script to tie your new utility together with existing tools. If you try to solve the whole problem at once, the first change that comes along might require you to rethink everything.

I have occasionally needed to produce two-column or three-column output from a database. It is generally more efficient to write a program to build the output in a single column and then glue it to a program that puts things in columns. The shell script that combines these two utilities is itself a throwaway; the separate utilities have outlived it.

Some utilities serve very specialized needs. If the output of ls in a crowded directory scrolls off the screen very quickly, it might be because there's a file with a very long name, forcing ls to use only a single column for output. Paging through it using more takes time. Why not just sort lines by length, and pipe the result through tail, as follows?

Listing 1. One of the smallest utilities anywhere, sl
 


#/usr/bin/perl -w
print sort { length $a <=> length $b } <>;

The script in Listing 1 does exactly one thing. It takes no options, because it needs no options; it only cares about the length of lines. Thanks to Perl's convenient <> idiom, this automatically works either on standard input or on files named on the command line.

Be a filter
Almost all utilities are best conceived of as filters, although a few very useful utilities don't fit this model. (For instance, a program that counts might be very useful, even though it doesn't work well as a filter. Programs that take only command-line arguments as input, and produce potentially complicated output, can be very useful.) Most utilities, though, should work as filters. By convention, filters work on lines of text. Most filters should have some support for running on multiple input files.

Remember that a utility needs to work on the command line and in scripts. Sometimes, the ideal behavior varies a little. For instance, most versions of ls automatically sort input into columns when writing to a terminal. The default behavior of grep is to print the file name in which a match was found only if multiple files were specified. Such differences should have to do with how users will want the utility to work, not with other agendas. For instance, old versions of GNU bc displayed an intrusive copyright notice when started. Please don't do that. Make your utility stick to doing its job.

Utilities like to live in pipelines. A pipeline lets a utility focus on doing its job, and nothing else. To live in a pipeline, a utility needs to read data from standard input and write data to standard output. If you want to deal with records, it's best if you can make each line be a "record." Existing programs such as sort and join are already thinking that way. They'll thank you for it.

One utility I occasionally use is a program that calls other programs iteratively over a tree of files. This makes very good use of the standard UNIX utility filter model, but it only works with utilities that read input and write output; you can't use it with utilities that operate in place, or take input and output file names.

Most programs that can run from standard input can also reasonably be run on a single file, or possibly on a group of files. Note that this arguably violates the rule against duplicating effort; obviously, this could be managed by feeding cat into the next program in the series. However, in practice, it seems to be justified.

Some programs may legitimately read records in one format but produce something entirely different. An example would be a utility to put material into columnar form. Such a utility might equate lines to records on input, but produce multiple records per line on output.

Not every utility fits entirely into this model. For instance, xargs takes not records but names of files as input, and all of the actual processing is done by some other program.

Generalize
Try to think of tasks similar to the one you're actually performing; if you can find a general description of these tasks, it may be best to try to write a utility that fits that description. For instance, if you find yourself sorting text lexicographically one day and numerically another day, it might make sense to consider attempting a general sort utility.

Generalizing functionality sometimes leads to the discovery that what seemed like a single utility is really two utilities used in concert. That's fine. Two well-defined utilities can be easier to write than one ugly or complicated one.

Doing one thing well doesn't mean doing exactly one thing. It means handling a consistent but useful problem space. Lots of people use grep. However, a great deal of its utility comes from the ability to perform related tasks. The various options to grep do the work of a handful of small utilities that would have ended up sharing, or duplicating, a lot of code.

This rule, and the rule to do one thing, are both corollaries of an underlying principle: avoid duplication of code whenever possible. If you write a half-dozen programs, each of which sorts lines, you can end up having to fix similar bugs half a dozen times instead of having one better-maintained sort program to work on.

This is the part of writing a utility that adds the most work to the process of getting it completed. You may not have time to generalize something fully at first, but it pays off when you get to keep using the utility.

Sometimes, it's very useful to add related functionality to a program, even when it's not quite the same task. For instance, a program to pretty-print raw binary data might be more useful if, when run on a terminal device, it threw the terminal into raw mode. This makes it a lot easier to test questions involving keymaps, new keyboards, and the like. Not sure why you're getting tildes when you hit the delete key? This is an easy way to find out what's really getting sent. It's not exactly the same task, but it's similar enough to be a likely addition.

The errno utility in Listing 2 below is a good example of generalizing, as it supports both numeric and symbolic names.

Be robust
It's important that a utility be durable. A utility that crashes easily or can't handle real data is not a useful utility. Utilities should handle arbitrarily long lines, huge files, and so on. It is perhaps tolerable for a utility to fail on a data set larger than it can hold in memory, but some utilities don't do this; for instance,
sort, by using temporary files, can generally sort data sets much larger than it can hold in memory.

Try to make sure you've figured out what data your utility can possibly run on. Don't just ignore the possibility of data you can't handle. Check for it and diagnose it. The more specific your error messages, the more helpful you are being to your users. Try to give the user enough information to know what happened and how to fix it. When processing data files, try to identify exactly what the malformed data was. When trying to parse a number, don't just give up; tell the user what you got, and if possible, what line of the input stream the data was on.

As a good example, consider the difference between two implementations of dc. If you run dc /home, one of them says "Cannot use directory as input!" The other just returns silently; no error message, no unusual exit code. Which of these would you rather have in your path when you make a typo on a cd command? Similarly, the former will give verbose error messages if you feed it the stream of data from a directory, perhaps by doing dc < /home. On the other hand, it might be nice for it to give up early on when getting invalid data.

Security holes are often rooted in a program that isn't robust in the face of unexpected data. Keep in mind that a good utility might find its way into a shell script run as root. A buffer overflow in a program such as find is likely to be a risk to a great number of systems.

The better a program deals with unexpected data, the more likely it is to adapt well to varied circumstances. Often, trying to make a program more robust leads to a better understanding of its role, and better generalizations of it.

Be new
One of the worst kinds of utility to write is the one you already have. I wrote a wonderful utility called
count. It allowed me to perform just about any counting task. It's a great utility, but there's a standard BSD utility called jot that does the same thing. Likewise, my very clever program for turning data into columns duplicates an existing utility, rs, likewise found on BSD systems except that rs is much more flexible and better designed. See Resources below for more information on jot and rs.

If you're about to start writing a utility, take a bit of time to browse around a few systems to see if there might be one already. Don't be afraid to steal Linux utilities for use on BSD, or BSD utilities for use on Linux; one of the joys of utility code is that almost all utilities are quite portable.

Don't forget to look at the possibility of combining existing applications to make a utility. It is possible, in theory, that you'll find stringing existing programs together is not fast enough, but it's very rare that writing a new utility is faster than waiting for a slightly slow pipeline.

An example utility
In a sense this program is a counterexample, in that it is never useful as a filter. It works very well as a command-line utility, however.

This program does one thing only. It prints out errno lines from /usr/include/sys/errno.h in a slightly pretty-printed format. For instance:

$ errno 22
EINVAL [22]: Invalid argument

Listing 2. Errno finder
 


    #!/bin/sh
    usage() {
        echo >&2 "usage: errno [numbers or error names]\n"
        exit 1
    }

    for i
    do
        case "$i" in
        [0-9]*)
            awk '/^#define/ && $3 == '"$i"' {
                for (i = 5; i < NF; ++i) {
                    foo = foo " " $i;
                }
                printf("%-22s%s\n", $2 " [" $3 "]:", foo);
                foo = ""
            }' < /usr/include/sys/errno.h
            ;;
        E*)
            awk '/^#define/ && $2 == "'"$i"'" {
                for (i = 5; i < NF; ++i) {
                    foo = foo " " $i;
                }
                printf("%-22s%s\n", $2 " [" $3 "]:", foo);
                foo = ""
            }' < /usr/include/sys/errno.h
            ;;
        *)
            echo >&2 "errno: can't figure out whether '$i' is a name or a number."
            usage
            ;;
        esac
    done

Does it generalize? Yes, nicely. It supports both numeric and symbolic names. On the other hand, it doesn't know about other files, such as /usr/include/sys/signal.h, that are likely in the same format. It could easily be extended to do that, but for a convenience utility like this, it's easier to just make a copy called "signal" that reads signal.h, and uses "SIG*" as the pattern to match a name.

This is just a tad more convenient than using grep on system header files, but it's less error-prone. It doesn't produce garbled results from ill-considered arguments. On the other hand, it produces no diagnostic if a given name or number is not found in the header. It also doesn't bother to correct some invalid inputs. Still, as a command-line utility never intended to be used in an automated context, it's okay.

Another example might be a program to unsort input (see Resources for a link to this utility). This is simple enough; read in input files, store them in some way, then generate a random order in which to print out the lines. This is a utility of nearly infinite applications. It's also a lot easier to write than a sorting program; for instance, you don't need to specify which keys you're not sorting on, or whether you want things in a random order alphabetically, lexicographically, or numerically. The tricky part comes in reading in potentially very long lines. In fact, the provided version cheats; it assumes there will be no null bytes in the lines it reads. It's a lot harder to get that right, and I was lazy when I wrote it.

Summary
If you find yourself performing a task repeatedly, consider writing a program to do it. If the program turns out to be reasonable to generalize a bit, generalize it, and you will have written a utility.

Don't design the utility the first time you need it. Wait until you have some experience. Feel free to write a prototype or two; a good utility is sufficiently better than a bad utility to justify a bit of time and effort on researching it. Don't feel bad if what you thought would be a great utility ends up gathering dust after you wrote it. If you find yourself frustrated by your new program's shortcomings, you just had another prototyping phase. If it turns out to be useless, well, that happens sometimes.

The thing you're looking for is a program that finds general application outside your initial usage patterns. I wrote unsort because I wanted an easy way to get a random series of colors out of an old X11 "rgb.txt" file. Since then, I've used it for an incredible number of tasks, not the least of which was producing test data for debugging and benchmarking sort routines.

One good utility can pay back the time you spent on all the near misses. The next thing to do is make it available for others, so they can experiment. Make your failed attempts available, too; other people may have a use for a utility you didn't need. More importantly, your failed utility may be someone else's prototype, and lead to a wonderful utility program for everyone.

Resources

[Jul 3, 2003] dunne.dyn.dhs.org/Using the m4 Macro Processor - updated link

"What is it about m4 that makes it so useful, and yet so overlooked? m4 -- a macro processor -- unfortunately has a dry name that disguises a great utility. A macro processor is basically a program that scans text and looks for defined symbols, which it replaces with other text or other symbols."

[Apr 17, 2003] Exploring processes with Truss: Part 1 By Sandra Henry-Stocker

The ps command can tell you quite a few things about each process running on your system. These include the process owner, memory use, accumulated time, the process status (e.g., waiting on resources) and many other things as well. But one thing that ps cannot tell you is what a process is doing - what files it is using, what ports it has opened, what libraries it is using and what system calls it is making. If you can't look at source code to determine how a program works, you can tell a lot about it by using a procedure called "tracing". When you trace a process (e.g., truss date), you get verbose commentary on the process' actions. For example, you will see a line like this each time the program opens a file:

open("/usr/lib/libc.so.1", O_RDONLY) = 4

The text on the left side of the equals sign clearly indicates what is happening. The program is trying to open the file /usr/lib/libc.so.1 and it's trying to open it in read-only mode (as you would expect, given that this is a system library). The right side is not nearly as self-evident. We have just the number 4. Open is not a Unix command, of course, but a system call. That means that you can only use the command within a program. Due to the nature of Unix, however, system calls are documented in man pages just like ls and pwd.

To determine what this number represents, you can skip down in this column or you can read the man page. If you elect to read the man page, you will undoubtedly read a line that tells you that the open() function returns a file descriptor for the named file. In other words, the number, 4 in our example, is the number of the file descriptor referred to in this open call. If the process that you are tracing opens a number of files, you will see a sequence of open calls. With other activity removed, the list might look something like this:

open("/dev/zero", O_RDONLY) = 3

open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

open("/usr/lib/libc.so.1", O_RDONLY) = 4

open("/usr/lib/libdl.so.1", O_RDONLY) = 4

open64("./../", O_RDONLY|O_NDELAY) = 3

open64("./../../", O_RDONLY|O_NDELAY) = 3

open("/etc/mnttab", O_RDONLY) = 4

Notice that the first file handle is 3 and that file handles 3 and 4 are used repeatedly. The initial file handle is always 3. This indicates that it is the first file handle following those that are the same for every process that you will run - 0, 1 and 2. These represent standard in, standard out and standard error.

The file handles shown in the example truss output above are repeated only because the associated files are subsequently closed. When a file is closed, the file handle that was used to access it can be used again. 

The close commands include only the file handle, since the location of the file is known. A close command would, therefore, be something like close(3). One of the lines shown above displays a different response - Err#2

ENOENT. This "error" (the word is put in quotes because this does not necessarily indicate that the process is defective in any way) indicates that the file the open call is attempting to open does not exist. Read "ENOENT" as "No such file".

Some open calls place multiple restrictions on the way that a file is opened. The open64 calls in the example output above, for example, specify both O_RDONLY and O_NDELAY. Again, reading the man page will help you to understand what each of these specifications means and will present with a list of other options as well.

As you might expect, open is only one of many system calls that you will see when you run the truss command. Next week we will look at some additional system calls and determine what they are doing.

Exploring processes with Truss: part 2 By Sandra Henry-Stocker

While truss and its cousins on non-Solaris systems (e.g., strace on Linux and ktrace on many BSD systems) provide a lot of data on what a running process is doing, this information is only useful if you know what it means. Last week, we looked at the open call and the file handles that are returned by the call to open(). This week, we look at some other system calls and analyze what these system calls are doing. You've probably noticed that the nomenclature for system functions is to follow the name of the call with a set of empty parentheses for example, open(). You will see this nomenclature in use whenever system calls are discussed.

The fstat() and fstat64() calls obtains information about open files - "fstat" refers to "file status". As you might expect, this information is retrieved from the files' inodes, including whether or not you are allowed to read the files' contents. If you trace the ls command (i.e., truss ls), for example, your trace will start with lines that resemble these:

1 execve("/usr/bin/ls", 0x08047BCC, 0x08047BD4) argc = 1

2 open("/dev/zero", O_RDONLY) = 3

3 mmap(0x00000000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0xDFBFA000

4 xstat(2, "/usr/bin/ls", 0x08047934) = 0

5 open("/var/ld/ld.config", O_RDONLY) Err#2 ENOENT

6 sysconfig(_CONFIG_PAGESIZE) = 4096

7 open("/usr/lib/libc.so.1", O_RDONLY) = 4

8 fxstat(2, 4, 0x08047310) = 0

...

28 lstat64(".", 0x080478B4) = 0

29 open64(".", O_RDONLY|O_NDELAY) = 3

30 fcntl(3, F_SETFD, 0x00000001) = 0

31 fstat64(3, 0x0804787C) = 0

32 brk(0x08057208) = 0

33 brk(0x08059208) = 0

34 getdents64(3, 0x08056F40, 1048) = 424

35 getdents64(3, 0x08056F40, 1048) = 0

36 close(3) = 0

In line 31, we see a call to fstat64, but what file is it checking? The man page for the fstat() and your intuition are probably both telling you that this fstat call is obtaining information on the file opened two lines before – "." or the current directory - and that it is referring to this file by its file handle (3) returned by the open() call in line

2. Keep in mind that a directory is simply a file, though a different variety of file, so the same system calls are used as would be used to check a text file.

You will probably also notice that the file being opened is called /dev/zero (again, see line 2). Most Unix sysadmins will immediately know that /dev/zero is a special kind of file - primarily because it is stored in /dev. And, if moved to look more closely at the file, they

will confirm that the file that /dev/zero points to (it is itself a symbolic link) is a special character file. What /dev/zero provides to system programmers, and to sysadmins if they care to use it, is an endless stream of zeroes. This is more useful than might first appear.

To see how /dev/zero works, you can create a 10M-byte file full of zeroes with a command like this:

/bin/dd < /dev/zero > zerofile bs=1024 seek=10240 count=1

This command works well because it creates the needed file with only a few read and write operations; in other words, it is very efficient.

You can verify that the file is zero-filled with od.

# od -x zerofile

0000000 0000 0000 0000 0000 0000 0000 0000 0000

*

50002000

Each string of four zeros (0000) represents two bytes of data. The * on the second line of output indicates that all of the remaining lines are identical to the first.

Looking back at the truss output above, we cannot help but notice that the first line of the truss output includes the name of the command that we are tracing. The execve() system call executes a process. The first argument to execve() is the name of the file from which the new process

image is to be loaded. The mmap() call which follows maps the process image into memory. In

other words, it directly incorporates file data into the process address space. The getdents64() calls on lines 34 and 35 are extracting information from the directory file - "dents" refers to "directory entries'.

The sequence of steps that we see at the beginning of the truss output executing the entered command, opening /dev/zero, mapping memory and so on - looks the same whether you are tracing ls, pwd, date or restarting Apache. In fact, the first dozen or so lines in your truss output will be nearly identical regardless of the command you are running. You should, however, expect to see some differences between different Unix systems and different versions of Solaris.

Viewing the output of truss, you can get a solid sense of how the operating system works. The same insights are available if you are tracing your own applications or troubleshooting third party executables.

-------------------

Sandra Henry-Stocker

Linux.ie Using the ps command.

3.2. Displaying all processes owned by a specific user

$ ps ux
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
heyne      691  0.0  2.4 19272 9576 ?        S    13:35   0:00 kdeinit: kded    
heyne      700  0.1  1.0  5880 3944 ?        S    13:35   0:01 artsd -F 10 -S 40
... ... ... 


You can also use the syntax "ps U username".

As you can see, the ps command can give you a lot of interesting information. If you for example want to know what your friend actually does, just replace your login name with her/his name and you see all processe belonging to her/him.

3.3. Own output format

If you are bored by the regular output, you could simply change the format. To do so use the formatting characters which are supported by the ps command.
If you execute the ps command with the 'o' parameter you can tell the ps command what you want to see:
e.g.
Odd display with AIX field descriptors:
 

$ ps -o "%u : %U : %p : %a"
RUSER    : USER     :   PID : COMMAND
heyne    : heyne    :  3363 : bash
heyne    : heyne    :  3367 : ps -o %u : %U : %p : %a

developerWorks  Concatenating files with cat Cat has two useful options:

Dogs of the Linux Shell Posted on Saturday, October 19, 2002 by Louis J. Iacona  

Could the command-line tools you've forgotten or never knew save time and some frustration?

One incarnation of the so called 80/20 rule has been associated with software systems. It has been observed that 80% of a user population regularly uses only 20% of a system's features. Without backing this up with hard statistics, my 20+ years of building and using software systems tells me that this hypothesis is probably true. The collection of Linux command-line programs is no exception to this generalization. Of the dozens of shell-level commands offered by Linux, perhaps only ten commands are commonly understood and utilized, and the remaining majority are virtually ignored.

Which of these dogs of the Linux shell have the most value to offer? I'll briefly describe ten of the less popular but useful Linux shell commands, those which I have gotten some mileage from over the years. Specifically, I've chosen to focus on commands that parse and format textual content.

The working examples presented here assume a basic familiarity with command-line syntax, simple shell constructs and some of the not-so-uncommon Linux commands. Even so, the command-line examples are fairly well commented and straightforward. Whenever practical, the output of usage examples is presented under each command-line execution.

The following eight commands parse, format and display textual content. Although not all provided examples demonstrate this, be aware that the following commands will read from standard input if file arguments are not presented.

Table 1. Summary of Commands

Head/Tail

As their names imply, head and tail are used to display some amount of the top or bottom of a text block. head presents beginning of a file to standard output while tail does the same with the end of a file. Review the following commented examples:

## (1) displays the first 6 lines of a file
   head -6 readme.txt
## (2) displays the last 25 lines of a file
   tail -25 mail.txt

Here's an example of using head and tail in concert to display the 11th through 20th line of a file.

# (3)
head -20 file | tail -10 

Manual pages show that the tail command has more command-line options than head. One of the more useful tail option is -f. When it is used, tail does not return when end-of-file is detected, unless it is explicitly interrupted. Instead, tail sleeps for a period and checks for new lines of data that may have been appended since the last read.

## (4) display ongoing updates to the given
##     log file 

tail -f /usr/tmp/logs/daemon_log.txt

Imagine that a dæmon process was continually appending activity logs to the /usr/adm/logs/daemon_log.txt file. Using tail -f at a console window, for example, will more or less track all updates to the file in real time. (The -f option is applicable only when tail's input is a file).

If you give multiple arguments to tail, you can track several log files in the same window.

## track the mail log and the server error log
## at the same time.

tail -f /var/log/mail.log /var/log/apache/error_log

tac--Concatenate in Reverse

What is cat spelled backwards? Well, that's what tac's functionality is all about. It concatenates file order and their contents in reverse. So what's its usefulness? It can be used on any task that requires ordering elements in a last-in, first-out (LIFO) manner. Consider the following command line to list the three most recently established user accounts from the most recent through the least recent.

# (5) last 3 /etc/passwd records - in reverse
$ tail -3 /etc/passwd | tac
curly:x:1003:100:3rd Stooge:/homes/curly:/bin/ksh
larry:x:1002:100:2nd Stooge:/homes/larry:/bin/ksh
moe:x:1001:100:1st Stooge:/homes/moe:/bin/ksh

nl--Numbered Line Output

nl is a simple but useful numbering filter. I displays input with each line numbered in the left margin, in a format dictated by command-line options. nl provides a plethora of options that specify every detail of its numbered output. The following commented examples demonstrate some of of those options:

# (6) Display the first 4 entries of the password
#     file - numbers to be three columns wide and 
#     padded by zeros.
$ head -4 /etc/passwd | nl -nrz -w3
001	root:x:0:1:Super-User:/:/bin/ksh
002	daemon:x:1:1::/:
003	bin:x:2:2::/usr/bin:
004	sys:x:3:3::/:
#
# (7) Prepend ordered line numbers followed by an
#     '=' sign to each line -- start at 101.
$ nl -s= -v101 Data.txt
101=1st Line ...
102=2nd Line ...
103=3rd Line ...
104=4th Line ...
105=5th Line ...
  .......

fmt--Format

The fmt command is a simple text formatter that focuses on making textual data conform to a maximum line width. It accomplishes this by joining and breaking lines around white space. Imagine that you need to maintain textual content that was generated with a word processor. The exported text may contain lines whose lengths vary from very short to much longer than a standard screen length. If such text is to be maintained in a text editor (like vi), fmt is the command of choice to transform the original text into a more maintainable format. The first example below shows fmt being asked to reformat file contents as text lines no greater than 60 characters long.

# (8) No more than 60 char lines
$ fmt -w 60 README.txt > NEW_README.txt
# 
# (9) Force uniform spacing:
#     1 space between words, 2 between sentences
$ echo "Hello   World. Hello Universe." | fmt -u -w80 

Hello World.  Hello Universe.

fold--Break Up Input

fold is similar to fmt but is used typically to format data that will be used by other programs, rather than to make the text more readable to the human eye. The commented examples below are fairly easy to follow:

# (10) Format text in 3 column width lines
$ echo oxoxoxoxo | fold -w3 
oxo
xox
oxo
# (11) Parse by triplet-char strings - 
#      search for 'xox'
$ echo oxoxoxoxo | fold -w3 | grep "xox"
xox
# (12) One way to iterate through a string of chars
$ for i in $(echo 12345 | fold -w1)
> do
> ### perform some task ...
> print $i
> done
1
2
3
4
5

pr

pr shares features with simpler commands like nl and fmt, but its command-line options make it ideal for converting text files into a format that's suitable for printing. pr offers options that allow you to specify page length, column width, margins, headers/footers, double line spacing and more.

Aside from being the best suited formatter for printing tasks, pr also offers other useful features. These features include allowing you to view multiple files vertically in adjacent columns or columnizing a list in a fixed number of columns (see Listing 2).

Listing 2. Using pr

Miscellaneous

The following two commands are specialized parsers used to pick apart file path pieces.

Basename/Dirname

The basename and dirname commands are useful for presenting portions of a given file path. Quite often in scripting situations, it's convenient to be able to parse and capture a file name or the containing-directory name portions of a file path. These commands reduce this task to a simple one-line command. (There are other ways to approach this using the Korn shell or sed "magic", but basename and dirname are more portable and straightforward).

basename is used to strip off the directory, and optionally, the file suffix parts of a file path. Consider the following trivial examples:

:# (21) Parse out the Java Class name
$ basename
/usr/local/src/java/TheClass.java .java 
TheClass 
# (22) Parse out the file name.  
$ basename srcs/C/main.c 
main.c

dirname is used to display the containing directory path, as much of the path as is provided. Consider the following examples:

# (23) absolute and relative directory examples
$ dirname /homes/curly/.profile 
/homes/curly 
$ dirname curly/.profile
curly 
# 
# (24) From any korn-shell script, the following
#  line will assign the directory from where 
#  the script was launched 
SCRIPT_HOME="$(dirname $(whence $0))" 
# 
# (25)
# Okay, how about a non-trivial practical example?
#  List all directories (under $PWD that contain a  
#  file called 'core'.
$ for i in $(find $PWD -name core )^
> do 
> dirname $i
> done | sort -u
bin 
rje/gcc 
src/C

ttyrec a tty recorder

ttyrec is a tty recorder. Recorded data can be played back with the included ttyplay command. ttyrec is just a derivative of script command for recording timing information with microsecond accuracy as well. It can record emacs -nw, vi, lynx, or any programs running on tty.

Understanding Archivers

In the next few articles, I'd like to take a look at backups and archiving utilities. if you're like I was when I started using Unix, I was intimidated by the words tar, cpio and dump, and a quick peek at their respective man pages did not alleviate my fears.

Online Gnu Documentation

Links to the manuals for the Gnu tools most commonly used in embedded development: Using and Porting GNU CC * Using as, The GNU Assembler * GASP, an assembly preprocessor * Using ld, the GNU linker
 http://www.objsw.com/docs/

Slashdot Articles Free Books Online

 Matt Braithwaite writes "Answering RMS's call for free documentation, Karl Fogel has written a book on CVS that is free (GPLed) and available online. (The paper version has additional non-free material.) " Also, edinator wrote to say that ORA has put the Using Samba text online. The entire text of the Oreilly Docbook is downloadable www.docbook.org

 

TheLinuxGurus.org: Book Review: Professional Linux Programming

(Oct 21, 2000, 18:38 UTC) (116 reads) (0 talkbacks) (Posted by john)
"This book takes a different approach in that it steps through the development of a fictional application. The application you will build is an interface for a DVD rental store."  

FreeOS.com: RPM usage for newbies

(Oct 21, 2000, 18:03 UTC) (203 reads) (0 talkbacks) (Posted by john)
"The Red Hat Package Manager (RPM) has establised itself as one of the most popular distrubution formats for linux software today. A first time user may feel overwhelmed by the vast number of options available and this article will help a newbie to get familiar with usage of this tool."

Signal Ground: Stupid dd Tricks (or, Why We Didn't buy Norton Ghost)

"The company that employs Tom and me builds big pieces of food processing machinery that cost upwards of $400K. Each machine includes an embedded PCs running -- and I cringe -- NT 4. While the company's legacy currently dictates NT, those of us at the lower levels of the totem pole work to wedge Linux in wherever we can. What follows is a short story of a successful insertion that turned out to be (gasp!) financially beneficial to the company, too."

"...Ghost works well; it does exactly what we wanted it to. You boot off of a floppy (while the image medium is in another drive), and Ghost does the rest. The problem lies in Ghost's licensing. If you want to install in a situation like ours, you have to purchase a Value-Added Reseller (VAR) license from Symantec. And, every time you create a drive, you have to pay them about 17 dollars. When you also figure in the time needed to keep track of those licenses, that adds up in a hurry."

"It finally occurred to me that we could use Linux and a couple of simple tools (dd, gzip, and a shell script) to do the same thing as Ghost -- at least as far as our purposes go. ... The Results? We showed our little program to management, and they were impressed. We were able to create disk images almost as quickly as Norton Ghost, and we did it all in an afternoon using entirely free software. The rest is history."

Issue #87 Common Shell Tools - Focus On Linux - 05-25-00

"sort and uniq

The sort command is used to sort the lines in an input stream in alphanumeric or telephone book order. The simplest ways to use sort are to provide it with a filename to sort or an input stream whose data should be output in sorted form:

  sort myfile.txt
  cat myfile.txt | sort

This tool can be told to sort based on alternate fields and in several different orders. The uniq command is often used in conjunction with sort because it removes consecutive duplicate lines from and input stream before writing them to standard output. This provides a quick easy way to sort a pool of data and them remove duplicate entries.

A more in-depth discussion of sort can be found in the past QuickTip called Sort and Uniq.

tr

The tr command in its simplest form can be thought of as a simpler case of the sed command discussed earlier. It is used to replace all occurances of a single character in an input stream with an alternate character before writing to the output stream. For example, to change all percent (%) characters to spaces, you might use:

  tr '%' ' ' 
newfile.txt

Though sed can be used to accomplish the same task, it is often simpler to use tr when replacing a single character because the syntax is easy to remember and many special characters which must be escaped for sed can be supplied to tr without escaping.

wc

The wc, or "word count" command does just what its name implies: it counts words. As an added feature, tr also counts lines and bytes. The formats for counting words, lines, or bytes in a file or input stream are:

  $ wc -w myfile.txt
      897 myfile.txt
  $ wc -l myfile.txt
      193 myfile.txt
  $ wc -c myfile.txt
     5927 myfile.txt
  $

Notice that the output for wc normally includes the filename (when reading from a file) and always includes a number of spaces as well. Often, this behavior is undesirable, usually when a number is required without leading or trailing whitespace. In such cases, sed and cut can be used to eliminate them:

  $ wc -l myfile.txt | cut -d ' ' -f 1 | sed 's! !!g'
  193
  $

Note that other methods for removing spaces or filenames include using a more complex sed command alone or even using awk, which we won't discuss in this issue.

xargs

The xargs utility is used to break long input streams into groups of lines so that the shell isn't overloaded by command substitution. For example, the following command may fail if too many files are present in the current directory tree for BASH to substitute correctly:

  lpr $(find .)

However, using xargs, the desired effect can be obtained:

  find . | xargs lpr

More information on using xargs can be found in the QuickTip called Long Argument Lists and on the xargs manual page.

Linux Today PRNewswire SCO Contributes to the Open Source Community; Kicks Off Open Source Initiatives

"SCO is contributing source code for two developer tools -- "cscope" and "fur." The code is released under the terms of the BSD License and will be maintained by SCO. The first technology, cscope, is available to download at www.sco.com/opensource. Software developers can use cscope to help design and debug programs coded with the C programming language. The second technology, Fur, will be available to download in several weeks. Fur is a real-time analysis program used to optimize application and system binaries for more effective run time execution. Dramatic results have been seen in high-level applications and database systems using fur." 

[Jan 30, 2000] Use the Source, Luke Compiling and installing from source code LG #49

One of the greatest strengths of the Open Source movement is the availability of source code for almost every program. This article will discuss in general terms, with some examples, how to install a program from source code rather than a precompiled binary package. The primary audience for this article is the user who has some familiarity with installing programs from binaries, but isn't familiar with installing from source code. Some knowledge of compiling software is helpful, but not required.

[Jan 3, 2000] Advanced Programming in Expect A Bulletproof Interface LG #48

A very interesting and useful paper. See also: Ext2- Automating interactive tasks with expect and crontab
QCad the user-friendly CAD system for Linux is now open source PC Week PC Week Labs evaluates open-source apps 

Open Source Software Chronicles --  July-September, 1998


See Also


Recommended Links


In case of broken links please try to use Google search. If you find the page please notify us about new location
Google     


Reference


  • Humor

    Less sucks less more than more.
    That's why I use more less, and less more.

     


    Copyright © 1996-2008 by Dr. Nikolai Bezroukov. www.softpanorama.org was created as a service to the UN Sustainable Development Networking Programme (SDNP) in the author free time. Submit comments This document is an industrial compilation designed and created exclusively for educational use and is placed under the copyright of the Open Content License(OPL). Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

    Standard disclaimer:

    Last modified: April 05, 2009