You are on page 1of 3

POWERTOOLS

The great
awk wk sounds like the noise made by a dying
Samuel Palmer takes a look at
the awk programming language,
the perfect hacker’s tool for editing
text files and analysing your data

A seagull, although the odd name, as is usual


with such things, has relatively prosaic ori-
gins, deriving, as it does, from the initials of the sur-
names of each of the creators of awk, Alfred V Aho,
Peter J Weinberger and Brian W Kernighan. Rightly
or wrongly, the majority of the credit for awk is gen-
erally given to Kernighan, although the name would
indicate that this shouldn’t necessarily be so.
Awk is, in fact, a powerful but simple program-
ming language that, like grep and sed, has become
an essential part of the Unix tool kit, and by exten-
sion, the Linux tool kit. Awk was designed to fulfill a
common requirement on computer systems, to edit
text files, especially where those files are used to
store information, and to re-arrange, classify, vali-
date and analyse the data contained in those files.
Such work is laborious and subject to error when
done manually. The alternative, writing programs in
C or other high-level programming languages, is
usually impractical and time consuming.
Awk could be said to be the godfather of the
macro facilities that are provided with modern
spreadsheet programs, but is far more powerful, far
more versatile, and can be used as a rapid solution in more
diverse scenarios. Free Software Foundation, is gawk, or GNU awk, which was
written in 1986 by Paul Rubin and Jay Fenlason, and was
From awk to gawk reworked in 1989 by David Trueman and Arnold Robbins.
The first incarnation of the language appeared in 1977, and Gawk contains a number of extensions to nawk that increase
originated, like Unix and C, under the aegis of Bell its functionality and power. The POSIX specifcation of awk
Laboratories, which has contributed a disproportionate share includes feedback from both the gawk designers and the
of the innovative technologies of the last half century. original awk designers.
Kernighan is head of Bell’s Computing Structures Research Gawk, like so much of the work of the GNU project, is an
Department, and is best known as the co-author of The C essential feature of Linux, and is just one of the many tools
Programming Language with Dennis Ritchie. that give some justification to Richard Stallman’s often dispar-
There are several varieties of awk. The original specifica- aged claim that Linux should be known to the world as
tion, as released in 1977, is still referenced because it is the GNU/Linux. There is a further implementation of awk, mawk,
default version on some versions of Unix. A revised version, or Mike’s awk implementation, which is also free software,
called nawk, or new awk, was finally released as part of Unix and is available with some Linux distributions.
System V Release 3.1 in 1988, although it had already been in Mawk was written by Mike Brennan, who claims as the
internal use within AT&T for several years. Nawk is the most main benefit of mawk that it is “the fastest awk implementa-
usual implementation of awk on Unix. Nawk added some tion I know. It’s even a lot faster than GNU awk (which is much
new features to the language and cleaned up some “dark cor- faster than the awks that Unix vendors ship with their sys-
ners”, as Effective Awk Programming puts it. The preferred tems)”.
adaption of the language for Linux, as implemented by the

LinuxUser/July-August 2001 5 9
POWERTOOLS

On the command line


Get it from the source The purpose of awk is to allow complex pattern recognition
and relatively complicated arithmetic functions in programs
Effective awk Programming is writ-
containing one or two lines. An awk program contains a
ten by Arnold Robbins, one of the
sequence of patterns and actions. Unlike conventional pro-
developers of gawk, and the co-
author of Sed & awk. This book is gramming languages awk can be said to be data-driven. Awk
required reading for the Linux pro- searches a file, or a set of specified files for a required pattern
grammer who wants to explore the of data, and then takes the appropriate action (or set of
potential of awk and gawk. It is gen- actions), which may be quite complex. Awk is a natural exten-
erally considered to give the most in sion of grep and sed, which can be used to perform similar
depth coverage of the many titles tasks, and was conceived as such by the original designers,
available on the subject. The book as a means of extending the processing capabilities of grep
was written under the auspices of and sed to more complex forms of data. The difference is that
the Free Software Foundation and is
awk has a much greater range of pattern recognition tools,
also available electronically, in which
can handle arithmetic processes, has the ability to control
form it can be freely copied and dis-
tributed under the terms of the Free flow to any part of a program, can store values in user-defined
Software Foundation’s Free variables that reference general storage locations, and has
Documentation Licence. A portion of the ability to operate on user generated internal functions.
the proceeds from sales of this book Awk can perform relatively complex pattern matching, file
will goes to the FSF to support fur- editing and analysis tasks over multiple files. As such awk
ther development of free and open replaces the need to use a full programming language, and
source software. Effective awk Programming is a complete gives the possibility of rapid facilities for global edits or data
guide to the gawk 3.1 implementation of the language, and also analysis. Typically awk might be used
contains the most up-to-date and thorough elucidation of the
POSIX standard for awk available anywhere. “You should never for one-off tasks, but an awk script can
also be stored in a file, and is one of

use C if you can do it those classic Unix utilities that has all
kinds of unpredictable uses far beyond

It has been said that The Awk


with a script, never the original remit of its design - a gen-
eral purpose programming language
Programming Language, by
Aho, Kernighan and
use a script if you can that doesn’t need extensive program-
ming experience to achieve the
Weinberger, the originators
of the language, “is to AWK
do it with awk, never desired results.
Awk can be invoked in two forms,
what The C Programming
Language is to C. Its the use awk if you can do which can be conventionally defined
as follows:
bible”. As the original guide
to the language it offers it with sed, and never awk {options] ‘script’ var=value file(s)
some insight into the inten-
tions of the authors, and use sed if you can do awk [options] -f scriptfile var=value
offers a complete set of file(s)
examples. it with grep”
Robert M Slade The options are -F to define a field
seperator to be found in the data, and -
V to assign a variable that can be used in the script. The script
may be written on the command line, or contained in a file.
Awk can be used to process multiple files that contain the
defined pattern.
Patterns can be defined as combinations of regular
sed & awk was written by expressions and comparison operations on strings, numbers,
Dale Dougherty and fields, variables, and array elements. Actions may perform
Arnold Robbins, and is arbitrary processing on selected lines. The language is C-like,
subject to the same but has no declarations although strings and numbers have
laudatory praise as the built-in data types. Some benefits of awk include automatic
books above. The book file handling, associative arrays, user-defined and reserved
progresses from a simple
functions, recursion, regular expressions, multidimensional
introduction to the bene-
arrays, formatted output using printf and sprintf. Empty pat-
fits of both sed and awk,
towards detailed descrip- terns and actions can be defined for specific purposes. While
tions of the tools, regular typical examples of awk programs show one line applica-
expression syntax and tions, awk can in fact be used to compile quite complex oper-
other intricacies. sed & ations, and a program that is being used to process data is
awk is a standard text more likely to be several lines long. Awk has the structures to
book Unix programmers support this. The simplest awk program might be as follows:
and administrators. O’Reilly also publishes a Pocket Reference
edition of sed & awk. awk ‘/LinuxUser/ {print}’ *.txt

This program will scan all files in the current directory with the
suffix .txt, search for any occurrence of the word LinuxUser,
and print to the terminal all lines containing that text.

6 0 LinuxUser/July-August 2001
POWERTOOLS

Short cuts
Awk and nawk and gawk From a programmer’s point of view an awk program can be
seen as a quick subroutine that can be invoked on its own
A classic definition of the capabilities and differences without the requirement for the surrounding program super-
between the popular implementations of awk is given by Dale structure. From a user point of view, awk allows the user with
Dougherty and Arnold Robbins in sed & awk. a rudimentary knowledge of programming structures to
process data according to his or her own requirements. Awk
With original awk, you can: is, in fact, a scripting language that was designed to achieve a
limited number of tasks. Some may argue that, as a language,
• Think of a text file as made up of records and fields in a
it has been superceded by Perl and other scripting languages,
textual database.
• Perform arithmetic and string operations. but it is simpler to master and quicker to use.
• Use programming constructs such as loops conditionals. Because awk uses a syntax that looks very much like C, it
• Produce formatted reports makes itself attractive as a short cut for programmers to get a
task done quickly. As such, awk is often used as a prototyping
With nawk, you can also: tool that lends itself to iterative testing of algorithms. Once the
proof is working it is a relatively easy process to convert the
• Define your own functions awk program into another language, or to embed the pro-
• Execute Unix commands from a script gram in a working script.
• Process the results of Unix commands
The authors claim that awk has been used for a diversity of
• Process command-line arguments more gracefully
applications “from databases to circuit design, from numeri-
• Work more easily with multiple input streams
• Flush open output files and pipes (latest Bell Labs awk) cal analysis to graphics, from compilers to system adminstra-
tion, from a first language for non-programmers to the imple-
In addition, with GNU auk (gawk), you can: mentation language for software engineering courses”. The
most typical application remains that for which it was original-
• Use regular expressions to separate records, as well as ly designed, to scan and edit text files and to produce reports
field on the data held therein.
• Skip to the start of the next file, not just the next record If you can not determine when awk should be used in pref-
• Perform more powerful string sustitutions erence to other languages, take the advise that Robert M
• Retrieve and format system time values
Slade gave in a review of the O’Reilly book, sed & awk. “The
Enlightened Ones say that you should never use C if you can
do it with a script, never use a script if you can do it with awk,
never use awk if you can do it with sed, and never use sed if
you can do it with grep.”