Advance Your Awk Skills With Two Easy Tutorials
Advance Your Awk Skills With Two Easy Tutorials
com
LO G I N
By Dave Neary
October 31, 2019 | 0 Comments | 7 min read 133 readers like this.
Awk is one of the oldest tools in the Unix and Linux user's toolbox. Created in the
1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan (the A, W, and K of the
https://opensource.com/article/19/10/advanced-awk 1/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
tool's name), awk was created for complex processing of text streams. It is a
companion tool to sed, the stream editor, which is designed for line-by-line
processing of text files. Awk allows more complex structured programs and is a
complete programming language.
This article will explain how to use awk for more structured and complex tasks,
including a simple mail merge application.
Each block executes when the line in the input buffer matches the pattern. If no
pattern is included, the function block executes on every line of the input stream.
Also, the following syntax can be used to define functions in awk that can be called
from any block:
https://opensource.com/article/19/10/advanced-awk 2/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
There are many other variables that affect awk's behavior, but this is enough to start
with.
Awk one-liners
For a tool so powerful, it's interesting that most of awk's usage is basic one-liners.
Perhaps the most common awk program prints selected fields from an input line from
a CSV file, a log file, etc. For example, the following one-liner prints a list of
usernames from /etc/passwd:
As mentioned above, $1 is the first field in the current record. The -F option sets the
FS variable to the character :.
In the following example, every user whose shell is not /sbin/nologin can be printed
by preceding the block with a pattern match:
https://opensource.com/article/19/10/advanced-awk 3/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
Now that you have some of the basics, try delving deeper into awk with a more
structured example: creating a mail merge.
Dear {firstname},
We are pleased to inform you that your proposal has been successful
will contact you shortly with further information about the event
schedule.
Thank you,
The Program Committee
And the other is a CSV file (called proposals.csv) with the people you want to send
the email to:
firstname,lastname,email,title
Harry,Potter,[email protected],"Defeating your nemesis in 3 easy
Jack,Reacher,[email protected],"Hand-to-hand combat for beginners"
Mickey,Mouse,[email protected],"Surviving public speaking with a sq
Santa,Claus,[email protected],"Efficient list-making"
You want to read the CSV file, replace the relevant fields in the first file (skipping the
first line), then write the result to a file called acceptanceN.txt, incrementing N for
https://opensource.com/article/19/10/advanced-awk 4/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
Write the awk program in a file called mail_merge.awk. Statements are separated by
; in awk scripts. The first task is to set the field separator variable and a couple of
other variables the script needs. You also need to read and discard the first line in the
CSV, or a file will be created starting with Dear firstname. To do this, use the special
function getline and reset the record counter to 0 after reading it.
BEGIN {
FS=",";
template="email_template.txt";
output="acceptance";
getline;
NR=0;
}
The main function is very straightforward: for each line processed, a variable is set for
the various fields—firstname, lastname, email, and title. The template file is read
line by line, and the function sub is used to substitute any occurrence of the special
character sequences with the value of the relevant variable. Then the line, with any
substitutions made, is output to the output file.
Since you are dealing with the template file and a different output file for each line,
you need to clean up and close the file handles for these files before processing the
next record.
{
# Read relevant fields from input file
firstname=$1;
lastname=$2;
email=$3;
title=$4;
or
and you will find text files generated in the current directory.
https://opensource.com/article/19/10/advanced-awk 6/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
proposer["firstname"]=$1;
proposer["lastname"]=$2;
proposer["email"]=$3;
proposer["title"]=$4;
This makes text processing very easy. A simple program that uses this concept is the
idea of a word frequency counter. You can parse a file, break out words (ignoring
punctuation) in each line, increment the counter for each word in the line, then
output the top 20 words that occur in the text.
First, in a file called wordcount.awk, set the field separator to a regular expression
that includes whitespace and punctuation:
BEGIN {
# ignore 1 or more consecutive occurrences of the character
# in the character group below
FS="[ .,:;()<>{}@!\"'\t]+";
}
Next, the main loop function will iterate over each field, ignoring any empty fields
(which happens if there is punctuation at the end of a line), and increment the word
count for the words in the line.
{
for (i = 1; i <= NF; i++) {
if ($i != "") {
words[$i]++;
}
}
}
Finally, after the text is processed, use the END function to print the contents of the
array, then use awk's capability of piping output into a shell command to do a
numerical sort and print the 20 most frequently occurring words:
https://opensource.com/article/19/10/advanced-awk 7/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
END {
sort_head = "sort -k2 -nr | head -n 20";
for (word in words) {
printf "%s\t%d\n", word, words[word] | sort_head;
}
close (sort_head);
}
Running this script on an earlier draft of this article produced this output:
What's next?
https://opensource.com/article/19/10/advanced-awk 8/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
If you want to learn more about awk programming, I strongly recommend the book
Sed and awk by Dale Dougherty and Arnold Robbins.
Another great resource for learning awk is the GNU awk user guide. It has a full
reference for awk's built-in function library, as well as lots of examples of simple and
complex awk scripts.
https://opensource.com/article/19/10/advanced-awk 9/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
https://opensource.com/article/19/10/advanced-awk 10/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
https://opensource.com/article/19/10/advanced-awk 11/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
Robert Young
Dave Morriss
Tags: LINUX
Dave Neary
Dave Neary is a member of the Open Source and Standards
team at Red Hat, helping make Open Source projects
important to Red Hat be successful. Dave has been around
the free and open source software world, wearing many
https://opensource.com/article/19/10/advanced-awk 12/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
More about me
Related Content
The opinions expressed on this website are those of each author, not of the author's
employer or of Red Hat.
https://opensource.com/article/19/10/advanced-awk 13/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com
Privacy Policy
Terms of use
https://opensource.com/article/19/10/advanced-awk 14/14