0% found this document useful (0 votes)
30 views

Advance Your Awk Skills With Two Easy Tutorials

This document provides two tutorials for advancing awk skills: a mail merge application and a word frequency counter. The mail merge uses awk to read a CSV file, substitute fields into an email template, and output customized emails. The word counter demonstrates using associative arrays to count word frequencies in a text file. Both examples show how to structure awk programs for readability and reuse with pattern blocks, functions, and file handling.

Uploaded by

safwannayeem569
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Advance Your Awk Skills With Two Easy Tutorials

This document provides two tutorials for advancing awk skills: a mail merge application and a word frequency counter. The mail merge uses awk to read a CSV file, substitute fields into an email template, and output customized emails. The word counter demonstrates using associative arrays to count word frequencies in a text file. Both examples show how to structure awk programs for readability and reuse with pattern blocks, functions, and file handling.

Uploaded by

safwannayeem569
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.

com

LO G I N 

Advance your awk skills with two easy tutorials


Go beyond one-line awk scripts with mail merge and word
counting.

By Dave Neary

October 31, 2019 | 0 Comments | 7 min read 133 readers like this.

Image by: Opensource.com

Awk is one of the oldest tools in the Unix and Linux user's toolbox. Created in the
1970s by Alfred Aho, Peter Weinberger, and Brian Kernighan (the A, W, and K of the
https://opensource.com/article/19/10/advanced-awk 1/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

tool's name), awk was created for complex processing of text streams. It is a
companion tool to sed, the stream editor, which is designed for line-by-line
processing of text files. Awk allows more complex structured programs and is a
complete programming language.

This article will explain how to use awk for more structured and complex tasks,
including a simple mail merge application.

Awk program structure


An awk script is made up of functional blocks surrounded by {} (curly brackets).
There are two special function blocks, BEGIN and END, that execute before
processing the first line of the input stream and after the last line is processed. In
between, blocks have the format:

pattern { action statements }

Each block executes when the line in the input buffer matches the pattern. If no
pattern is included, the function block executes on every line of the input stream.

Also, the following syntax can be used to define functions in awk that can be called
from any block:

function name(parameter list) { statements }

This combination of pattern-matching blocks and functions allows the developer to


structure awk programs for reuse and readability.

How awk processes text streams


Awk reads text from its input file or stream one line at a time and uses a field
separator to parse it into a number of fields. In awk terminology, the current buffer is
a record. There are a number of special variables that affect how awk reads and
processes a file:

FS (field separator): By default, this is any whitespace (spaces or tabs)

https://opensource.com/article/19/10/advanced-awk 2/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

RS (record separator): By default, a newline (\n)


NF (number of fields): When awk parses a line, this variable is set to the number
of fields that have been parsed
$0: The current record
$1, $2, $3, etc.: The first, second, third, etc. field from the current record
NR (number of records): The number of records that have been parsed so far by
the awk script

There are many other variables that affect awk's behavior, but this is enough to start
with.

Awk one-liners
For a tool so powerful, it's interesting that most of awk's usage is basic one-liners.
Perhaps the most common awk program prints selected fields from an input line from
a CSV file, a log file, etc. For example, the following one-liner prints a list of
usernames from /etc/passwd:

awk -F":" '{print $1 }' /etc/passwd

As mentioned above, $1 is the first field in the current record. The -F option sets the
FS variable to the character :.

The field separator can also be set in a BEGIN function block:

awk 'BEGIN { FS=":" } {print $1 }' /etc/passwd

In the following example, every user whose shell is not /sbin/nologin can be printed
by preceding the block with a pattern match:

awk 'BEGIN { FS=":" } ! /\/sbin\/nologin/ {print $1 }' /etc/passwd

Advanced awk: Mail merge

https://opensource.com/article/19/10/advanced-awk 3/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

Now that you have some of the basics, try delving deeper into awk with a more
structured example: creating a mail merge.

A mail merge uses two files, one (called in this


example email_template.txt) containing a template for an email you want to send:

From: Program committee <[email protected]>


To: {firstname} {lastname} <{email}>
Subject: Your presentation proposal

Dear {firstname},

Thank you for your presentation proposal:


{title}

We are pleased to inform you that your proposal has been successful
will contact you shortly with further information about the event
schedule.

Thank you,
The Program Committee

And the other is a CSV file (called proposals.csv) with the people you want to send
the email to:

firstname,lastname,email,title
Harry,Potter,[email protected],"Defeating your nemesis in 3 easy
Jack,Reacher,[email protected],"Hand-to-hand combat for beginners"
Mickey,Mouse,[email protected],"Surviving public speaking with a sq
Santa,Claus,[email protected],"Efficient list-making"

You want to read the CSV file, replace the relevant fields in the first file (skipping the
first line), then write the result to a file called acceptanceN.txt, incrementing N for

https://opensource.com/article/19/10/advanced-awk 4/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

each line you parse.

Write the awk program in a file called mail_merge.awk. Statements are separated by
; in awk scripts. The first task is to set the field separator variable and a couple of
other variables the script needs. You also need to read and discard the first line in the
CSV, or a file will be created starting with Dear firstname. To do this, use the special
function getline and reset the record counter to 0 after reading it.

BEGIN {
FS=",";
template="email_template.txt";
output="acceptance";
getline;
NR=0;
}

The main function is very straightforward: for each line processed, a variable is set for
the various fields—firstname, lastname, email, and title. The template file is read
line by line, and the function sub is used to substitute any occurrence of the special
character sequences with the value of the relevant variable. Then the line, with any
substitutions made, is output to the output file.

Since you are dealing with the template file and a different output file for each line,
you need to clean up and close the file handles for these files before processing the
next record.

{
# Read relevant fields from input file
firstname=$1;
lastname=$2;
email=$3;
title=$4;

# Set output filename


outfile=(output NR ".txt");
https://opensource.com/article/19/10/advanced-awk 5/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

# Read a line from template, replace special fields, and


# print result to output file
while ( (getline ln < template) > 0 )
{
sub(/{firstname}/,firstname,ln);
sub(/{lastname}/,lastname,ln);
sub(/{email}/,email,ln);
sub(/{title}/,title,ln);
print(ln) > outfile;
}

# Close template and output file in advance of next record


close(outfile);
close(template);
}

You're done! Run the script on the command line with:

awk -f mail_merge.awk proposals.csv

or

awk -f mail_merge.awk < proposals.csv

and you will find text files generated in the current directory.

Advanced awk: Word frequency count


One of the most powerful features in awk is the associative array. In most
programming languages, array entries are typically indexed by a number, but in awk,
arrays are referenced by a key string. You could store an entry from the file
proposals.txt from the previous section. For example, in a single associative array, like
this:

https://opensource.com/article/19/10/advanced-awk 6/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

proposer["firstname"]=$1;
proposer["lastname"]=$2;
proposer["email"]=$3;
proposer["title"]=$4;

This makes text processing very easy. A simple program that uses this concept is the
idea of a word frequency counter. You can parse a file, break out words (ignoring
punctuation) in each line, increment the counter for each word in the line, then
output the top 20 words that occur in the text.

First, in a file called wordcount.awk, set the field separator to a regular expression
that includes whitespace and punctuation:

BEGIN {
# ignore 1 or more consecutive occurrences of the character
# in the character group below
FS="[ .,:;()<>{}@!\"'\t]+";
}

Next, the main loop function will iterate over each field, ignoring any empty fields
(which happens if there is punctuation at the end of a line), and increment the word
count for the words in the line.

{
for (i = 1; i <= NF; i++) {
if ($i != "") {
words[$i]++;
}
}
}

Finally, after the text is processed, use the END function to print the contents of the
array, then use awk's capability of piping output into a shell command to do a
numerical sort and print the 20 most frequently occurring words:
https://opensource.com/article/19/10/advanced-awk 7/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

END {
sort_head = "sort -k2 -nr | head -n 20";
for (word in words) {
printf "%s\t%d\n", word, words[word] | sort_head;
}
close (sort_head);
}

Running this script on an earlier draft of this article produced this output:

[[email protected]]$ awk -f wordcount.awk < awk_arti


the 79
awk 41
a 39
and 33
of 32
in 27
to 26
is 25
line 23
for 23
will 22
file 21
we 16
We 15
with 12
which 12
by 12
this 11
output 11
function 11

What's next?
https://opensource.com/article/19/10/advanced-awk 8/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

More Linux resources

Linux commands cheat sheet

Advanced Linux commands cheat sheet

Free online course: RHEL Technical Overview

Linux networking cheat sheet

SELinux cheat sheet

Linux common commands cheat sheet

What are Linux containers?

Our latest Linux articles

If you want to learn more about awk programming, I strongly recommend the book
Sed and awk by Dale Dougherty and Arnold Robbins.

One of the keys to progressing in awk programming is mastering "extended regular


expressions." Awk offers several powerful additions to the sed regular expression
syntax you may already be familiar with.

Another great resource for learning awk is the GNU awk user guide. It has a full
reference for awk's built-in function library, as well as lots of examples of simple and
complex awk scripts.

What to read next

https://opensource.com/article/19/10/advanced-awk 9/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

Extracting and displaying data with awk


Get our awk cheat sheet.

Jim Hall (Correspondent)

https://opensource.com/article/19/10/advanced-awk 10/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

Fields, records, and variables in awk


In the second article in this intro to awk series, learn about fields, records, and some
powerful awk variables.

Seth Kenlon (Team, Red Hat)

https://opensource.com/article/19/10/advanced-awk 11/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

Getting started with awk, a powerful text-parsing tool


Let's jump in and start using it.

Seth Kenlon (Team, Red Hat)

Robert Young

Dave Morriss

Tags: LINUX

Dave Neary
Dave Neary is a member of the Open Source and Standards
team at Red Hat, helping make Open Source projects
important to Red Hat be successful. Dave has been around
the free and open source software world, wearing many
https://opensource.com/article/19/10/advanced-awk 12/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

different hats, since sending his first patch to the GIMP in


1999.

More about me

Comments are closed.


These comments are closed.

Related Content

What's new in 5 reasons virtual Remove the


GNOME 44? machines still matter background from an
image with this Linux
command

This work is licensed under a Creative Commons Attribution-Share Alike


4.0 International License.

ABOUT THIS SITE

The opinions expressed on this website are those of each author, not of the author's
employer or of Red Hat.

https://opensource.com/article/19/10/advanced-awk 13/14
11/26/23, 7:12 PM Advance your awk skills with two easy tutorials | Opensource.com

Opensource.com aspires to publish all content under a Creative Commons


license but may not be able to do so in all cases. You are responsible for ensuring
that you have the necessary permission to reuse any work on this site. Red Hat and
the Red Hat logo are trademarks of Red Hat, Inc., registered in the United States
and other countries.

A note on advertising: Opensource.com does not sell advertising on the site or in


any of its newsletters.

Copyright ©2023 Red Hat, Inc.

Privacy Policy

Terms of use

https://opensource.com/article/19/10/advanced-awk 14/14

You might also like