To Become An Expert AWK Programmer
To Become An Expert AWK Programmer
To become an expert AWK programmer, you need to know its internals. AWK follows a
simple workflow − Read, Execute, and Repeat. The following diagram depicts the
workflow of AWK −
Read
AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory.
Execute
All AWK commands are applied sequentially on the input. By default AWK execute
commands on every line. We can restrict this by providing patterns.
Repeat
[Type here]
fdasfsa
Program Structure
Let us now understand the program structure of AWK.
BEGIN block
The syntax of the BEGIN block is as follows −
Syntax
BEGIN {awk-commands}
The BEGIN block gets executed at program start-up. It executes only once. This is
good place to initialize variables. BEGIN is an AWK keyword and hence it must be in
upper-case. Please note that this block is optional.
Body Block
The syntax of the body block is as follows −
Syntax
/pattern/ {awk-commands}
The body block applies AWK commands on every input line. By default, AWK executes
commands on every line. We can restrict this by providing patterns. Note that there are
no keywords for the Body block.
END Block
The syntax of the END block is as follows −
Syntax
END {awk-commands}
The END block executes at the end of the program. END is an AWK keyword and
hence it must be in upper-case. Please note that this block is optional.
Let us create a file marks.txt which contains the serial number, name of the student,
subject name, and number of marks obtained.
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Let us now display the file contents with header by using AWK script.
[Type here]
fdasfsa
Example
[nag]$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"} {print}'
marks.txt
AWK is simple to use. We can provide AWK commands either directly from the
command line or in the form of a text file containing AWK commands.
Example
Consider a text file marks.txt with the following content −
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Let us display the complete content of the file using AWK as follows −
Example
[nagarjuna]$ awk '{print}' marks.txt
[Type here]
fdasfsa
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
# BEGIN block(s)
BEGIN {
printf "---|Header|--\n"
}
# Rule(s) {
print $0
}
# END block(s)
END {
printf "---|Footer|---\n"
}
[Type here]
fdasfsa
This chapter describes several useful AWK commands and their appropriate examples.
Consider a text file marks.txt to be processed with the following content −
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Example
[nagarjuna]$ awk '{print $3 "\t" $4}' marks.txt
Output
Physics 80
Maths 90
Biology 87
English 85
History 89
In the file marks.txt, the third column contains the subject name and the fourth column
contains the marks obtained in a particular subject. Let us print these two columns
using AWK print command. In the above example, $3 and $4 represent the third and
the fourth fields respectively from the input record.
Example
[nagarjuna]$ awk '/a/ {print $0}' marks.txt
Output
[Type here]
fdasfsa
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
In the above example, we are searching form pattern a. When a pattern match
succeeds, it executes a command from the body block. In the absence of a body block
− default action is taken which is print the record. Hence, the following command
produces the same result −
Example
[nagarjuna]$ awk '/a/' marks.txt
Example
[nagarjuna]$ awk '/a/ {print $3 "\t" $4}' marks.txt
Output
Maths 90
Biology 87
English 85
History 89
Example
[nagarjuna]$ awk '/a/ {print $4 "\t" $3}' marks.txt
Output
[Type here]
fdasfsa
90 Maths
87 Biology
85 English
89 History
Example
[nagarjuna]$ awk '/a/{++cnt} END {print "Count = ", cnt}' marks.txt
Output
Count = 4
In this example, we increment the value of counter when a pattern match succeeds
and we print this value in the END block. Note that unlike other programming
languages, there is no need to declare a variable before using it.
Example
[nagarjuna]$ awk 'length($0) > 18' marks.txt
Output
3) Shyam Biology 87
4) Kedar English 85
AWK provides a built-in length function that returns the length of the string. $0 variable
stores the entire line and in the absence of a body block, default action is taken, i.e.,
the print action. Hence, if a line has more than 18 characters, then the comparison
results true and the line gets printed.
[Type here]
fdasfsa
Dot
It matches any single character except the end of line character. For instance, the
following example matches fin, fun, fan etc.
Example
[nagarjuna]$ echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/'
Output
fun
fin
fan
Start of line
It matches the start of line. For instance, the following example prints all the lines that
start with pattern The.
Example
[nagarjuna]$ echo -e "This\nThat\nThere\nTheir\nthese" | awk
'/^The/'
Output
There
Their
End of line
[Type here]
fdasfsa
It matches the end of line. For instance, the following example prints the lines that end
with the letter n.
Example
[nagarjuna]$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk
'/n$/'
Output
On executing this code, you get the following result −
fun
fin
fan
Example
[nagarjuna]$ echo -e "Call\nTall\nBall" | awk '/[CT]all/'
Output
On executing this code, you get the following result −
Call
Tall
Exclusive set
In exclusive set, the carat negates the set of characters in the square brackets. For
instance, the following example prints only Ball.
Example
[nagarjuna]$ echo -e "Call\nTall\nBall" | awk '/[^CT]all/'
Output
[Type here]
fdasfsa
Ball
Alteration
A vertical bar allows regular expressions to be logically ORed. For instance, the
following example prints Ball and Call.
Example
[nagarjuna]$ echo -e "Call\nTall\nBall\nSmall\nShall" | awk '/Call|
Ball/'
Output
Call
Ball
Example
[nagarjuna]$ echo -e "Colour\nColor" | awk '/Colou?r/'
Output
Colour
Color
Example
[nagarjuna]$ echo -e "ca\ncat\ncatt" | awk '/cat*/'
[Type here]
fdasfsa
Output
ca
cat
catt
Example
[nagarjuna]$ echo -e "111\n22\n123\n234\n456\n222" | awk '/2+/'
Output
22
123
234
222
Grouping
Parentheses () are used for grouping and the character | is used for alternatives. For
instance, the following regular expression matches the lines containing either Apple
Juice or Apple Cake.
Example
[nagarjuna]$ echo -e "Apple Juice\nApple Pie\nApple Tart\nApple
Cake" | awk
'/Apple (Juice|Cake)/'
Output
Apple Juice
Apple Cake
[Type here]
fdasfsa
AWK has associative arrays and one of the best thing about it is – the indexes need
not to be continuous set of number; you can use either string or number as an array
index. Also, there is no need to declare the size of an array in advance – arrays can
expand/shrink at runtime.
Its syntax is as follows −
Syntax
array_name[index] = value
Where array_name is the name of array, index is the array index, and value is any
value assigning to the element of the array.
Creating Array
To gain more insight on array, let us create and access the elements of an array.
Example
[nagarjuna]$ awk 'BEGIN {
fruits["mango"] = "yellow";
fruits["orange"] = "orange"
print fruits["orange"] "\n" fruits["mango"]
}'
Output
orange
yellow
In the above example, we declare the array as fruits whose index is fruit name and the
value is the color of the fruit. To access array elements, we
use array_name[index] format.
[Type here]
fdasfsa
Syntax
delete array_name[index]
The following example deletes the element orange. Hence the command does not
show any output.
Example
[nagarjuna]$ awk 'BEGIN {
fruits["mango"] = "yellow";
fruits["orange"] = "orange";
delete fruits["orange"];
print fruits["orange"]
}'
If statement
It simply tests the condition and performs certain actions depending upon the condition.
Given below is the syntax of if statement −
Syntax
if (condition)
action
We can also use a pair of curly braces as given below to execute multiple actions −
Syntax
if (condition) {
action-1
action-1
.
.
action-n
}
For instance, the following example checks whether a number is even or not −
Example
[Type here]
fdasfsa
Output
10 is even number.
If Else Statement
In if-else syntax, we can provide a list of actions to be performed when a condition
becomes false.
The syntax of if-else statement is as follows −
Syntax
if (condition)
action-1
else
action-2
In the above syntax, action-1 is performed when the condition evaluates to true and
action-2 is performed when the condition evaluates to false. For instance, the following
example checks whether a number is even or not −
Example
[nagarjuna]$ awk 'BEGIN {
num = 11; if (num % 2 == 0) printf "%d is even number.\n", num;
else printf "%d is odd number.\n", num
}'
Output
11 is odd number.
If-Else-If Ladder
We can easily create an if-else-if ladder by using multiple if-else statements. The
following example demonstrates this −
Example
[Type here]
fdasfsa
if (a==10)
print "a = 10";
else if (a == 20)
print "a = 20";
else if (a == 30)
print "a = 30";
}'
Output
a = 30
This chapter explains AWK's loops with suitable example. Loops are used to execute a
set of actions in a repeated manner. The loop execution continues as long as the loop
condition is true.
For Loop
The syntax of for loop is −
Syntax
for (initialization; condition; increment/decrement)
action
Initially, the for statement performs initialization action, then it checks the condition. If
the condition is true, it executes actions, thereafter it performs increment or decrement
operation. The loop execution continues as long as the condition is true. For instance,
the following example prints 1 to 5 using for loop −
Example
[nagarjuna]$ awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'
Output
1
2
3
[Type here]
fdasfsa
4
5
While Loop
The while loop keeps executing the action until a particular logical condition evaluates
to true. Here is the syntax of while loop −
Syntax
while (condition)
action
AWK first checks the condition; if the condition is true, it executes the action. This
process repeats as long as the loop condition evaluates to true. For instance, the
following example prints 1 to 5 using while loop −
Example
[nagarjuna]$ awk 'BEGIN {i = 1; while (i < 6) { print i; ++i } }'
Output
1
2
3
4
5
Do-While Loop
The do-while loop is similar to the while loop, except that the test condition is
evaluated at the end of the loop. Here is the syntax of do-whileloop −
Syntax
do
action
while (condition)
In a do-while loop, the action statement gets executed at least once even when the
condition statement evaluates to false. For instance, the following example prints 1 to 5
numbers using do-while loop −
[Type here]
fdasfsa
Example
[nagarjuna]$ awk 'BEGIN {i = 1; do { print i; ++i } while (i <
6) }'
Output
1
2
3
4
5
Break Statement
As its name suggests, it is used to end the loop execution. Here is an example which
ends the loop when the sum becomes greater than 50.
Example
[nagarjuna]$ awk 'BEGIN {
sum = 0; for (i = 0; i < 20; ++i) {
sum += i; if (sum > 50) break; else print "Sum =", sum
}
}'
Output
Sum = 0
Sum = 1
Sum = 3
Sum = 6
Sum = 10
Sum = 15
Sum = 21
Sum = 28
Sum = 36
Sum = 45
Continue Statement
[Type here]
fdasfsa
The continue statement is used inside a loop to skip to the next iteration of the loop. It
is useful when you wish to skip the processing of some data inside the loop. For
instance, the following example uses continue statement to print the even numbers
between 1 to 20.
Example
[nagarjuna]$ awk 'BEGIN {
for (i = 1; i <= 20; ++i) {
if (i % 2 == 0) print i ; else continue
}
}'
Output
2
4
6
8
10
12
14
16
18
20
Exit Statement
It is used to stop the execution of the script. It accepts an integer as an argument which
is the exit status code for AWK process. If no argument is supplied, exit returns status
zero. Here is an example that stops the execution when the sum becomes greater than
50.
Example
[nagarjuna]$ awk 'BEGIN {
sum = 0; for (i = 0; i < 20; ++i) {
sum += i; if (sum > 50) exit(10); else print "Sum =", sum
}
}'
Output
On executing this code, you get the following result −
[Type here]
fdasfsa
Sum = 0
Sum = 1
Sum = 3
Sum = 6
Sum = 10
Sum = 15
Sum = 21
Sum = 28
Sum = 36
Sum = 45
[Type here]