0% found this document useful (0 votes)
18 views

To Become An Expert AWK Programmer

Uploaded by

nagarjuna0595
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

To Become An Expert AWK Programmer

Uploaded by

nagarjuna0595
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

fdasfsa

To become an expert AWK programmer, you need to know its internals. AWK follows a
simple workflow − Read, Execute, and Repeat. The following diagram depicts the
workflow of AWK −

Read
AWK reads a line from the input stream (file, pipe, or stdin) and stores it in memory.

Execute
All AWK commands are applied sequentially on the input. By default AWK execute
commands on every line. We can restrict this by providing patterns.

Repeat

[Type here]
fdasfsa

This process repeats until the file reaches its end.

Program Structure
Let us now understand the program structure of AWK.

BEGIN block
The syntax of the BEGIN block is as follows −
Syntax
BEGIN {awk-commands}
The BEGIN block gets executed at program start-up. It executes only once. This is
good place to initialize variables. BEGIN is an AWK keyword and hence it must be in
upper-case. Please note that this block is optional.

Body Block
The syntax of the body block is as follows −
Syntax
/pattern/ {awk-commands}
The body block applies AWK commands on every input line. By default, AWK executes
commands on every line. We can restrict this by providing patterns. Note that there are
no keywords for the Body block.

END Block
The syntax of the END block is as follows −
Syntax
END {awk-commands}
The END block executes at the end of the program. END is an AWK keyword and
hence it must be in upper-case. Please note that this block is optional.
Let us create a file marks.txt which contains the serial number, name of the student,
subject name, and number of marks obtained.
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Let us now display the file contents with header by using AWK script.

[Type here]
fdasfsa

Example
[nag]$ awk 'BEGIN{printf "Sr No\tName\tSub\tMarks\n"} {print}'
marks.txt

When this code is executed, it produces the following result −


Output
Sr No Name Sub Marks
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
At the start, AWK prints the header from the BEGIN block. Then in the body block, it
reads a line from a file and executes AWK's print command which just prints the
contents on the standard output stream. This process repeats until file reaches the
end.

AWK is simple to use. We can provide AWK commands either directly from the
command line or in the form of a text file containing AWK commands.

AWK Command Line


We can specify an AWK command within single quotes at command line as shown −
awk [options] file ...

Example
Consider a text file marks.txt with the following content −
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
Let us display the complete content of the file using AWK as follows −
Example
[nagarjuna]$ awk '{print}' marks.txt

On executing this code, you get the following result −


Output

[Type here]
fdasfsa

1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89

AWK Program File


We can provide AWK commands in a script file as shown −
awk [options] -f file ....
First, create a text file command.awk containing the AWK command as shown below

{print}
Now we can instruct the AWK to read commands from the text file and perform the
action. Here, we achieve the same result as shown in the above example.
Example
[nagarjuna]$ awk -f command.awk marks.txt

On executing this code, you get the following result −


Output
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89

# BEGIN block(s)

BEGIN {
printf "---|Header|--\n"
}

# Rule(s) {
print $0
}

# END block(s)

END {
printf "---|Footer|---\n"
}

[Type here]
fdasfsa

This chapter describes several useful AWK commands and their appropriate examples.
Consider a text file marks.txt to be processed with the following content −
1) Amit Physics 80
2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89

Printing Column or Field


You can instruct AWK to print only certain columns from the input field. The following
example demonstrates this −

Example
[nagarjuna]$ awk '{print $3 "\t" $4}' marks.txt

On executing this code, you get the following result −

Output
Physics 80
Maths 90
Biology 87
English 85
History 89
In the file marks.txt, the third column contains the subject name and the fourth column
contains the marks obtained in a particular subject. Let us print these two columns
using AWK print command. In the above example, $3 and $4 represent the third and
the fourth fields respectively from the input record.

Printing All Lines


By default, AWK prints all the lines that match pattern.

Example
[nagarjuna]$ awk '/a/ {print $0}' marks.txt

On executing this code, you get the following result −

Output
[Type here]
fdasfsa

2) Rahul Maths 90
3) Shyam Biology 87
4) Kedar English 85
5) Hari History 89
In the above example, we are searching form pattern a. When a pattern match
succeeds, it executes a command from the body block. In the absence of a body block
− default action is taken which is print the record. Hence, the following command
produces the same result −

Example
[nagarjuna]$ awk '/a/' marks.txt

Printing Columns by Pattern


When a pattern match succeeds, AWK prints the entire record by default. But you can
instruct AWK to print only certain fields. For instance, the following example prints the
third and fourth field when a pattern match succeeds.

Example
[nagarjuna]$ awk '/a/ {print $3 "\t" $4}' marks.txt

On executing this code, you get the following result −

Output
Maths 90
Biology 87
English 85
History 89

Printing Column in Any Order


You can print columns in any order. For instance, the following example prints the
fourth column followed by the third column.

Example
[nagarjuna]$ awk '/a/ {print $4 "\t" $3}' marks.txt

On executing the above code, you get the following result −

Output

[Type here]
fdasfsa

90 Maths
87 Biology
85 English
89 History

Counting and Printing Matched Pattern


Let us see an example where you can count and print the number of lines for which a
pattern match succeeded.

Example
[nagarjuna]$ awk '/a/{++cnt} END {print "Count = ", cnt}' marks.txt

On executing this code, you get the following result −

Output
Count = 4
In this example, we increment the value of counter when a pattern match succeeds
and we print this value in the END block. Note that unlike other programming
languages, there is no need to declare a variable before using it.

Printing Lines with More than 18 Characters


Let us print only those lines that contain more than 18 characters.

Example
[nagarjuna]$ awk 'length($0) > 18' marks.txt

On executing this code, you get the following result −

Output
3) Shyam Biology 87
4) Kedar English 85
AWK provides a built-in length function that returns the length of the string. $0 variable
stores the entire line and in the absence of a body block, default action is taken, i.e.,
the print action. Hence, if a line has more than 18 characters, then the comparison
results true and the line gets printed.

[Type here]
fdasfsa

AWK is very powerful and efficient in handling regular expressions. A number of


complex tasks can be solved with simple regular expressions. Any command-line
expert knows the power of regular expressions.
This chapter covers standard regular expressions with suitable examples.

Dot
It matches any single character except the end of line character. For instance, the
following example matches fin, fun, fan etc.

Example
[nagarjuna]$ echo -e "cat\nbat\nfun\nfin\nfan" | awk '/f.n/'

On executing the above code, you get the following result −

Output
fun
fin
fan

Start of line
It matches the start of line. For instance, the following example prints all the lines that
start with pattern The.

Example
[nagarjuna]$ echo -e "This\nThat\nThere\nTheir\nthese" | awk
'/^The/'

On executing this code, you get the following result −

Output
There
Their

End of line

[Type here]
fdasfsa

It matches the end of line. For instance, the following example prints the lines that end
with the letter n.

Example
[nagarjuna]$ echo -e "knife\nknow\nfun\nfin\nfan\nnine" | awk
'/n$/'

Output
On executing this code, you get the following result −
fun
fin
fan

Match character set


It is used to match only one out of several characters. For instance, the following
example matches pattern Call and Tall but not Ball.

Example
[nagarjuna]$ echo -e "Call\nTall\nBall" | awk '/[CT]all/'

Output
On executing this code, you get the following result −
Call
Tall

Exclusive set
In exclusive set, the carat negates the set of characters in the square brackets. For
instance, the following example prints only Ball.

Example
[nagarjuna]$ echo -e "Call\nTall\nBall" | awk '/[^CT]all/'

On executing this code, you get the following result −

Output

[Type here]
fdasfsa

Ball

Alteration
A vertical bar allows regular expressions to be logically ORed. For instance, the
following example prints Ball and Call.

Example
[nagarjuna]$ echo -e "Call\nTall\nBall\nSmall\nShall" | awk '/Call|
Ball/'

On executing this code, you get the following result −

Output
Call
Ball

Zero or One Occurrence


It matches zero or one occurrence of the preceding character. For instance, the
following example matches Colour as well as Color. We have made u as an optional
character by using ?.

Example
[nagarjuna]$ echo -e "Colour\nColor" | awk '/Colou?r/'

On executing this code, you get the following result −

Output
Colour
Color

Zero or More Occurrence


It matches zero or more occurrences of the preceding character. For instance, the
following example matches ca, cat, catt, and so on.

Example
[nagarjuna]$ echo -e "ca\ncat\ncatt" | awk '/cat*/'

[Type here]
fdasfsa

On executing this code, you get the following result −

Output
ca
cat
catt

One or More Occurrence


It matches one or more occurrence of the preceding character. For instance below
example matches one or more occurrences of the 2.

Example
[nagarjuna]$ echo -e "111\n22\n123\n234\n456\n222" | awk '/2+/'

On executing the above code, you get the following result −

Output
22
123
234
222

Grouping
Parentheses () are used for grouping and the character | is used for alternatives. For
instance, the following regular expression matches the lines containing either Apple
Juice or Apple Cake.

Example
[nagarjuna]$ echo -e "Apple Juice\nApple Pie\nApple Tart\nApple
Cake" | awk
'/Apple (Juice|Cake)/'

On executing this code, you get the following result −

Output
Apple Juice
Apple Cake

[Type here]
fdasfsa

AWK has associative arrays and one of the best thing about it is – the indexes need
not to be continuous set of number; you can use either string or number as an array
index. Also, there is no need to declare the size of an array in advance – arrays can
expand/shrink at runtime.
Its syntax is as follows −

Syntax
array_name[index] = value
Where array_name is the name of array, index is the array index, and value is any
value assigning to the element of the array.

Creating Array
To gain more insight on array, let us create and access the elements of an array.

Example
[nagarjuna]$ awk 'BEGIN {
fruits["mango"] = "yellow";
fruits["orange"] = "orange"
print fruits["orange"] "\n" fruits["mango"]
}'

On executing this code, you get the following result −

Output
orange
yellow
In the above example, we declare the array as fruits whose index is fruit name and the
value is the color of the fruit. To access array elements, we
use array_name[index] format.

Deleting Array Elements


For insertion, we used assignment operator. Similarly, we can use delete statement to
remove an element from the array. The syntax of delete statement is as follows −

[Type here]
fdasfsa

Syntax
delete array_name[index]
The following example deletes the element orange. Hence the command does not
show any output.

Example
[nagarjuna]$ awk 'BEGIN {
fruits["mango"] = "yellow";
fruits["orange"] = "orange";
delete fruits["orange"];
print fruits["orange"]
}'

Like other programming languages, AWK provides conditional statements to control


the flow of a program. This chapter explains AWK's control statements with suitable
examples.

If statement
It simply tests the condition and performs certain actions depending upon the condition.
Given below is the syntax of if statement −

Syntax
if (condition)
action
We can also use a pair of curly braces as given below to execute multiple actions −

Syntax
if (condition) {
action-1
action-1
.
.
action-n
}
For instance, the following example checks whether a number is even or not −

Example

[Type here]
fdasfsa

[nagarjuna]$ awk 'BEGIN {num = 10; if (num % 2 == 0) printf "%d is


even number.\n", num }'

On executing the above code, you get the following result −

Output
10 is even number.

If Else Statement
In if-else syntax, we can provide a list of actions to be performed when a condition
becomes false.
The syntax of if-else statement is as follows −

Syntax
if (condition)
action-1
else
action-2
In the above syntax, action-1 is performed when the condition evaluates to true and
action-2 is performed when the condition evaluates to false. For instance, the following
example checks whether a number is even or not −

Example
[nagarjuna]$ awk 'BEGIN {
num = 11; if (num % 2 == 0) printf "%d is even number.\n", num;
else printf "%d is odd number.\n", num
}'

On executing this code, you get the following result −

Output
11 is odd number.

If-Else-If Ladder
We can easily create an if-else-if ladder by using multiple if-else statements. The
following example demonstrates this −

Example

[Type here]
fdasfsa

[nagarjuna]$ awk 'BEGIN {


a = 30;

if (a==10)
print "a = 10";
else if (a == 20)
print "a = 20";
else if (a == 30)
print "a = 30";
}'

On executing this code, you get the following result −

Output
a = 30

This chapter explains AWK's loops with suitable example. Loops are used to execute a
set of actions in a repeated manner. The loop execution continues as long as the loop
condition is true.

For Loop
The syntax of for loop is −

Syntax
for (initialization; condition; increment/decrement)
action
Initially, the for statement performs initialization action, then it checks the condition. If
the condition is true, it executes actions, thereafter it performs increment or decrement
operation. The loop execution continues as long as the condition is true. For instance,
the following example prints 1 to 5 using for loop −

Example
[nagarjuna]$ awk 'BEGIN { for (i = 1; i <= 5; ++i) print i }'

On executing this code, you get the following result −

Output
1
2
3

[Type here]
fdasfsa

4
5

While Loop
The while loop keeps executing the action until a particular logical condition evaluates
to true. Here is the syntax of while loop −

Syntax
while (condition)
action
AWK first checks the condition; if the condition is true, it executes the action. This
process repeats as long as the loop condition evaluates to true. For instance, the
following example prints 1 to 5 using while loop −

Example
[nagarjuna]$ awk 'BEGIN {i = 1; while (i < 6) { print i; ++i } }'

On executing this code, you get the following result −

Output
1
2
3
4
5

Do-While Loop
The do-while loop is similar to the while loop, except that the test condition is
evaluated at the end of the loop. Here is the syntax of do-whileloop −

Syntax
do
action
while (condition)
In a do-while loop, the action statement gets executed at least once even when the
condition statement evaluates to false. For instance, the following example prints 1 to 5
numbers using do-while loop −

[Type here]
fdasfsa

Example
[nagarjuna]$ awk 'BEGIN {i = 1; do { print i; ++i } while (i <
6) }'

On executing this code, you get the following result −

Output
1
2
3
4
5

Break Statement
As its name suggests, it is used to end the loop execution. Here is an example which
ends the loop when the sum becomes greater than 50.

Example
[nagarjuna]$ awk 'BEGIN {
sum = 0; for (i = 0; i < 20; ++i) {
sum += i; if (sum > 50) break; else print "Sum =", sum
}
}'

On executing this code, you get the following result −

Output
Sum = 0
Sum = 1
Sum = 3
Sum = 6
Sum = 10
Sum = 15
Sum = 21
Sum = 28
Sum = 36
Sum = 45

Continue Statement

[Type here]
fdasfsa

The continue statement is used inside a loop to skip to the next iteration of the loop. It
is useful when you wish to skip the processing of some data inside the loop. For
instance, the following example uses continue statement to print the even numbers
between 1 to 20.

Example
[nagarjuna]$ awk 'BEGIN {
for (i = 1; i <= 20; ++i) {
if (i % 2 == 0) print i ; else continue
}
}'

On executing this code, you get the following result −

Output
2
4
6
8
10
12
14
16
18
20

Exit Statement
It is used to stop the execution of the script. It accepts an integer as an argument which
is the exit status code for AWK process. If no argument is supplied, exit returns status
zero. Here is an example that stops the execution when the sum becomes greater than
50.

Example
[nagarjuna]$ awk 'BEGIN {
sum = 0; for (i = 0; i < 20; ++i) {
sum += i; if (sum > 50) exit(10); else print "Sum =", sum
}
}'

Output
On executing this code, you get the following result −

[Type here]
fdasfsa

Sum = 0
Sum = 1
Sum = 3
Sum = 6
Sum = 10
Sum = 15
Sum = 21
Sum = 28
Sum = 36
Sum = 45

[Type here]

You might also like