0% found this document useful (0 votes)
37 views

Filters

The document discusses various filtering commands in UNIX such as cut, paste, head, and tail. Cut can extract sections of each line using byte positions, characters, or fields. Paste joins files horizontally by outputting corresponding lines together with a tab delimiter. Head prints the first few lines of a file, while tail prints the last few lines. Each command has options to customize the output, such as specifying the number of lines/bytes or changing delimiters.

Uploaded by

Allu Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Filters

The document discusses various filtering commands in UNIX such as cut, paste, head, and tail. Cut can extract sections of each line using byte positions, characters, or fields. Paste joins files horizontally by outputting corresponding lines together with a tab delimiter. Head prints the first few lines of a file, while tail prints the last few lines. Each command has options to customize the output, such as specifying the number of lines/bytes or changing delimiters.

Uploaded by

Allu Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT V

Filters in UNIX

1. cut command
The cut command in UNIX is a command for cutting out the sections from each line of files and
writing the result to standard output. It can be used to cut parts of a line by byte position,
character and field. Basically the cut command slices a line and extracts the text. It is necessary
to specify option with command otherwise it gives error. If more than one file name is provided
then data from each file is not precedes by its file name.
Syntax:
cut OPTION... [FILE]...
Let us consider two files having name state.txt and capital.txt contains 5 names of the Indian
states and capitals respectively.

$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Without any option specified it displays error.
$ cut state.txt
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help' for more information.

Options and their Description with examples:

1. -c (column): To cut by character use the -c option. This selects the characters given to the -c
option. This can be a list of numbers separated comma or a range of numbers separated by
hyphen(-). Tabs and backspaces are treated as a character. It is necessary to specify list of
character numbers otherwise it gives error with this option.

Syntax:

$cut -c [(k)-(n)/(k),(n)/(n)] filename


Here,k denotes the starting position of the character and n denotes the ending position of the
character in each line, if k and n are separated by “-” otherwise they are only the position of
character in each line from the file taken as an input.

$ cut -c 2,5,7 state.txt


nr
rah
sm
ir
hti
Above cut command prints second, fifth and seventh character from each line of the file.

$ cut -c 1-7 state.txt


Andhra
Arunach
Assam
Bihar
Chhatti
Above cut command prints first seven characters of each line from the file.
Cut uses a special form for selecting characters from beginning upto the end of the line:

$ cut -c 1- state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh

Above command prints starting from first character to end. Here in command only starting
position is specified and the ending position is omitted.

$ cut -c -5 state.txt
Andhr
Aruna
Assam
Bihar
Chhat

Above command prints starting position to the fifth character. Here the starting position
is omitted and the ending position is specified.
2. -f (field): -c option is useful for fixed-length lines. Most unix files doesn’t have fixed-length
lines. To extract the useful information you need to cut by fields rather than columns. List of the
fields number specified must be separated by comma. Ranges are not described with -f
option. cut uses tab as a default field delimiter but can also work with other delimiter by using -
d option.
Note: Space is not considered as delimiter in UNIX.

Syntax:
$cut -d "delimiter" -f (field number) file.txt
Like in the file state.txt fields are separated by space if -d option is not used then it prints whole
line:
$ cut -f 1 state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
If -d option is used then it considered space as a field separator or delimiter:
$ cut -d " " -f 1 state.txt
Andhra
Arunachal
Assam
Bihar
Chhattisgarh
Command prints field from first to fourth of each line from the file.
Command:
$ cut -d " " -f 1-4 state.txt
Output:
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh

3. –output-delimiter: By default the output delimiter is same as input delimiter that we specify
in the cut with -d option. To change the output delimiter use the option –output-
delimiter=”delimiter”.

$ cut -d " " -f 1,2 state.txt --output-delimiter='%'


Andhra%Pradesh
Arunachal%Pradesh
Assam
Bihar
Chhattisgarh
Here cut command changes delimiter(%) in the standard output between the fields which is
specified by using -f option .

2. Paste

Paste command
Paste command is one of the useful commands in Unix or Linux operating system. It is used to
join files horizontally (parallel merging) by outputting lines consisting of lines from each file
specified, separated by tab as delimiter, to the standard output. When no file is specified, or put
dash (“-“) instead of file name, paste reads from standard input and gives output as it is until a
interrupt command [Ctrl-c] is given.

Syntax:

paste [OPTION]... [FILES]...


Let us consider three files having name state, capital and number. state and capital file contains
5 names of the Indian states and capitals respectively. number file contains 5 numbers.

$ cat state
Arunachal Pradesh
Assam
Andhra Pradesh
Bihar
Chhattisgrah

$ cat capital
Itanagar
Dispur
Hyderabad
Patna
Raipur
Without any option paste merges the files in parallel. The paste command writes corresponding
lines from the files with tab as a deliminator on the terminal.
$ paste number state capital
1 Arunachal Pradesh Itanagar
2 Assam Dispur
3 Andhra Pradesh Hyderabad
4 Bihar Patna
5 Chhattisgrah Raipur
In the above command three files are merges by paste command.
Options:

1. -d (delimiter): Paste command uses the tab delimiter by default for merging the files. The
delimiter can be changed to any other character by using the -d option. If more than one
character is specified as delimiter then paste uses it in a circular fashion for each file line
separation.
Only one character is specified
$ paste -d "|" number state capital
1|Arunachal Pradesh|Itanagar
2|Assam|Dispur
3|Andhra Pradesh|Hyderabad
4|Bihar|Patna
5|Chhattisgrah|Raipur

More than one character is specified


$ paste -d "|," number state capital
1|Arunachal Pradesh,Itanagar
2|Assam,Dispur
3|Andhra Pradesh,Hyderabad
4|Bihar,Patna
5|Chhattisgrah,Raipur

First and second file is separated by '|' and second and third is separated by ','.
After that list is exhausted and reused.

2. -s (serial): We can merge the files in sequentially manner using the -s option. It reads all the
lines from a single file and merges all these lines into a single line with each line separated by
tab. And these single lines are separated by newline.

$ paste -s number state capital


1 2 3 4 5
Arunachal Pradesh Assam Andhra Pradesh Bihar Chhattisgrah
Itanagar Dispur Hyderabad Patna Raipur
In the above command, first it reads data from number file and merge them into single line with
each line separated by tab. After that newline character is introduced and reading from next file
i.e. state starts and process repeats again till all files are read.
Combination of -d and -s: The following example shows how to specify a delimiter for
sequential merging of files:
$ paste -s -d ":" number state capital
1:2:3:4:5
Arunachal Pradesh:Assam:Andhra Pradesh:Bihar:Chhattisgrah
Itanagar:Dispur:Hyderabad:Patna:Raipur
3. HEAD Command

It is the complementary of Tail command. The head command, as the name implies, print the top
N number of data of the given input. By default, it prints the first 10 lines of the specified files. If
more than one file name is provided then data from each file is preceded by its file name.

Syntax:

head [OPTION]... [FILE]...

Let us consider two files having name state.txt and capital.txt contains all the names of the
Indian states and capitals respectively.

$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
Jharkhand
Karnataka
Kerala
Madhya Pradesh
Maharashtra
Manipur
Meghalaya
Mizoram
Nagaland
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal

Without any option, it displays only the first 10 lines of the file specified.
Example:
$ head state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir

Options

1. -n num: Prints the first ‘num’ lines instead of first 10 lines. num is mandatory to be specified
in command otherwise it displays an error.

$ head -n 5 state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh

2. -c num: Prints the first ‘num’ bytes from the file specified. Newline count as a single
character, so if head prints out a newline, it will count it as a byte. num is mandatory to be
specified in command otherwise displays an error.

$ head -c 6 state.txt
Andhra

3. -q: It is used if more than 1 file is given. Because of this command, data from each file is not
precedes by its file name.

Without using -q option


==> state.txt capital.txt <==
Hyderabad
Itanagar
Dispur
Patna
Raipur
Panaji
Gandhinagar
Chandigarh
Shimla
Srinagar

With using -q option


$ head -q state.txt capital.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
Hyderabad
Itanagar
Dispur
Patna
Raipur
Panaji
Gandhinagar
Chandigarh
Shimla
Srinagar

4. -v: By using this option, data from the specified file is always preceded by its file name.

$ head -v state.txt
==> state.txt <==
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir

4. TAIL
It is the complementary of head command. The tail command, as the name implies, print the last
N number of data of the given input. By default it prints the last 10 lines of the specified files. If
more than one file name is provided then data from each file is precedes by its file name.

Syntax:

tail [OPTION]... [FILE]...

Let us consider two files having name state.txt and capital.txt contains all the names of the
Indian states and capitals respectively.

$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
Jharkhand
Karnataka
Kerala
Madhya Pradesh
Maharashtra
Manipur
Meghalaya
Mizoram
Nagaland
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal

Without any option it display only the last 10 lines of the file specified.
Example:

$ tail state.txt
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal

Options:

1. -n num: Prints the last ‘num’ lines instead of last 10 lines. num is mandatory to be specified
in command otherwise it displays an error. This command can also be written as without
symbolizing ‘n’ character but ‘-‘ sign is mandatory.

$ tail -n 3 state.txt
Uttar Pradesh
Uttarakhand
West Bengal
OR
$ tail -3 state.txt
Uttar Pradesh
Uttarakhand
West Bengal

Tail command also comes with an ‘+’ option which is not present in the head command. With
this option tail command prints the data starting from specified line number of the file instead of
end. For command: tail +n file_name, data will start printing from line number ‘n’ till the end of
the file specified.

$ tail +25 state.txt


Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal

2. -c num: Prints the last ‘num’ bytes from the file specified. Newline count as a single
character, so if tail prints out a newline, it will count it as a byte. In this option it is mandatory to
write -c followed by positive or negative num depends upon the requirement. By +num, it
display all the data after skipping num bytes from starting of the specified file and by -num, it
display the last num bytes from the file specified.
Note: Without positive or negative sign before num, command will display the last num bytes
from the file specified.

With negative num


$ tail -c -6 state.txt
Bengal
OR
$ tail -c 6 state.txt
Bengal

With positive num


$ tail -c +263 state.txt
Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand

3. -q: It is used if more than 1 file is given. Because of this command, data from each file is not
precedes by its file name.

Without using -q option


$ tail state.txt capital.txt
state.txt
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal
capital.txt
Dispur
Patna
Raipur
Panaji
Gandhinagar
Chandigarh
Shimla
Srinagar
Ranchi
With using -q option
$ tail -q state.txt capital.txt
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West BengalDispur
Patna
Raipur
Panaji
Gandhinagar
Chandigarh
Shimla
Srinagar
Ranchi
Bengaluru

4. -f: This option is mainly used by system administration to monitor the growth of the log files
written by many Unix program as they are running. This option shows the last ten lines of a file
and will update when new lines are added. As new lines are written to the log, the console will
update with the new lines. The prompt doesn’t return even after work is over so, we have to use
the interrupt key to abort this command. In general, the applications writes error messages to log
files. You can use the -f option to check for the error messages as and when they appear in the
log file.

$ tail -f logfile

5. -v: By using this option, data from the specified file is always preceded by its file name.

$ tail -v state.txt
==> state.txt <==
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal

6. –version: This option is used to display the version of tail which is currently running on your
system.
5. SORT command
SORT command is used to sort a file, arranging the records in a particular order. By default, the
sort command sorts file assuming the contents are ASCII. Using options in sort command, it can
also be used to sort numerically.

 SORT command sorts the contents of a text file, line by line.


 sort is a standard command line program that prints the lines of its input or concatenation
of all files listed in its argument list in sorted order.
 The sort command is a command line utility for sorting lines of text files. It supports
sorting alphabetically, in reverse order, by number, by month and can also remove
duplicates.
 The sort command can also sort by items not at the beginning of the line, ignore case
sensitivity and return whether a file is sorted or not. Sorting is done based on one or more
sort keys extracted from each line of input.
 By default, the entire input is taken as sort key. Blank space is the default field separator.

The sort command follows these features as stated below:

1. Lines starting with a number will appear before lines starting with a letter.
2. Lines starting with a letter that appears earlier in the alphabet will appear before lines
starting with a letter that appears later in the alphabet.
3. Lines starting with a lowercase letter will appear before lines starting with the same letter
in uppercase.

Examples

Suppose you create a data file with name file.txt

Command :
$ cat > file.txt
abhishek
chitransh
satish
rajan
naveen
divyam
harsh
Sorting a file : Now use the sort command
Syntax :

$ sort filename.txt
Command:
$ sort file.txt

Output :
abhishek
chitransh
divyam
harsh
naveen
rajan
satish

Note: This command does not actually change the input file, i.e. file.txt.

Sort function with mix file i.e. uppercase and lower case : When we have a mix file with both
uppercase and lowercase letters then first the lower case letters would be sorted following with
the upper case letters .
Example:
Create a file mix.txt

Command :
$ cat > mix.txt
abc
apple
BALL
Abc
bat

Now use the sort command

Command :
$ sort mix.txt
Output :
abc
Abc
apple
bat
BALL
Options with sort function

1. -o Option : Unix also provides us with special facilities like if you want to write the
output to a new file, output.txt, redirects the output like this or you can also use the
built-in sort option -o, which allows you to specify an output file.
Using the -o option is functionally the same as redirecting the output to a file.
Note: Neither one has an advantage over the other.
Example:The input file is the same as mentioned above.

Syntax :

$ sort inputfile.txt > filename.txt


$ sort -o filename.txt inputfile.txt
Command:
$ sort file.txt > output.txt
$ sort -o output.txt file.txt
$ cat output.txt

Output :
abhishek
chitransh
divyam
harsh
naveen
rajan
satish

2. -r Option: Sorting In Reverse Order : You can perform a reverse-order sort using the -r
flag. the -r flag is an option of the sort command which sorts the input file in reverse
order i.e. descending order by default.
Example: The input file is the same as mentioned above.
Syntax :

$ sort -r inputfile.txt
Command :
$ sort -r file.txt
Output :
satish
rajan
naveen
harsh
divyam
chitransh
abhishek
-n Option : To sort a file numerically used –n option. -n option is also predefined in unix as
the above options are. This option is used to sort the file with numeric data present inside.
Example :
Let us consider a file with numbers:

Command :
$ cat > file1.txt
50
39
15
89
200

Syntax :

$ sort -n filename.txt
Command :
$ sort -n file1.txt
Output :
15
39
50
89
200

3. -nr option : To sort a file with numeric data in reverse order we can use the
combination of two options as stated below.
Example :The numeric file is the same as above.
Syntax :

$ sort -nr filename.txt


Command :
$ sort -nr file1.txt
Output :
200
89
50
39
15

4. -k Option : Unix provides the feature of sorting a table on the basis of any column
number by using -k option.
Use the -k option to sort on a certain column. For example, use “-k 2” to sort on the second
column.
Example :
Let us create a table with 2 columns

$ cat > employee.txt


manager 5000
clerk 4000
employee 6000
peon 4500
director 9000
guard 3000

Syntax :

$ sort -k filename.txt
Command :
$ sort -k 2n employee.txt
guard 3000
clerk 4000
peon 4500
manager 5000
employee 6000
director 9000

5.-c option : This option is used to check if the file given is already sorted or not & checks
if a file is already sorted pass the -c option to sort. This will write to standard output if there
are lines that are out of order.The sort tool can be used to understand if this file is sorted and
which lines are out of order
Example :
Suppose a file exists with a list of cars called cars.txt.

Audi
Cadillac
BMW
Dodge

Syntax :

$ sort -c filename.txt
Command :
$ sort -c cars.txt
Output :
sort: cars.txt:3: disorder: BMW
Note : If there is no output then the file is considered to be already sorted

6 -u option : To sort and remove duplicates pass the -u option to sort. This will write a
sorted list to standard output and remove duplicates.
This option is helpful as the duplicates being removed gives us an redundant file.
Example : Suppose a file exists with a list of cars called cars.txt.

Audi
BMW
Cadillac
BMW
Dodge
Syntax :

$ sort -u filename.txt
Command :
$ sort -u cars.txt
$ cat cars.txt
Output :
Audi
BMW
Cadillac
Dodge

6. tr command
The tr command in UNIX is a command line utility for translating or deleting characters. It
supports a range of transformations including uppercase to lowercase, squeezing repeating
characters, deleting specific characters and basic find and replace. It can be used with UNIX
pipes to support more complex translation. tr stands for translate.

Syntax :

$ tr [OPTION] SET1 [SET2]

Options

-c : complements the set of characters in string.i.e., operations apply to characters not in the
given set
-d : delete characters in the first set from the output.
-s : replaces repeated characters listed in the set1 with single occurrence
-t : truncates set1

Sample Commands

1. How to convert lower case to upper case


To convert from lower case to upper case the predefined sets in tr can be used.

$cat greekfile

Output:

WELCOME TO
GeeksforGeeks
$cat greekfile | tr “[a-z]” “[A-Z]”

Output:

WELCOME TO
GEEKSFORGEEKS

or

$cat geekfile | tr “[:lower:]” “[:upper:]”

Output:

WELCOME TO
GEEKSFORGEEKS

2. How to translate white-space to tabs


The following command will translate all the white-space to tabs

$ echo "Welcome To GeeksforGeeks" | tr [:space:] '\t'

Output:

Welcome To GeeksforGeeks

3. How to translate braces into parenthesis


You can also translate from and to a file. In this example we will translate braces in a file with
parenthesis.

$cat greekfile

Output:

{WELCOME TO}
GeeksforGeeks
$ tr '{}' '()' newfile.txt

Output:

(WELCOME TO)
GeeksforGeeks

The above command will read each character from “geekfile.txt”, translate if it is a brace, and
write the output in “newfile.txt”.
4. How to use squeeze repetition of characters using -s
To squeeze repeat occurrences of characters specified in a set use the -s option. This removes
repeated instances of a character.
OR we can say that,you can convert multiple continuous spaces with a single space

$ echo "Welcome To GeeksforGeeks" | tr -s [:space:] ' '

Output:

Welcome To GeeksforGeeks

5. How to delete specified characters using -d option


To delete specific characters use the -d option.This option deletes characters in the first set
specified.

$ echo "Welcome To GeeksforGeeks" | tr -d 'w'

Output:

elcome To GeeksforGeeks

6. To remove all the digits from the string, use

$ echo "my ID is 73535" | tr -d [:digit:]

Output:

my ID is

7. How to complement the sets using -c option


You can complement the SET1 using -c option. For example, to remove all characters except
digits, you can use the following.

$ echo "my ID is 73535" | tr -cd [:digit:]

Output:

73535

You might also like