Filters
Filters
Filters in UNIX
1. cut command
The cut command in UNIX is a command for cutting out the sections from each line of files and
writing the result to standard output. It can be used to cut parts of a line by byte position,
character and field. Basically the cut command slices a line and extracts the text. It is necessary
to specify option with command otherwise it gives error. If more than one file name is provided
then data from each file is not precedes by its file name.
Syntax:
cut OPTION... [FILE]...
Let us consider two files having name state.txt and capital.txt contains 5 names of the Indian
states and capitals respectively.
$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Without any option specified it displays error.
$ cut state.txt
cut: you must specify a list of bytes, characters, or fields
Try 'cut --help' for more information.
1. -c (column): To cut by character use the -c option. This selects the characters given to the -c
option. This can be a list of numbers separated comma or a range of numbers separated by
hyphen(-). Tabs and backspaces are treated as a character. It is necessary to specify list of
character numbers otherwise it gives error with this option.
Syntax:
$ cut -c 1- state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Above command prints starting from first character to end. Here in command only starting
position is specified and the ending position is omitted.
$ cut -c -5 state.txt
Andhr
Aruna
Assam
Bihar
Chhat
Above command prints starting position to the fifth character. Here the starting position
is omitted and the ending position is specified.
2. -f (field): -c option is useful for fixed-length lines. Most unix files doesn’t have fixed-length
lines. To extract the useful information you need to cut by fields rather than columns. List of the
fields number specified must be separated by comma. Ranges are not described with -f
option. cut uses tab as a default field delimiter but can also work with other delimiter by using -
d option.
Note: Space is not considered as delimiter in UNIX.
Syntax:
$cut -d "delimiter" -f (field number) file.txt
Like in the file state.txt fields are separated by space if -d option is not used then it prints whole
line:
$ cut -f 1 state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
If -d option is used then it considered space as a field separator or delimiter:
$ cut -d " " -f 1 state.txt
Andhra
Arunachal
Assam
Bihar
Chhattisgarh
Command prints field from first to fourth of each line from the file.
Command:
$ cut -d " " -f 1-4 state.txt
Output:
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
3. –output-delimiter: By default the output delimiter is same as input delimiter that we specify
in the cut with -d option. To change the output delimiter use the option –output-
delimiter=”delimiter”.
2. Paste
Paste command
Paste command is one of the useful commands in Unix or Linux operating system. It is used to
join files horizontally (parallel merging) by outputting lines consisting of lines from each file
specified, separated by tab as delimiter, to the standard output. When no file is specified, or put
dash (“-“) instead of file name, paste reads from standard input and gives output as it is until a
interrupt command [Ctrl-c] is given.
Syntax:
$ cat state
Arunachal Pradesh
Assam
Andhra Pradesh
Bihar
Chhattisgrah
$ cat capital
Itanagar
Dispur
Hyderabad
Patna
Raipur
Without any option paste merges the files in parallel. The paste command writes corresponding
lines from the files with tab as a deliminator on the terminal.
$ paste number state capital
1 Arunachal Pradesh Itanagar
2 Assam Dispur
3 Andhra Pradesh Hyderabad
4 Bihar Patna
5 Chhattisgrah Raipur
In the above command three files are merges by paste command.
Options:
1. -d (delimiter): Paste command uses the tab delimiter by default for merging the files. The
delimiter can be changed to any other character by using the -d option. If more than one
character is specified as delimiter then paste uses it in a circular fashion for each file line
separation.
Only one character is specified
$ paste -d "|" number state capital
1|Arunachal Pradesh|Itanagar
2|Assam|Dispur
3|Andhra Pradesh|Hyderabad
4|Bihar|Patna
5|Chhattisgrah|Raipur
First and second file is separated by '|' and second and third is separated by ','.
After that list is exhausted and reused.
2. -s (serial): We can merge the files in sequentially manner using the -s option. It reads all the
lines from a single file and merges all these lines into a single line with each line separated by
tab. And these single lines are separated by newline.
It is the complementary of Tail command. The head command, as the name implies, print the top
N number of data of the given input. By default, it prints the first 10 lines of the specified files. If
more than one file name is provided then data from each file is preceded by its file name.
Syntax:
Let us consider two files having name state.txt and capital.txt contains all the names of the
Indian states and capitals respectively.
$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
Jharkhand
Karnataka
Kerala
Madhya Pradesh
Maharashtra
Manipur
Meghalaya
Mizoram
Nagaland
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal
Without any option, it displays only the first 10 lines of the file specified.
Example:
$ head state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
Options
1. -n num: Prints the first ‘num’ lines instead of first 10 lines. num is mandatory to be specified
in command otherwise it displays an error.
$ head -n 5 state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
2. -c num: Prints the first ‘num’ bytes from the file specified. Newline count as a single
character, so if head prints out a newline, it will count it as a byte. num is mandatory to be
specified in command otherwise displays an error.
$ head -c 6 state.txt
Andhra
3. -q: It is used if more than 1 file is given. Because of this command, data from each file is not
precedes by its file name.
4. -v: By using this option, data from the specified file is always preceded by its file name.
$ head -v state.txt
==> state.txt <==
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
4. TAIL
It is the complementary of head command. The tail command, as the name implies, print the last
N number of data of the given input. By default it prints the last 10 lines of the specified files. If
more than one file name is provided then data from each file is precedes by its file name.
Syntax:
Let us consider two files having name state.txt and capital.txt contains all the names of the
Indian states and capitals respectively.
$ cat state.txt
Andhra Pradesh
Arunachal Pradesh
Assam
Bihar
Chhattisgarh
Goa
Gujarat
Haryana
Himachal Pradesh
Jammu and Kashmir
Jharkhand
Karnataka
Kerala
Madhya Pradesh
Maharashtra
Manipur
Meghalaya
Mizoram
Nagaland
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal
Without any option it display only the last 10 lines of the file specified.
Example:
$ tail state.txt
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal
Options:
1. -n num: Prints the last ‘num’ lines instead of last 10 lines. num is mandatory to be specified
in command otherwise it displays an error. This command can also be written as without
symbolizing ‘n’ character but ‘-‘ sign is mandatory.
$ tail -n 3 state.txt
Uttar Pradesh
Uttarakhand
West Bengal
OR
$ tail -3 state.txt
Uttar Pradesh
Uttarakhand
West Bengal
Tail command also comes with an ‘+’ option which is not present in the head command. With
this option tail command prints the data starting from specified line number of the file instead of
end. For command: tail +n file_name, data will start printing from line number ‘n’ till the end of
the file specified.
2. -c num: Prints the last ‘num’ bytes from the file specified. Newline count as a single
character, so if tail prints out a newline, it will count it as a byte. In this option it is mandatory to
write -c followed by positive or negative num depends upon the requirement. By +num, it
display all the data after skipping num bytes from starting of the specified file and by -num, it
display the last num bytes from the file specified.
Note: Without positive or negative sign before num, command will display the last num bytes
from the file specified.
3. -q: It is used if more than 1 file is given. Because of this command, data from each file is not
precedes by its file name.
4. -f: This option is mainly used by system administration to monitor the growth of the log files
written by many Unix program as they are running. This option shows the last ten lines of a file
and will update when new lines are added. As new lines are written to the log, the console will
update with the new lines. The prompt doesn’t return even after work is over so, we have to use
the interrupt key to abort this command. In general, the applications writes error messages to log
files. You can use the -f option to check for the error messages as and when they appear in the
log file.
$ tail -f logfile
5. -v: By using this option, data from the specified file is always preceded by its file name.
$ tail -v state.txt
==> state.txt <==
Odisha
Punjab
Rajasthan
Sikkim
Tamil Nadu
Telangana
Tripura
Uttar Pradesh
Uttarakhand
West Bengal
6. –version: This option is used to display the version of tail which is currently running on your
system.
5. SORT command
SORT command is used to sort a file, arranging the records in a particular order. By default, the
sort command sorts file assuming the contents are ASCII. Using options in sort command, it can
also be used to sort numerically.
1. Lines starting with a number will appear before lines starting with a letter.
2. Lines starting with a letter that appears earlier in the alphabet will appear before lines
starting with a letter that appears later in the alphabet.
3. Lines starting with a lowercase letter will appear before lines starting with the same letter
in uppercase.
Examples
Command :
$ cat > file.txt
abhishek
chitransh
satish
rajan
naveen
divyam
harsh
Sorting a file : Now use the sort command
Syntax :
$ sort filename.txt
Command:
$ sort file.txt
Output :
abhishek
chitransh
divyam
harsh
naveen
rajan
satish
Note: This command does not actually change the input file, i.e. file.txt.
Sort function with mix file i.e. uppercase and lower case : When we have a mix file with both
uppercase and lowercase letters then first the lower case letters would be sorted following with
the upper case letters .
Example:
Create a file mix.txt
Command :
$ cat > mix.txt
abc
apple
BALL
Abc
bat
Command :
$ sort mix.txt
Output :
abc
Abc
apple
bat
BALL
Options with sort function
1. -o Option : Unix also provides us with special facilities like if you want to write the
output to a new file, output.txt, redirects the output like this or you can also use the
built-in sort option -o, which allows you to specify an output file.
Using the -o option is functionally the same as redirecting the output to a file.
Note: Neither one has an advantage over the other.
Example:The input file is the same as mentioned above.
Syntax :
Output :
abhishek
chitransh
divyam
harsh
naveen
rajan
satish
2. -r Option: Sorting In Reverse Order : You can perform a reverse-order sort using the -r
flag. the -r flag is an option of the sort command which sorts the input file in reverse
order i.e. descending order by default.
Example: The input file is the same as mentioned above.
Syntax :
$ sort -r inputfile.txt
Command :
$ sort -r file.txt
Output :
satish
rajan
naveen
harsh
divyam
chitransh
abhishek
-n Option : To sort a file numerically used –n option. -n option is also predefined in unix as
the above options are. This option is used to sort the file with numeric data present inside.
Example :
Let us consider a file with numbers:
Command :
$ cat > file1.txt
50
39
15
89
200
Syntax :
$ sort -n filename.txt
Command :
$ sort -n file1.txt
Output :
15
39
50
89
200
3. -nr option : To sort a file with numeric data in reverse order we can use the
combination of two options as stated below.
Example :The numeric file is the same as above.
Syntax :
4. -k Option : Unix provides the feature of sorting a table on the basis of any column
number by using -k option.
Use the -k option to sort on a certain column. For example, use “-k 2” to sort on the second
column.
Example :
Let us create a table with 2 columns
Syntax :
$ sort -k filename.txt
Command :
$ sort -k 2n employee.txt
guard 3000
clerk 4000
peon 4500
manager 5000
employee 6000
director 9000
5.-c option : This option is used to check if the file given is already sorted or not & checks
if a file is already sorted pass the -c option to sort. This will write to standard output if there
are lines that are out of order.The sort tool can be used to understand if this file is sorted and
which lines are out of order
Example :
Suppose a file exists with a list of cars called cars.txt.
Audi
Cadillac
BMW
Dodge
Syntax :
$ sort -c filename.txt
Command :
$ sort -c cars.txt
Output :
sort: cars.txt:3: disorder: BMW
Note : If there is no output then the file is considered to be already sorted
6 -u option : To sort and remove duplicates pass the -u option to sort. This will write a
sorted list to standard output and remove duplicates.
This option is helpful as the duplicates being removed gives us an redundant file.
Example : Suppose a file exists with a list of cars called cars.txt.
Audi
BMW
Cadillac
BMW
Dodge
Syntax :
$ sort -u filename.txt
Command :
$ sort -u cars.txt
$ cat cars.txt
Output :
Audi
BMW
Cadillac
Dodge
6. tr command
The tr command in UNIX is a command line utility for translating or deleting characters. It
supports a range of transformations including uppercase to lowercase, squeezing repeating
characters, deleting specific characters and basic find and replace. It can be used with UNIX
pipes to support more complex translation. tr stands for translate.
Syntax :
Options
-c : complements the set of characters in string.i.e., operations apply to characters not in the
given set
-d : delete characters in the first set from the output.
-s : replaces repeated characters listed in the set1 with single occurrence
-t : truncates set1
Sample Commands
$cat greekfile
Output:
WELCOME TO
GeeksforGeeks
$cat greekfile | tr “[a-z]” “[A-Z]”
Output:
WELCOME TO
GEEKSFORGEEKS
or
Output:
WELCOME TO
GEEKSFORGEEKS
Output:
Welcome To GeeksforGeeks
$cat greekfile
Output:
{WELCOME TO}
GeeksforGeeks
$ tr '{}' '()' newfile.txt
Output:
(WELCOME TO)
GeeksforGeeks
The above command will read each character from “geekfile.txt”, translate if it is a brace, and
write the output in “newfile.txt”.
4. How to use squeeze repetition of characters using -s
To squeeze repeat occurrences of characters specified in a set use the -s option. This removes
repeated instances of a character.
OR we can say that,you can convert multiple continuous spaces with a single space
Output:
Welcome To GeeksforGeeks
Output:
elcome To GeeksforGeeks
Output:
my ID is
Output:
73535