0% found this document useful (0 votes)
25 views38 pages

L5 - Reg Exp

The document discusses various UNIX commands used for text processing and manipulation including regular expressions, cut, paste, sed, tr, grep, and sort. It provides details on the format and examples of how each command can be used.

Uploaded by

gauri Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views38 pages

L5 - Reg Exp

The document discusses various UNIX commands used for text processing and manipulation including regular expressions, cut, paste, sed, tr, grep, and sort. It provides details on the format and examples of how each command can be used.

Uploaded by

gauri Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 38

Regular expressions

 Used by several different UNIX commands,


including ed, sed, awk, grep
 A period ‘.’ matches any single characters
 .X. matches any X that is surrounded by any two
characters
 Caret character ^ matches the beginning of the
line
 ^Bridgeport matches the characters Bridgeport
only if they occur at the beginning of the line
Regular expressions (continue.)

 A dollar sign ‘$’ is used to match the end of the


line
 Bridgeport$ will match the characters Bridgeport
only they are the very last characters on the line
 $ matches any single character at the end of the
line
 To match any single character, this character
should be preceded by a backslash ‘\’ to remove
the special meaning
 \.$ matches any line end with a period
Regular expressions (continue.)
 ^$ matches any line that contains no characters
 […] is used to match any character enclosed in […]
 [tT] matches a lower or upper case t followed
immediately by the characters
 [A-Z] matches upper case letter
 [A-Za-z] matches upper or lower case letter
 [^A-Z] matches any character except upper case
letter
 [^A-Za-z] matches any non alphabetic character
Regular expressions (continue.)

 (*) Asterisk matches zero or more characters


 X* matches zero, one, two, three, … capital X’s
 XX* matches one or more capital X’s
 .* matches zero or more occurrences of any characters
 e.*e matches all the characters from the first e in the
line to the last one
 [A-Za-z] [A-Za-z] * matches any alphabetic character
followed by zero or more alphabetic character
Regular expressions (continue.)

 [-0-9] matches a single dash or digit character


(ORDER IS IMPORTANT)
 [0-9-] same as [-0-9]
 [^-0-9] matches any alphabetic except digits and dash
 []a-z] matches a right bracket or lower case letter
(ORDER IS IMPORTANT)
Regular expressions (continue.)
 \{min, max\} matches a precise number of characters
 min specifies the minimum number of occurrences
of the preceding regular expression to be matched,
and max specifies the maximum
 w\{1,10\} matches from 1 to 10 consecutive w’s
 [a-zA-Z]\{7\} matches exactly seven alphabetic
characters
Regular expressions (continue.)
 X\{5,\} matches at least five consecutive X’s
 \(….) is used to save matched characters
 ^\(.\) matches the first character on the line and
store it into register one
 There is 1-9 registers
 To retrieve what is stored in any register \n is used
 Example: ^\(.\)\1 matches the first two characters
on a line if they are both the same characters
Regular expressions (continue.)

 ^\(.\).*\1$ matches all lines in which the first


character on the line is the same as the last.
Note (.*) matches all the characters in-between

 ^\(…)\(…\) the first three characters on the line


will be stored into register 1 and the next three
characters into register 2
cut
Used in extracting various fields of data from a data file or the
output of a command

$ who
bgeorge pts/16 Oct 5 15:01 (216.87.102.204)
abakshi pts/13 Oct 6 19:48 (216.87.102.220)
tphilip pts/11 Oct 2 14:10 (AC8C6085.ipt.aol.com)
$ who | cut -c1-8,18-
bgeorge Oct 5 15:01 (216.87.102.204)
abakshi Oct 6 19:48 (216.87.102.220)
tphilip Oct 2 14:10 (AC8C6085.ipt.aol.com)
$
Format: cut -cchars file
 chars specifies what characters to extract from each line of file.
cut (continue.)
 Example: -c5, -c1,3,4 -c-10-15 -c5-
 The –d and –f options are used with cut when
you have data that is delimited by a particular
character
 Format: cut –ddchars –ffields file
 dchar: delimiters of the fields (default: tab
character)
 fields: fields to be extracted from file
cut (continue.)
$ cat phonebook
Edward 336-145
Alice 334-121
Sony 332-336
Robert 326-056

$ cut -f1 phonebook


Edward
Alice
Sony
Robert

$
cut (continue.)
$ cat /etc/passwd
root:x:0:1:Super-User:/:/sbin/sh
daemon:x:1:1::/:
bin:x:2:2::/usr/bin:
sys:x:3:3::/:
adm:x:4:4:Admin:/var/adm:
lp:x:71:8:Line Printer Admin:/usr/spool/lp:
uucp:x:5:5:uucp Admin:/usr/lib/uucp:
listen:x:37:4:Network Admin:/usr/net/nls:
nobody:x:60001:60001:Nobody:/:
noaccess:x:60002:60002:No Access User:/:
oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh
webuser:*:102:102:Web User:/export/home/webuser:/bin/csh
abuzneid:x:103:100:Abdelshakour Abuzneid:/home/abuzneid:/sbin/csh
$
cut (continue.)
$ cut -d: -f1 /etc/passwd
root
daemon
bin
sys
adm
lp
uucp
nuucp
listen
nobody
oracle
webuser
abuzneid
$
paste
 Format: paste files
 tab character is a default delimiter
paste (continue.)
 Example:
$ cat students
Sue
Vara
Elvis
Luis
Eliza
$ cat sid
578426
452869
354896
455468
335123
$ paste students sid
Sue 578426
Vara 452869
Elvis 354896
Luis 455468
Eliza 335123
$
paste (continue.)

 The option –s tells paste to paste together


lines from the same file not from alternate
files
 To change the delimiter, -d option is used
paste (continue.)
 Examples:
$ paste -d '+' students sid
Sue+578426
Vara+452869
Elvis+354896
Luis+455468
Eliza+335123

$ paste -s students
Sue Vara Elvis Luis Eliza

$ ls | paste -d ' ' -s -


addr args list mail memo name nsmail phonebook programs roster sid

students test tp twice user

$
sed
 sed (stream editor) is a program used for editing
data
 Unlike ed, sed can not be used interactively
 Format: sed command file
 command: applied to each line of the specified file
 file: if no file is specified, then standard input is
assumed
 sed writes the output to the standard output
 s/Unix/UNIX command is applied to every line in
the file, it replaces the first Unix with UNIX
sed (continue.)
 sed makes no changes to the original input file
 ‘s/Unix/UNIX/g’ command is applied to every line
in the file. It replaces every Unix with UNIX. “g”
means global
 With –n option, selected lines can be printed
 Example: sed –n ’1,2p’ file which prints the first
two lines
 Example: sed –n ‘/UNIX/p’ file, prints any line
containing UNIX
sed (continue.)

 Example: sed –n ‘/1,2d/’ file, deletes lines 1


and 2
 Example: sed –n’ /1’ text, prints all lines
from text,

showing non printing characters as \nn and


tab characters as “>”
tr
 The tr filter is used to translate characters from
standard input
 Format: tr from-chars to-chars
 Result is written to standard output
 Example tr e x <file, translates every “e” in file to
“x” and prints the output to the standard output
 The octal representation of a character can be
given to “tr” in the format \nnn
 Example: tr : ‘\11’ will translate all : to tabs
tr (continue.)

Character Octal value


Bell 7
Backspace 10
Tab 11
New line 12
Linefeed 12
Form feed 14
Carriage return 15
Escape 33
tr (continue.)
 Example: tr ‘[a-z]’’[A-Z]’ < file translate all
lower case letters in file to their uppercase
equivalent. The characters ranges [a-z] and
[A-Z] are enclosed in quotes to keep the
shell from replacing them with all files
named from a through z and A through Z
 To “squeeze” out multiple occurrences of
characters the –s option is used
tr (continue.)

 Example: tr –s ’ ’ ‘ ‘ < file will squeeze multiple


spaces to one space
 The –d option is used to delete single characters
from a stream of input
 Format: tr –d from-chars
 Example: tr –d ‘ ‘ < file will delete all spaces from
the input stream
grep

 Searches one or more files for a particular


characters patterns
 Format: grep pattern files
 Example: grep path .cshrc will print every line
in .cshrc file which has the pattern ‘path’ and print
it
 Example: grep bin .cshrc .login .profile will print
every line from any of the three files .cshrc, .login
and .profile which has the pattern “bin”
grep (continue.)

 Example : grep * smarts will give an


error because * will be substituted with
all file in the correct directory
 Example : grep ‘*’ smarts
arguments *
grep
smarts
sort
 By default, sort takes each line of the specified input file and
sorts it into ascending order
$ cat students
Sue
Vara
Elvis
Luis
Eliza

$ sort students
Eliza
Elvis
Luis
Sue
Vara

$
sort (continue.)
 The –n option tells sort to eliminate
duplicate lines from the output
sort (continue.)
$ echo Ash >> students
$ cat students
Sue
Vara
Elvis
Luis
Eliza
Ash
Ash

$ sort students
Ash
Ash
Eliza
Elvis
Luis
Sue
Vara
sort (continue.)
 The –s option reverses the order of the sort
 The –o option is used to direct the input from the
standard output to file
 sort students > sorted_students works as sort
students –o sorted_students
 The –o option allows to sort file and saves the output
to the same file
 Example:
sort students –o students correct
sort students > students incorrect
sort (continue.)
• The –n option specifies the first field for sort
as number and data to sorted arithmetically
sort (continue.)

$ cat data
-10 11
15 2
-9 -3
2 13
20 22
3 1

$ sort data
-10 11
-9 -3
15 2
2 13
20 22
3 1

$
sort (continue.)
 To sort by the second field +1n should be used
instead of n. +1 says to skip the first field
 +5n would mean to skip the first five fields on
each line and then sort the data numerically
sort (continue.)

 Example
$ sort -t: +2n /etc/passwd
root:x:0:1:Super-User:/:/sbin/sh
daemon:x:1:1::/:
bin:x:2:2::/usr/bin:
sys:x:3:3::/:
adm:x:4:4:Admin:/var/adm:
uucp:x:5:5:uucp Admin:/usr/lib/uucp:
nuucp:x:9:9:uucp Admin:/var/spool/uucppublic:/usr/lib/uucp/uucico
listen:x:37:4:Network Admin:/usr/net/nls:
lp:x:71:8:Line Printer Admin:/usr/spool/lp:
oracle:*:101:67:DBA Account:/export/home/oracle:/bin/csh
webuser:*:102:102:Web User:/export/home/webuser:/bin/csh
y:x:60001:60001:Nobody:/:
$
uniq
 Used to find duplicate lines in a file
 Format: uniq in_file out_file
 uniq will copy in_file to out_file removing
any duplicate lines in the process
 uniq’s definition of duplicated lines are
consecutive-occurring lines that match
exactly
uniq (continue.)

 The –d option is used to list duplicate lines


 Example:
$ cat students
Sue
Vara
Elvis
Luis
Eliza
Ash
Ash
$ uniq students
Sue
Vara
Elvis
Luis
Eliza
Ash
$
References
 UNIX SHELLS BY EXAMPLE BY ELLIE
QUIGLEY
 UNIX FOR PROGRAMMERS AND USERS
BY G. GLASS AND K ABLES
 UNIX SHELL PROGRAMMING BY S.
KOCHAN AND P. WOOD

You might also like