Specification of Tokens Using Regular Expressions
Specification of Tokens Using Regular Expressions
TOKENS USING
REGULAR
EXPRESSIONS
Strings and Languages
– An alphabet is any finite set of symbols.
– Examples of symbols are letters, digits,
and punctuation.
– The set {0,1} is the binary alphabet.
A string over an alphabet is a finite sequence
of symbols drawn from that alphabet.
– "sentence" and "word" are often used as synonyms
for "string.“
– The empty string, denoted , is the string of
length zero.
A language is any countable set of strings over
some fixed alphabet.
– The set containing only the empty string, are
languages {}.
REGULAR EXPRESSION
Regular expressions are an important notation
for specifying lexeme patterns. They are
effective in specifying those types of patterns
that we need for tokens
ws -> (blank|tab|newline)+
When ws is recognized , we do not return
anything but restart to the character following
white space