Notes - IAE-1-CD
Notes - IAE-1-CD
Context free grammar is a formal grammar which is used to generate all possible strings
in a given formal language.
G= (V, T, P, S)
Where,
G describes the grammar
9. Define Ambiguity.
A grammar that produces more than one parse tree for some sentences is said to be
ambiguous.
10. What is a Parser? List its types.
A parser for grammar G is a program that takes as input a string „w will produces as output
either a parse tree for w, if „w is a sentence of G, or an error message indicating that w is
not a sentence of G. it obtains a string of tokens from the lexical analyzer, verifies that the
string generated by the grammar for the source language.
a) Top down parsing b) Bottom up parsing
11. Discuss the role of lexical analyzer in detail with necessary example.
The main task of lexical analysis is to read input characters in the code and produce tokens.
Lexical analyzer scans the entire source code of the program. It identifies each token one by one.
Scanners are usually implemented to produce tokens only when requested by a parser. Here is how
this works-
1. "Get next token" is a command which is sent from the parser to the lexical analyzer.
2. On receiving this command, the lexical analyzer scans the input until it finds the next
token.
3. It returns the token to Parser.
Lexical Analyzer skips whitespaces and comments while creating these tokens. If any error is
present, then Lexical analyzer will correlate that error with the source file and line number.
#include <stdio.h>
int maximum(int x, int y) {
// This will compare 2 numbers
if (x > y)
return x;
else {
return y;
}
}
Examples of Tokens created
Lexeme Token
int Keyword
maximum Identifier
( Operator
int Keyword
x Identifier
, Operator
int Keyword
Y Identifier
) Operator
{ Operator
If Keyword
Examples of Nontokens
Type Examples
Macro NUMS
Whitespace /n /b /t
Lexical Errors
A character sequence which is not possible to scan into any valid token is a lexical error.
Important facts about the lexical error:
• Lexical errors are not very common, but it should be managed by a scanner
• Misspelling of identifiers, operators, keyword are considered as lexical errors
• Generally, a lexical error is caused by the appearance of some illegal character, mostly at
the beginning of a token.
Error Recovery in Lexical Analyzer
COUSINS OF COMPILER
1. Preprocessor 2. Assembler 3. Loader and Link-editor
Preprocessor
A preprocessor is a program that processes its input data to produce output that is used as
input to another program. The output is said to be a preprocessed form of the input data, which is
often used by some subsequent programs like compilers.
They may perform the following functions :
1. Macro processing 3. Rational Preprocessors
2. File Inclusion 4. Language extension
1. Macro processing:
A macro is a rule or pattern that specifies how a certain input sequence should be mapped
to an output sequence according to a defined procedure. The mapping process that instantiates a
macro into a specific output sequence is known as macro expansion.
2. File Inclusion:
Preprocessor includes header files into the program text. When the preprocessor finds an
#include directive it replaces it by the entire content of the specified file.
3. Rational Preprocessors:
These processors change older languages with more modern flow-of-control and data-
structuring facilities.
4. Language extension :
These processors attempt to add capabilities to the language by what amounts to built-in
macros. For example, the language Equel is a database query language embedded in C.
Assembler
· One-pass assemblers go through the source code once and assume that all symbols
will be defined before any instruction that references them.
· Two-pass assemblers create a table with all symbols and their values in the first pass,
and then use the table in a second pass to generate code
Fig. 1.7 Translation of a statement
Linker and Loader
A linker or link editor is a program that takes one or more objects generated by a compiler
and combines them into a single executable program. Three tasks of the linker are
1.Searches the program to find library routines used by program, e.g. printf(), math routines.
2. Determines the memory locations that code from each module will occupy and relocates its
instructions by adjusting absolute references 3. Resolves references among files.
A loader is the part of an operating system that is responsible for loading programs in
memory, one of the essential stages in the process of starting a program.
12. Build the transition diagram for relational operators, keywords, identifier and constant.
Recognition of Tokens
Our current goal is to perform the lexical analysis needed for the following grammar.
Recall that the terminals are the tokens, the nonterminals produce terminals.
digit → [0-9]
digits → digits+
number → digits (. digits)? (E[+-]? digits)?
letter → [A-Za-z]
id → letter ( letter | digit )*
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>
A transition diagram is similar to a flowchart for (a part of) the lexer. We draw one for each
possible token. It shows the decisions that must be made based on the input seen. The two main
components are circles representing states (think of them as decision points of the lexer) and
arrows representing edges (think of them as the decisions made).
1. The double circles represent accepting or final states at which point a lexeme has been
found. There is often an action to be done (e.g., returning the token), which is written to
the right of the double circle.
2. If we have moved one (or more) characters too far in finding the token, one (or more) stars
are drawn.
3. An imaginary start state exists and has an arrow coming from it to indicate where to begin
the process.
It is fairly clear how to write code corresponding to this diagram. You look at the first character,
if it is <, you look at the next character. If that character is =, you return (relop,LE) to the parser.
If instead that character is >, you return (relop,NE). If it is another character, return (relop,LT) and
adjust the input buffer so that you will read this character again since you have not used it for the
current lexeme. If the first character was =, you return (relop,EQ).
Recognition of Reserved Words and Identifiers
The transition diagram below corresponds to the regular definition given previously.
1. How do we distinguish between identifiers and keywords such as then, which also match
the pattern in the transition diagram?
2. What is (gettoken(), installID())?
We will continue to assume that the keywords are reserved, i.e., may not be used as identifiers.
(What if this is not the case—as in Pl/I, which had no reserved words? Then the lexer does not
distinguish between keywords and identifiers and the parser must.)
We will use the method mentioned last chapter and have the keywords installed into the
identifier table prior to any invocation of the lexer. The table entry will indicate that the entry is a
keyword.
installID() checks if the lexeme is already in the table. If it is not present, the lexeme is installed
as an id token. In either case a pointer to the entry is returned.
gettoken() examines the lexeme and returns the token name, either id or a name corresponding to
a reserved keyword.
The text also gives another method to distinguish between identifiers and keywords.
So far we have transition diagrams for identifiers (this diagram also handles keywords) and the
relational operators. What remains are whitespace, and numbers, which are respectively the
simplest and most complicated diagrams seen so far.
Recognizing Whitespace
Recognizing Numbers
14. How do you reduce the number of states of DFA without affecting the language? Explain
with an example. (Available in notes also)
DFA minimization stands for converting a given DFA to its equivalent DFA with minimum
number of states.
Minimization of DFA
Suppose there is a DFA D < Q, Σ, q0, δ, F > which recognizes a language L. Then the minimized
DFA D < Q’, Σ, q0, δ’, F’ > can be constructed for language L as:
Step 1: We will divide Q (set of states) into two sets. One set will contain all final states and
other set will contain non-final states. This partition is called P0.
Step 2: Initialize k = 1
Step 3: Find Pk by partitioning the different sets of Pk-1. In each set of Pk-1, we will take all
possible pair of states. If two states of a set are distinguishable, we will split the sets into
different sets in Pk.
Step 4: Stop when Pk = Pk-1 (No change in partition)
Step 5: All states of one set are merged into one. No. of states in minimized DFA will be equal
to no. of sets in Pk.
How to find whether two states in partition Pk are distinguishable ?
Two states ( qi, qj ) are distinguishable in partition P k if for any input symbol a, δ ( qi, a ) and δ (
qj, a ) are in different sets in partition Pk-1.
Example
Consider the following DFA shown in figure.
Step 1. P0 will have two sets of states. One set will contain q1, q2, q4 which are final states of
DFA and another set will contain remaining states. So P0 = { { q1, q2, q4 }, { q0, q3, q5 } }.
Step 2. To calculate P1, we will check whether sets of partition P0 can be partitioned or not: