Compiler Construction Past Paper 2022 Solution
Compiler Construction Past Paper 2022 Solution
Tokenizer
A Tokenizer, also known as a lexical analyzer lexer, or scanner, is a component of a
compiler that reads the source code, breaks it down into basic units called tokens (such as
keywords, identifiers, literals, and operators), and removes whitespace and comments. It
serves as the first step in the compilation process, transforming a stream of characters into a
structured sequence of tokens for syntax analysis.
Duties:
Type checker
A Type checker is a component of a compiler that verifies the correctness of types in
a program, ensuring that operations are applied to compatible types and that type rules of
the programming language are followed. This process helps catch type-related errors early
in the compilation process and enforces type safety.
Duties:
1. Type Inference:
Deduce the types of expressions and variables when they are not explicitly stated.
2. Type Compatibility Checks:
Verify that operations and functions are applied to compatible types.
3. Type Declaration Verification:
Check that variables, functions, and other entities are used according to their
declared types.
4. Function (argument and return) type checking:
Ensures the function receives and returns data according to the type declared.
5. Scope and Binding Verification:
Ensure identifiers (variables, functions) are used within their valid scopes and are
bound to the correct types.
6. Type Conversion and Casting:
Manage implicit and explicit type conversions, ensuring they are valid and safe.
7. Function and Method Type Checks:
Check the consistency of function and method calls with their type signatures.
8. Type Error Reporting:
Detect and report type errors with precise and informative messages.
9. User-Defined Type Verification:
Validate the use of user-defined types (such as classes, structs, and enums)
according to their definitions.
10. Generics and Polymorphism Handling:
Manage and verify the correct use of generics and polymorphic types.
Q: NO:02
Code optimization is the process of modifying code to reduce resource usage (CPU time,
memory, power) without changing its functionality. It's a balancing act between improving
performance and maintaining the code's readability and correctness.
Types:
1. Machine Independent Optimization
2. Machine dependent Optimization
Scope of Optimization:
● Local Optimization
● Global Optimization
Techniques:
● Constant Folding
● Constant Propagation
● Dead Code Elimination
● Common Subexpression Elimination
● Copy Propagation
● Peephole Optimization
● Inlining
● Tail Call Optimization
● Register Allocation
● Data Flow Analysis
● Loop Fusion
● Code Motion
● Strength Reduction
● Loop Unrolling
Loops
● Loop Unrolling
● Loop Invariant Code Motion
● Loop Fusion
Functions
● Inlining small functions to reduce the overhead of function calls.
Mathematical Operations
● Replace expensive mathematical operations with simpler ones.
Algorithms and Data structures
● Using efficient algorithms and data structures to make better use of available
resources.
Libraries and APIs
● Replacing slow library functions with more efficient alternatives.
Input/Output Operations
● Buffering I/O operations to reduce the number of system calls.
Q: NO:03
Architecture Overview
1. Lexical Analyzer:
● Converts source code into tokens (keywords, identifiers, literals, etc.).
● Handles error detection related to invalid tokens.
2. Syntax Analyzer:
● Constructs a parse tree based on the grammar rules of the source
language.
● Identifies syntactic errors in the source code.
3. Semantic Analyzer:
● Ensures semantic correctness by checking types, scopes, and other
language-specific rules.
● Constructs an annotated parse tree or syntax tree with semantic
information.
4. Intermediate Code Generator:
● Converts the annotated parse tree into an intermediate representation.
● This IR is easier to optimize and translate into machine code.
5. Code Optimizer:
● Improves the intermediate representation by eliminating redundancies
and improving efficiency.
● Techniques include constant folding, dead code elimination, and loop
optimization.
6. Target Code Generator:
● Translates the optimized intermediate representation into target
machine code or assembly code.
● Takes into account the specifics of the target architecture.
7. Symbol Table Manager:
● Keeps track of all symbols (variables, functions, classes) and their
attributes (type, scope, memory location).
● Essential for semantic analysis and code generation.
8. Error Handler:
● Manages errors detected in various phases (lexical, syntactic,
semantic).
● Provides meaningful error messages to help debug the source
program.
Schematic Diagram:
Q: NO:05
Synthesized Attributes:
Definition:
Attributes whose values are computed from the attribute values of the children nodes
in a parse tree during the bottom-up traversal.
Synthesized attributes are properties attached to the non-terminal symbols on the left-hand
side (LHS) of a production rule in the grammar.
Example:
Consider a simple arithmetic expression grammar. The synthesized attribute might
store the computed value of an expression.
Production Rule: E → E1 + T
Computation:
The values of synthesized attributes are calculated starting from the leaves of the
parse tree and moving towards the root. Each node's synthesized attribute is determined
based on its children's attributes.
Purpose:
● Expression Evaluation:
To compute the values of expressions and propagate them up the parse tree.
● Type Checking:
To determine the types of expressions and ensure type consistency.
● Intermediate Code Generation:
To generate intermediate or target code for expressions and statements.
● Error Reporting:
To propagate error information up the parse tree for reporting semantic errors.
Limitations:
Unidirectional Information Flow:
They only support bottom-up information flow, which might be limiting in some
contexts. In such cases, inherited attributes (which flow top-down) are also used.
Complexity:
In complex grammars, managing attributes and ensuring consistency can become
challenging.
Inherited Attributes:
Definition:
Attributes whose values are computed from the attribute values of the parent node
and its siblings in a parse tree during the top-down traversal.
Interited attributes are properties attached to the non-terminal symbols on the right-hand
side (RHS) of a production rule in the grammar.
Example:
Consider a grammar for variable declarations where scope information needs to be
passed to variable nodes.
Production Rule: D → T L
Inherited Attribute: L.scope = D.scope
Here, L.scope is an inherited attribute that gets its value from D.scope, ensuring that
the scope information is passed from the declaration to the list of variables.
Computation:
The values of inherited attributes are calculated starting from the root of the parse
tree and moving towards the leaves. Each node's inherited attribute is determined based on
its parent's attributes and potentially its siblings' attributes.
Purpose:
● Scope Information:
To pass scope-related information (e.g., symbol tables) down the parse tree.
● Type Checking:
To propagate expected types or constraints from parent nodes to child nodes.
● Contextual Information:
To pass context-specific information such as loop control variables or function return
types.
Limitations:
● Complexity:
Managing inherited attributes can be complex, especially in large grammars with
many rules and dependencies.
● Circular Dependencies:
Care must be taken to avoid circular dependencies between inherited and
synthesized attributes, which can lead to non-terminating computations.
Q: NO:06
Disadvantages:
Memory Usage:
TAC can be verbose(wordy), leading to high memory consumption for large
programs.
Translation Overhead:
Converting TAC to machine code or other intermediate representations can be
complex and require additional translation steps.
Limited Abstraction:
TAC operates at a relatively low level of abstraction, which might not capture
higher-level constructs as effectively as other intermediate representations.
Disadvantages:
Complexity:
Constructing and manipulating DAGs can be more complex compared to linear
representations like TAC.
Scalability:
For very large programs, managing DAGs can become difficult, potentially leading to high
computational overhead.
Limited Control Flow Representation:
DAGs are typically used for expressions and basic blocks but are not well-suited for
representing complex control flow structures.
Postfix Notation:
Advantages:
Compact representation:
Postfix notation can be more concise than infix notation (standard mathematical notation) for
some expressions.
No parentheses needed:
Implicit operator precedence eliminates the need for parentheses, simplifying the code.
Implementation Simplicity:
Suitable for stack-based virtual machines and interpreters, making it easier to implement
interpreters and compilers for certain classes of expressions.
Avoids Ambiguity:
Eliminates ambiguity in expression parsing since the order of operations is explicitly defined
by the notation.
Disadvantages:
Readability:
Postfix notation is less intuitive and harder for humans to read and understand compared to
infix notation.
Complex Expressions:
For complex expressions involving many operators and operands, postfix notation can
become cumbersome and difficult to manage.
Limited Expressiveness:
Postfix notation is primarily useful for arithmetic expressions and is not well-suited for
representing more complex programming constructs or control flow.
Limited use in optimization:
While compact, postfix notation doesn't explicitly show data dependencies, making it less
suitable for complex optimizations.
Potential for conversion overhead:
Depending on the context, converting between postfix and infix notation might require
additional processing.
Q: NO:07
To construct an LR parse table for the given context-free grammar (CFG), we will follow
these steps:
1. Augment the grammar by adding a new start symbol 𝑆 -> S′.
2. Construct the canonical collection of LR(0) items.
3. Construct the LR parse table using the canonical collection.
Given Grammar:
S -> aBb | A | c
B -> c
A -> e
Numbering Production:
State a b c e $ S A B
0 S5 S3 S4 1 2
1 accept
2 r2 r2 r2 r2 r2
3 r3 r3 r3 r3 r3
4 r5 r5 r5 r5 r5
5 S7 6
6 s8
7 r4 r4 r4 r4 r4
8 r1 r1 r1 r1 r1
Q: NO:09
Given Grammar:
E -> E + T | E - T | T
T -> T / F | F
F -> [E] | v
Left Recursion:
Left recursion in a context-free grammar occurs when a non-terminal symbol has a
production rule that allows it to appear as the leftmost symbol on the right-hand side
of its own production. This can lead to infinite recursion in a top-down parser.
Since the given grammar is left recursive and predictive parsers do not accept left recursive
grammars, we need to remove the left recursion. Here are the generalized production rules
for removing left recursion:
Here:
`A` is `E`
`a` is `+ T` and `- T`
`b` is `T`
Transformed productions:
E -> T E'
E' -> + T E' | - T E' | ε
To construct the predictive parse table for the given grammar, we need to follow these steps:
1. Compute the FIRST sets for all non-terminals.
2. Compute the FOLLOW sets for all non-terminals.
3. Construct the predictive parse table using the FIRST and FOLLOW sets.
Finding First:
● FIRST(E): {[, v}
● FIRST(T): {[, v}
● FIRST(E'): {+, -, ε}
● FIRST(T'): {/, ε}
● FIRST(F): {[, v}
Finding Follow:
● FOLLOW(E): {$, ]}
● FOLLOW(E'): {$, ]}
● FOLLOW(T): {+, -, $, ]}
● FOLLOW(T'): {+, -, /, $, ]}
● FOLLOW(F): {+, -, /, $, ]}
Non-terminal [ v + - / ] $
E' E' -> + T E' E' -> - T E' E' -> ε E' -> ε
T' T' -> ε T' -> ε T' -> / F T' T' -> ε T' -> ε
F -> v
F F -> [ E ]