0% found this document useful (0 votes)
318 views

Compiler Construction Past Paper 2022 Solution

Uploaded by

Zubair Jamil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
318 views

Compiler Construction Past Paper 2022 Solution

Uploaded by

Zubair Jamil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Q: NO:01

Tokenizer
A Tokenizer, also known as a lexical analyzer lexer, or scanner, is a component of a
compiler that reads the source code, breaks it down into basic units called tokens (such as
keywords, identifiers, literals, and operators), and removes whitespace and comments. It
serves as the first step in the compilation process, transforming a stream of characters into a
structured sequence of tokens for syntax analysis.

Duties:

1. Reading Input Stream


2. Recognizing and Classifying Tokens
3. Handling Whitespaces and Comments
4. Error Reporting
5. Maintaining Line and Column Information
6. Symbol Table Management
7. Generating Token Stream
8. Handling Lexical States
9. Efficient Tokenization
10. Providing information to the parser

Type checker
A Type checker is a component of a compiler that verifies the correctness of types in
a program, ensuring that operations are applied to compatible types and that type rules of
the programming language are followed. This process helps catch type-related errors early
in the compilation process and enforces type safety.

Duties:

1. Type Inference:
Deduce the types of expressions and variables when they are not explicitly stated.
2. Type Compatibility Checks:
Verify that operations and functions are applied to compatible types.
3. Type Declaration Verification:
Check that variables, functions, and other entities are used according to their
declared types.
4. Function (argument and return) type checking:
Ensures the function receives and returns data according to the type declared.
5. Scope and Binding Verification:
Ensure identifiers (variables, functions) are used within their valid scopes and are
bound to the correct types.
6. Type Conversion and Casting:
Manage implicit and explicit type conversions, ensuring they are valid and safe.
7. Function and Method Type Checks:
Check the consistency of function and method calls with their type signatures.
8. Type Error Reporting:
Detect and report type errors with precise and informative messages.
9. User-Defined Type Verification:
Validate the use of user-defined types (such as classes, structs, and enums)
according to their definitions.
10. Generics and Polymorphism Handling:
Manage and verify the correct use of generics and polymorphic types.

Q: NO:02

Code optimization is the process of modifying code to reduce resource usage (CPU time,
memory, power) without changing its functionality. It's a balancing act between improving
performance and maintaining the code's readability and correctness.

Types:
1. Machine Independent Optimization
2. Machine dependent Optimization

Scope of Optimization:
● Local Optimization
● Global Optimization

Techniques:
● Constant Folding
● Constant Propagation
● Dead Code Elimination
● Common Subexpression Elimination
● Copy Propagation
● Peephole Optimization
● Inlining
● Tail Call Optimization
● Register Allocation
● Data Flow Analysis
● Loop Fusion
● Code Motion
● Strength Reduction
● Loop Unrolling

Sources of optimization in source code:

Loops
● Loop Unrolling
● Loop Invariant Code Motion
● Loop Fusion
Functions
● Inlining small functions to reduce the overhead of function calls.
Mathematical Operations
● Replace expensive mathematical operations with simpler ones.
Algorithms and Data structures
● Using efficient algorithms and data structures to make better use of available
resources.
Libraries and APIs
● Replacing slow library functions with more efficient alternatives.
Input/Output Operations
● Buffering I/O operations to reduce the number of system calls.

Q: NO:03

A Syntax-Directed Translator is a type of compiler architecture that translates


high-level programming language constructs into an intermediate representation (IR)
or machine code, guided by the syntax and semantics of the source language. This
architecture relies on syntax-directed definitions, which associate semantic rules
with productions in a context-free grammar.

Architecture Overview

1. Lexical Analyzer:
● Converts source code into tokens (keywords, identifiers, literals, etc.).
● Handles error detection related to invalid tokens.
2. Syntax Analyzer:
● Constructs a parse tree based on the grammar rules of the source
language.
● Identifies syntactic errors in the source code.
3. Semantic Analyzer:
● Ensures semantic correctness by checking types, scopes, and other
language-specific rules.
● Constructs an annotated parse tree or syntax tree with semantic
information.
4. Intermediate Code Generator:
● Converts the annotated parse tree into an intermediate representation.
● This IR is easier to optimize and translate into machine code.
5. Code Optimizer:
● Improves the intermediate representation by eliminating redundancies
and improving efficiency.
● Techniques include constant folding, dead code elimination, and loop
optimization.
6. Target Code Generator:
● Translates the optimized intermediate representation into target
machine code or assembly code.
● Takes into account the specifics of the target architecture.
7. Symbol Table Manager:
● Keeps track of all symbols (variables, functions, classes) and their
attributes (type, scope, memory location).
● Essential for semantic analysis and code generation.
8. Error Handler:
● Manages errors detected in various phases (lexical, syntactic,
semantic).
● Provides meaningful error messages to help debug the source
program.

This architecture ensures a systematic approach to translating high-level

programming constructs into efficient machine code while maintaining modularity,

making it easier to manage and extend.

Schematic Diagram:
Q: NO:05

Synthesized Attributes:
Definition:
Attributes whose values are computed from the attribute values of the children nodes
in a parse tree during the bottom-up traversal.
Synthesized attributes are properties attached to the non-terminal symbols on the left-hand
side (LHS) of a production rule in the grammar.

Example:
Consider a simple arithmetic expression grammar. The synthesized attribute might
store the computed value of an expression.
Production Rule: E → E1 + T

Synthesized Attribute: E.val = E1.val + T.val


Here, E.val is a synthesized attribute that holds the value of the expression. It's computed
using the values of E1.val and T.val.

Computation:
The values of synthesized attributes are calculated starting from the leaves of the
parse tree and moving towards the root. Each node's synthesized attribute is determined
based on its children's attributes.

Purpose:
● Expression Evaluation:
To compute the values of expressions and propagate them up the parse tree.
● Type Checking:
To determine the types of expressions and ensure type consistency.
● Intermediate Code Generation:
To generate intermediate or target code for expressions and statements.
● Error Reporting:
To propagate error information up the parse tree for reporting semantic errors.

Limitations:
Unidirectional Information Flow:
They only support bottom-up information flow, which might be limiting in some
contexts. In such cases, inherited attributes (which flow top-down) are also used.
Complexity:
In complex grammars, managing attributes and ensuring consistency can become
challenging.

Inherited Attributes:
Definition:
Attributes whose values are computed from the attribute values of the parent node
and its siblings in a parse tree during the top-down traversal.
Interited attributes are properties attached to the non-terminal symbols on the right-hand
side (RHS) of a production rule in the grammar.

Example:
Consider a grammar for variable declarations where scope information needs to be
passed to variable nodes.
Production Rule: D → T L
Inherited Attribute: L.scope = D.scope
Here, L.scope is an inherited attribute that gets its value from D.scope, ensuring that
the scope information is passed from the declaration to the list of variables.

Computation:
The values of inherited attributes are calculated starting from the root of the parse
tree and moving towards the leaves. Each node's inherited attribute is determined based on
its parent's attributes and potentially its siblings' attributes.

Purpose:
● Scope Information:
To pass scope-related information (e.g., symbol tables) down the parse tree.
● Type Checking:
To propagate expected types or constraints from parent nodes to child nodes.
● Contextual Information:
To pass context-specific information such as loop control variables or function return
types.

Limitations:
● Complexity:
Managing inherited attributes can be complex, especially in large grammars with
many rules and dependencies.
● Circular Dependencies:
Care must be taken to avoid circular dependencies between inherited and
synthesized attributes, which can lead to non-terminating computations.

Q: NO:06

Three Address Code (Simple and Easy)


Direct Acyclic Graph (Eliminating Redundancy)
Postfix Notation (Evaluation Efficiency)

Three Address Code:


Advantages:
Simplicity:
TAC is straightforward to generate and understand, with each instruction containing
at most three operands.
Ease of Optimization:
The explicit representation of operations and operands simplifies various
optimizations like constant folding, dead code elimination, and common subexpression
elimination.

Disadvantages:
Memory Usage:
TAC can be verbose(wordy), leading to high memory consumption for large
programs.
Translation Overhead:
Converting TAC to machine code or other intermediate representations can be
complex and require additional translation steps.
Limited Abstraction:
TAC operates at a relatively low level of abstraction, which might not capture
higher-level constructs as effectively as other intermediate representations.

Direct Acyclic Graph:


Advantages:
Common Subexpression Elimination:
DAGs are particularly effective for identifying and eliminating common subexpressions, and
reducing redundant computations.
Optimization Potential:
DAGs can represent complex expressions compactly, allowing for efficient optimizations of
arithmetic expressions and code motion.
Dependency Representation:
Clearly shows dependencies between operations, which helps in scheduling and
parallelization.

Disadvantages:
Complexity:
Constructing and manipulating DAGs can be more complex compared to linear
representations like TAC.
Scalability:
For very large programs, managing DAGs can become difficult, potentially leading to high
computational overhead.
Limited Control Flow Representation:
DAGs are typically used for expressions and basic blocks but are not well-suited for
representing complex control flow structures.

Postfix Notation:
Advantages:
Compact representation:
Postfix notation can be more concise than infix notation (standard mathematical notation) for
some expressions.
No parentheses needed:
Implicit operator precedence eliminates the need for parentheses, simplifying the code.
Implementation Simplicity:
Suitable for stack-based virtual machines and interpreters, making it easier to implement
interpreters and compilers for certain classes of expressions.
Avoids Ambiguity:
Eliminates ambiguity in expression parsing since the order of operations is explicitly defined
by the notation.

Disadvantages:
Readability:
Postfix notation is less intuitive and harder for humans to read and understand compared to
infix notation.
Complex Expressions:
For complex expressions involving many operators and operands, postfix notation can
become cumbersome and difficult to manage.
Limited Expressiveness:
Postfix notation is primarily useful for arithmetic expressions and is not well-suited for
representing more complex programming constructs or control flow.
Limited use in optimization:
While compact, postfix notation doesn't explicitly show data dependencies, making it less
suitable for complex optimizations.
Potential for conversion overhead:
Depending on the context, converting between postfix and infix notation might require
additional processing.

Q: NO:07

To construct an LR parse table for the given context-free grammar (CFG), we will follow
these steps:
1. Augment the grammar by adding a new start symbol 𝑆 -> S′.
2. Construct the canonical collection of LR(0) items.
3. Construct the LR parse table using the canonical collection.

Given Grammar:
S -> aBb | A | c
B -> c
A -> e

Numbering Production:

1-S -> aBb


2-S -> A
3-S -> c
4-B -> c
5-A -> e

Step 1: Augment the Grammar


Add a new start symbol 𝑆′ with a production that leads to the original start symbol 𝑆:
S' -> S
S -> aBb | A | c
B -> c
A -> e

Step 2: Construct the Canonical Collection of LR(0) Items


Start with the augmented grammar and create the set of LR(0) items. We need to compute
the closure and the goto functions for the items.
Construct the LR parse table:

State a b c e $ S A B

0 S5 S3 S4 1 2

1 accept

2 r2 r2 r2 r2 r2

3 r3 r3 r3 r3 r3

4 r5 r5 r5 r5 r5

5 S7 6

6 s8

7 r4 r4 r4 r4 r4

8 r1 r1 r1 r1 r1

Q: NO:09

Given Grammar:

E -> E + T | E - T | T
T -> T / F | F
F -> [E] | v

Left Recursion:
Left recursion in a context-free grammar occurs when a non-terminal symbol has a
production rule that allows it to appear as the leftmost symbol on the right-hand side
of its own production. This can lead to infinite recursion in a top-down parser.

Removing Left Recursion:

Since the given grammar is left recursive and predictive parsers do not accept left recursive
grammars, we need to remove the left recursion. Here are the generalized production rules
for removing left recursion:

Generalized Production Rules:


With left recursion:
A -> Aa | b (where a and b are sequences of terminals and/or non-terminals)

Without left recursion:


A -> bA'
A' -> aA' | ε

Applying the Rules:

For Non-terminal `E`


Original productions:
E -> E + T | E - T | T

Here:
`A` is `E`
`a` is `+ T` and `- T`
`b` is `T`
Transformed productions:
E -> T E'
E' -> + T E' | - T E' | ε

For Non-terminal `T`


Original productions:
T -> T / F | F
Here:
`A` is `T`
`a` is `/ F`
`b` is `F`
Transformed productions:
T -> F T'
T' -> / F T' | ε

For Non-terminal `F`


Original productions:
F -> [E] | v
There is no left recursion here, so `F` remains unchanged:
F -> [E] | v

Final Transformed Grammar:


E -> T E'
E' -> + T E' | - T E' | ε
T -> F T'
T' -> / F T' | ε
F -> [E] | v
This final grammar is free of left recursion and can be used by predictive parsers.
Construction of predictive parse table:

To construct the predictive parse table for the given grammar, we need to follow these steps:
1. Compute the FIRST sets for all non-terminals.
2. Compute the FOLLOW sets for all non-terminals.
3. Construct the predictive parse table using the FIRST and FOLLOW sets.

Finding First:
● FIRST(E): {[, v}
● FIRST(T): {[, v}
● FIRST(E'): {+, -, ε}
● FIRST(T'): {/, ε}
● FIRST(F): {[, v}

Finding Follow:
● FOLLOW(E): {$, ]}
● FOLLOW(E'): {$, ]}
● FOLLOW(T): {+, -, $, ]}
● FOLLOW(T'): {+, -, /, $, ]}
● FOLLOW(F): {+, -, /, $, ]}

Non-terminal [ v + - / ] $

E E -> T E' E -> T E'

E' E' -> + T E' E' -> - T E' E' -> ε E' -> ε

T T -> F T' T -> F T'

T' T' -> ε T' -> ε T' -> / F T' T' -> ε T' -> ε

F -> v
F F -> [ E ]

You might also like