0% found this document useful (0 votes)
3 views13 pages

409.f

The document covers compiler construction concepts, detailing the differences between compilers and interpreters, and outlining the phases of compilation including lexical analysis, syntax analysis, and semantic analysis. It explains the roles of lexical analyzers and parsers, their outputs, and the advantages and disadvantages of lexical analysis. Additionally, it provides examples, mnemonics, and key concepts related to the compilation process.

Uploaded by

gremasaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

409.f

The document covers compiler construction concepts, detailing the differences between compilers and interpreters, and outlining the phases of compilation including lexical analysis, syntax analysis, and semantic analysis. It explains the roles of lexical analyzers and parsers, their outputs, and the advantages and disadvantages of lexical analysis. Additionally, it provides examples, mnemonics, and key concepts related to the compilation process.

Uploaded by

gremasaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Compiler Construction (CSC 409)

1. Core Concepts

Front: What is a compiler?

Back: A translator that converts high-level code (e.g., C++) into machine code all at once.

Example: C code → .exe file.

Analogies: Like translating an entire book into another language.

Front: What is an interpreter?

Back: Executes code line-by-line (e.g., Python).

Example: Runs print("Hello") directly without creating an executable.

Key Difference: Interpreter = Tour Guide; Compiler = Book Translator.

2. Phases of Compilation (Use Mnemonic: Lazy Squirrels Sell


Intermediate Coconuts Gracefully)

Front: List the 6 phases of compilation in order.

Back:

1. Lexical Analysis
2. Syntax Analysis
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Gode Generation

Front: What does lexical analysis do?

Back: Converts source code into tokens (e.g., id, =, +).


Example: a = b + 5 → Tokens: id(a), =, id(b), +, num(5).

Front: What is the role of syntax analysis?

Back: Validates code structure using grammar rules and builds a parse tree.

Example: Detects a + = b as invalid (adjacent + and =).

Front: What happens in semantic analysis?

Back: Checks meaning (types, scope).

Example: int x = "text"; → Error (type mismatch).

Front: What is intermediate code generation?

Back: Creates platform-independent code (e.g., three-address code).

Example:

temp1 = b * 60.0
a = temp1 + c

Front: What is code optimization?

Back: Improves code efficiency.

Example: Replacing x = x + 0 → x.

Front: What is code generation?

Back: Converts intermediate code to machine-specific instructions.

Example:

MOV b, R1
MUL #60.0, R1
ADD c, R1

3. Lexical Analysis Deep Dive

Front: Define lexeme vs. token.


Back:

• Lexeme: Raw text (e.g., "if").


• Token: Categorized lexeme (e.g., keyword).

Example:

Lexemes: "123", Tokens: num.

Front: What are regular expressions used for in lexical analysis?

Back: To define patterns for tokens (e.g., [a-zA-Z]+ for identifiers).

Front: What is lookahead?

Back: Checking the next character to resolve ambiguities.

Example: == (one token) vs. = followed by = (two tokens).

Front: Name 2 lexical error recovery techniques.

Back:

1. Panic Mode: Skip input until a valid token.


2. Local Correction: Insert/delete/replace characters.

4. Syntax & Semantic Analysis

Front: What is a parse tree?

Back: A hierarchical representation of code structure.

Example:

=
/\
a +
/\
b *
/\
c 60

Front: What is operator precedence?


Back: Rules defining which operations execute first (e.g., * before +).

Example: a + b * c → a + (b * c).

Front: What is type checking?

Back: Ensuring variables/operations use compatible types.

Example: int x = 5.5; → Error (float assigned to int).

5. Intermediate Code & Optimization

Front: What is three-address code?

Back: Intermediate code with ≤ 3 operands per instruction.

Example:

t1 = b * 60
t2 = a + t1

Front: What is common subexpression elimination?

Back: Reusing repeated calculations.

Example:

Original: t1 = b + c; t2 = b + c;
Optimized: t1 = b + c; t2 = t1;

Front: What is loop optimization?

Back: Moving invariant code outside loops.

Example:

Before: for (i=0; i<n; i++) { x = y*z; }


After: x = y*z; for (i=0; i<n; i++) {}
6. Symbol Table & Error Handling

Front: What is the symbol table?

Back: A "dictionary" storing identifiers and their attributes (type, scope, memory address).

Example: rate: float, Address: 0x1000.

Front: Name 4 error recovery techniques.

Back:

1. Panic Mode
2. Phrase-Level Recovery
3. Error Productions
4. Global Correction

Front: What is error production?

Back: Adding grammar rules to handle common errors (e.g., missing semicolon).

7. Tools & Examples

Front: Tools for lexical and syntax analysis?

Back:

• Lexical: Lex/Flex
• Syntax: Yacc/Bison

Front: Trace position = initial + rate * 60 through compilation phases.

Back:

1. Lexical: Tokens → id, =, id, +, id, *, num.


2. Syntax: Validates * before +.
3. Semantic: Converts 60 to float.
4. Intermediate: temp1 = rate * 60.0; position = initial + temp1.
5. CodeGen: Assembly instructions.
8. Quick Comparisons

Front: Compiler vs. Interpreter

Back:

• Compiler: Faster execution, standalone executable.


• Interpreter: Slower, platform-independent.

Front: Lexical vs. Syntax Analyzer

Back:

• Lexical: Tokens.
• Syntax: Parse tree.

Front: Local vs. Loop Optimization

Back:

• Local: Redundant code removal.


• Loop: Move code outside loops.

9. Exam Hotspots

Front: What is relocatable code?

Back: Machine code with adjustable memory addresses (handled by the loader).

Front: What does the preprocessor do?

Back: Processes macros (#define), file inclusion (#include), and conditional compilation (#ifdef).

Front: What is the loader/linker?

Back: Combines object files into an executable and loads it into memory.
1. Core Concepts

Term Definition/Example

Compiler Translates entire HLL code to machine code (e.g., C → .exe).

Interpreter Executes code line-by-line (e.g., Python).

Assembler Converts assembly code (e.g., MOV AX, 5) to machine code.

Preproces Handles macros (#define), file inclusion (#include), and conditional


sor compilation.

2. Phases of Compilation

Phase Input → Output Example

Lexical Characters → Tokens (e.g., a = b + 5 →


position → id, 60 → num.
Analysis id, =, id, +, num).

Syntax Validates a + b * c as a + (b * c)
Tokens → Parse Tree
Analysis (operator precedence).

Semantic Checks int x = "text"; → type


Parse Tree → Annotated Tree
Analysis mismatch error.

Intermedi Annotated Tree → Three-Address temp1 = rate * 60.0; position =


ate Code Code initial + temp1.

Code
Optimizat Intermediate Code → Optimized Code Replaces x = x + 0 → x.
ion

Code
MOVF rate, R1; MULF #60.0, R1;
Generatio Optimized Code → Machine Code
ADDF initial, R1.
n
3. Lexical Analysis

Term Definition Example

Lexeme Raw sequence of characters (e.g., "if", "123"). "return" → lexeme.

"return" → keyword
Token Categorized lexeme (e.g., keyword, identifier).
token.

Rule (regex) defining valid lexemes (e.g., [a-z]+ for [0-9]+ → pattern for
Pattern
identifiers). numbers.

Lookah Checking next character to resolve ambiguities = followed by = → ==


ead (e.g., = vs. ==). token.

4. Error Handling

Technique Description Example

Panic Mode Skip input until a valid token (e.g., ;). Skip until ; after a = b +.

Local
Insert/delete/replace characters to fix errors. Replace fi with if.
Correction

Error Add grammar rules to handle common errors Allow if (x) { ... }
Productions (e.g., missing ;). without ;.

5. Tools & Components

Componen
Role Tools
t

Lexical
Generates tokens from source code. Lex, Flex.
Analyzer
Syntax
Builds parse tree using grammar rules. Yacc, Bison.
Analyzer

Symbol Stores identifiers (variables, functions) and Hash table, tree


Table attributes (type, address). structures.

6. Compiler vs. Interpreter

Feature Compiler Interpreter

Entire code at once → standalone


Execution Line-by-line at runtime.
executable.

Slower due to runtime


Speed Faster execution.
translation.

Error Immediate (during


Early (during compilation).
Detection execution).

Examples C, C++, Rust. Python, JavaScript.

7. Key Optimization Techniques

Technique Description Example

Common Subexpression Reuse repeated t1 = b + c used in a = t1 +


Elimination calculations. d.

Move computations x = y * z moved outside


Loop Invariant Code Motion
outside loops. loop.

Remove unreachable
Dead Code Elimination Delete if (false) { ... }.
code.
8. Mnemonics & Examples

Mnemon
Purpose Example
ic

Lazy
Phases of Compilation: Lex → Syntax → Semantic → position = initial +
Squirrel
Intermediate → Codegen → Generate. rate * 60.
s...

Token
"123" → lexeme,
vs. Lexeme = raw text, Token = categorized lexeme.
num → token.
Lexeme

9. Quick Reference Table

Concept Key Points

Symbol Table Stores identifiers (e.g., rate: float, Address: 0x1000).

Intermediate Code Platform-independent (e.g., three-address code).

Error Recovery Panic mode, local correction, error productions.

Lexical Errors Misspelled keywords, invalid characters.

Structured and Corrected Summary: Lexical Analyzer vs. Parser

1. Comparison Table: Lexical Analyzer vs. Parser

Lexical Analyzer Parser


Scans the input program character
Performs syntax analysis on token stream.
stream.

Identifies tokens (e.g., keywords, Generates an abstract syntax tree (AST) or


identifiers). parse tree.

Inserts tokens into the symbol table Updates symbol table with semantic
(basic entries). information (e.g., type, scope).

Generates lexical errors (e.g., Generates syntax errors (e.g., missing


invalid characters). semicolons).

2. Why Separate Lexical and Syntax Analysis?

1. Simplicity of Design
a. Separating tokenization (lexical) from grammar validation (syntax) simplifies compiler
architecture.
2. Efficiency
a. Specialized buffering techniques in the lexical analyzer speed up tokenization.
3. Specialization
a. Lexical analyzers use regular expressions, while parsers use context-free grammars
(CFG).
4. Portability
a. Lexical analyzers handle platform-specific input (e.g., file encoding), isolating these
details from the parser.

3. Advantages of Lexical Analysis

1. Foundation for Parsing


a. Provides tokens for syntax and semantic analysis (e.g., compilers, interpreters).
2. Error Localization
a. Pinpoints lexical errors (e.g., @num → invalid character).
3. Reusability
a. Lexical rules (e.g., regex for identifiers) can be reused across projects.
4. Web Development
a. Used in browsers to parse HTML/CSS/JavaScript into tokens for rendering.
4. Disadvantages of Lexical Analysis

1. Time-Consuming
a. Requires careful design of regex patterns for token recognition.
2. Complex Regular Expressions
a. Some patterns (e.g., floating-point numbers) are harder to define than PEG or EBNF
rules.
3. Debugging Overhead
a. Testing tokenization rules (e.g., edge cases like 0x1F vs. 0x1G) can be tedious.
4. Runtime Overhead
a. Generating token tables (e.g., DFA/NFA) adds initial compilation time.

5. Key Concepts

• Lexeme: Raw text matched to a token (e.g., "if", "123").


• Token: Categorized lexeme (e.g., keyword, number).
• Symbol Table: Stores identifiers with attributes (type, scope, memory address).

6. Example Workflow

Input Code:

int x = 42 + 5.3;

1. Lexical Analyzer Output:


a. Tokens: int (keyword), x (identifier), = (operator), 42 (integer), + (operator), 5.3 (float), ;
(punctuation).
2. Parser Output:
a. AST: Declaration
├── Type: int
├── Variable: x
└── Initializer:
└── BinaryExpression (+)
├── 42
└── 5.3

b. Syntax Error: Mixing int and float in an expression (detected in semantic analysis).
7. Summary Table

Aspect Lexical Analyzer Parser

Input Character stream Token stream

Output Tokens Parse tree/AST

Tools Lex, Flex Yacc, Bison, ANTLR

Errors Invalid tokens Grammar violations

Key Role Tokenization Structure validation

8. Mnemonics for Exam Prep

• "Lex Before Parse": Lexical analysis always precedes syntax analysis.


• "Regex for Tokens, CFG for Grammar": Lexical uses regex; parsing uses CFG.

You might also like