SP Unit-5
SP Unit-5
Compiler
Compiler is system software components that accepts a program return in a high level language
and produce an object program
Step 2:
The basic elements are tokens are entered into the table.
The table consists of 2 fields
1. Uniform symbols
2. Pointer
The uniform symbols are of fixed size and points the table entry of the associated basic element.
Here, uniform symbols are IDN for identifiers TRM->for terminals, LIT->for literals
Step 2:
Interpreting the meaning of the construction:
After performing the above step the resultant form is “syntactic form”
Step 3:
Intermediate form:
“The process of generating the object code for each construction after determining syntactic
construction is known as intermediate form”
1 Arithmetic Statements: The one intermediate form of the arithmetic statement is a parse tree.
The rules for convening arithmetic statement into a phrase tree are:
a) Any variable is a terminal node of a tree
b) For every operator having 2 branches in a binary tree whose left branch in the tree for
operand and whose right branch in the tree for operand 2.
Ex:
The another intermediate form is linear representation of the parse tree called a matrix
Matrix number Operator Operand1 Operand2
1 - Start Finish
2 * Rate M1
3 * 2 Rate
4 - Start Finish
5 - M4 100
6 * M3 M5
7 + M2 M6
8 = Cost M7
2 Non-arithmetic statement: The non-arithmetic statements are DD, IF, GOTO are the
examples of non-arithmetic statements
These statements can all be replaced by a sequential ordering of individual matrix entry.
Ex: Return (cost)
End
Matrix
Operator Operand1 Operand2
Return Cost
End
Ex:
Declare (Cost, Rate, Start, Finish) fixedBinary (31) static;
The tables consist of four fields
1. Variables-> cost, rate, start, finish
2. Data type-> fixed binary
3. Precession-> 31 bits
4. Storage class-> static
3) Storage allocation:
Proper amount of memory is reserved i.e., required by the program at some point of time.
Ex:
Declare (cost, Rate, Start, Finish) Fixed Binary
( 3 1 ) Static ;
Each variable of size 32 bite the first bit is reserved for representing sign bit. The sign bit is
allocated during load time
This relative addresses are used by the later phases of the compiler for proper accessing similarly
storage is also assigned for the temporary locations that will contain intermediate results of the
matrix
Ex:[ M1, M2, M3,………M7]
4 Code generations: The code generation phase taking the input in matrix form and generating
the object code for each and every entry defined in the table
Each entry in the matrix and with the associated object code is defined by a table called as
production on table
Ex:
Start – Finish
The operator -> In matrix is treated as a macro call
The operands start and finish -> Is treated as macro arguments
Operator operand1 operand2
L 1,&operand1
S 1,&operand2
ST 1,&N
The following code can be generated the above statement using code definition of the
operator minus.
L 1,start
S 1,finish
ST 1,M1
Assembly phase:
The code generating phase is producing assembly language or the process of generating the
actual code is known as assembly phase
The assembly phase must perform these operations:
1. Resolve label references
2. Calculate addresses
3. Generate binary machine instructions
4. Generate storage
5. Convert literals
1 lexical analysis: Recognition of basics element or tokens and creation of uniform single table
3 Interpitaton phases: It describes the definition of exact meaning, creation of matrix and tables
for respective routine [action routings]
4 Machine independent optimization: Creation of most optimal matrix [removes the duplicate
entries in the matrix table]
5 storage assignment: It makes entries in the matrix that allow code generation to create code
that allocates dynamic storage and also the assembly phase to reserve the proper amount of
STARTIC storage
6 Code generation: A macro processor is used to produced more optimal assembly code
7 Assembly and Output: It resolving symbolic address and generating the machine language
Phase 1 to 4 is machine independent and language3 dependent. Because this phases helps in
determining the syntax and meaning of each statement in the source program. Hence it
dependent on the language and independent of the machine
Phase 5 to 7 is machine dependent and language independent. Because this phase allocates
memory for literals and also generate the assembly code which is dependent on machine and
independent of language
2 Uniform symbol table: It consist of the tokens or basic elements as they appear in the
program created by lexical analysis phase and given as input syntax analysis and interprition
phase
3 Terminal table: This table is created by lexical analysis phase and contains all variable in the
program
4 Identifier table: It contains all variable in the program and temporary storage [Ex M1, M2,
M3 … M7] and information needed to reference allocate storage for the variables. This table is
created by lexical analysis
6 Reductions: It is a permanent table of decision rules in the form of pattern for matching with
the uniform symbols table to discover synthetic structure.
7 Matrix: Matrix is created by the intermediate form of the program which is created by the
action routine. It is optimized and then used for code generation
8 Code productions: It is permanent table of definition. There is one entry defining code for
each matrix operator.
9 assembly code: The assembly language variation of the program which is created by the code
generation phase and it is input to the assembly phase
10 Re-locatable object codes: The final output of the assembly phase ready to be use as input to
loader
Phases of compiler
1 Lexical phase:
The lexical phase performs the following three tasks:
1. Recognize basic elements are tokens present in the source code
2. Build literal and an identifier table
3. Build a uniform symbol table
Database:
Lexical phase involves the manipulation of 5 databases
1. Source program
2. Terminal table
3. Literal table
4. Identifier table
5. Uniform symbol table
1 Source program: The original form of the program created by the user
2 Terminal table: It is a permanent database it consist of 3 fields
3 Literal table:
It describes all literals constants used in the source program.
It consists of 6 fields:
Literal Base Scale Precision Other address
information
4 Identifier table:
It describes all identifiers used in the source program. It consists of three fields
1. Name
2. Data attribute
3. Address
Algorithm:
Step1: The first task of the lexical analysis algorithm is to parse the input character strange into
tokens
Step2: the second step is to make appropriate entries in the table.
Implementation:
1 The input strange is separated into tokens by break character. Brake characters are denoted by
the contents of a special field in the terminal table
2 lexical analysis 3 types of tokens:
1. Terminal symbols [TRM]
2. Identifiers [IDN]
3. Literals [LIT]
If symbol== TERMINAL table then
Create uniform symbol table of type TRM
3 Else if symbol==IDENTIFIER table then
Create uniform symbol table of type IDN
4 Else
Create uniform symbol table of type LIT
End if
2 Syntax Phase:
The functions of the syntax phase are
1. To recognize the major construct of the language
2. To call the appropriate action routines that will generate the intermediate form or matrix
form the constructs
Databases:
1 Uniform symbol table: The table create a by lexical phase
The uniform symbols are the source of input to the stack which s used by syntax and
interpretation phase
Table classes index
2 Stack: The stack is a collection of uniform symbol i.e., currently being worked on the stack is
organized in LIFO technique
3 Reduction table: The syntax rules of the source language are contained in the reduction table
The general form of the reduction or rules is:-
Label: old top stack/ action routine/ new top stack/ next reduction
Algorithm:
Step1: Reduction or tested consequently for match between old top of stack field and the actual
top of stack until match is found
Step2: When match is found the action routine specified in the action fields are executed in
ordered from left to right
Step3: when controlled return to the syntax analyzer, it modifies the top of stack to agree with
the new top of tack.
Step4: step1 is repeated starting with the reduction specified in the next reduction field
3. Interpretation Phase:
1. Uniform symbol table
2. Stack
3. Identifier table
4. Matrix
The above mentioned data bases are referred in text book page nos: 210.
5. Optimization Phase:
Optimization performed by a compiler are of 2 types. They are
6. Code generation:
The Purpose of the code generation is to produce appropriate code. In this phase Matrix is
the input data base.
Data bases:
• Matrix
• Identifier table
• Literal table
• Code productions.
Ex: code generation with machine dependent Optimization.
A=B+C+D
Refer in text book Page no: 224.
• The linker is the software program which binds many object modules to make a
single object program
• Functions of Linking are Static linking and dynamic linking.
A cross reference table can be used to accept one input value and produce one output value. The
example below shows a cross reference table lookup within a function. In addition to setting up
the function, you need to map all input elements from the source profile to the input values in the
function. You also need to map all output values from the function to the output elements in the
destination profile.
For example, when referring to the U.S. states:
• System A uses the State Name value
• System B uses the FIPS Alpha Code value
When mapping from System A to System B, you need to translate the State Name value to the
FIPS Alpha Code value. The SQL Select statement for the Output Element would be: SELECT
FIPS_Alpha_Code FROM State_Cross-Reference_Example WHERE State Name =
Input_Element. If the State Name=Alabama in System A then the FIPS Alpha Code=AL for
System B. "AL" is the value that will be returned in the output.