0% found this document useful (0 votes)
61 views

Compiler Construction: BY Ahsan Khan Email: Ahsan@Cuiatd - Edu.Pk

This document provides an overview of a compiler construction course. The course includes 32 lectures on theory and 32 on labs. It will teach students how a high-level language program is translated into machine code. Students will learn to write a simple compiler using LEX and YACC tools. The prerequisites are data structures, assembly language, and theory of automata. Key topics covered include stacks, queues, linked lists, trees, graphs, and hash tables. The document also discusses programming languages, the conversion from high-level to machine code via compilers, and the language processing steps of preprocessing.

Uploaded by

sardar Bityaan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Compiler Construction: BY Ahsan Khan Email: Ahsan@Cuiatd - Edu.Pk

This document provides an overview of a compiler construction course. The course includes 32 lectures on theory and 32 on labs. It will teach students how a high-level language program is translated into machine code. Students will learn to write a simple compiler using LEX and YACC tools. The prerequisites are data structures, assembly language, and theory of automata. Key topics covered include stacks, queues, linked lists, trees, graphs, and hash tables. The document also discusses programming languages, the conversion from high-level to machine code via compilers, and the language processing steps of preprocessing.

Uploaded by

sardar Bityaan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Compiler Construction BY

AHSAN KHAN
EMAIL: [email protected]
Basic introduction to course

Instructor Ahsan Khan

Lectures 32 – Theory
32 – LAB

Text Book Compilers – Principles, Techniques


and Tools by
Aho, Sethi and Ullman
Why to Take this Course?
Understanding Compilers

 Understanding the Code structure


 Understanding the language semantics
 Understanding the relation between source code and machine code
Course Objective

 You will learn that how a high-level


language program is systematically
translated into a low-level machine
language.
 you will be able to write a simple complier
Tools
using automated tools LEX and YACC.
LEX/YACC
Course Assembly
Data Structures
language
Theory of Automata
Perquisites
 Data Structure can be defined as the group of
data elements which provides an efficient way
Data Structures of storing and organizing data in the computer
so that it can be used efficiently
 Data Structures are the main part of many
computer science algorithms as they enable
the programmers to handle the data in an
efficient way. It plays a vital role in enhancing
the performance of a software or a program as
the main function of the software is to store
and retrieve the user’s data as fast as possible
 Processor speed: To handle very large amount of data, high speed
Need of Data processing is required, but as the data is growing day by day to the
billions of files per entity, processor may fail to deal with that
Structures much amount of data.
 Data Search: Consider an inventory size of 106 items in a store, If
our application needs to search for a particular item, it needs to
traverse 106 items every time, results in slowing down the search
process.
 Multiple requests: If thousands of users are searching the data
simultaneously on a web server, then there are the chances that a
very large server can be failed during that process
 Arrays- An array stores a collection of items at
adjoining memory locations. Items that are the
Non-Primitive same type get stored together so that the position
of each element can be calculated or retrieved
Data Structure easily. Arrays can be fixed or flexible in length.
 Stacks- A stack stores a collection of items in the
linear order that operations are applied. This order
could be last in first out (LIFO) or first in first out
(FIFO).
 Queues- A queue stores a collection of items
similar to a stack; however, the operation order
can only be first in first out.
 Linked lists- A linked list stores a collection of
items in a linear order. Each element, or node, in a
linked list contains a data item as well as a
reference, or link, to the next item in the list.
 Trees- A tree stores a collection of items in an
abstract, hierarchical way. Each node is linked
to other nodes and can have multiple sub-
values, also known as children.
 Graphs- A graph stores a collection of items in a
non-linear fashion. Graphs are made up of a
finite set of nodes, also known as vertices, and
lines that connect them, also known as edges.
These are useful for representing real-life
systems such as computer networks.
 Hash tables- A hash table, or a hash map, stores
a collection of items in an associative array that
plots keys to values. A hash table uses a hash
function to convert an index into an array of
buckets that contain the desired data item.
 Stack is a linear data structure which follows a
Stack particular order in which the operations are performed.
The order may be LIFO(Last In First Out) or
FILO(First In Last Out).
 Mainly the following three basic operations are
performed in the stack:
 Push: Adds an item in the stack. If the stack is full,
then it is said to be an Overflow condition.
 Pop: Removes an item from the stack. The items are
popped in the reversed order in which they are pushed.
If the stack is empty, then it is said to be an Underflow
condition.
 Peek or Top: Returns top element of stack.
 isEmpty: Returns true if stack is empty, else false.
// CPP program to demonstrate working of STL stack
#include <bits/stdc++.h>
using namespace std;
 
void showstack(stack <int> s)
{
    while (!s.empty())
    {
        cout << '\t' << s.top();
        s.pop();
    }
    cout << '\n';
}
 
int main ()
{
    stack <int> s;
    s.push(10);
    s.push(30);
    s.push(20);
    s.push(5);
    s.push(1);
 
    cout << "The stack is : ";
    showstack(s);
 
    cout << "\ns.size() : " << s.size();
    cout << "\ns.top() : " << s.top();
 
 
    cout << "\ns.pop() : ";
    s.pop();
    showstack(s);
 
    return 0;
}
 A linked list is a linear data structure, in which the
elements are not stored at contiguous memory locations.
The elements in a linked list are linked using pointers as

Linked List
shown in the below image

Data
Structure
// A linked list node 
class Node 

    public:
    int data; 
    Node *next; 
}; 
// A simple CPP program to introduce
// a linked list
#include <bits/stdc++.h>
using namespace std;
  
int main()
{
    Node* head = NULL;
    Node* second = NULL;
    Node* third = NULL;
  
    // allocate 3 nodes in the heap
    head = new Node();
    second = new Node();
    third = new Node();
  
   
  /* Three blocks have been allocated dynamically. 
  
    We have pointers to these three blocks as head, 
    second and third     
    head         second         third 
        |             |             | 
        |             |             | 
    +---+-----+     +----+----+     +----+----+ 
    | # | # |     | # | # |     | # | # | 
    +---+-----+     +----+----+     +----+----+ 
   
  
     
    head->data = 1; // assign data in first node
    head->next = second; // Link first node with
    // the second node
  
    /* data has been assigned to the data part of first 
    block (block pointed by the head). And next 
    pointer of the first block points to second. 
    So they both are linked. 
  
    head         second         third 
        |             |             | 
        |             |             | 
    +---+---+     +----+----+     +-----+----+ 
    | 1 | o----->| # | # |     | # | # | 
    +---+---+     +----+----+     +-----+----+     
*/
  
    // assign data to second node
   
  
    // assign data to second node
    second->data = 2;
  
    // Link second node with the third node
    second->next = third;
  
    /* data has been assigned to the data part of the second 
    block (block pointed by second). And next 
    pointer of the second block points to the third 
    block. So all three blocks are linked. 
      
    head         second         third 
        |             |             | 
        |             |             | 
    +---+---+     +---+---+     +----+----+ 
    | 1 | o----->| 2 | o-----> | # | # | 
    +---+---+     +---+---+     +----+----+     */
  
    third->data = 3; // assign data to third node
    third->next = NULL;
  
    /* data has been assigned to the data part of the third 
    block (block pointed by third). And next pointer 
    of the third block is made NULL to indicate 
    that the linked list is terminated here. 
  
    We have the linked list ready. 
  
        head     
            | 
            | 
        +---+---+     +---+---+     +----+------+ 
        | 1 | o----->| 2 | o-----> | 3 | NULL | 
        +---+---+     +---+---+     +----+------+     
      
      
    Note that only the head is sufficient to represent 
Computer Programing language

 Programming languages are notations for


describing computations to people and to
machines.
 Computer programs are written in three
types of language
 High level languages : closer to problem
domain , so problems are easier to solve
 Machine language: complex , more efficient
and faster
Compared to machine language, the notation used by programming languages is
closer to the way humans think about problems.

The compiler can spot some obvious programming mistakes.

High level
language Programs written in a high-level language tend to be shorter than equivalent
programs written in machine language.

Another advantage of using a high-level level language is that the same program
can be compiled to many different machine languages and, hence, be brought to
run on many different machines.
Conversion of high level to machine code
Language processing Steps

 Programs are written in high language pass


through series of tools to get desired
machine code known as language
preprocessing
C language preprocessing

1.The C++ preprocessor copies the contents of the included header files into the
source code file, generates macro code, and replaces symbolic constants
defined using #define with their values.
2.The expanded source code file produced by the C++ preprocessor is compiled
into the assembly language for the platform.
3.The assembler code generated by the compiler is assembled into the object
code for the platform.
4.On compilation of source code, the machine code generated for different
processors like Intel, AMD, an ARM is different. To make code portable, the
source code is first converted to Object Code. It is an intermediary code
(similar to machine code) that no processor will understand. At run time, the
object code is converted to the machine code of the underlying platform
5. The object code file generated by the assembler combined with functions
from the standard library archive files by the linker to produce an executable
file. By default, this executable file is named a.out. In this case, we have used
the -o option to specify the name of the executable file as prog1.
Step 1: Preprocessor

A preprocessor, generally considered as a part of compiler, is a tool that produces


input for compilers. It deals with macro-processing, augmentation, file inclusion,
language extension, etc
Step 02: What are Compilers ?

 Compiler is a software which converts a


program written in high level language
(Source Language) to low level language
(Object/Target/Machine Language).
 An important role of the compiler is to
report any errors in the source program that
it detects during the translation process. Java , Assembly,
 Compilers may be distinguished in two android , machine
ways: C# codes
By the kind of machine code they generate
By the format of the target code they generate
Typical Compilers:

• VC, VC++, GCC, JavaC


• FORTRAN, Pascal, VB(?)
Examples
Translators

• Word to PDF
• PDF to Postscript
Step 03: Assembler

 Programmers found it difficult to write or read programs in


machine language. They begin to use a mnemonic
(symbols) for each machine instruction, which they would
subsequently translate into machine language. Such a
mnemonic machine language is now called an assembly
language. Programs known as assembler were written to
automate the translation of assembly language in to machine
language. The input to an assembler program is called
source program, the output is a machine language
translation (object program).
Step 04: Linker

 Linkeris a computer program that links and merges


various object files together in order to make an
executable file. All these files might have been
compiled by separate assemblers. The major task of a
linker is to search and locate referenced
module/routines in a program and to determine the
memory location where these codes will be loaded,
making the program instruction to have absolute
references.
interpreter

 An interpreter is another common kind of language processor. Instead of


producing a target program as a translation, an interpreter appears to directly
execute the operations specified in the source program on inputs supplied by the
user
Difference between compiler and interpreter

Interpreter Compiler

Scans the entire program and translates it as a whole into machine


Translates program one statement at a time.
code.

Interpreters usually take less amount of time to analyze the source Compilers usually take a large amount of time to analyze the source
code. However, the overall execution time is comparatively slower code. However, the overall execution time is comparatively faster than
than compilers. interpreters.

Generates intermediate object code which further requires linking,


No intermediate object code is generated, hence are memory efficient.
hence requires more memory.

Programming languages like JavaScript, Python, Ruby use


Programming languages like C, C++, Java use compilers.
interpreters.
 The machine-language target program produced by a compiler is usually much
faster than an interpreter at mapping inputs to outputs . An interpreter, however,
can usually give better error diagnostics than a compiler, because it executes the
source program statement by statement.
 Some languages combine interpretation with compilation like java
 A Java source program may first be compiled into an intermediate form called
bytecodes. The bytecodes are then interpreted by a virtual machine. A benefit of
this arrangement is that bytecodes compiled on one machine can be interpreted on
another machine, perhaps across a network
Two main phases of Compiler

Structure of
1. Analysis
compiler

2. Synthesis
Analysis phase of compiler

 Also known as front end of compiler


 The analysis part breaks up the source program into constituent pieces and imposes a
grammatical structure on them.
 It then uses this structure to create an intermediate representation of the source program
 If the analysis part detects that the source program is either syntactically ill formed or
semantically unsound, then it must provide informative messages, so the user can take
corrective action.
 And also checks and indicates the syntax and semantic errors of a source program.
 The analysis part also collects information about the source program and stores it in a data
structure called a symbol table, which is passed along with the intermediate representation
to the synthesis part
Synthesis phase of compiler

 The synthesis part constructs the desired target program from the intermediate
representation and the information in the symbol table.
 It will get the analysis phase input(intermediate representation and symbol table)
and produces the targeted machine level code.
 This is also called as the back end of a compiler.

You might also like