The document discusses syntax analysis in compiler design. It defines syntax analysis as the process of analyzing a string of symbols according to the rules of a formal grammar. This involves checking the syntax against a context-free grammar, which is more powerful than regular expressions and can check balancing of tokens. The output of syntax analysis is a parse tree. It separates lexical analysis and parsing for simplicity and efficiency. Lexical analysis breaks the source code into tokens, while parsing analyzes token streams against production rules to detect errors and generate the parse tree.
The document discusses lexical analysis and how it relates to parsing in compilers. It introduces basic terminology like tokens, patterns, lexemes, and attributes. It describes how a lexical analyzer works by scanning input, identifying tokens, and sending tokens to a parser. Regular expressions are used to specify patterns for token recognition. Finite automata like nondeterministic and deterministic finite automata are constructed from regular expressions to recognize tokens.
This document discusses the role of a lexical analyzer in compiling source code. It contains the following key points:
1. A lexical analyzer takes source code as input and breaks it down into tokens by removing whitespace and comments. It checks for valid tokens and passes data to the syntax analyzer.
2. Regular expressions are used to formally specify tokens through patterns. Examples of tokens include keywords, identifiers, numbers, and operators.
3. A finite automaton can recognize tokens by using a transition diagram or table to transition between states based on input characters and accept or reject token patterns.
New compiler design 101 April 13 2024.pdfeliasabdi2024
This document provides an overview of syntax analysis, also known as parsing. It discusses the functions and responsibilities of a parser, context-free grammars, concepts and terminology related to grammars, writing and designing grammars, resolving grammar problems, top-down and bottom-up parsing approaches, typical parser errors and recovery strategies. The document also reviews lexical analysis and context-free grammars as they relate to parsing during compilation.
The document discusses the role and process of a lexical analyzer in compiler design. A lexical analyzer groups input characters into lexemes and produces a sequence of tokens as output for the syntactic analyzer. It strips out comments and whitespace, correlates line numbers with errors, and interacts with the symbol table. Lexical analysis improves compiler efficiency, portability, and allows for simpler parser design by separating lexical and syntactic analysis.
The document discusses the role and implementation of a lexical analyzer in compilers. A lexical analyzer is the first phase of a compiler that reads source code characters and generates a sequence of tokens. It groups characters into lexemes and determines the tokens based on patterns. A lexical analyzer may need to perform lookahead to unambiguously determine tokens. It associates attributes with tokens, such as symbol table entries for identifiers. The lexical analyzer and parser interact through a producer-consumer relationship using a token buffer.
This module performs semantic analysis on the input program to ensure logical correctness and adherence to the rules of the PIM ISA. It involves type checking, symbol table management, scope resolution, and validation of instruction semantics. The module detects invalid operations, ensures proper use of memory and registers, and enforces constraints specific to the architecture. By verifying the correctness of high-level constructs before code generation, it helps prevent runtime errors and optimizes instruction translation for efficient execution.
The document discusses the structure and process of a compiler. It has two major phases - the front-end and back-end. The front-end performs analysis of the source code by recognizing legal/illegal programs, understanding semantics, and producing an intermediate representation. The back-end translates the intermediate representation into target code. The general structure includes lexical analysis, syntax analysis, semantic analysis, code generation and optimization phases.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
The document discusses lexical analysis in compilers. It begins with an overview of lexical analysis and its role as the first phase of a compiler. It describes how a lexical analyzer works by reading the source program as a stream of characters and grouping them into lexemes (tokens). Regular expressions are used to specify patterns for tokens. The document then discusses specific topics like lexical errors, input buffering techniques, specification of tokens using regular expressions and grammars, recognition of tokens using transition diagrams, and the transition diagram for identifiers and keywords.
The document discusses lexical analysis in compilers. It defines lexical analysis as the first phase of compilation that reads the source code characters and groups them into meaningful tokens. It describes how a lexical analyzer works by generating tokens in the form of <token name, attribute value> from the source code lexemes. Examples of tokens generated for a sample program are provided. Methods for handling lexical errors, buffering input, specifying tokens with regular expressions and recognizing tokens using transition diagrams are also summarized.
The document discusses lexical analysis, which is the first phase of compilation. It involves reading the source code and grouping characters into meaningful sequences called lexemes. Each lexeme is mapped to a token that is passed to the subsequent parsing phase. Regular expressions are used to specify patterns for tokens. A lexical analyzer uses finite automata to recognize tokens based on these patterns. Lexical analyzers may also perform tasks like removing comments and whitespace from the source code.
The document discusses the role and implementation of a lexical analyzer. It can be summarized as:
1. A lexical analyzer scans source code, groups characters into lexemes, and produces tokens which it returns to the parser upon request. It handles tasks like removing whitespace and expanding macros.
2. It implements buffering techniques to efficiently scan large inputs and uses transition diagrams to represent patterns for matching tokens.
3. Regular expressions are used to specify patterns for tokens, and flex is a common language for implementing lexical analyzers based on these specifications.
Lex and Yacc are compiler-writing tools that are used to specify lexical tokens and context-free grammars. Lex is used to specify lexical tokens and their processing order, while Yacc is used to specify context-free grammars for LALR(1) parsing. Both tools have a long history in computing and were originally developed in the early days of Unix on minicomputers. Lex reads a specification of lexical patterns and actions and generates a C program that implements a lexical analyzer. The generated C program defines a function called yylex() that scans the input and returns the recognized tokens.
The document discusses the role of a lexical analyzer in a compiler. It states that the lexical analyzer is the first phase of a compiler. Its main task is to read characters as input and produce a sequence of tokens that the parser uses for syntax analysis. It groups character sequences into lexemes and passes the resulting tokens to the parser along with any attribute values. The lexical analyzer and parser form a producer-consumer relationship, with the lexical analyzer producing tokens for the parser to consume.
This document describes the syllabus for the course CS2352 Principles of Compiler Design. It includes 5 units covering lexical analysis, syntax analysis, intermediate code generation, code generation, and code optimization. The objectives of the course are to understand and implement a lexical analyzer, parser, code generation schemes, and optimization techniques. It lists a textbook and references for the course and provides a brief description of the topics to be covered in each unit.
New compiler design 101 April 13 2024.pdfeliasabdi2024
This document provides an overview of syntax analysis, also known as parsing. It discusses the functions and responsibilities of a parser, context-free grammars, concepts and terminology related to grammars, writing and designing grammars, resolving grammar problems, top-down and bottom-up parsing approaches, typical parser errors and recovery strategies. The document also reviews lexical analysis and context-free grammars as they relate to parsing during compilation.
The document discusses the role and process of a lexical analyzer in compiler design. A lexical analyzer groups input characters into lexemes and produces a sequence of tokens as output for the syntactic analyzer. It strips out comments and whitespace, correlates line numbers with errors, and interacts with the symbol table. Lexical analysis improves compiler efficiency, portability, and allows for simpler parser design by separating lexical and syntactic analysis.
The document discusses the role and implementation of a lexical analyzer in compilers. A lexical analyzer is the first phase of a compiler that reads source code characters and generates a sequence of tokens. It groups characters into lexemes and determines the tokens based on patterns. A lexical analyzer may need to perform lookahead to unambiguously determine tokens. It associates attributes with tokens, such as symbol table entries for identifiers. The lexical analyzer and parser interact through a producer-consumer relationship using a token buffer.
This module performs semantic analysis on the input program to ensure logical correctness and adherence to the rules of the PIM ISA. It involves type checking, symbol table management, scope resolution, and validation of instruction semantics. The module detects invalid operations, ensures proper use of memory and registers, and enforces constraints specific to the architecture. By verifying the correctness of high-level constructs before code generation, it helps prevent runtime errors and optimizes instruction translation for efficient execution.
The document discusses the structure and process of a compiler. It has two major phases - the front-end and back-end. The front-end performs analysis of the source code by recognizing legal/illegal programs, understanding semantics, and producing an intermediate representation. The back-end translates the intermediate representation into target code. The general structure includes lexical analysis, syntax analysis, semantic analysis, code generation and optimization phases.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
- Lexical analyzer reads source program character by character to produce tokens. It returns tokens to the parser one by one as requested.
- A token represents a set of strings defined by a pattern and has a type and attribute to uniquely identify a lexeme. Regular expressions are used to specify patterns for tokens.
- A finite automaton can be used as a lexical analyzer to recognize tokens. Non-deterministic finite automata (NFA) and deterministic finite automata (DFA) are commonly used, with DFA being more efficient for implementation. Regular expressions for tokens are first converted to NFA and then to DFA.
The document discusses lexical analysis in compilers. It begins with an overview of lexical analysis and its role as the first phase of a compiler. It describes how a lexical analyzer works by reading the source program as a stream of characters and grouping them into lexemes (tokens). Regular expressions are used to specify patterns for tokens. The document then discusses specific topics like lexical errors, input buffering techniques, specification of tokens using regular expressions and grammars, recognition of tokens using transition diagrams, and the transition diagram for identifiers and keywords.
The document discusses lexical analysis in compilers. It defines lexical analysis as the first phase of compilation that reads the source code characters and groups them into meaningful tokens. It describes how a lexical analyzer works by generating tokens in the form of <token name, attribute value> from the source code lexemes. Examples of tokens generated for a sample program are provided. Methods for handling lexical errors, buffering input, specifying tokens with regular expressions and recognizing tokens using transition diagrams are also summarized.
The document discusses lexical analysis, which is the first phase of compilation. It involves reading the source code and grouping characters into meaningful sequences called lexemes. Each lexeme is mapped to a token that is passed to the subsequent parsing phase. Regular expressions are used to specify patterns for tokens. A lexical analyzer uses finite automata to recognize tokens based on these patterns. Lexical analyzers may also perform tasks like removing comments and whitespace from the source code.
The document discusses the role and implementation of a lexical analyzer. It can be summarized as:
1. A lexical analyzer scans source code, groups characters into lexemes, and produces tokens which it returns to the parser upon request. It handles tasks like removing whitespace and expanding macros.
2. It implements buffering techniques to efficiently scan large inputs and uses transition diagrams to represent patterns for matching tokens.
3. Regular expressions are used to specify patterns for tokens, and flex is a common language for implementing lexical analyzers based on these specifications.
Lex and Yacc are compiler-writing tools that are used to specify lexical tokens and context-free grammars. Lex is used to specify lexical tokens and their processing order, while Yacc is used to specify context-free grammars for LALR(1) parsing. Both tools have a long history in computing and were originally developed in the early days of Unix on minicomputers. Lex reads a specification of lexical patterns and actions and generates a C program that implements a lexical analyzer. The generated C program defines a function called yylex() that scans the input and returns the recognized tokens.
The document discusses the role of a lexical analyzer in a compiler. It states that the lexical analyzer is the first phase of a compiler. Its main task is to read characters as input and produce a sequence of tokens that the parser uses for syntax analysis. It groups character sequences into lexemes and passes the resulting tokens to the parser along with any attribute values. The lexical analyzer and parser form a producer-consumer relationship, with the lexical analyzer producing tokens for the parser to consume.
This document describes the syllabus for the course CS2352 Principles of Compiler Design. It includes 5 units covering lexical analysis, syntax analysis, intermediate code generation, code generation, and code optimization. The objectives of the course are to understand and implement a lexical analyzer, parser, code generation schemes, and optimization techniques. It lists a textbook and references for the course and provides a brief description of the topics to be covered in each unit.
This document provides an overview of finite automata, including deterministic finite automata (DFAs) and non-deterministic finite automata (NFAs). It defines what a finite automaton is, describes the components of a DFA and NFA, and how they process input strings. It also discusses the relationship between DFAs and NFAs, showing that any language recognizable by an NFA is also recognizable by a DFA through subset construction. Examples are provided to illustrate DFA and NFA design.
π0.5: a Vision-Language-Action Model with Open-World GeneralizationNABLAS株式会社
今回の資料「Transfusion / π0 / π0.5」は、画像・言語・アクションを統合するロボット基盤モデルについて紹介しています。
拡散×自己回帰を融合したTransformerをベースに、π0.5ではオープンワールドでの推論・計画も可能に。
This presentation introduces robot foundation models that integrate vision, language, and action.
Built on a Transformer combining diffusion and autoregression, π0.5 enables reasoning and planning in open-world settings.
"Feed Water Heaters in Thermal Power Plants: Types, Working, and Efficiency G...Infopitaara
A feed water heater is a device used in power plants to preheat water before it enters the boiler. It plays a critical role in improving the overall efficiency of the power generation process, especially in thermal power plants.
🔧 Function of a Feed Water Heater:
It uses steam extracted from the turbine to preheat the feed water.
This reduces the fuel required to convert water into steam in the boiler.
It supports Regenerative Rankine Cycle, increasing plant efficiency.
🔍 Types of Feed Water Heaters:
Open Feed Water Heater (Direct Contact)
Steam and water come into direct contact.
Mixing occurs, and heat is transferred directly.
Common in low-pressure stages.
Closed Feed Water Heater (Surface Type)
Steam and water are separated by tubes.
Heat is transferred through tube walls.
Common in high-pressure systems.
⚙️ Advantages:
Improves thermal efficiency.
Reduces fuel consumption.
Lowers thermal stress on boiler components.
Minimizes corrosion by removing dissolved gases.
Sorting Order and Stability in Sorting.
Concept of Internal and External Sorting.
Bubble Sort,
Insertion Sort,
Selection Sort,
Quick Sort and
Merge Sort,
Radix Sort, and
Shell Sort,
External Sorting, Time complexity analysis of Sorting Algorithms.
This paper proposes a shoulder inverse kinematics (IK) technique. Shoulder complex is comprised of the sternum, clavicle, ribs, scapula, humerus, and four joints.
In tube drawing process, a tube is pulled out through a die and a plug to reduce its diameter and thickness as per the requirement. Dimensional accuracy of cold drawn tubes plays a vital role in the further quality of end products and controlling rejection in manufacturing processes of these end products. Springback phenomenon is the elastic strain recovery after removal of forming loads, causes geometrical inaccuracies in drawn tubes. Further, this leads to difficulty in achieving close dimensional tolerances. In the present work springback of EN 8 D tube material is studied for various cold drawing parameters. The process parameters in this work include die semi-angle, land width and drawing speed. The experimentation is done using Taguchi’s L36 orthogonal array, and then optimization is done in data analysis software Minitab 17. The results of ANOVA shows that 15 degrees die semi-angle,5 mm land width and 6 m/min drawing speed yields least springback. Furthermore, optimization algorithms named Particle Swarm Optimization (PSO), Simulated Annealing (SA) and Genetic Algorithm (GA) are applied which shows that 15 degrees die semi-angle, 10 mm land width and 8 m/min drawing speed results in minimal springback with almost 10.5 % improvement. Finally, the results of experimentation are validated with Finite Element Analysis technique using ANSYS.
Value Stream Mapping Worskshops for Intelligent Continuous SecurityMarc Hornbeek
This presentation provides detailed guidance and tools for conducting Current State and Future State Value Stream Mapping workshops for Intelligent Continuous Security.
☁️ GDG Cloud Munich: Build With AI Workshop - Introduction to Vertex AI! ☁️
Join us for an exciting #BuildWithAi workshop on the 28th of April, 2025 at the Google Office in Munich!
Dive into the world of AI with our "Introduction to Vertex AI" session, presented by Google Cloud expert Randy Gupta.
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...Infopitaara
A Boiler Feed Pump (BFP) is a critical component in thermal power plants. It supplies high-pressure water (feedwater) to the boiler, ensuring continuous steam generation.
⚙️ How a Boiler Feed Pump Works
Water Collection:
Feedwater is collected from the deaerator or feedwater tank.
Pressurization:
The pump increases water pressure using multiple impellers/stages in centrifugal types.
Discharge to Boiler:
Pressurized water is then supplied to the boiler drum or economizer section, depending on design.
🌀 Types of Boiler Feed Pumps
Centrifugal Pumps (most common):
Multistage for higher pressure.
Used in large thermal power stations.
Positive Displacement Pumps (less common):
For smaller or specific applications.
Precise flow control but less efficient for large volumes.
🛠️ Key Operations and Controls
Recirculation Line: Protects the pump from overheating at low flow.
Throttle Valve: Regulates flow based on boiler demand.
Control System: Often automated via DCS/PLC for variable load conditions.
Sealing & Cooling Systems: Prevent leakage and maintain pump health.
⚠️ Common BFP Issues
Cavitation due to low NPSH (Net Positive Suction Head).
Seal or bearing failure.
Overheating from improper flow or recirculation.
RICS Membership-(The Royal Institution of Chartered Surveyors).pdfMohamedAbdelkader115
Glad to be one of only 14 members inside Kuwait to hold this credential.
Please check the members inside kuwait from this link:
https://www.rics.org/networking/find-a-member.html?firstname=&lastname=&town=&country=Kuwait&member_grade=(AssocRICS)&expert_witness=&accrediation=&page=1
Fluid mechanics is the branch of physics concerned with the mechanics of fluids (liquids, gases, and plasmas) and the forces on them. Originally applied to water (hydromechanics), it found applications in a wide range of disciplines, including mechanical, aerospace, civil, chemical, and biomedical engineering, as well as geophysics, oceanography, meteorology, astrophysics, and biology.
It can be divided into fluid statics, the study of various fluids at rest, and fluid dynamics.
Fluid statics, also known as hydrostatics, is the study of fluids at rest, specifically when there's no relative motion between fluid particles. It focuses on the conditions under which fluids are in stable equilibrium and doesn't involve fluid motion.
Fluid kinematics is the branch of fluid mechanics that focuses on describing and analyzing the motion of fluids, such as liquids and gases, without considering the forces that cause the motion. It deals with the geometrical and temporal aspects of fluid flow, including velocity and acceleration. Fluid dynamics, on the other hand, considers the forces acting on the fluid.
Fluid dynamics is the study of the effect of forces on fluid motion. It is a branch of continuum mechanics, a subject which models matter without using the information that it is made out of atoms; that is, it models matter from a macroscopic viewpoint rather than from microscopic.
Fluid mechanics, especially fluid dynamics, is an active field of research, typically mathematically complex. Many problems are partly or wholly unsolved and are best addressed by numerical methods, typically using computers. A modern discipline, called computational fluid dynamics (CFD), is devoted to this approach. Particle image velocimetry, an experimental method for visualizing and analyzing fluid flow, also takes advantage of the highly visual nature of fluid flow.
Fundamentally, every fluid mechanical system is assumed to obey the basic laws :
Conservation of mass
Conservation of energy
Conservation of momentum
The continuum assumption
For example, the assumption that mass is conserved means that for any fixed control volume (for example, a spherical volume)—enclosed by a control surface—the rate of change of the mass contained in that volume is equal to the rate at which mass is passing through the surface from outside to inside, minus the rate at which mass is passing from inside to outside. This can be expressed as an equation in integral form over the control volume.
The continuum assumption is an idealization of continuum mechanics under which fluids can be treated as continuous, even though, on a microscopic scale, they are composed of molecules. Under the continuum assumption, macroscopic (observed/measurable) properties such as density, pressure, temperature, and bulk velocity are taken to be well-defined at "infinitesimal" volume elements—small in comparison to the characteristic length scale of the system, but large in comparison to molecular length scale
2. The Role of the Lexical Analyzer
• As the first phase of a compiler, the main task of the lexical analyzer
is to read the input characters of the source program, group them into
lexemes, and produce as output a sequence of tokens for each lexeme
in the source program.
• It is common for the lexical analyzer to interact with the symbol table
as well. When the lexical analyzer discovers a lexeme constituting an
identifier, it needs to enter that lexeme into the symbol table.
3. The role of lexical analyzer
Lexical Analyzer Parser
Source
program
token
getNextToken
Symbol
table
To semantic
analysis
4. Lexical Analyzer continue----
• Since the lexical analyzer is the part of the compiler that reads the
source text, it may perform certain other tasks besides identification
of lexemes
They are:
• One such task is stripping out comments and whitespace (blank,
newline, tab
• Another task is correlating error messages generated by the compiler
with the source program.
• i.e For instance, the lexical analyzer may keep track of the number of
newline characters seen, so it can associate a line number with each
error message.
5. Sometimes, lexical analyzers are divided into a cascade of two
processes
a) Scanning consists of the simple processes that do not require
tokenization of the input, such as deletion of comments and
compaction of consecutive whitespace characters into one.
b) b) Lexical analysis proper is the more complex portion, where the
scanner produces the sequence of tokens as output
6. Why to separate Lexical analysis and
parsing(Syntax analyzer)
1. Simplicity of design
The separation of lexical and syntactic analysis often
allows us to simplify at least one of these tasks. For
example, a parser that had to deal with comments and
whitespace as syntactic units would be considerably more
complex than one that can assume comments and
whitespace have already been removed by the lexical
analyzer
2. Improving compiler efficiency
A separate lexical analyzer allows us to apply specialized
techniques that serve only the lexical task, not the job of
parsing.
8. Tokens, Patterns and Lexemes
• A token is a pair a token name and an optional token value
• A pattern is a description of the form that the lexemes of a token may
take
• A lexeme is a sequence of characters in the source program that
matches the pattern for a token
9. Example
Token Informal description Sample lexemes
if
else
comparison
id
number
literal
Characters i, f
Characters e, l, s, e
< or > or <= or >= or == or !=
Letter followed by letter and digits
Any numeric constant
Anything but “ sorrounded by “
if
else
<=, !=
pi, score, D2
3.14159, 0, 6.02e23
“core dumped”
printf(“total = %dn”, score);
10. Attributes for tokens
• E = M * C ** 2
• <id, pointer to symbol table entry for E>
• <assign-op>
• <id, pointer to symbol table entry for M>
• <mult-op>
• <id, pointer to symbol table entry for C>
• <exp-op>
• <number, integer value 2>
11. Lexical errors
It is hard for a lexical analyzer to tell, without the aid of other
components, that there is a source-code error
• Some errors are out of power of lexical analyzer to recognize:
• fi (a == f(x)) …
a lexical analyzer cannot tell whether fi is a misspelling of the keyword if or an
undeclared function identifier. Since fi is a valid lexeme for the token id, the
lexical analyzer must return the token id to the parser and let some other
phase of the compiler - probably the parser in this case - handle an error
• However it may be able to recognize errors like:
• d = 2r
• Such errors are recognized when no pattern for tokens matches a
character sequence
12. Error recovery
Suppose a situation arises in which the lexical analyzer is unable to
proceed because none of the patterns for tokens matches any prefix of
the remaining input
The simplest recovery strategy is "panic mode" recovery:-
• Successive characters are ignored until we reach to a well formed
token
• Delete one character from the remaining input
• Insert a missing character into the remaining input
• Replace a character by another character
• Transpose two adjacent characters
13. Input buffering
• Sometimes lexical analyzer needs to look ahead some symbols to
decide about the token to return
• In C language: we need to look after -, = or < to decide what token to return
• In Fortran: DO 5 I = 1.25
• We need to introduce a two buffer scheme to handle large look-
aheads safely
E = M * C * * 2 eof
14. Sentinels
Switch (*forward++) {
case eof:
if (forward is at end of first buffer) {
reload second buffer;
forward = beginning of second buffer;
}
else if {forward is at end of second buffer) {
reload first buffer;
forward = beginning of first buffer;
}
else /* eof within a buffer marks the end of input */
terminate lexical analysis;
break;
cases for the other characters;
}
E = M eof * C * * 2 eof eof
15. Specification of tokens
• In theory of compilation regular expressions are used to formalize
the specification of tokens
• Regular expressions are means for specifying regular languages
• Example:
• Letter_(letter_ | digit)*
• Each regular expression is a pattern specifying the form of strings
16. Regular expressions
• Ɛ is a regular expression, L(Ɛ) = {Ɛ}
• If a is a symbol in ∑then a is a regular expression, L(a) = {a}
• (r) | (s) is a regular expression denoting the language L(r) ∪ L(s)
• (r)(s) is a regular expression denoting the language L(r)L(s)
• (r)* is a regular expression denoting (L9r))*
• (r) is a regular expression denting L(r)
17. Regular definitions
d1 -> r1
d2 -> r2
…
dn -> rn
• Example:
letter_ -> A | B | … | Z | a | b | … | Z | _
digit -> 0 | 1 | … | 9
id -> letter_ (letter_ | digit)*
18. Extensions
• One or more instances: (r)+
• Zero of one instances: r?
• Character classes: [abc]
• Example:
• letter_ -> [A-Za-z_]
• digit -> [0-9]
• id -> letter_(letter|digit)*
19. Recognition of tokens
• Starting point is the language grammar to understand the tokens:
stmt -> if expr then stmt
| if expr then stmt else stmt
| Ɛ
expr -> term relop term
| term
term -> id
| number
20. Recognition of tokens (cont.)
• The next step is to formalize the patterns:
digit -> [0-9]
Digits -> digit+
number -> digit(.digits)? (E[+-]? Digit)?
letter -> [A-Za-z_]
id -> letter (letter|digit)*
If -> if
Then -> then
Else -> else
Relop -> < | > | <= | >= | = | <>
• We also need to handle whitespaces:
ws -> (blank | tab | newline)+
25. Architecture of a transition-diagram-based
lexical analyzer
TOKEN getRelop()
{
TOKEN retToken = new (RELOP)
while (1) { /* repeat character processing until a
return or failure occurs */
switch(state) {
case 0: c= nextchar();
if (c == ‘<‘) state = 1;
else if (c == ‘=‘) state = 5;
else if (c == ‘>’) state = 6;
else fail(); /* lexeme is not a relop */
break;
case 1: …
…
case 8: retract();
retToken.attribute = GT;
return(retToken);
}
26. Lexical Analyzer Generator - Lex
Lexical Compiler
Lex Source program
lex.l
lex.yy.c
C
compiler
lex.yy.c a.out
a.out
Input stream Sequence
of tokens
41. Example 2 of Leftmost and right most
derivation
1. S->AB/€
2. A->aB
3. B->Sb
Derive “abb” from both leftmost and rightmost derivation.
Left Most Derivation: Right Most derivation
S->AB S->AB
->aBB -> Asb S-> €
->aSbB ->Ab A->aB
->abB ->aBb B-> Sb
->abSb ->aSbb S-> €
->abb ->abb
45. Parse tree or Derivation tree
• The parse tree is the pictorial representation of derivations.
Therefore, it is also known as derivation trees. The derivation tree is
independent of the other in which productions are used.
• A parse tree is an ordered tree in which nodes are labeled with the
left side of the productions and in which the children of a node
define its equivalent right parse tree also known as syntax tree,
generation tree, or production tree.
• A Parse Tree for a CFG G =(V,∑, P,S) is a tree satisfying the following
conditions −
46. Conditions
1. Root has the label S, where S is the start symbol.
2. Each vertex of the parse tree has a label which can be a variable
(V), terminal (Σ), or ε.
3. If A → C1,C2…….Cn is a production, then C1,C2…….Cn are children of
node labeled A.
4. Leaf Nodes are terminal (Σ), and Interior nodes are variable (V).
5. The label of an internal vertex is always a variable
Yield or result − Yield of Derivation Tree is the concatenation of labels
of the leaves in left to right ordering.
49. Consider the Grammar given below −
E⇒ E+E|E ∗E|id
Find Leftmost and Rightmost Derivation for the string.
Left Most :
E ⇒ E+E
⇒ E+E+E
⇒ id+E+E
⇒ id+id+E
⇒ id+id+id
51. Example1 − If CFG has productions.
S → a A S | a
S → Sb A | SS | ba
Show that S ⇒ *aa bb aa & construct parse tree whose yield is aa bb aa.
Solution
• S ⇒lm aAS
• ⇒ a Sb A S
• ⇒ aa b A S
• ⇒ aa bba S
∴ S ⇒ * aa bb aa
52. lm S ⇒ aAS
⇒ a Sb A S
⇒ aa b A S
⇒ aa bba S
∴ S ⇒ * aa bb aa
54. Example 2
• Let us consider this grammar: E -> E+E|id We can create a 2 parse
tree from this grammar to obtain a string id+id+id. The following are
the 2 parse trees generated by left-most derivation: