PPL UNIT 1 NOTES
PPL UNIT 1 NOTES
Preliminary Concepts
• Scientific applications
– In the early 40s computers were invented for scientific applications.
– The applications require large number of floating point computations.
– Fortran was the first language developed scientific applications.
– ALGOL 60 was intended for the same use.
• Business applications
– The first successful language for business was COBOL.
– Produce reports, use decimal arithmetic numbers and characters.
– The arrival of PCs started new ways for businesses to use computers.
– Spreadsheets and database systems were developed for business.
• Artificial intelligence
– Symbolic rather than numeric computations are manipulated.
– Symbolic computation is more suitably done with linked lists than arrays.
– LISP was the first widely used AI programming language.
• Systems programming
– The O/S and all of the programming supports tools are collectively known as its
systemsoftware.
– Need efficiency because of continuous use.
• Scripting languages
– Put a list of commands, called a script, in a file to be executed.
– PHP is a scripting language used on Web server systems. Its code is embedded in
HTMLdocuments. The code is interpreted on the server before the document is sent to a
requesting browser.
• Special-purpose languages
• Readability : the ease with which programs can be read and understood
• Writability : the ease with which a language can be used to create programs
• Reliability : conformance to specifications (i.e., performs to its specifications)
• Cost : the ultimate total cost
Readability
• Overall simplicity
– A manageable set of features and constructs
– Few feature multiplicity (means of doing the same operation)
– Minimal operator overloading
• Orthogonality
– A relatively small set of primitive constructs can be combined in a relatively small
number of ways
– Every possible combination is legal
• Control statements
– The presence of well-known control structures (e.g., while statement)
• Data types and structures
– The presence of adequate facilities for defining data structures
• Syntax considerations
– Identifier forms: flexible composition
– Special words and methods of forming compound statements
– Form and meaning: self-descriptive constructs, meaningful keywords
Writability
• Simplicity and Orthogonality
– Few constructs, a small number of primitives, a small set of rules for combining them
• Support for abstraction
– The ability to define and use complex structures or operations in ways that allow details
to be ignored
• Expressivity
– A set of relatively convenient ways of specifying operations
– Example: the inclusion of for statement in many modern languages
Reliability
• Type checking
– Testing for type errors
• Exception handling
– Intercept run-time errors and take corrective measures
• Aliasing
– Presence of two or more distinct referencing methods for the same memory location
• Readability and writability
– A language that does not support “natural” ways of expressing an algorithm will
necessarily use “unnatural” approaches, and hence reduced reliability
Cost
• Training programmers to use language
• Writing programs (closeness to particular applications)
• Compiling programs
• Executing programs
• Language implementation system: availability of free compilers
• Reliability: poor reliability leads to high costs
• Maintaining programs
Others
• Portability
– The ease with which programs can be moved from one implementation to another
• Generality
– The applicability to a wide range of applications
• Well-defined
– The completeness and precision of the language‘s official definition
Computer Architecture
• Well-known computer architecture: Von Neumann
• Imperative languages, most dominant, because of von Neumann computers
– Data and programs stored in memory
– Memory is separate from CPU
– Instructions and data are piped from memory to CPU
– Basis for imperative languages
• Variables model memory cells
• Assignment statements model piping
• Iteration is efficient
Programming Methodologies
• 1950s and early 1960s: Simple applications; worry about machine efficiency
• Late 1960s: People efficiency became important; readability, better control structures
– structured programming
– top-down design and step-wise refinement
• Late 1970s: Process-oriented to data-oriented
– data abstraction
• Middle 1980s: Object-oriented programming
– Data abstraction + inheritance + polymorphism
1.5 Language Categories
• Imperative
– Central features are variables, assignment statements, and iteration
– Examples: C, Pascal
• Functional
– Main means of making computations is by applying functions to given parameters
– Examples: LISP, Scheme
• Logic
– Rule-based (rules are specified in no particular order)
– Example: Prolog
• Object-oriented
– Data abstraction, inheritance, late binding
– Examples: Java, C++
• Markup
– New; not a programming per se, but used to specify the layout of information in Web
documents
– Examples: XHTML, XML
Language Design Trade-Offs
• Reliability vs. cost of execution
– Conflicting criteria
– Example: Java demands all references to array elements be checked for proper indexing
but that leads to increased execution costs
• Readability vs. writability
– Another conflicting criteria
– Example: APL provides many powerful operators (and a large number of new symbols),
allowing complex computations to be written in a compact program but at the cost of poor
readability
• Writability (flexibility) vs. reliability
– Another conflicting criteria
– Example: C++ pointers are powerful and very flexible but not reliably used
A(7) := 5 * B(6)
| 5 * B => A
V | 6 7 (array subscripts)
• Poor readability
• Poor modifiability
• Expression coding was tedious
• Machine deficiencies--no indexing or floating point
- SHORT CODE was a category of pseudocode invented in 1949 for BINAC machine by Mauchly
- Expressions were coded, left to right
- Some operations:
1n => (n+2)nd power
- SPEEDCODING was second category of pseudocode invented in 1954 for IBM 701 machine by Backus
- Pseudo ops for arithmetic and math functions
- Conditional and unconditional branching
- Autoincrement registers for array access
- Slow!
- Only 700 words left for user program
C. Laning and Zierler System - 1953
- Designed for the new IBM 704, which had indexregisters and floating point hardware
- The Environment under which FORTRAN was developed was :
1. Computers were small and unreliable
2. Applications were scientific
3. No programming methodology or tools
4. Machine efficiency was most important
- Dynamic arrays
- Pointers
- Recursion
- CASEstatement
- Parameter type checking
FORTRAN Evaluation
- Environment of development:
1. FORTRAN had (barely) arrived for IBM 70x
2. Many other languages were being developed, all for specific machines
3. No portable language; all were machine-dependent
4. No universal language for communicatingalgorithm
K. ALGOL 60 - 1960
- Successes:
- It was the standard way to publish algorithms for over 20 years
- All subsequent imperative languages arebased on it
- First machine-independent language
- First language whose syntax was formally defined
- Failure:
L. COBOL - 1960
- Environment of development:
- UNIVAC was beginning to use FLOW-MATIC
- USAF was beginning to use AIMACO
- IBM was developing COMTRAN
Based on FLOW-MATIC
- FLOW-MATIC features:
- Names up to 12 characters, with embedded hyphens
- English names for arithmetic operators
- Data and code were completely separate
- Verbs were first word in every statement
First Design Meeting - May 1959
- Design goals:
1. Must look like simple English
2. Must be easy to use, even if that means it will be less powerful
3. Must broaden the base of computer users
4. Must not be biased by current compilerproblems
- Design committee were all from computermanufacturers and DoD branches
- Design Problems: arithmetic expressions?subscripts? Fights among manufacturers
- Contributions:
- First macro facility in a high-level language
- Hierarchical data structures (records)
- Nested selection statements
- Long names (up to 30 characters), with hyphens
- Data Division
- Comments:
- First language required by DoD; would havefailed without DoD
- Still the most widely used businessapplications language
M. BASIC - 1964
N. PL/I - 1965
1. Scientific computing
- IBM 1620 and 7090 computers
- FORTRAN
- SHARE user group
2. Business computing
- It looked like many shops would begin to need two kinds of computers, languages, and
support staff--too costly
- PL/I contributions:
1. First unit-level concurrency
2. First exception handling
3. Switch-selectable recursion
4. First pointer data type
5. First array cross sections
- Comments:
- Many new features were poorly designed
- Too large and too complex
- Was (and still is) actually used for bothscientific and business applications
0. Early Dynamic Languages
c. SNOBOL(1964)
i. Designed as a string manipulation language
(at Bell Labs by Farber, Griswold, and Polensky)
ii. Powerful operators for string pattern matching
P. SIMULA 67 – 1967
- Designed primarily for system simulation (in Norway by Nygaard and Dahl)
- Based on ALGOL 60 and SIMULA I
- Primary Contribution:
- Coroutines - a kind of subprogram
- Implemented in a structure called a class
- Classes are the basis for data abstraction
- Classes are structures that include both local data and functionality
Q. ALGOL 68 – 1968
- From the continued development of ALGOL 60, but it is not a superset of that language
- Design is based on the concept of orthogonality
- Contributions:
1. User-defined data structures
2. Reference types
3. Dynamic arrays (called flex arrays)
- Comments:
- Had even less usage than ALGOL 60
- Had strong influence on subsequent languages,especially Pascal, C, and Ada
P. Pascal – 1971
- Designed by Wirth, who quit the ALGOL 68 committee (didn't like the direction of that work)
- Designed for teaching structured programming
- Small, simple, nothing really new
- Still the most widely used language for teachingprogramming in colleges (but use is shrinking)
Q. C - 1972
- Pascal plus modules and some low-level features designed for systems programming
- Delphi (Borland)
S. Prolog - 1972
- Non-procedural
U. Smalltalk - 1972-1980
W. Java (1995)
- Developed at Sun in the early 1990s
- Based on C++
- Significantly simplified
- Supports only OOP
- Has references, but not pointers
- Includes support for applets and a form ofconcurrency
1.7 Syntax and Semantics
Introduction
• Syntax: the form or structure of the expressions, statements, and program units
• Semantics: the meaning of the expressions, statements, and program units
• Syntax and semantics provide a language‘s definition
– Users of a language definition
– Other language designers
– Implementers
– Programmers (the users of the language)
• Languages Recognizers
– A recognition device reads input strings of the language and decides whether the input
strings belong to the language
– Example: syntax analysis part of a compiler
• Languages Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular sentence is correct by comparing it to
the structure of the generator
• Context-Free Grammars
• Developed by Noam Chomsky in the mid-1950s
• Language generators, meant to describe the syntax of natural languages
• Define a class of languages called context-free languages
BNF Rules
• A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of
terminal and nonterminal symbols
• A grammar is a finite nonempty set of rules
• An abstraction (or nonterminal symbol) can have more than one RHS
<stmt> → <single_stmt>
| begin <stmt_list> end
Describing Lists
• Syntactic lists are described using recursion
<ident_list> → ident
| ident, <ident_list>
• A derivation is a repeated application of rules, starting with the start symbol and ending
with a sentence (all terminal symbols)
An Example Grammar
<program> → <stmts>
<stmts> → <stmt> | <stmt> ; <stmts>
<stmt> → <var> = <expr>
<var> → a | b | c | d
<expr> → <term> + <term> | <term> - <term>
<term> → <var> | const
Parse Tree
A hierarchical representation of a derivation
An example derivation Figure 1.2 Parse Tree
<program> <stmts>
<stmt>
<var>=<expr>
a=<expr>
a=<term>+<term>
a=<var>+<term>
a=b+<term>
a=b+const
Derivation
• Every string of symbols in the derivation is a sentential form
• A sentence is a sentential form that has only terminal symbols
• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the
one that is expanded
• A derivation may be neither leftmost nor rightmost
Ambiguity in Grammars
• A grammar is ambiguous iff it generates a sentential form that has two or more distinct
parse trees
An Unambiguous Expression Grammar
If we use the parse tree to indicate precedence levels of the operators, we cannot have
ambiguity
<expr> → <expr> - <term>|<term>
<term> → <term> / const|const
Figure 1.3 An Ambiguous Expression Figure 1.4 An Unambiguous Expression
Grammar Grammar
Attribute Grammars
• Context-free grammars (CFGs) cannot describe all of the syntax of programming languages
• Additions to CFGs to carry some semantic info along parse trees
• Primary value of attribute grammars (AGs):
– Static semantics specification
– Compiler design (static semantics checking)
Definition
• An attribute grammar is a context-free grammar G = (S, N, T, P) with the following additions:
– For each grammar symbol x there is a set A(x) of attribute values
– Each rule has a set of functions that define certain attributes of the nonterminals in
the rule
– Each rule has a (possibly empty) set of predicates to check for attribute consistency
– Let X0 X1 ... Xn be a rule
– Functions of the form S(X0) = f(A(X1), ... , A(Xn)) define synthesized attributes
– Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for i <= j <= n, define
inherited attributes
– Initially, there are intrinsic attributes on the leaves
Example
• Syntax
<assign> → <var> = <expr>
<expr> → <var> + <var> | <var>
<var> → A | B | C
• actual_type: synthesized for <var> and <expr>
• expected_type: inherited for <expr>
• Syntax rule :<expr> → <var>[1] + <var>[2]
Semantic rules :<expr>.actual_type → <var>[1].actual_type
Predicate :<var>[1].actual_type == <var>[2].actual_type
<expr>.expected_type == <expr>.actual_type
• Syntax rule :<var> → id
Semantic rule :<var>.actual_type lookup (<var>.string)
Denotational Semantics
– Based on recursive function theory
– The most abstract semantics description method
– Originally developed by Scott and Strachey (1970)
– The process of building a denotational spec for a language (not necessarilyeasy):
– Define a mathematical object for each language entity
– Define a function that maps instances of the language entities onto instancesof the
corresponding mathematical objects
– The meaning of language constructs are defined by only the values of the
program's variables
– The difference between denotational and operational semantics: In
operational semantics, the state changes are defined by coded algorithms;
in denotational semantics, they are defined by rigorous mathematical
functions
– The state of a program is the values of all its current variables
s = {<i1, v1>, <i2, v2>, …, <in, vn>}
– Let VARMAP be a function that, when given a variable name and a state,returns
the current value of the variable
VARMAP(ij, s) = vj
• Decimal Numbers
– The following denotational semantics description maps decimal numbers asstrings
of symbols into numeric values
<dec_num> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
| <dec_num> (0 | 1 | 2 | 3 | 4 |5 | 6 | 7 | 8 | 9)
Mdec('0') = 0, Mdec ('1') = 1, …, Mdec ('9') = 9
Mdec (<dec_num> '0') = 10 * Mdec
(<dec_num>) Mdec (<dec_num> '1’) =
10 * Mdec (<dec_num>) + 1
…
Mdec (<dec_num> '9') = 10 * Mdec (<dec_num>) + 9
Expressions
• Map expressions onto Z {error}
• We assume expressions are decimal numbers, variables, or binary expressions
having one arithmetic operator and two operands, each of which can be an
expression
• Assignment Statements
– Maps state sets to state sets
• Logical Pretest Loops
– Maps state sets to state sets
• The meaning of the loop is the value of the program variables after the statements
in the loop have been executed the prescribed number of times, assuming there
have been no errors
• In essence, the loop has been converted from iteration to recursion, where the
recursive control is mathematically defined by other recursive state mapping
functions
• Recursion, when compared to iteration, is easier to describe with mathematical rigor
• Evaluation of denotational semantics
– Can be used to prove the correctness of programs
– Provides a rigorous way to think about programs
– Can be an aid to language design
– Has been used in compiler generation systems
– Because of its complexity, they are of little use to language users
1.8 Lexical Analysis
Introduction
A lexical analyzer is essentially a pattern matcher. A pattern matcher attempts to find
a substring of a given string of characters that matches a given character pattern.
Pattern matching is a traditional part of computing. One of the earliest uses of pattern
matching was with text editors, such as the ed line editor, which was introduced in an
early version of UNIX. Since then, pattern matching has found its way into some
programming languages—for example,Perl and JavaScript. It is also available through
the standard class libraries of Java, C++, and C#.
Lexemes and tokens
A lexical analyzer serves as the front end of a syntax analyzer. Technically, lexical
analysis is a part of syntax analysis. A lexical analyzer performs syntax analysis at the
lowest level of program structure. An input program appears to a compiler as a single
string of characters. The lexical analyzer collects characters into logical groupings and
assigns internal codes to the groupings according to their structure. These logical
groupings are named lexemes, and the internal codes for categories of these
groupings are named tokens. Lexemes are recognized by matching the input
character string against character string patterns. Although tokens are usually
represented as integer values, for the sake of readability of lexical and syntax
analyzers, theyare often referenced through named constants.
Lexical analyzers extract lexemes from a given input string and produce the
corresponding tokens. Lexical analyzers are subprograms that locate the next lexeme
in the input, determine its associated token code, and return them to the caller, which
is the syntax analyzer. So, each call to the lexical analyzer returns a single lexeme and
its token. The only view of the input program seen by the syntax analyzer is the output
of the lexical analyzer, one token at a time.
The lexical-analysis process includes skipping comments and white space outside
lexemes, as they are not relevant to the meaning of the program. Also, the lexical
analyzer inserts lexemes for user-defined names into the symbol table, which is used
by later phases of the compiler. Finally, lexical analyzers detect syntactic errors in
tokens, such as ill-formed floating-point literals, and report such errors to the user.
1. Write a formal description of the token patterns of the language using a descriptive
language related to regular expressions. These descriptionsare used as input to a
software tool that automatically generates a lexica lanalyzer. There are many such
tools available for this. The oldest of these, named lex, is commonly included as part
of UNIX systems.
These regular expressions are the basis for the pattern-matching facilities now
part of many programming languages, either directly orthrough a class library.
2. Design a state transition diagram that describes the token patterns of thelanguage
and write a program that implements the diagram.
3. Design a state transition diagram that describes the token patterns of thelanguage
and hand-construct a table-driven implementation of the state diagram.
Assume that the variable names consist of strings of uppercase letters, lowercase
letters, and digits but must begin with a letter. Names have no length limitation. The
first thing to observe is that there are 52 different characters (any uppercase or
lowercase letter) that can begin a name, which require 52 transitions from the
transition diagram’s initial state.
The part of the process of analyzing syntax that is referred to as syntax analysis is often called
parsing. We will use these two interchangeably.
This section discusses the general parsing problem and introduces the two main categories
of parsing algorithms, top-down and bottom-up, as well as the complexity of the parsing
process.
Introduction to Parsing
Parsers for programming languages construct parse trees for given programs.In some
cases, the parse tree is only implicitly constructed, meaning that perhaps only a traversal of the
tree is generated. But in all cases, the information required to build the parse tree is created
during the parse. Both parse trees and derivations include all of the syntactic information needed
bya language processor.
There are two distinct goals of syntax analysis: First, the syntax analyzer must check the input
program to determine whether it is syntactically correct.When an error is found, the analyzer
must produce a diagnostic message and recover. In this case, recovery means it must get back
to a normal state and continue its analysis of the input program. This step is required so that
the compiler finds as many errors as possible during a single analysis of the inputprogram. If it
is not done well, error recovery may create more errors, or at least more error messages. The
second goal of syntax analysis is to produce acomplete parse tree, or at least trace the structure
of the complete parse tree, for syntactically correct input. The parse tree (or its trace) is used
as the basis for translation.
Parsers are categorized according to the direction in which they build parsetrees. The
two broad classes of parsers are top-down, in which the tree is built from the root downward
to the leaves, and bottom-up, in which the parse tree is built from the leaves upward to the
root.
For programming languages, terminal symbols are the small-scale syntacticconstructs of the
language, what we have referred to as lexemes. The nonterminal symbols of programming
languages are usually connotative names or abbreviations, surrounded by angle brackets—for
example,<while_statement>, <expr>, and <function_def>. The sentences of a language
(programs, in the case of a programming language) are strings of terminals. Mixed strings
describe right-hand sides (RHSs) of grammar rulesand are used in parsing algorithms.
Top-Down Parsers
A top-down parser traces or builds a parse tree in preorder. A preorder traversal of a parse tree
begins with the root. Each node is visited before its branches are followed. Branches from a
particular node are followed in left-to-right order. This corresponds to a leftmost derivation.
Given a sentential form that is part of a leftmost derivation, the parser’s task is to find the
next sentential form in that leftmost derivation. The general form of a left sentential form is xAα,
whereby our notational conventions x isa string of terminal symbols, A is a nonterminal, and α is a
mixed string.Because x contains only terminals, A is the leftmost nonterminal in the sentential
form, so it is the one that must be expanded to get the next sentential form in a leftmost derivation.
Determining the next sentential formis a matter of choosing the correct grammar rule that has A
as its LHS. For example, if the current sentential form is xAα and the A-rules are A→bB, A→cBb,
and A→a, a top-down parser must choose among these three rules to get the next sentential form,
which could be xbBα, xcBbα, orxaα. This is the parsing decision problem for top-down parsers.
The most common top-down parsing algorithms are closely related. A recursive-descent
parser is a coded version of a syntax analyzer based directly on the BNF description of the
syntax of language. The most common alternative to recursive descent is to use a parsing table,
rather than code, to implement the BNF rules. Both of these, which are called LL algorithms,
are equally powerful, meaning they work on the same subset of all context-free grammars. The
first L in LL specifies a left-to-right scan of the input; the second L specifies that a leftmost
derivation is generated. The recursive-descent approach to implementing an LL parser is
introduced.
Bottom-Up Parsers
A bottom-up parser constructs a parse tree by beginning at the leaves and progressing toward
the root. This parse order corresponds to the reverse of a rightmost derivation. That is, the
sentential forms of the derivation are produced in order of last to first. In terms of the derivation,
a bottom-up parser can be described as follows: Given a right sentential form α, the parser must
determine what substring of α is the RHS of the rule in the grammar that must be reduced to its
LHS to produce the previous sentential form in the rightmost derivation. For example, the first
step for a bottom-up parser is to determine which substring of the initial given sentence is the
RHS to be reduced to its corresponding LHS to get the second last sentential form in the
derivation. The process of finding the correct RHS to reduce is complicated by the fact that a
given right sentential form may include more than one RHS from the grammar of the language
being parsed. The correct RHS is called the handle. A right sentential form is a sentential form
that appears in a rightmost derivation.
Consider the following grammar and derivation:
S → aAc
A→aA | b
S => aAc => aaAc => aabc
A bottom-up parser of this sentence, aabc, starts with the sentence and must find the handle in
it. In this example, this is an easy task, for the string contains only one RHS, b. When the parser
replaces b with its LHS, A, it gets the second to last sentential form in the derivation, aaAc. In the
general case, as stated previously, finding the handle is much more difficult, because a sentential
form may include several different RHSs.
A bottom-up parser finds the handle of a given right sentential form by examining the symbols on
one or both sides of a possible handle. Symbols to the right of the possible handle are usually
tokens in the input that have not yet been analysed.The most common bottom-up parsing
algorithms are in the LR family, where the L specifies a left-to-right scan of the input the specifies
that a rightmost derivation is generated.
Recursive-Descent Parsing
An EBNF grammar for arithmetic expressions, such as this one, does not force any associativity rule.
Therefore, when using such a grammar as the basis for a compiler, one must take care to ensure that the
code generation process, which is normally driven by syntax analysis, produces code that adheres to the
associativity rules of the language. This can be done easily when recursive-descent parsing is used.In the
following recursive-descent function, expr, the lexical analyser is the function that is implemented. It gets
the next lexeme and puts its token code in the global variable nextToken. The token codes are defined as
named constant
A recursive-descent subprogram for a rule with a single RHS is relatively simple. For each terminal
symbol in the RHS, that terminal symbol is compared with nextToken. If they do not match, it is a syntax
error. If they match, the lexical analyser is called to get the next input token. For each nonterminal, the
parsing subprogram for that nonterminal is called.
The recursive-descent subprogram for the first rule in the previous example grammar, written in C, is
/* expr
Parses strings in the language generated by the rule:
<expr> -> <term> {(+ | -) <term>}
*/
void expr() { printf("Enter <expr>\n");
Notice that the expr function includes tracing output statements, which are included to produce the example
output shown later .
Recursive-descent parsing subprograms are written with the convention that each one leaves the next
token of input in nextToken. So, whenever a parsing function begins, it assumes that nextToken has the
code for the leftmost token of the input that has not yet been used in the parsing process.
The part of the language that the expr function parses consists of one or more terms, separated by either
plus or minus operators. This is the language generated by the nonterminal <expr>. Therefore, first it calls
the function that parses terms (term). Then it continues to call that function as long as it finds ADD_OP or
SUB_OP tokens (which it passes over by calling lex). This recursive-descent function is simpler than most,
because its associated rule has only one RHS. Furthermore, it does not include any code for syntax error
detection or recovery, because there are no detectable errors associated with the grammar rule.
A recursive-descent parsing subprogram for a nonterminal whose rule has more than one RHS begins with
code to determine which RHS is to be parsed. Each RHS is examined (at compiler construction time) to
determine the set of terminal symbols that can appear at the beginning of sentences it can generate. By
matching these sets against the next token of input, the parser can choose the correct RHS.
*/
void term() { printf("Enter <term>\n");
/* Parse the first factor */ factor();
/* As long as the next token is * or /, get the next token and
parse the next factor */
while (nextToken == MULT_OP || nextToken == DIV_OP) { lex();
factor();
}
printf("Exit <term>\n");
} /* End of function term */
The function for the <factor> nonterminal of our arithmetic expression grammar must choose between its
two RHSs. It also includes error detection. In the function for <factor>, the reaction to detecting a syntax
error is simply to call the error function. In a real parser, a diagnostic message must be produced when an
error is detected. Furthermore, parsers must recover from the error so that the parsing process can
continue.
/* factor
Parses strings in the language generated by the rule:
<factor> -> id | int_constant | ( <expr )
*/
void factor() { printf("Enter <factor>\n");
/* Determine which RHS */
if (nextToken == IDENT || nextToken == INT_LIT)
/* Get the next token */ lex();
/* If the RHS is ( <expr> ), call lex to pass over the left
parenthesis, call expr, and check for the right parenthesis */
else {
if (nextToken == LEFT_PAREN) { lex();
expr();
if (nextToken == RIGHT_PAREN) lex();
else
error();
} /* End of if (nextToken == ... */
/* It was not an id, an integer literal, or a left parenthesis
*/
else error();
Following is the trace of the parse of the example expression (sum + 47) / total, using the parsing functions
expr, term, and factor, and the function lex Note that the parse begins by calling lex and the start symbol
routine, in this case, expr.
Notice that this function uses parser functions for statements and Boolean expressions
that are not described .