0% found this document useful (0 votes)
62 views

Chapter 2

The document discusses syntax and semantics in programming languages. It defines syntax as the form or structure of expressions, statements, and program units, while semantics refers to their meaning. Syntax can be defined using rules, but semantics is more complex. An example is given of the syntax and semantics of an if statement. Backus-Naur Form (BNF) is introduced as a method to formally describe a language's syntax using production rules and terminals. BNF grammars include non-terminals, terminals, a start symbol, and rules to generate sentences.

Uploaded by

Sølø Ëd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Chapter 2

The document discusses syntax and semantics in programming languages. It defines syntax as the form or structure of expressions, statements, and program units, while semantics refers to their meaning. Syntax can be defined using rules, but semantics is more complex. An example is given of the syntax and semantics of an if statement. Backus-Naur Form (BNF) is introduced as a method to formally describe a language's syntax using production rules and terminals. BNF grammars include non-terminals, terminals, a start symbol, and rules to generate sentences.

Uploaded by

Sølø Ëd
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Principles of Programming Languages

Chapter 2
Syntax and Semantic

Zebiba N 1
Description of a Language
• Syntax: the form or structure of the
expressions, statements, and program units

• Semantics: the meaning of the expressions,


statements, and program units
– What programs do, their behavior and meaning

Zebiba N 2
Describing Syntax and Semantics
• Syntax is defined using some kind of rules
– Specifying how statements, declarations, and other
language constructs are written
• Semantics is more complex and involved. It is
harder to define, e.g., natural language doc.
• Example: if statement
– Syntax: if (<expr>) <statement>
– Semantics: if <expr> is true, execute <statement>
• Detecting syntax error is easier, semantics error is
much harder

Zebiba N 3
What is a Language?
• In programming language terminologies, a
language is a set of sentences
• A sentence is a string of characters over some
alphabet
– The meaning of a “sentence” is very general. In
English, it may be an English sentence, a paragraph,
or all the text in a book, or hundreds of books, …
• Every C program, if can be compiled properly, is a
sentence of the C language
– No matter whether it is “hello world” or a program
with several million lines of code

Zebiba N 4
Definition of a Language
• The syntax of a language can be defined by a
set of syntax rules
• The syntax rules of a language specify which
sentences are in the language, i.e., which
sentences are legal sentences of the language

Zebiba N 5
Syntax Rules A hierarchical
structure of
language
• A more concise representation:
<sentence>  <noun> <verb> <preposition> <noun>
<noun>  place
<verb>  “is” | “belongs” <preposition>  “in” | “to”
• With these rules, we can generate followings:
A is in B
B is in A
B belongs to A
• They are all in language X
– Its alphabet includes “is”, “belongs”, “in”, “to”, place

Zebiba N 6
Checking Syntax of a Sentence
• How to check if the following sentence is in
the language X?
A belongs in B
• Idea: check if you can generate that sentence
 This is called parsing
• How?
Try to match the input sentence with the
structure of the language

Zebiba N 7
Matching the Language Structure

<sentence>

<noun> <verb> <preposition> <noun>

So, the sentence is in the language X!


A belongs in B
The above structure is called a parse tree

Zebiba N 8
Formal Description of Syntax
Most widely known methods for describing
syntax:
• Context-Free Grammars
– Developed by Noam Chomsky
– Define a class of languages: context-free
languages
• Backus-Naur Form
– Invented by John Backus to describe ALGOL
– Equivalent to context-free grammars

Zebiba N 9
Backus-Naur Form
• Backus-Naur Form (BNF)
– Add recursion to regular expressions
• Nested constructions
– Equivalent to CFGs in power
– CFG
expression  identifier | number | - expression
| ( expression )
| expression operator expression
operator  + | - | * | /
– BNF
expression  identifier | number | - expression
| ( expression )
| expression operator expression
operator  + | - | * | /
Zebiba N 10
BNF
• BNF stands for either Backus-Naur Form or
Backus Normal Form
• BNF is a metalanguage used to describe the
grammar of a programming language
• BNF is formal and precise
– BNF is a notation for context-free grammars
• BNF is essential in compiler construction
• There are many dialects of BNF in use, but…
• …the differences are almost always minor
Zebiba N 11
BNF Terminologies
• A lexeme is the lowest level syntactic unit of a
language (e.g., A, B, is, in)
• A token is a category of lexemes (e.g., place)
• A BNF grammar consists of four parts:
– The set of tokens and lexemes (terminals)
– The set of non-terminals, e.g., <sentence>, <verb>
– The start symbol, e.g., <sentence>
– The set of production rules,
– e.g.,
<sentence>  <noun> <verb> <preposition> <noun>
<noun>  place
<verb>  “is” | “belongs” <preposition>  “in” | “to”

Zebiba N 12
BNF Terminologies
• Tokens and lexemes are smallest units of syntax
– Lexemes appear literally in program text
• Non-terminals stand for larger pieces of syntax
– Do NOT occur literally in program text
– The grammar says how they can be expanded into
strings of tokens or lexemes
• The start symbol is the particular non-terminal
that forms the starting point of generating a
sentence of the language

Zebiba N 13
BNF Rules
• A rule has a left-hand side (LHS) and a right-hand
side (RHS)
– LHS is a single non-terminal  context-free
– RHS contains one or more terminals or non-terminals
– A rule tells how LHS can be replaced by RHS, or how
RHS is grouped together to form a larger syntactic unit
(LHS)  traversing the parse tree up and down
– A nonterminal can have more than one RHS
– A syntactic list can be described using recursion
<ident_list>  ident | ident,
<ident_list>

Zebiba N 14
BNF
• < > indicate a nonterminal that needs to be
further expanded, e.g. <variable>
• Symbols not enclosed in < > are terminals;
they represent themselves, e.g. if, while, (
• The symbol ::= means is defined as
• The symbol | means or; it separates
alternatives, e.g. <addop> ::= + | -
• This is all there is to “plain” BNF; but we will
discuss extended BNF (EBNF) later in this
lecture
Zebiba N 15
BNF uses recursion
• <integer> ::= <digit> | <integer> <digit>
or
<integer> ::= <digit> | <digit> <integer>
• Recursion is all that is needed (at least, in a
formal sense)
• "Extended BNF" allows repetition as well as
recursion
• Repetition is usually better when using BNF to
construct a compiler

Zebiba N 16
BNF Examples I
• <digit> ::=
0|1|2|3|4|5|6|7|8|9

• <if statement> ::=


if ( <condition> ) <statement>
| if ( <condition> ) <statement>
else <statement>

Zebiba N 17
BNF Examples II
• <unsigned integer> ::=
<digit> | <unsigned integer> <digit>

• <integer> ::=
<unsigned integer>
| + <unsigned integer>
| - <unsigned integer>

Zebiba N 18
BNF Examples III
• <identifier> ::=
<letter>
| <identifier> <letter>
| <identifier> <digit>
• <block> ::= { <statement list> }
• <statement list> ::=
<statement>
| <statement list> <statement>

Zebiba N 19
BNF Examples IV
• <statement> ::=
<block>
| <assignment statement>
| <break statement>
| <continue statement>
| <do statement>
| <for loop>
| <goto statement>
| <if statement>
| ... Zebiba N 20
Limitations of BNF
• No easy way to impose length limitations, such
as maximum length of variable names
• No easy way to describe ranges, such as 1 to 31
• No way at all to impose distributed
requirements, such as, a variable must be
declared before it is used
• Describes only syntax, not semantics
• Nothing clearly better has been devised

Zebiba N 21
Grammar and Derivation
Grammar is a generative device for defining a language.
- The sentences of the language are generated through a sequence of applications of
the rules, beginning with a special nonterminal of the grammar called the start
symbol.
<program>  <stmts>
<stmts>  <stmt> | <stmt>
<stmt>  <var> = <expr>
<var>  a | b | c | d
<expr>  <term> + <term> | <term> - <term>
<term>  <var> | const
<program> is the start symbol non terminal.
a, b, c, const,+,-,;,= are the terminals
Zebiba N 22
• A derivation is a repeated application of rules,
starting with the start symbol and ending with a
sentence (all terminal symbols),
• e.g. a=b+const.
<program> => <stmts>
=> <stmt>
=> <var> = <expr>
=> a = <expr>
=> a = <term> + <term>
=> a = <var> + <term>
=> a = b + <term>
=> a = b + const

Zebiba N 23
Parse Tree
• Is hierarchical structure of the sequence of language.
• A hierarchical representation of a derivation
<program>

<stmts>
<stmt>
<var> = <expr>
a <term> + <term>
<var> const
b Zebiba N
a = b + const
24
Grammar and Parse Tree
• The grammar can be viewed as a set of rules
that say how to build a parse tree
• You put <S> at the root of the tree
• Add children to every non-terminal, following
any one of the rules for that non-terminal
• Done when all the leaves are tokens
• Read off leaves from left to right—that is the
string derived by the tree

Zebiba N 25
Ambiguity in Grammars
• If a sentential form can be generated by two or
more distinct parse trees, the grammar is said to
be ambiguous, because it has two or more
different meanings
• Problem with ambiguity:
– Consider the following grammar and the sentence
a+b*c

<exp>  <exp> + <exp> | <exp> * <exp>


| (<exp>)| a | b | c

Zebiba N 26
An Ambiguous Grammar
• Two different parse trees for a+b*c

<exp> <exp>

<exp> <exp> <exp> + <exp>


*

<exp> <exp> c a <exp> <exp>


+ *

a b b c

Means (a+b)*c Means a+(b*c)

Zebiba N 27
Consequences
• The compiler will generate different codes,
depending on which parse tree it builds
– According to convention, we would like to use the
parse tree at the right, i.e., performing a+(b*c)
• Cause of the problem:
Grammar lacks semantic of operator precedence
– Applies when the order of evaluation is not
completely decided by parentheses
– Each operator has a precedence level, and those with
higher precedence are performed before those with
lower precedence, as if parenthesized

Zebiba N 28
Putting Semantics into Grammar
<exp>  <exp> + <exp> | <exp> * <exp>
| (<exp>) | a | b | c
• To fix the precedence problem, we modify the
grammar so that it is forced to put * below +
in the parse tree
<exp>  <exp> + <exp> | <mulexp>
<mulexp>  <mulexp> * <mulexp>
| (<exp>)| a | b | c

Note the hierarchical structure of


the production rules
Zebiba N 29
Correct Precedence
<exp>

<exp> + <exp>

G5 parse tree: <mulexp> <mulexp>

a <mulexp> * <mulexp>

b c
Our new grammar generates same language as before, but no longer generates parse
trees with incorrect precedence.

Zebiba N 30
Semantics of Associativity
• Grammar can also handle the semantics of operator
associativity.
When an expression includes two operators that have the
same precedence (as * and / usually have)
—for example, A / B * C—a semantic rule is required
to specify which should have precedence.

<exp> <exp>

<exp> + <exp> <exp> + <exp>

<mulexp> <exp> + <exp> <exp> + <exp> <mulexp>

a <mulexp> <mulexp> <mulexp> <mulexp> c

b c a b

Zebiba N 31
Operator Associativity
• Applies when the order of evaluation is not
decided by parentheses or by precedence
• Left-associative operators group operands left
to right: a+b+c+d = ((a+b)+c)+d
• Right-associative operators group operands
right to left: a+b+c+d = a+(b+(c+d))
• Most operators in most languages are left-
associative, but there are exceptions, e.g., C
a=b=0 — right-associative (assignment)

Zebiba N 32
Dangling Else in Grammars
<stmt>  <if-stmt> | s1 | s2
<if-stmt>  if <expr> then <stmt> else <stmt>
| if <expr> then <stmt>
<expr>  e1 | e2

• This grammar has a classic “dangling-else


ambiguity.” Consider the statement
if e1 then if e2 then s1 else s2

Zebiba N 33
<if-stmt>
Different
if <exp> then <stmt> else <stmt>
Parse Trees e1 s2
<if-stmt>

if <exp> then <stmt>

e2 s1

<if-stmt>
Most languages that have if <exp> then <stmt>
this problem choose this
e1 <if-stmt>
parse tree: else goes with
nearest unmatched then
if <exp> then <stmt> else <stmt>

e2 s1 s2
Zebiba N 34
Eliminating the Ambiguity
<stmt>  <if-stmt> | s1 | s2
<if-stmt>  if <expr> then <stmt> else <stmt>
| if <expr> then <stmt>
<expr>  e1 | e2

If this expands into an if, that if must already have its own else.
First, we make a new non-terminal <full-stmt> that generates
everything <stmt> generates, except that it can not generate
if statements with no else:

<full-stmt>  <full-if> | s1 | s2
<full-if>  if <expr> then <full-stmt> else <full-stmt>

Zebiba N 35
Eliminating the Ambiguity

<stmt>  <if-stmt> | s1 | s2
<if-stmt>  if <expr> then <full-stmt> else <stmt>
| if <expr> then <stmt>
<expr>  e1 | e2

Then we use the new non-terminal here.

The effect is that the new grammar can match an else part
with an if part only if all the nearer if parts are already
matched.

Zebiba N 36
Languages That Don’t Dangle
• Some languages define if-then-else in a way that forces the
programmer to be more clear

• ALGOL does not allow the then part to be another if statement,


though it can be a block containing an if statement

• Ada requires each if statement to be terminated with an end if

Zebiba N 37
Extended BNF
• The following are pretty standard:
– [ ] enclose an optional part of the rule
<if_stmt> → if (<expression>) <statement> [else <statement>]
Without the use of the brackets, the syntactic description of this statement
would require the following two rules:
<if_stmt> → if (<expression>) <statement>
| if (<expression>) <statement> else <statement>
{ } mean the enclosed can be repeated any number of times (including zero)
( ) - for a list of choices

• Example: <parameter list> ::= ( )


| ( { <parameter> , } <parameter> )

Zebiba N 38
_ multiple-choice options. When a single element must be chosen from a group,
the options are placed in parentheses and separated by the OR operator, |.

<term> → <term> (* | / | %) <factor>


In BNF, a description of this <term> would require the following three rules:
<term> → <term> * <factor>
| <term> / <factor>
| <term> % <factor

BNF
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
EBNF
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}

Zebiba N 39
EBNF Descriptions and Rules
• Each Description is a list of Rules
• Rule Form: LHS Ü RHS (read Ü as “is defined as”)
• Rule Names (LHS) are italicized, hyphenated words
• Control Forms in RHS
– Sequence Items appear left to right; order is important
– Choice Alternatives separated by | (stroke); exactly
one item is chosen from the alternatives
– Option Optional item enclosed between [ and ]; it
can be included or discarded
– Repetition Repeatable item enclosed between { and };
it can be repeated 0 or more times

Zebiba N 40
An EBNF Description of Integers
• A symbol (sequence of characters) is classified legal by an EBNF
rule if we can process all the characters in the symbol when we
reach the end of the right hand side of the EBNF rule.
digit Ü 0|1|2|3|4|5|6|7|8|9
integer Ü [+|-]digit{digit}
digit is defined as any of the alternatives 0 through 9
integer is defined as a sequence of three items:
(1)an optional sign (if it is included, it must be the alternative + or -),
followed by
(2) any digit, followed by
(3) a repetition of zero or more digits.
The integer RHS combines and illustrates all EBNF
control forms: sequence, option, alternative, repetition.

Zebiba N 41
Semantics
Describing the meaning of a program or of a statement or
group of statements.
There is no single widely acceptable notation or formalism for
describing semantics
Several needs for a methodology and notation for semantics.

We would need more formal methods of defining semantics for this, so we turn
to:
Operational Semantics
how the statement will be executed
Axiomatic Semantics
what results to expect from the statement
Denotational Semantics
functional way of mapping the affects of a statement

Zebiba N 42
Operational Semantics
• This can be thought of as “tracing” through a
program to see what affects an instruction will Example: C for-loop

have for(expr1; expr2; expr3)


stmt;
• Implemented as an interpreter or compiler or
assembler Becomes:

– that is, how will the computer execute this expr1;


loop: if expr2 = 0 goto out
instruction? stmt;
expr 3;
• This is simply a mechanistic description of the goto loop
out: …
statement and does not necessarily help us
understand the statement

Zebiba N 43
Axiomatic Semantics
• Used mainly to prove correctness of code
– Each statement in the language has associated assertions – what we
expect to be true before and after the statement executes
– We list these assertions as pre- and post-conditions that specify how
the machine changes (changes to variables)
– Given the state of the machine prior to executing a statement, we
can then determine what must be true afterward
• The basic form of an axiomatic semantic is {P} S {Q}

• This is interpreted as:


– if P is true before S, then Q is true after S
– We must now define how to determine Q given P and S

Zebiba N 44
Pre and Post-condition
• We will start with a given post-condition and derive the
weakest pre-condition
– We work backwards mainly because we will start with an
overall goal in mind for the given statement or program
– We want to derive the weakest pre-condition for a given post-
condition because this is the least restrictive pre-condition that
will guarantee validity
• Weakest means most general – what is the greatest range of values for a
given variable such that the result will be true?
• For example, consider the assignment statement
– sum = 2*x+1;
• with post-condition {sum > 1}
• Possible pre-conditions are {x > 10}, {x > 50} and {x > 1000}
• But the weakest pre-condition is {x > 0}

Zebiba N 45
Assignment Statement Rule
• We will use the following notation for an assignment
statement axiomatic rule:
– {QxE} x = E {Q}
• This is read as follows:
– If Q is true after the assignment, then Q xE is true prior
• The notation QxE means to replace all instances of x in Q with E
– Examples:
• a=b/2-1; {a < 10}
– We replace a in {a < 10} with b / 2 – 1 and solve for b, thus {QxE} is {b / 2 –
1 < 10} or {b < 22}
– So we have: {b < 22} a = b / 2 – 1; {a < 10} – that is, if b < 22 prior to the
assignment statement, then a will be less than 10 afterward
• x = 2 * y – 3; {x > 25}
– pre-condition is {2 * y – 3 > 25} or {y > 14}
• c = d * e – 4; {c > 0}
– pre-condition is {d * e – 4 > 0} or {d * e > 4}, we might want to list this as {d
> 4 / e} or {e > 4 / d}, or even
Zebiba{d
N > 4 / e & d != 0 & e != 0} 46
Sequences
• In general, a series of statements S1, S2, S3, ..., Sn can
be expressed as:
– {P} S1 {Q1}; {Q1} S2 {Q2} ; {Q2} S3 {Q3};
... {Qn} Sn {Q}
– This can be simplified to {P} S1, S2, S3, ..., Sn{Q}
– Therefore, we can combine rules to show the axiomatic
semantics of a block of code
• Example:
– y = 3 * x + 1;
– x = y + 3;
• If our post-condition is {x < 10} then our pre-condition between the
two statements is {y+3 < 10} or {y < 7} and our pre-condition before
the first statement is {3 * x + 1 < 7} or {x < 2}

Zebiba N 47
Selection Axiomatic Semantic
• Given a statement: if (B) S1; else S2;
• The semantic rule is: {B & P} S1 {Q}, {(!B) & P} S2 {Q}
– if Q is our post-condition, then we have two pre-conditions, if the if
statement’s condition is true (B) then B & P, and if the if statement’s
condition is false (Not B) then !B & P, so we must derive P that will
allow the same post-condition no matter if B or !B is true
• Example:
– if (x > 0) y--;
else y++;
• Suppose the post-condition is {y > 0}
– the pre-condition for the if-clause is {y > 1}
– the pre-condition for the else-clause is {y > -1}
– the condition {y > 1} is subsumed by the condition {y > -1} (that is, if {y
> 1} is true, then {y > -1} must also be true
• So, we select {y > 1} as our weakest pre-condition
– we cannot use {y > -1} because, if x > 0 and y = -1, our post-condition is
not true Zebiba N 48
Thank you

Zebiba N 49

You might also like