0% found this document useful (0 votes)
217 views

Ebnf

This document provides an overview of Extended Backus-Naur Form (EBNF), a notation used to formally describe the syntax of programming languages. It discusses how EBNF describes syntax through rules with control forms like sequence, choice, option, and repetition. These control forms have similarities to programming constructs. The document also provides examples of using EBNF rules to describe the syntax of integers and explains how to determine if symbols match EBNF rules.

Uploaded by

Ian Chuah
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
217 views

Ebnf

This document provides an overview of Extended Backus-Naur Form (EBNF), a notation used to formally describe the syntax of programming languages. It discusses how EBNF describes syntax through rules with control forms like sequence, choice, option, and repetition. These control forms have similarities to programming constructs. The document also provides examples of using EBNF rules to describe the syntax of integers and explains how to determine if symbols match EBNF rules.

Uploaded by

Ian Chuah
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

It goes against the grain of modern education to teach students to

program. What fun is there to making plans, acquiring discipline,


organizing thoughts, devoting attention to detail, and learning to be
self critical?
A. PERLIS

E BNF is a notation for formally describing syntax: how to write entities


in a language. We use EBNF throughout this book to describe the
syntax of Ada. But there is a more compelling reason to begin our study of
programming with EBNF: it is a microcosm of programming. First, there is
a strong similarity between the control forms in EBNF rules and the control
structures in Ada: sequence; decision, repetition, recursion, and the ability to
name descriptions. There is also a strong similarity between the process of
writing descriptions in EBNF and writing programs in Ada: we must synthesize
a candidate solution and then analyze it- to determine whether it is correct and
easy to understand. Finally, studylng EBNF introduces a level of formality that
will continue throughout our study of programming and Ada.
CHAPTER 2 EBNF: A Notation to Describe Syntax

OBJECTIVES
- Learn the control forms in EBIL'F: sequence, choice, option, and repetition
- Learn how to read and write syntactic descriptions with EBNF rules
- Explore the difference between syntax and semantics
- Learn the correspondence between EBNF rules and Syntax Charts

2.1 Languages and Syntax


In the middle 19505, computer saentists began to design high-level FORTRAN
programming languages and build their compilers. The first two major successes and
were FORTRAN (FORrnula TRANslator), developed by the IBM corporation ALGOL
in the United States, and ALGOL (ALGOrithmic Language), sponsored by a
consortium of North American and European countries. John Backus led the
effort to develop FORTRAN. He then became a member of the ALGOL design
committee, where he studied the problem of describing the syntax of these
programming languages simply and precisely.
Backus invented a notation (based on the work of logician Emil Post) that was Backus, Naur,
simple, precise, and powerful enough to describe the syntax of any programming BNF and EBNF
language. Using this notation, a programmer or compiler can determine whether
a program is syntactically correct - whether it adheres to the grammar and
punctuation rules of the programming language. Peter Naur, as editor of the
ALGOL report, popularized this notation by using it to describe the complete
syntax of ALGOL. In their honor, this notation is called Backus-Naur Form
(BNF). Thls book uses Extended Backus-Naur Form (EBNF) to describe Ada's
syntax, because it results in more compact descriptions.
In a parallel development, the linguist Noam Chomsky began work on the Chomsky's
harder problem of describing the syntactic structure of natural languages, such language
as English. H e developed four different notations that describe languages of hierarchy
increasing complexity; they are numbered type 3 up through 0 in the Chomsky
hierarchy. The power of Chomsky's type 2 notation is equivalent to BNF and
EBNE The languages in Chomsky's hierarchy, along with the machines that
recognize them, are studied in computer science, mathematics, and linguistics
under the topics of formal language and automata theory.

2.2 EBNF Descriptions and Rules


An EBNF description is an unordered list of EBNF rules. Each EBNF EBNF descriptions
rule has three parts: a left-hand side (LHS), a right-hand side (RHS), and the via EBNF rules
SECTION 2.2 EBNF Descriptions and Rules

character e separating the two sides; read this symbol as "is defined as". The
LHS contains one (possibly hyphenated) word written in lower-case; it names
the EBNF rule. The RHS supplies the definition associated with this name. It
can include combinations of the four control forms explained in Table 2.1.

Table 2.1 Control Forms of Riaht-Hand Sides in EBNF Rules

Sequence items appear left-to-righ~their order is important


Choice alternative items are separated by a I (stroke);one item is chosen from this
list of alternatives;their order is unimportant
Option an optional item is enclosed between [ and ] (square brackets); the item can
either be included or discarded
Repetition a repeatable item is enclosed between { and } (curly braces); the item can
be repeated zero or more times

EBNF rules can include six characters with special meanings: e , I, [, I, {, Characters with
and ). Except for these characters and the names of EBNF rules, anything that special meanings
appears in a RHS stands for itself: letters, digits, punctuation, parentheses, and
any other printable characters.

An EBNF Description of Integers


T h e following EBNF rules describe how to write simple integers.' Their right- A first EBNF
hand sides illustrate every control form available in EBNE description
digit e O 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
integer e [* l -]digit{digit)
- A digit is defined as any one of the ten alternative characters 0 through 9. An English
interpretation
- An integer is defined as a sequence of three items: an oprional sign (if it is of the control
included, it must be the character + or -), followed by any digit, followed by forms in these
a repetition of zero or more digit - each repeated digit is independently EBNF rules
chosen fiom the list of alternatives in the digit rule. The RHS of integer
combines all the control forms: sequence, option, choice, and repetition.
We can abstract the structure of an integer - where each digit appears - Abstraction via
independently fiom the definition of digit - which defines only the possible named EBNF rules
choice of characters. For example, an integer written in base 2 has this same
structure, even though the digit rule would be restricted to the choices 0 and 1.

'The EBNF dcxriptions in this chapter are for illustration purposes only: they do not dcxribe any
ofAda5 actual language hturrs. Subsequent chaptrrri use EBNF extensively to demibe Ma.
CHAPTER 2 EBNF: A Notation to Describe Syntax

To make EBNF descriptions easy to read, we align their rule names, the (: Typesetting
and rule definitions. Sometimes we put extra spaces in a RHS, to make it easier conventions
to read; such spaces do not change the meaning of the rule. Although the order
in whch EBNF rules appear is irrelevant, it is useful to write the rules in order of
increasing complexity: with the right-hand sides in later EBNF rules referring
to the names in the left-hand sides of earlier ones. Using this convention, the
last EBNF rule names the main syntactic entity being described.
Given an EBNF description, we must learn to interpret it as a language Language lawyers
lawyer: determining whether a symbol - any sequence of characters -is legal
or illegal according to the EBNF rules in that description. Computers perform
expertly as language lawyers, wen on the most complicated descriptions.

Proving Symbols Match EBNF Rules


To prove that a symbol is an integer we must match its characters with the Matching
characters in the integer rule, according to that rule's control forms. If there is symbols with
an exact match, we recognize the symbol as a legal integer; otherwise we classify EBNF rules
the symbol as illegal, according to the integer description.

Proofs in English: To prove that the symbol 7 is an integer, we must start in 7 is an integer
the integer EBNF rule with the optional sign; in this case we discard the option.
Next in the sequence, the symbol must contain a character that we can recognize
as a digit; in this case-we choose the 7 alternative from the RHS of the digit
rule. Finally, we must repeat digit zero or more times; in this case we use zero
repetitions. Every character in the symbol 7 has been matched against wery
character in the integer EBNF rule, according to its control forms. Therefore,
we recognize 7 as a legal integer according to this EBNF description.
We use a similar argument to prove that the symbol +I42 is an integer. Again *I42 is an integer
we must start with the optional sign; in this case we include the option and choose
the + alternative. Next in the sequence, the symbol must contain a character that
we can recognize as a digit; in this case we choose the 1 alternative from the RHS
of the digit rule. Finally, we must repeat digit zero or more times; in this case
we use two repetitions. For the first digit we choose the 4 alternative, and for the
second digit the 2 alternative: recall that each time we encounter a digit, we are
bee to choose any of its alternatives. Again, wery character in the symbol +I42
has been matched against every character in the integer EBNF rule, according
to its control forms. Therefore, +I42 is a also a legal integer.
We can easily prove that 1,024 is an illegal integer by observing that the 1,024 A5 and 15-
comma appearing in this symbol does not appear in either EBNF rule; therefore, are each Illegal
the match is guaranteed to fail. Likewise for the letter A in the symbol 15.
Finally, we can prove that 15- is an illegal integer - not because it contains an
illegal character, but because its structure is incorrect: in this symbol the minus
follows the last digit, but the sequence in the integer rule requires that the sign
precede the first digit. So according to our EBNF rules, none of these symbols
SECTION 2.2 EBNF Descriptions and Rules

is recognized as a legal integer.' When viewing symbols as a language lawyer,


we cannot appeal to intuition; we must rely solely on the EBNF description we
are matching.

Tabular Proofs: A tabular proof is a more formal demonstration that a symbol Proof rules
matches an EBNF description. The first line in a tabular proof is always the
name of the syntactic entity we are nylng to match (in this example, integer);
the last line must be the symbol we are matching. Each line is derived from the
previous one according to the following rules.
1. Replace a name (LHS) by its definition (RHS)
2. Choose an alternative
3. Determine whether to include or discard an option
4. Determine the number of times to repeat
Combining rules 1 and 2 simplifies our proofs by allowing us to replace a left-hand Simplifying proofs
side by one of the alternatives in its right-hand side in a single step. Figure 2.1
contains a tabular proof showing +I42 is an integer.

Derivation Trees: We can illustrate a tabular proof more graphically by writing A graphic
it as a derivation tree. The downward branches in such a tree correspond to illustration
the same rules that allow us to go from one line to the next in a tabular proof. proofs
Although a derivation tree displays the same information as a tabular proof, it
omits certain irrelevant details: the ordering of some decisions in the proof (e.g.,
which digit is replaced first). The original symbol appears at the bottom of a
derivation tree, when its characters are read left to right. Figure 2.1 contains a
derivation tree showing +I42 is an integer.

Status Reason (rule #) integer

rtl
integer Gwen
[+I-] digit {digit) Replace integer by is RHS (1)
[+]digit{digit) Choose alternative (2)
*digit{digit) Include option (3) [+I-] digit {digit)
+l{digit) Replace digit by 1 alternative (1 &2)
*ldigit digit Use two repetitions (4)
*14digit Replace digit by 4 alternative (1&2) 11. 1
+I42 Replace digit by 2 alternative (1&2)
I digit digit
+ I I
Figure 2.1 A Tabular Proof and its Derivation Tree showing *I42 is an integer

'AU three symbols are legal integers under some i n q r e o r i o n : the 6 r s c uses a comma w separate
rhc thousands digir h m the hundreds, the second is a valid number written in base 16, and the h r d
is a negative number - somerimes wricten this way by accounons.
10 CHAPTER 2 EBNF: A Notation t o Describe Syntax

REVIEW QUESTIONS
1. Classrfy each of the following symbols as a legal or illegal integer. Note that part (0)
specifies a symbol containing no characters.
a. *42 e. -1492 i. 28 m. 0 q. *-7
b. f. 187 j. 187.0 n. forty two r. 1 543
c. -0 g. drei k. $15 o. s. 1*1
d. VII h. 251 1.1000 p. 555-1212 t. 000193
Annuer: Only a, c, e, f, I, m, and t are legal.
2. a. Write a tabular proof that -1024 is a legal integer. b. Draw a derivation tree
showing 03 is a legal integer.
Annuer: Note how the omitted option ([*I -1) is drawn in the derivation tree.
Status Reason (rule #) integer

rh
integer Given
[* l-]digit{digit) Replace integer by its RHS (1)
[-]digit{digit) -
Choose alternative (2)
-digit{digit) Include option (3) [+I-] digit {digit)
-l{digit) Replace digit by 1 alternative (18~2)
-1digit digit digit Use three repetitions (4) I I I
-1 Odigit digit Replace digit by 0 alternative (18~2) 0 digit
-102digit
-1024
Replace digit by 2 alternative (18~2)
Replace digit by 4 alternative (18~2)
I
3

2.3 More Examples of EBNF

The following EBNF description is equivalent' to the one presented in Identical versus
the previous section. Two EBNF descriptions are equivalent if they recognize equivalent
exactly the same legal and illegal symbols: for any possible symbol, both will
recognize it as legal or both will classify it as illegal -they never classify symbols
differently.
sign e I -
digit e 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
integer e [sign]digit{digit)

Thls EBNF description is not identical to the first, because it defines an extra Names for rules
sign rule that is then used in the integer rule. But these two EBNF descriptions do not change
are equivalent, because providing a named rule for +I- does not change which their meanings
symbols are legal. In fact, even ifthe names of all the rules are changed uniformly,
exactly the same symbols are recognized as legal.

*The words "identical" and "equivalent" have d~stinctmeanings. Identical meam "are exactly the
samem.E q d e n t means "are the s m e within some context". Any two dollar bills have identical
buying power. A dollar bill has equivalent buying power to four quarters in most concern; but in a
v e n d q machine that requires eraa change, it does not
SECTION 2.3 More Examples of EBNF

Any symbol recognized as an integer by the previous EBNF descriptions is Equivalent proofs
recognized as a z in this description, and vice versa. Just exchange the names x,
y, and z, for sign, digit, and integer in any tabular proof or derivation tree.
Complicated EBNF descriptions are easier to read and understand if they Good rule names
are named well: each name intuitively communicating the meaning of its rule's
definition. But to a language lawyer or compiler, names -good or bad -cannot
change the meaning of a rule or the classification of a symbol.

Incorrect EBNF Descriptions for integer


This section examines two EBNF descriptions that contain interesting errors. A simpler syntax
To start, we try to simplify the integer rule by removing the digit that precedes for integer? No!
the repetition. The best description is the simplest one; so if these rules are
equivalent to the previous ones, we have improved the description of integer.
sign + + I -
digit ~ 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
integer e [signltdigit)
Every symbol that is a legal integer in the previous EBNF descriptions is Almost equivalent
also legal in this one. For example, we can use this new EBNF description to
prove that +I42 is an integer: include the sign option, choosing the plus; repeat
digit three times, choosing 1,4, and 2.
But there are two symbols that this description recognizes as legal, while the Three differences
-
previous descriptions classify them as illegal: + and (signs without following
digts). T h e previous integer rules all require one digit followed by zero or more
others; but this integer rule contains just the repetition, which may be taken
zero times. To prove + is a legal integer: include the sign option, choosing the
plus; then repeat digit zero times. The proof for - is similar. Even the 'empty
symbol", which contains n o characters, is recognized by this EBNF description
as a legal integer: discard the sign option, then repeat digit zero times. Because
of these three differences, this EBNF description of integer is not equivalent to
the previous ones.
Next we address the problem of describing how to write numbers with Numbers with
embedded commas: 1,024. We can easily extend the digit rule to allow a comma embedded
as one of its alternatives. commas

sign + + I -
coma-digit + 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 1 ,
coma-integer e [sign]coma-digittconna-digit)
Using this description, we can prove that 1,024 is a legal comma-integer: Good news,
discard the sign option: repeat coma-digit four times, choosing 1 , 0 2 and 4. bad news
But we can also prove that 1, ,3,4 is a legal comma-integer, by using six repetitions
CHAPTER 2 EBNF: A Notation t o Describe Syntax

of comma-digit and choosing a comma for the second, third and fifth ones. By
treating a comma as if it were a digit, numbers with well-placed commas are
recognized as legal, but symbols with randomly placed commas also become
legal. See Exercise 8 for a correct solution to this problem.

REVIEW QUESTIONS
1. Are the following EBNF descriptions equivalent to the standard ones for integer?
Jusufy your answers.
sign e [+I-] sign e [+I-]
digit e 0111213141516171819 digit e 0111213141516171819
integer e sign digitCdigit) integer e sign Cdigitldigit
A m n : Each is equivalent Left: it is irrelevant whether the option brackets appear
around the sign in integer, or around + I - in the sign rule; in either case there is a
way to include or discard the sign. Right: it is irrelevant whether the mandatory
digit comes before or after the repeated ones; in either case one digit is mandated
m d there is a way to recognize one or more digin.
2. Write m EBNF description for even-integer that recognizes only even integers: -6
m d 34 are legal but 3 or -23 are n o t
Amn:
sign e +l-
even-digit e 0 1 2 1 4 1 6 1 8
digit e even-digit 1 1 1 3 1 5 1 7 1 9
even-integer e [sign]{digitleven-digit
3. Normalized integers have no extraneous leading zeros, and zero is unsigned. Write
m EBNF description for normalized-integer: 0, -1, and 193 are legal, but -01,000193,
+0, and -0 are n o t

sign * + I -
non-&digit e l 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
digit e 0 l non-&digit
normalized-integer e 0 1 [signlnon-0-digit{digit)

2.4 Syntax versus Semantics

EBNF descriptions specify only syntax: the form in which some entity Form versus
is written. They do not specify semantics: the meaning of what is written. meaning
T h e sentence, "Colorless green ideas sleep furiously." illustrates the difference
between syntax and semantics: It is syntactically correct, because the grammar
and punctuation are proper. But what does this sentence mean? How can ideas
sleep? If ideas can sleep, what does it mean for them to sleep furiously? Can ideas
have colors? Can ideas be both colorless and green? These questions all relate
SECTION 2.4 Syntax versus Semantics

to the semantics, or meaning, of the sentence. As another example the sentence,


"The Earth is the fourth planet out from the Sun" has an obvious meaning, but
its meaning is contradicted by known astronomical facts.
Two semantic issues are important in programming languages Symbols and
meanings
- Can two different symbols have the same meaning?
- Can a symbol have two different meanings?
The first issue is easy to illustrate. Everyone has a nickname; so two names Two symbols,
can refer to the same person. The second issue is a bit more subtle; here the one meaning
symbol we analyze is a sentence. Suppose you take a course that meets on
Mondays, Wednesdays and Fridays. If your instructor says on Monday, "The One
two meanings
next class is canceled" you know not to come to class on Wednesday. Now
suppose you take another course that meets every weekday. Ifyour instructor for
this course says on Monday, "The next class is canceled" you know not to come
to class on Tuesday. Finally, if it were Friday, "The next class is canceled" has
the same meaning in both courses: there is no class on Monday. So the meaning
of a sentence may depend on its context.

Semantics and EBNF Descriptions


Now we examine these semantic issues in relation to the EBNF description for The semantics
integer. In a mathematical context, the meaning of a number is its value. In of integer in
common usage, the symbols 1 and +1 both have the same value: an omitted sign mathematics
is considered equivalent to a plus sign. As an even more special case, the symbols
0 and +O and -0 all have the same value: the sign of zero is irrelevant. Finally, the
symbols 000193 and 193 both have the same meaning: leading zeros do not effect
a number's value.
But there are contexts where 000193 and 193 have different meanings. I once Alternative integer
worked on a computer where each user was assigned a six-digit account number; semantics
mine was 000193. When I logged in, the computer expected me to identify myself
with a six-digit account number; it accepted 000193 but rejected 193.
Another example concerns how measurements are written: although 9.0 and The semantics of
9.00 represent the samevalue, the former may indicate the quantity was measured measurements
to only two significant digits; the latter to three. Finally, when saying the name and
pronunciation
"Rich" and the adjective "rich", the upper-case letter is pronounced the same as
the lower-case one.
Structured integers contain embedded underscores that separate groups of The syntax
digits, indicating some important structure. We can use them to encode real- and semantics
world information in an easy to read form: dates 2-10-1954, phone numbers of structured
integers
1-800-555-1212, and credit card numbers314-159-265-358. Underscores can appear
only between digits - not as the first or last character, and they cannot be
adjacent. Here is an EBNF description that captures exactly these requirements.
digit e011l213141516171819
structured-integer e digit{[-]digit)
14 CHAPTER 2 EBNF: A Notation t o Describe Syntax

Semantically, underscores do not affect the value of a structured-integer: Semantics of


1-800-555-1212has the same meaning as 18005551212;when dialing either number, underscores
we press keys only for the characters representing a digit.
In summary, EBNF descriptions specify syntax not semantics. When we Describing
describe the syntax of an Ada feature in EBNF, we will describe its semantics semantics
using a mixture of English definitions and illustrations. Computer scientists are
just beginning to develop notations that describe the semantics of programming
languages in simple and precise ways. In general, meaning is much harder to
quantify than form.

REVIEW QUESTIONS
1. a. Find two dates that have the same meaning, when written naturally as structured
integers. b. Propose a new format for writing dates that alleviates this problem.
A m r r : a. T h e date December 5, 1987 is written as 12-5-1987; the date January 25,
1987 is written as 1-25-1987. Both symbols have the same meaning: the value 1251987.
(b)To alleviate this problem, always use two digits to speclfy 3 day, adding 3 leading
zero if necessary. Write the first date as 12-05-1987; write the second as 1-25-1987.
These structured integers have different values.

2.5 Syntax Charts


A Syntax Chart (SC) is a graphical notation for writing a syntax Representing
description. Figure 2.2 illustrates how to translate each EBNF control form into EBNF rules
its equivalent SC. In each case we must follow the arrows from the beginning of graphically
the picture on the left, to its end on the right. In a sequence, we must go through
each item. In a choice, we must go through one rung in a ladder of alternatives.
In an option we must go through either the top rung containing the item, or the
bottom that does not. Repetition is like option, but we can loop through the
item in the top rung; this picture is the only one with a right-to-left arrow.

Figure 2.2 Translating EBNF rules into Syntax Charts


SECTION 2.5 Syntax Charts 15

We can combine these control forms to translate the RHS of any EBNF rule Combining
into its equivalent syntax chart. We can also compose related syntax charts into and composing
one big SC that contains no named EBNF rules, by replacing each named LHS Syntaxcharts
with the SC for its RHS. Figure 2.3 shows the SC equivalents of the digit and
original integer EBNF rule, and one large composed SC for integer.

+I integer digit

1
digit

integer

Figure 2.3 Syntax Charts for digit and integer, and a composed integer

T h e syntax charts in Figure 2.4 illustrate the RHS of three interesting EBNF Disambiguation
rules. The first shows a repetition of two alternatives, where any number of of EBNF rules
intermixed As and Bs are legal: AAA, BB, or BABBA. A different choice can be made
for each repetition. The second shows two alternatives where a repetition of As
or a repetition of Bs are legal: AAA or BB, but not AB. Once the choice is made, only
one of the characters can be repeated. Any symbol legal in this rule is legal in
the first, but not vice versa.

- {A) 1 { B J
Figure 2.4 Syntax Charts Disambiguating Interesting EBNF Rules
16 CHAPTER 2 EBNF: A Notation to Describe Syntax

The third illustration shows how the sequence and choice control forms A factoring
interact: the stroke separates the first alternative (the sequence AB) from the technique
second (just C). To describe the sequence of A followed by either 6 or C we must
write both alternatives fully or use a second rule to "factor out" the alternatives.
simple e AB I AC tail e B I C
simple e A t a i l

EBNF is a compact text-based notation; syntax charts present the same EBNF versus
information, but in a graphical form. Which is better? For beginners, SCs are syntax charts
easier to use when classifying symbols; for advanced students EBNF descriptions,
which are smaller, are easier to read and understand. Because beginning students
become advanced ones, thls book uses EBNF rules to describe Ada's syntax.

REVIEW QUESTIONS
1. a. Translate each of the following right-hand sides into a syntax chart.
A{[JA) {A[BICI) {AlB[Cl)
b. Which symbols does the first RHS recognize as legal: 1-1, AA-AAA, -A, A_, A--A?

47%
Which symbols do the second and third RHS recognize as legal: ABAAC, ABC, BA, AA,
ABBA?

Am$T\
+ A

A{[JA) {A[BlCl} {AIB[CI}


bl. A-A and AA-AAA. b2. ABAAC and AA. b3. ABC, BA, AA and ABBA.

2.6 EBNF Descriptions of Sets


This section explores the syntax and semantics for writing sets of integers. The syntax of sets
Such sets start and end with parentheses, and contain zero or more integers
separated by commas. T h e empty set (1, a singleton set (31, and a set containing
the elements (5,-2,111 are all legal. Sets are illegal if they omit the parentheses,
mntain consecutive commas (1, ,31 or extra commas (,21 or (1,2,3,1.
Given a description of i n t e g e r , the following EBNF rules describe such sets. EBNF for sets
Note that the parentheses characters in i n t e g e r - s e t stand for themselves; they
are not used for grouping or any other special EBNF purpose.
integer-list e integer{.integer)
integer-set e ([integer-list]

We can easily prove that the empty set is a legal integer-set: discard the Proofs using
integer-list option between the ~arentheses.For a singleton set we include this integer-set
option but take zero repetitions after the first i n t e g e r in i n t e g e r - l i s t . Figure 2.5
SECTION 2.6 EBNF Descriptions of Sets

proves that (5,-2,111 is a legal integer-set. The tabular proof and its derivation
tree are shortened by recognizing in one step that 5, -2, and 11 are each an integer.
Like lemmas in mathematics, we use this informarion without proving it.

Status Reason integer-set


integer-set
([integer-list])
(integer-list)
(integer{.integer)
Given
Replace integer-set by i s RHS
Include option
Replace integer-list by i s RHS
dl
( [integer-list] )
(5{,integer)) Lemma: 5 is an integer I
I
(5,integer,integer) Use two repetitions
integer-list
(5,-2,integer) Lemma: -2 is an integer
(5,-2.11) Lemma: 11 is an integer

+l
integer {,integer)

B integer s integer

-2 11
Figure 2.5 A Tabular Proof and its Derivation Tree showing (5,-2.11) is an integer-set

Now we switch our focus to semantics and examine when two sets are Set semantics
equivalent. The rules involve duplicate elements and the order of elements.
- Duplicate elements are irrelevant and can be removed: (1,3,5,1,3,3,5) is
equivalent to (1,3,5).
- T h e order of the elements is irrelevant and can be rearranged: (1.5.3) is
equivalent to (1.3.5) and (3,1,5) and all other permutations of these values.
By convendon, we write sets in an ordered form, starting with the smallest Canonical sets
element and ending with the largest; each element is written once. Such a form
is called canonical. It is impossible for our EBNF description to enforce these
properties, which is why these rules are considered to be semantic, not syntactic.
The following EBNF rules are an equivalent description for writing sets. An equivalent .
Here, the option brackets are in the integer-list rule, not the integer-set rule. description
integer-list e [integer{,integer)]
integer-set e (integer-list)
There are two stylistic reasons to prefer the original description. First, it Stylistic
better balances the complexity between the EBNF rules: the repetition control preferences
form is in one rule, and the option is in the other. Second, the new description
CHAPTER 2 EBNF: A Notation t o Describe Syntax

allows integer-list to match the empty symbol, which contains no characters;


this is a bit awkward and can lead to problems if this EBNF rule is used in others.

Sets containing Integer Ranges


A range is a compact way to write a sequence of integers. Using the symbol . . The structure of
t o represent "up through", the range 2. .5 specifies 2,3,4, and 5: the values 2 up ranges and sets
through 5 inclusive. Using such a notation, we can write sets more compactly:
(2. .5,8,10. .13,17. .19,21) instead of (2,3,4,5,8,10,11,12,13,17,18,19,21).T h e
following EBNF rules extend our description of integer-set to include ranges.
integer-range e integer[. .integer]
integer-list e integer-range{,integer-range)
integer-set e ( [integer-list])
Figure 2.6 proves that (1,3. .7,15) is a legal integer-set.
Now we switch our focus to semantics and examine the exact meaning of Range semantics
ranges. For every pair of integers X and Y:
X 5 Y: the range X ..Y is equivalent to all integers between X and Y inclusive.
By thls rule, every integer X is equivalent to the range X ..X.
X > Y: the range X . .Y is the "null range" and contains no values.
By convention, we do not use ranges to write single values nor ranges of two Canonical ranges
values (1.2 is more compact than 1. .2) unless there are special reasons to do so.

REVIEW QUESTIONS
1. Translate the RHS of the integer-range,integer-list,and integer-set EBNF rules
into syntax charts.
- -
. . integer -integer-range
Awn: + integer +integer-range
integer[. .integer] integer{ ,integer-range)
integer-list

([integer-list]1
2. Conven the following sea into canonical form; use ranges when appropriate.
a. (1,5,9,3,7,11,9) c. (8,1,2,3.4,5,12,13,14,10) e. (1..3,8,2..5,12,4)
b. (1..3,8,5..9,4) d. (2..5,7..10,1) f. (4..1,12,2,7..10,6)
Awn:
a. (lS3,57,9,11 C. (1..5,8,10s12..14) e. (1..5,8,12)
b. (1.-9) d. (1..5,7..lo) f. (2,6..10,12)
3. The following EBNF description for integer-set is more compact than the original,
but they are not equivalent This one recognizes all the sets recognized by the
original description, but it recognizes others as well; find one of these sets.
integer-list e integer{,integer 1. .integer)
integer-set e ([integer-list])
SECTION 2.6 EBNF Descriptions of Sets

Annun-: This description allows "ranges"with more than one .. in them: (1. . 3 . .5).

Status Reason
integer-set Given
([integer-list] Replace integer-set by i s RHS
(integer-list) Include option
(integer-range{ .integer-range)) Replace integer-list by i s RHS
(integer [. .integer] {,integer-range) ) Replace integer-range by i s RHS
(integer{ ,integer-range)) Discard option
(l{ ,integer-range)) Lemma: 1 is an integer w

(1 ,integer-range,integer-range) Use two repetitions


(l,integer[. .integer] .integer-range) Replace integer-range by i s RHS
(1.31. .integer] ,integer-range) Lemma: 3 is an integer
(1,3. .integer ,integer-range) Include option
(1,3. .7,integer-range) Lemma: 7 is an integer
(1,3. .7,integer[. .integer]) Replace integer-range by i s RHS
(1,3..7,15[. .integer]) Lemma: 15 is an integer
(1,3..7,15) Discard option

integer-set

( [integer-list] )

integer-list

integer-range { ,integer-range)

integer [. .integer] s integer-range s integer-range

'
3 +,
integer [..integer]

.. integer
integer [. .integer]

5'l
I

Figure 2.6A Tabular Proof a n d its Derivation Tree s h o w i n g (1.3. .7,15) is a n integer-set
20 CHAPTER 2 EBNF: A Notation t o Describe Syntax

2.7 Advanced EBNF (optional)


This section examines two advanced concepts in E B N F : recursive E B N F Recursion, and
rules and using recursive E B N F rules to describe E B N F . In programming, EBNF in EBNF
recursion is a useful technique for specifying complex data structures and the
subprograms that manipulate them.

Recursive EBNF Descriptions


Recursive E B N F descriptions can contain rules that are directly recursive or Direct recursion
mutually recursive: such rules use their names in a special way. A directly
recursive E B N F ~ l uses
e its own name in its definition. The following directly
recursive rule recognizes symbols containing any number of As, which we can
describe mathematically as An, n 2 0.

The first alternative in this rule contains the empty symbol, which is A non-recursive
recopzed as a legal r l . Directly recursive rules must include at least one alternative
alternative that is not recursive, otherwise they describe only infinite-length
symbols.
T h e second alternative means that an A preceding anything that is recognized The meaning of
as an r l is also recognized as an r l : so A is recognized as an r l because it has recursive rules
an A preceding the empty symbol; likewise AA is also recognized as an r l , as is
AAA, etc. Figure 2.7 is a tabular proof and its derivation tree, showing how AAA
is recognized as a legal r l . If we require at least one A, this rule can be written
more understandably as r l A I A r l , with A as the non-recursive alternative.

-
Status Reason rl
rl
A rl
AA r l
AAA r l
Given
Replace r l by the second alternative in its RHS
Replace r l by the second alternative in its RHS
Replace r l by the second alternative in its RHS A
A rl

h
AAA Replace r l by the first (empty) alternative in i s RHS

Figure 2.7 A Tabular Proof and its Derivation Tree showing A A A is an r l


SECTION 2.7 Advanced EBNF (optional)

The recursive rl rule is equivalent to rl e { I ] ,which uses repetition instead. Recursion and
Recursion can always replace repetition, but the converse is not true,' because repetition
recursion is more powerful than repetition. For example, examine the following
directly recursive EBNF description. It 'recognizes all symbols having some
number of As followed by the same number of Bs: AnBn, n 2 0. The description
of these symbols cannot be written without recursion.

The rule r2 e {A}{B} does not require the same number of As as Bs.
We just learned that repetition can always be replaced by recursion in an EBNF versus BNF
EBNF rule. We can also replace any option control form by an equivalent
choice control form that contains an empty symbol. Using both techniques,
we can rewrite our original integer description, or any other one, using only
recursion, the choice control form, and the empty symbol. EBNF without the
repetition or option control form extensions is called just BNE The structure of
each BNF rule is simpler, but descriptions written using them are often longer.
sign e I + I -
digit e O l l l 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9
digits e I digitdigits
integer e sign digit digits
Even recursive EBNF rules are not powerful enough to describe all simple Recursive EBNF
languages. For example, they cannot describe symbols having some number rules are limited
of As, followed by the same number of Bs, followed by the same number of Cs: ~'C'B'A 20
AnB"C,n 2 0. To specify such a description, we need a more powerful notation:
type 1 or type 0 in the Chomsky Hierarchy.

Describing EBNF using EBNF rules


EBNF descriptions are powerful enough to describe their own syntax. Although Mutual recursion
such an idea may seem odd at first, recall that dictionaries use English to
describe English. T h e EBNF rules describing EBNF illustrate mutual recursion:
although no rule is directly recursive, the RHS of rhs is defined in terms of
sequence, whose RHS is defined in terms of option and repetition, whose RHSs
are defined in terms of rhs. Thus, all these rules are mutually described in terms
of each other.
For easier reading, these rules are grouped into three categories: character Boxed characters
set related, LHS/RHS related (mutually recursive), and EBNF related. When a
boxed character appears in a rule, it stands for itself, not its special meaning in
EBNF: denotes the stroke character, it does not separate alternatives in the
rule in whlch it appears. The empty symbol appears as an empty box

'The EBNF rule r1 k uil recursive: the m i v e refrrence occurs at the end of an alternative. All
cad recursive E B m d e s can be replaced by equivalent EBNF ~ l t that
s use repetition.
CHAPTER 2 EBNF: A Notation t o Describe Syntax

lover-case e= a~b~c~d~e~f~gfhliljlklllmlnlolplqlrlsltlulvlvlxlylz
upper-case r A~B~C~DIEIF~G~H~I~J~K~L~M~N~O~P~Q~R~S~T~UIVIUIIIYIZ
digit e= 011l2l31415I6l71819
other-character e= -1-l"III&I'J(l)I*~+~,I.I/l:~;~<~=~>
character e= lover-case I upper-case 1 d i g i t I other-character
empty-symbol e= I]

lhs e= lover-case{[-]lover-case I [-]digit)


option
repetition
sequence e= empty-symbol l {character I lhs I option I repetition)
rhs e= sequence{ nsequence)

ebnf - r u l e e= lhslr]rhs
ebnf -description e= {ebnf -rule)

REVIEW QUESTIONS
1. a. Write a tabular proof that shows AAAABBBB is a legal r2. b. Draw a derivation tree
showing AABB is a legal r2.
Anruler:
Status Reason r2
r2
Ar2B
AA r 2 BB
AAA r 2 BBB
Given
Replace r 2 by the second alternative in i s RHS
Replace r 2 by the second alternative in i s RHS
Replace r 2 by the second alternative in i s RHS
dl
A r2 B
AAAA r2BBBB Replace r 2 by the second alternative in i s RHS
AAAABBBB Replace r 2 by the first (empty) alternative in i s RHS
. . .
2. Replace the i n t e g e r - l i s t rule by an equivalent directly recursive one. A r2 B
Annun: i n t e g e r - l i s t e= integer I integer-list,integer
I
3. Rewrite: ebnf e= { A [B IC] ) as an equivalent description in BNF.

choice e= I B I C
bnf e= I Achoicebnf

SUMMARY
This chapter examined the use of EBNF rules to describe syntax. It started by discussing EBNF descriptions
EBNF descriptions, named rules, and the control f o m used in the right-hand sides of
these rules: sequence, choice, option, and repetition. Proofs showing how a symbol
matches an EBNF description were illustrated in English, and more formally as tabular
proofs and their derivation trees. ?hroughout the chapter various EBNF descriptions
of integers, ranges, and s e s were proposed and analyzed according to their syntax and
semantics. EBNF descriptions must be liberal enough to include all legal symbols,
but restrictive enough to exclude all illegal symbols. Syntax Charts were seen to
present graphically the same information contained in EBNF rules. Finally, this chapter
EXERCISES

introduced recursive descriptions (direct and mutually recursive) and the latter's use in an
EBNF description of EBNE

EXERCISES
1. Write an EBNF description for phone-number, which describes telephone numbers
written according to the following specifications. T h e description should be
compact, and each rule should be well named.
- Normal: a three digit exchange, followed by a dash, followed by a four digit
number: 555-1212
- Long Distance: A 1, followed by a dash, followed by a three digit area code
enclosed in parentheses, followed by a three digit exchange, followed by a dash,
followed by a four digit number: 1-(800)555-1212
- Interoffice: a 1 followed by either a 3 or a 5, followed by a dash, followed by a
four digit number: 15-1212

2. T h e control forms in each of the following pairs are not equivalent. Find the
simplest symbol that is classified differently by each control form in the pair.
al.[A][B] bl.{AIB) cl.AIB
a2. [A[B]] b2. {A) I {B) c2. [A][B]
3. Simp& each of the following control forms (but preserve equivalence).
For this problem, simpler means shorter or has fewer nested forms.
a. AlBlA c. [Al{A) e . [ A l B l I [BIAI g.AlAB
b. A ] ] ] d. [A]{C) I [B]{C) f. {[AlB] [Bl A]) h. Al AAl AAAl AAAAl AAAAA
4. Write an EBNF description for numbers written in scientific notation, which
scientists and engineers use to write very large and very small numbers compactty.
Avogadro'snumber is written 6.02252xlOf23 and read as 6.02252 -called the mantissa
- times 1 0 raised to the 23d power - called the exponent. Likewise, the mass of
an electron is written 9.11xlOf -31 and earth's gravitational constant is written 9.8 -
this number is pure mantissa; it is not multiplied by any power of ten.
Numbers in scientific notation always contain at least one digit in the mantissa;
if that digit is nonzero: (1) it may have a plus or minus sign preceding it, (2) it may be
followed by a decimal point, which may be followed by more digits, and (3) it may be
followed by an exponent that specifies multiplication by ten raised to some non-zero
unsigned or signed integer power. T h e symbols 0.5, 15.2, +0,0x10f5,5.3x10f02, and
5.3xlOf2.0 are illegal in scientific notation. Hint: my solution uses a total of five
EBNF rules: non+digit, d i g i t , mantissa, exponent, and scientific-notation.
5. Idenafy one advantage of writing dates as a structured-integer in the form: year,
month, day (1954-02-10) instead of in the normal order (02-10-1954).
6. T h e following EBNF rules attempt to describe every possible family-relation. Here
spaces are important, so all the characters do not run together.
contemporary -e SISTER I BROTHER 1 WIFE 1 HUSBAND
ancestor-descendent -e MOTHER I FATHER I DAUGHTER I SON I NEPHEW I NIECE
side-relation -e AUNT I UNCLE I COUSIN
close-relation -e contemporary I ancestor-descendent I side-relation
f ar-relation -e {GREAT) side-relation I {GREAT) [GRAND] ancestor-descendent
family-relation e far-relation 1 [STEP] close-relation
CHAPTER 2 EBNF: A Notation to Describe Syntax

a. Classify each of the following symbols as a legal or illegal family-relation.


i. SISTER iv. GRAND MOTHER vii. GREAT UNCLE
ii. UNCLE LARRY V. GRAND UNCLE viii. STEP STEP BROTHER
iii. GREAT GREAT GRAND SON vi. GRAND NIECE ix. GREAT MDTHER
b. Based on your classification in ix, improve the far-relation rule to always require
a GRAND after the last GREAT.
7. We can extend the previous EBNF rules for describing family relationships.
person e ME I MY family-relation I THE family-relation OF MY family-relation
This description specifies symbols such as MY GRAND MOTHER and THE MOTHER
OF MY MDTHER. These symbols may denote the same person, although MY
GRAND MDTHER may also refer to THE MDTHER OF MY FATHER, THE MOTHER OF MY
STEP FATHER, em. Write a simpler symbol that is equivalent to each of
the following; in some cases, more than one answer may be correct
a. THE SISTER OF MY MOTHER e. THE SON OF MY FATHER
b. THE GRAND SON OF MY MDTHER f. THE DAUGHTER OF MY GREAT GRAND MOTHER
c. THE WIFE OF MY FATHER g. THE GRAND SON OF MY GREAT GREAT GRAND MOTHER
d. THE GRAND FATHER OF MY SON h. THE FATHER OF MY AUNT
8. Write an EBNF description for comma-integer, which includes normalized unsigned
or signed integers (no extraneous leading zeros) that have commas in only the correct
places (separating thousands, millions, billions, etc.): 0; 213; -2,048; and 1,000,000. It
should not recognize -0; 062; 0,516; 05,418; 54,32,12; or 5, ,123 as legal.
9. a. Write an EBNFdescription for structured-integer-set that specifies an integer-set
(Section 2.6) allowing sets and ranges that use structured-integer (Section 2.4).
b. How can ?similar coma-integer-set allowing sets and ranges that uses comma-integer
(Exercise 8) lead to a semantic problem?
10. Using the following rules, mite an EBNF description for train. Let letters stand
for each car in a train: E for Engine, C for Caboose, B for Boxcar, P for Passenger car,
and D for Dining car. The railroad has four rules telling how to form trains.
a. Trains start with one or more Engines and end with one Caboose; neither can
appear anywhere else.
b. Whenever Boxcars are used, they always come in pairs: BB, BBBB, etc.
c. There cannot be more than four Passenger cars in a series.
d. A single dining car must follow each series of passenger cars; it cannot appear
anywhere else.

Train Analysis
EC The smallest train
EEEPPDBBPDBBBBC A train showing all the cars
EEEPPDBBPDBBBB Illegal by rule a - no caboose
EBBBC Illegal by rule b - 3 boxcars in a row
EEPPPPPDBBC Illegal by rule c - 5 passenger cars in a row
EEPPBBC Illegal by rule d -no dining car after passenger cars
EEBBOC Illegal by rule d - dining car after box car
11. The following message was seen on a bumper sticker: Stinks Syntax. What is the
joke?
EXERCISES

12. a. Write a directly recursive EBNF rule named mp that describes all symbols that
have matchingparentheses: 0, 000, O ( O O ) , and ( ( O ) O ) ( O ( O ) ) O . It should
not recognize (, 0)(, or ( 00 as legal. b. Write a tabular proof and its derivation
tree showing how 0(00 is recognized as legal.

You might also like