0% found this document useful (0 votes)
26 views

2008 A Case Study in Grammar Engineering

This paper describes applying software engineering techniques like version control, testing, and metrics to grammar development. The methodology is demonstrated through developing a grammar for VDM-SL from its language standard. The resulting grammar is well-tested and supports automatic generation of syntax tools.

Uploaded by

krishan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

2008 A Case Study in Grammar Engineering

This paper describes applying software engineering techniques like version control, testing, and metrics to grammar development. The methodology is demonstrated through developing a grammar for VDM-SL from its language standard. The resulting grammar is well-tested and supports automatic generation of syntax tools.

Uploaded by

krishan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

A case study in grammar engineering

Tiago L. Alves1 and Joost Visser2


1

University of Minho, Portugal, and


Software Improvement Group, The Netherlands
[email protected]
2
Software Improvement Group, The Netherlands
[email protected]

Abstract. This paper describes a case study about how well-established


software engineering techniques can be applied to the development of a
grammar. The employed development methodology can be described as
iterative grammar engineering and includes the application of techniques
such as grammar metrics, unit testing, and test coverage analysis. The
result is a grammar of industrial strength, in the sense that it is welltested, it can be used for fast parsing of high volumes of code, and it
allows automatic generation of support for syntax tree representation,
traversal, and interchange.

Introduction

Grammar engineering is an emerging field of software engineering [1] that aims


to apply solid software engineering techniques to grammars, just as they are applied to other software artifacts. Such techniques include version control, static
analysis, and testing. Through their adoption, the notoriously erratic and unpredictable process of developing and maintaining large grammars can become more
efficient and effective, and can lead to results of higher quality. Such timely delivery of high-quality grammars is especially important in the context of grammarcentered language tool development, where grammars are used for much more
than single-platform parser generation.
In this paper, we provide details of a grammar engineering project where wellestablished software engineering techniques were applied to the development of
a VDM-SL grammar from its ISO standard language reference. Our approach
can be characterised as a tool-based methodology for iterative grammar development embedded into the larger context of grammar-centered language tool
development. We explain the methodology and illustrate its application in a detailed case study. We thus hope to contribute to the body of knowledge about
best grammar engineering practises.
The paper is structured as follows. Section 2 describes our tool-based methodology for grammar engineering. We demonstrate key elements such as grammar
versioning, metrics, visualization, and testing, and we embed our grammar engineering approach into the larger context of grammar-centered language tool
development. Section 3 describes the application of these general techniques in

the specific case of our VDM grammar development project. We describe the
development process and its intermediate and final deliverables. We discuss related work in Section 4, including comparison to earlier grammar engineering
case studes [2, 3]. In Section 5 we summarize our contributions and identify future challenges.

Tool-based grammar engineering methodology

Our methodology for grammar engineering is based on well-established software


engineering techniques. In this section we will explain the basic ingredients of this
methodology. We first explain our grammar-centered approach to language tool
development and motivate our choice of grammar notation (SDF), in Section 2.1.
In Section 2.2 we define the most important phases in grammar evolution which
define the different kinds of transformation steps that may occur in a grammar
engineering project. In Section 2.3 we present a catalogue of grammar metrics.
Metrics allow quantification which is an important instrument in understanding
and controlling grammar evolution. We describe tool-based grammar testing in
Section 2.4 where we distinguish functional and unit testing. Finally, in Section 2.5, we discuss the use of test coverage as test quality indicator. A more
general discussion of related grammar engineering work is deferred to Section 4.
2.1

Grammar-centered tool development

In traditional approaches to language tool development, the grammar of the


language is encoded in a parser specification. Commonly used parser generators
include Yacc, Antlr, and JavaCC. The parser specifications consumed by such
tools are not general context-free grammars. Rather, they are grammars within
a proper subset of the class of context-free grammars, such as LL(1), or LALR.
Entangled into the syntax definitions are semantic actions in a particular target
programming language, such as C, C++ or Java. As a consequence, the grammar can serve only a single purpose: generate a parser in a single programming
language, with a singly type of associated semantic functionality (e.g. compilation, tree building, metrics computation). For a more in-depth discussion of the
disadvantages of traditional approaches to language tool development see [4].
For the development of language tool support, we advocate a grammarcentered approach [5]. In such an approach, the grammar of a given language
takes a central role in the development of a wide variety of tools or tool components for that language. For instance, the grammar can serve as input for
generating parsing components to be used in combination with several different programming languages. In addition, the grammar serves as basis for the
generation of support for representation of abstract syntax, serialization and deserialization in various formats, customizable pretty-printers, and support for
syntax tree traversal. This approach is illustrated by the diagram in Figure 1.
For the description of grammars that play such central roles, it is essential
to employ a grammar description language that meets certain criteria. It must

grammar

parser
pretty-printer

Abstract
Syntax

serialization
deserialization

Traversal
Support

Fig. 1. Grammar-centered approach to language tool development.

be neutral with respect to target implementation language, it must not impose


restrictions on the set of context-free languages that can be described, and it
should allow specification not of semantics, but of syntax only. Possible candidates are BNF or EBNF, or our grammar description of choice: SDF [6, 7].
The syntax definition formalism SDF allows description of both lexical and
context-free syntax. It adds even more regular expression-style construct to BNF
than EBNF does, such as separated lists. It offers a flexible modularization
mechanism that allows modules to be mutually dependent, and distribution of
alternatives of the same non-terminal across multiple modules. Various kinds of
tool support are available for SDF, such as a well-formedness checker, a GLR
parser generator, generators of abstract syntax support for various programming
languages, among which Java, Haskell, and Stratego, and customizable prettyprinter generators [812].
2.2

Grammar evolution

Grammars for sizeable languages are not created instantaneously, but through
a prolonged, resource consuming process. After an initial version of a grammar
has been created, it goes through an evolutionary process, where piece-meal
modifications are made at each step. After delivery of the grammar, evolution
may continue in the form of corrective and adaptive maintenance.
A basic instrument in making such evolutionary processes tractable is version
control. We have chosen the Concurrent Versions System (CVS) as the tool to
support such version control [13].
As alternative, GDK [14] could be used enabling the documentation of changes
as a set of transformations. Although this is an interesting academic approach
we preferred the use of more practical and standard tool.
In grammar evolution, different kinds of transformation steps occur:
Recovery: An initial version of the grammar may be retrieved by reverse engineering an existing parser, or by converting a language reference manual,

Size and complexity metrics


TERM Number of terminals
VAR Number of non-terminals
MCC McCabes cyclomatic complexity
AVS-P Average size of RHS per production
AVS-N Average size of RHS per non-terminal
Fig. 2. Size and complexity metrics for grammars.

available as Word or PDF document. If only a hardcopy is available then it


should be transcribed.
Error correction: Making the grammar complete, fully connected, and correct
by supplying missing production rules, or adapting existing ones.
Extension or restriction: Adding rules to cover the constructs of an extended
language, or removing rules to limit the grammar to some core language.
Refactoring: Changing the shape of the grammar, without changing the language that it generates. Such shape changes may be motivated by different
reasons. For instance, changing the shape may make the description more
concise, easier to understand, or it may enable subsequent correction, extensions, or restrictions.
In our case, grammar descriptions will include disambiguation information, so
adding disambiguation information is yet another kind of transformation step
present in our evolution process.
2.3

Grammar metrics

Quantification is an important instrument in understanding and controlling


grammar evolution, just as it is for software evolution in general. We have
adopted, adapted, and extended the suite of metrics defined for BNF in [15]
and implemented a tool, called SdfMetz, to collect grammar metrics for SDF.
Adaptation was necessary because SDF differs from (E)BNF in more than
syntax. For instance, it allows several productions for the same non-terminal.
This forced us to choose between using the number of productions or the number
of non-terminals in some metrics definitions.
Furthermore, SDF grammars contain more than just context-free syntax.
They also contain lexical syntax and disambiguation information. We decided
to apply the metrics originally defined for BNF only to the context-free syntax,
to make comparisons possible with the results of others. For the disambiguation
information these metrics were extended with the definition of a dedicated set of
metrics. Full details about the definition and the implementation of these SDF
metrics are provided in [16].
We will discuss several categories of metrics: size and complexity metrics,
structure metrics, Halstead metrics, and disambiguation metrics by providing a
brief description of each.

Structure metrics
TIMP Tree impurity (%)
CLEV Normalized count of levels (%)
NSLEV Number of non-singleton levels
DEP Size of largest level
HEI Maximum height
Fig. 3. Structure metrics for grammars.

Size, complexity, and structure metrics Figure 2 lists a number of size and
complexity metrics for grammars. These metrics are defined for BNF in [15].
The number of terminals (TERM) and non-terminals (VAR) are simple metrics
applicable equally to BNF and SDF grammars. McCabes cyclomatic complexity
(MCC), originally defined for program complexity, was adapted for BNF grammars, based on an analogy between grammar production rules and program
procedures. Using the same analogy, MCC can be extended easily to cover the
operators that SDF adds to BNF.
The average size of right-hand sides (AVS) needs to be adapted to SDF with
more care. In (E)BNF the definition of AVS is trivial: count the number of
terminals and non-terminals on the right-hand side of each grammar rule, sum
these numbers, and divide them by the number of rules. In SDF, this definition
can be interpreted in two ways, because each non-terminal can have several
productions associated to it. Therefore, we decided to split AVS into two separate
metrics: average size of right-hand sides per production (AVS-P) and average
size of right-hand sides per non-terminal (AVS-N). For grammars where each
non-terminal has a single production rule, as is the case for (E)BNF grammars,
these metrics will present the same value. For SDF grammars, the values can be
different. While the AVS-N metric is more appropriate to compare with other
formalisms (like BNF and EBNF), the AVS-P metric provides more precision.

Structure metrics Figure 3 lists a number of structure metrics also previously


defined in [15]. Each of these metrics is based on the representation of a grammar
as a graph which has non-terminals as nodes, and which contains edges between
two non-terminals whenever one occurs in the right-hand side of the definition
of the other. This graph is called the grammars flow graph.
Only the tree impurity metric (TIMP) is calculated directly from this flow
graph, all the other structure metrics are calculated from the corresponding
strongly connected components graph. This latter graph is obtained from the
flow graph by grouping the non-terminals that are strongly connected (reachable
from each other) into nodes (called components of levels of the grammar). An
edge is created from one component to another if in the flow graph at least one
non-terminal from one component has an edge to a non-terminal from the other
component. This graphs is called the grammars level graph or graph of strongly
connected components.

Halstead basic metrics


n1 Number of distinct operators
n2 Number of distinct operands
N1 Total number of operators
N2 Total number of operands

Halstead derived metrics


n Program vocabulary
N Program length
V Program volume
D Program difficulty
E Program effort (HAL)
L Program level
T Program time

Fig. 4. Halstead metrics.

Tree impurity (TIMP) measures how much the flow graph deviates from a
tree, expressed as a percentage. A tree impurity of 0% means that the graph is
a tree and a tree impurity of 100% means that is a fully connected graph.
Normalized count of levels (CLEV) expresses the number of nodes in the level
graph (graph of strongly connected components) as a percentage of the number
of nodes in the flow graph. A normalized count of levels of 100% means that
there are as many levels in the level graph as non-terminals in the flow graph.
In other words, there are no circular connections in the flow graph, and the level
graph only contains singleton components. A normalized count of levels of 50%
means that about half of the non-terminals of the flow graph are involved in
circularities and are grouped into non-singleton components in the level graph.
Number of non-singleton levels (NSLEV) indicates how many of the grammar
strongly connected components (levels) contain more than a single non-terminal.
Size of the largest level (DEP) measures the depth (or width) of the level
graph as the maximum number of non-terminals per level.
Maximum height (HEI) measures the height of the level graph as the longest
vertical path through the level graph, i.e. the biggest path length from a source
of the level graph to a sink.
Halstead metrics The Halstead Effort metric [17] has also been adapted for
(E)BNF grammars in [15]. We compute values not only for Halsteads effort
metric but also for some of its ingredient metrics and related metrics. Figure 4
shows a full list. The essential step in adapting Halsteads metrics to grammars is
to interpret the notions of operand and operator in the context of grammars. For
more details about the interpretation from BNF to SDF we refer again to [16].
The theory of software science behind Halsteads metrics has been widely
questioned. In particular, the meaningfulness and validity of the effort and time
metrics have been called into question [18]. Below, we will still report HAL, for
purposes of comparison to data reported in [15].
Disambiguation metrics In SDF, disambiguation constructs are provided in
the same formalism as the syntax description itself. To quantify this part of SDF
grammars, we defined a series of metrics, which are shown in Figure 5. These
metrics are simple counters for each type of disambiguation construct offered by
the SDF notation.

Disambiguation metrics
FRST Number of follow restrictions
ASSOC Number of associativity attributes
REJP Number of reject productions
UPP Number of unique productions in priorities
Fig. 5. Disambiguation metrics for grammars.

2.4

Grammar testing

In software testing, a global distinction can be made between white box testing
and black box testing. In black box testing, also called functional or behavioral
testing, only the external interface of the subject system is available. In white box
testing, also called unit testing, the internal composition of the subject system
is taken into consideration, and the individual units of this composition can be
tested separately.
In grammar testing, we make a similar distinction between functional tests
and unit tests. A functional grammar test will use complete files as test data.
The grammar is tested by generating a parser from it and running this parser
on such files. Test observations are the success or failure of the parser on a
input file, and perhaps its time and space consumption. A unit test will use
fragments of files as test data. Typically, such fragments are composed by the
grammar developer to help him detect and solve specific errors in the grammar,
and to protect himself from reintroducing the error in subsequent development
iterations. In addition to success and failure observations, unit tests may observe
the number of ambiguities that occur during parsing, or the shape of the parse
trees that are produced.
For both functional and unit testing we have used the parse-unit utility [19].
Tests are specified in a simple unit test description language with which it is
possible to declare whether a certain input should parse or not, or that a certain
input sentence should produce a specific parse tree (tree shape testing). Taking
such test descriptions as input, the parse-unit utility allows batches of unit
tests to be run automatically and repeatedly.
2.5

Coverage metrics

To determine how well a given grammar has been tested, a commonly used
indicator is the number of non-empty lines in the test suites.
A more reliable instrument to determine grammar test quality is coverage
analysis. We have adopted the rule coverage (RC) metric [20] for this purpose.
The RC metric simply counts the number of production rules used during parsing
of a test suite, and expresses it as a percentage of the total number of production
rules of the grammar.
SDF allows two possible interpretations of RC, due to the fact that a single
non-terminal may be defined by multiple productions. (Above, we discussed a
similar interpretation problem for the AVS metric.) One possibility is to count

each of these alternative productions separately. Another possibility is to count


different productions of the same non-terminal as one. For comparison with rule
coverage for (E)BNF grammars, the latter is more appropriate. However, the
former gives a more accurate indication of how extensively a grammar is covered
by the given test suite. Below we report both, under the names of RC (rule
coverage) and NC (non-terminal coverage), respectively. These numbers were
computed for our functional test suite and unit test suite by a tool developed
for this purpose, called SdfCoverage [16].
An even more accurate indication can be obtained with context-dependent
rule coverage [21]. This metric takes into account not just whether a given production is used, but also whether it has been used in each context (use site)
where it can actually occur. However, implementation and computation of this
metric is more involved.

Development of the VDM grammar

The Vienna Development Method (VDM) is a collection of techniques for the


formal specification and development of computing systems. VDM consists of a
specification language, called VDM-SL, and an associated proof theory. Specifications written in VDM-SL describe mathematical models in terms of data
structures and data operations that may change the models state. VDM-SL is
quite a rich language in the sense that its syntax provides notation for a wide
range of basic types, type constructors, and mathematical operators (the size
and complexity of the language will be quantified below). VDMs origins lie in
the research on formal semantics of programming languages at IBMs Vienna
Laboratory in the 1960s and 70s, in particular of the semantics of PL/I. In 1996,
VDM achieved ISO standardization.
We have applied the grammar engineering techniques described above during
the iterative development of an SDF grammar of VDM-SL. This grammar development project is ancillary to a larger effort in which various kinds of VDM tool
support are developed in the context of Formal Methods research. For example,
the grammar has already been used for the development of VooDooM, a tool
for converting VDM-SL specifications into relational models in SQL [22].
In this section we describe the scope, priorities, and planned deliverables of
the project, as well as its execution. We describe the evolution of the grammar
during its development both in qualitative and quantitative terms, using the
metrics described above. The test effort during the project is described in terms
of the test suites used and the evolution of the unit tests and test coverage
metrics during development.
3.1

Scope, priorities, and planned deliverables

Language Though we are interested in eventually developing grammars for various existing VDM dialects, such as IFAD VDM and VDM++, we limited the
scope of the initial project to the VDM-SL language as described in the ISO
VDM-SL standard [23].

Grammar shape Not only should the parser generate the VDM-SL language as
defined in the standard, we also want the shape of the grammar, the names of the
non-terminals, and the module structure to correspond closely to the grammar.
We want to take advantage of SDFs advanced regular expression-style constructs
wherever this leads to additional conciseness and understandability.
Parsing and parse trees Though the grammar should be suitable for generation
of a wide range of tool components and tools, we limited the scope of the initial
project to development of a grammar from which a GLR parser can be generated.
The generated parser should be well-tested, exhibit acceptable time and space
consumption, parse without ambiguities, and build abstract syntax trees that
correspond as close as possible to the abstract syntax as defined in the standard.
Planned deliverables Based on the defined scope and priorities, a release plan
was drawn up with three releases within the scope of the initial project:
Initial grammar Straightforward transcription of the concrete syntax BNF
specification of the ISO standard into SDF notation. Introduction of SDFs
regular expression-style constructs.
Disambiguated grammar Addition of disambiguation information to the grammar, to obtain a grammar from which a non-ambiguous GLR parser can be
generated.
Refactored grammar Addition of constructor attributes to context-free productions to allow generated parsers to automatically build ASTs with constructor names corresponding to abstract syntax of the standard. Changes
in the grammar shape to better reflect the tree shape as intended by the
abstract syntax in the standard.
The following functionalities have explicitly been kept outside the scope of the
initial project, and are planned to be added in follow-up projects:
Haskell front-end3 . Including generated support for parsing, pretty-printing,
AST representation, AST traversal, marshalling ASTs to XML format and
back, marshalling of ASTs to ATerm format and back.
Java front-end. Including similar generated support.
IFAD VDM-SL extensions, including module syntax.
VDM object-orientation extension (VDM++ ).
Section 5.2 discusses some of this future work in more detail.
3.2

Grammar creation and evolution

To accurately keep track of all grammar changes, a new revision was created for
each grammar evolution step. This led to the creation of a total of 48 development
versions. While the first and the latest release versions (initial and refactored)
correspond to development versions 1 and 48 of the grammar, respectively, the
intermediate release version (disambiguated) corresponds to version 32.
3

At the time of writing, a release with this functionality has been completed.

The initial grammar The grammar was transcribed from the hardcopy of the
ISO Standard [23]. In that document, context-free syntax, lexical syntax and disambiguation information are specified in a semi-formal notation. Context-free
syntax is specified in EBNF, but the terminals are specified as mathematical
symbols. To translate the mathematical symbols to ASCII symbols an interchange table is provided. Lexical syntax is specified in tables by enumerating
the possible symbols. Finally, disambiguation information is specified in terms
of precedence in tables and equations.
Apart from changing syntax from EBNF to SDF and using the interchange
table to substitute mathematical symbols for its parseable representation, the
following was involved in the transcription.
Added SDF constructs Although a direct transcription from the EBNF specification was possible, we preferred to use SDF specific regular-expressionstyle constructs. For instance consider the following excerpt from the ISO
VDM-SL EBNF grammar:
product type = type, "*", type, { "*", type} ;

During transcription this was converted to:


{ Type "*" }2+ -> ProductType

Both excerpts define the same language. Apart from the syntactic differences
from EBNF to SDF, the difference is that SDF has special constructs that
allows definition of the repetition of a non-terminal separated by a terminal.
In this case, the non-terminal Type appears at least two times and is always
separated by the terminal "*".
Detected top and bottom non-terminals To help the manual process of
typing the grammar a small tool was developed to detect top and bottom
non-terminals. This tool helped to detect typos. More than one top nonterminal, or a bottom non-terminal indicates that a part of the grammar is
not connected. This tool provided a valuable help not only in this phase but
also during the overall development of the grammar.
Modularized the grammar (E)BNF does not support modularization. The
ISO Standard separates concerns by dividing the EBNF rules over sections.
SDF does support modules, which allowed us to modularize the grammar
following the sectioning of the ISO standard. Also, another small tool was
implemented to discover the dependencies between modules.
Added lexical syntax In SDF, lexical syntax can be defined in the same grammar as context-free syntax, using the same notation. In the ISO standard,
lexical syntax is described in an ad hoc notation resembling BNF, without
clear semantics. We interpreted this lexical syntax description and converted
it into SDF. Obtaining a complete and correct definition required renaming
some lexical non-terminals and providing additional definitions. Detection
of top and bottom non-terminals in this case helped to detect some inconsistencies in the standard.

Table 1. Grammar metrics for the three release versions.


Version
term var mcc avs-n avs-p hal timp clev nslev dep hei
initial
138 161 234
4.4
2.3 55.4 1% 34.9
4 69 16
disambiguated
138 118 232
6.4
2.8 61.1 1.5% 43.9
4 39 16
refactored
138 71 232 10.4
3.3 68.2 3% 52.6
3 27 14

Disambiguation In SDF, disambiguation is specified by means of dedicated


disambiguation constructs [24]. These are specified more or less independently
from the context-free grammar rules. The constructs are associativity attributes,
priorities, reject productions and lookahead restrictions.
In the ISO standard, disambiguation is described in detail by means of tables and a semi-formal textual notation. We interpreted these descriptions and
expressed them with SDF disambiguation constructs. This was not a completely
straightforward process, in the sense that it is not possible to simply translate
the information of the standard document to SDF notation. In some cases, the
grammar must respect specific patterns in order enable disambiguation. For each
disambiguation specified, a unit test was created.
Refactoring As already mentioned, the purpose of this release was to automatically generate ASTs that follow the ISO standard as close as possible.
Two operations were performed. First, constructor attributes were added to the
context-free rules to specify AST node labels. Second, injections were removed
to make the grammar and the ASTs nicer.
The removal of the injections needs further explanation. We call a production
rule an injection when it is the only defining production of its non-terminal, and
its right-hand side contains exactly one (different) non-terminal. Such injections,
which already existed in the original EBNF grammar, were actively removed, because they needlessly increase the size of the grammar (which can be observed
in the measurements) and reduce its readability. Also, the corresponding automatically built ASTs are more compact after injection removal. The names of
the inlined productions, however, were preserved in annotations to create the
AST, so actual no information loss occurs.
3.3

Grammar metrics

We measured grammar evolution in terms of the size, complexity, structure, and


Halstead metrics introduced above. The data is summarized in Table 1. This
table shows the values of all metrics for the three released versions. In addition,
Figure 6 graphically plots the evolution of a selection of the metrics for all 48
development versions.
Size and complexity metrics A first important observation to make is that
the number of terminals (TERM) is constant throughout grammar development.

300

250

200
VAR
HAL (K)
CLEV

150

100

50

0
1

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Fig. 6. The evolution of VAR, HAL, and CLEV grammar metrics during development.
The x-axis represents the 48 development versions.

This is conform to expectation, since all keywords and symbols of the language
are present from the first grammar version onward.
The initial number of 161 non-terminals (VAR) decreases via 118 after disambiguation to 71 after refactoring. These numbers are the consequence of changes
in grammar shape where non-terminals are replaced by their definition. In the
disambiguation phase (43 non-terminals removed), such non-terminal inlining
was performed to make formulation of the disambiguation information possible,
or easier. For instance, after inlining, simple associativity attributes would suffice
to specify disambiguation, while without inlining more elaborate reject productions might have been necessary. In the refactoring phase (47 non-terminals removed), the inlinings performed were mainly removals of injections. These were
performed to make the grammar easier to read, more concise, and suitable for
creation of ASTs closer to the abstract syntax specification in the standard.
The value of the McCabe cyclomatic complexity (MCC) is expected to remain
constant. However the value decreases by 2 during disambiguation, meaning that
we eliminated two paths in the flow graph of the grammar. This was caused
by refactoring the syntax of product types and union types in similar ways
required for disambiguation purposes. In case of product types, the following
two production rules:
ProductType
-> Type
{ Type "*" }2+ -> ProductType

were replaced by a single one:


Type "*" Type

-> Type

For union types, the same replacement was performed. The language generated
by the grammar remained the same after refactoring, but disambiguation using
priorities became possible. In refactorings, MCC remains constant as expected.
The average rules size metrics, AVS-N and AVS-P increase significantly.
These increases are also due to inlining of non-terminals. Naturally, when a
non-terminal with a right-hand size of more than 1 is inlined, the number of

non-terminals decreases by 1, and the size of the right-hand sides of the productions in which the non-terminal was used goes up. The increase of AVS-N
is roughly by a factor of 2.4, while the increase of AVS-P is by a factor of 1.4.
Given the decrease of VAR, increase in AVS-N is expected. However, the limited
increase in AVS-P indicates that inlining did not produce overly large production
rules.

Halstead metrics The value of the Halstead Effort metric (HAL) fluctuates
during development. It starts at 228K in the initial grammar, and immediately
rises to 255K. This initial rise is directly related to the removal of 32 nonterminals. The value then rises more calmly to 265K, but drops again abruptly
towards the end of the disambiguation phase, to the level of 236K. During refactoring, the value rises again to 255K, drops briefly to 224K, and finally stabilizes
at 256K. Below, a comparison of these values with those of other grammars will
be offered. The use of Halstead is for comparison purpose only and we attach no
conclusions.

Structure metrics Tree impurity (TIMP) measures how much the grammars
flow graph deviates from a tree, expressed as a percentage. The low values for
this measure indicates that our grammar is almost a tree, or, in other words,
that complexity due to circularities is low. As the grammar evolves, the tree
impurity increases steadily, from little more than 1%, to little over 3%. This
development can be attributed directly to the non-terminal inlining that was
performed. When a non-terminal is inlined, the flow graph becomes smaller, but
the number of cycles remains equal, i.e. the ratio of the latter becomes higher.
Normalized count of levels (CLEV) indicates roughly the percentage of modularizability, if grammar levels (strongly connected components in the flow graph)
are considered as modules. Throughout development, the number of levels goes
down (from 58 to 40; values are not shown), but the potential number of levels, i.e. the number of non-terminals, goes down more drastically (from 161 to
71). As a result, CLEV rises from 34% to 53%, meaning that the percentage of
modularizability increases.
The number of non-singleton levels (NSLEV) of the grammar is 4 throughout
most of its development, except at the end, where it goes down to 3. Inspection
of the grammar learns us that these 4 levels roughly correspond to Expressions,
Statement, Type and StateDesignators. The latter becomes a singleton level towards the end of development due to inlining.
The size of the largest grammar level (DEP) starts initially very high at 69
non-terminals, but drops immediately to only 39. Towards the end of development, this number drops further to 27 non-terminals in the largest level, which
corresponds to Expressions. The decrease in level sizes is directly attributable
to inlining of grammar rules involved in cycles.
The height of the level graph (HEI) is 16 throughout most of the evolution
of the grammar, but sinks slightly to 14 towards the end of development. Only

Fig. 7. The evolution of the ASSOC and UPP disambiguation metrics compared with
the evolution of the number of productions (PROD). The x-axis represents the 48
development versions.

inlining of production rules not involved in cycles leads to reduction of path


length through the level graph. This explains why the decrease of HEI is modest.
Disambiguation metrics In Figure 7 we plot the evolution of two disambiguation metrics and compare them to the number of productions metric (PROD).
Although we computed more metrics, we chose to show only the number of associativity attributes (ASSOC) and the number of unique productions in priorities
(UPP) because these are the two types of disambiguation information most used
during the development.
In the disambiguation phase, 31 development versions were produced (version
32 corresponds to the disambiguated grammar). Yet, by analyzing the chart we
can see that the ASSOC and the UPP metrics stabilize after the 23rd version.
This is because after this version other kind of disambiguation information was
added (reject productions and lookahead restrictions) which are not covered by
the chosen metrics.
Also, it is interesting to see that in the development version 2 no disambiguation information was added but the number of productions drops significantly.
This was due to injection removal necessary to prepare for disambiguation.
From the version 2 to 9, the UPP and ASSOC metrics grow, in most cases, at
the same rate. In these steps, the binary expressions were disambiguated using
associativity attributes (to remove ambiguity between a binary operator and
itself) and priorities (to remove ambiguity between a binary operator and other
binary operators). Between version 9 to 10, a large number of unary expressions
were disambiguated, involving priorities (between the unary operator and binary
operators) but not associativity attributes (unary operators are not ambiguous
with themselves).
From versions 9 to 16 both metrics increase fairly gradually. But in version
17, there is a surge in the number of productions in priorities. This was caused
by the simultaneous disambiguation of a group of expressions with somewhat

Table 2. Grammar metrics for VDM and other grammars. The italicized grammars
are in BNF, and their metrics are reproduced from [15]. The remaining grammars are
in SDF. Rows have been sorted by Halstead effort (HAL), reported in thousands.
Grammar
term var mcc avs-n avs-p hal timp clev nslev dep
Fortran 77
21 16 32
8.8
3.4 26 11.7 95.0
1
2
ISO C
86 65 149 5.9
5.9 51 64.1 33.8
3
38
Java v1.1
100 149 213 4.1
4.1 95 32.7 59.7
4
33
AT&T SDL
83 91 170 5.0
2.6 138 1.7 84.8
2
13
ISO C++
116 141 368 6.1
6.1 173 85.8 14.9
1
121
ECMA Standard C# 138 145 466 4.7
4.7 228 29.7 64.9
5
44
ISO VDM-SL
138 71 232 10.4 3.3 256 3.0 52.6
3
27
VS Cobol II
333 493 739 3.2
1.9 306 0.24 94.4
3
20
VS Cobol II (alt)
364 185 1158 10.4 8.3 678 1.18 82.6
5
21
PL/SQL
440 499 888 4.5
2.1 715 0.3 87.4
2
38

hei
7
13
23
15
4
28
14
27
15
29

similar syntax (let, def, if, foreach, exits, etc. expressions) which do not have
associativity information.
From versions 18 to 24 the number of production in priorities grows simultaneously while the total number of productions decreases. This shows that, once
more, disambiguation was only enabled by injection removal or by inlining.
Although not shown in the chart, from the 24th version until the 32th disambiguation was continued by adding lookahead restrictions and reject productions.
In this phase lexicals were disambiguated by keyword reservation or by preferring the longer match in lexical recognition. For that reason, the total number
of productions remains practically unchanged.

Grammar comparisons In this section we compare our grammar, in terms


of metrics, to those developed by others in SDF, and in Yacc-style BNF. The
relevant numbers are listed in Table 2, sorted by the value of the Halstead Effort.
The numbers for the grammars of C, Java, C++ , and C# are reproduced
from the same paper from which we adopted the various grammar metrics [15].
These grammars were specified in BNF, or Yacc-like BNF dialects. Note that
for these grammars, the AVS-N and AVS-P metrics are always equal, since the
number of productions and non-terminals is always equal in BNF grammars.
The numbers for the remaining grammars were computed by us. These grammars were all specified in SDF by various authors. Two versions of the VS Cobol
II grammar are listed: alt makes heavy use of nested alternatives, while in the
other one, such nested alternatives have been folded into new non-terminals.
Note that the tree impurity (TIMP) values for the SDF grammars are much
smaller (between 0.2% and 12%) than for the BNF grammars (between 29%
and 86%). This can be attributed to SDFs extended set of regular expressionstyle constructs, which allow more kinds of iterations to be specified without
(mutually) recursive production rules. In general, manually coded iterations are

error prune and harder to understand. This can be revealed by high values of
the TIMP metric.
In terms of Halstead effort, our VDM-SL grammar ranks quite high, only
behind the grammars of the giant Cobol and PL/SQL languages.
Discussion As previously observed, different metrics can play different roles,
namely validation, quality improvement, productivity and comparison. Examples
of validation are MCC and TERM which are expected to remain constant under
semantics preserving operations such as disambiguations and refactorings. If
deviations are noticed, inspection is needed to check no errors were committed.
Examples of quality improvement are VAR, DEP and HEI. Lower values mean a
more compact and easier to understand grammar. The increase of TIMP can be
seen as quality loss, but is explainable against background of strongly decreasing
VAR, so no actual quality is lost. Examples of productivity are disambiguation
metrics such as ASSOC and UPP. Finally, for comparison in order to be able to
quantitatively and qualitatively rank different grammars.
3.4

Test suites

Integration test suite The body of VDM-SL code that strictly adheres to the
ISO standard is rather small. Most industrial applications have been developed
with tools that support some superset or other deviation from the standard, such
as VDM++ . We have constructed an integration test suite by collecting specifications from the internet4 . A preprocessing step was done to extract VDM-SL
specification code from literate specifications. We manually adapted specifications that did not adhere to the ISO standard.
Table 3 lists the suite of integration tests that we obtained in this way. The
table also shows the lines of code (excluding blank lines and comments) that each
test specification contains, as well as the rule coverage (RC) and non-terminal
coverage (NC) metrics for each. The coverage metrics shown were obtained with
the final, refactored grammar.
Note that in spite of the small size of the integration test suite in terms of
lines of code, the test coverage it offers for the grammar is satisfactory. Still, since
test coverage is not 100%, a follow-up project specifically aimed at enlarging the
integration test suite would be justified.
Unit tests During development, unit tests were created incrementally. For every
problem encountered, one or more unit tests were created to isolate the problem.
We measured unit tests development during grammar evolution in terms of
lines of unit test code, and coverage by unit tests in terms of rules (RC) and
non-terminals (NC). This development is shown graphically in Figure 8. As the
chart indicates, all unit tests were developed during the disambiguation phase,
i.e. between development versions 1 and 32. There is a small fluctuation in the
4

A collection of specifications is available from http://www.csr.ncl.ac.uk/vdm/.

Table 3. Integration test suite. The second column gives the number of code lines.
The third and fourth columns gives coverage values for the final grammar.
Origin
Specification of the MAA standard (Graeme Parkin)
Abstract data types (Matthew Suderman and Rick Sutcliffe)
A crosswords assistant (Yves Ledru)
Modelling of Realms in (Peter Gorm Larsen)
Exercises formal methods course Univ. do Minho (Tiago Alves)
Total

LOC
269
1287
144
380
500
2580

RC
19%
37%
28%
26%
35%
50%

NC
30%
53%
43%
38%
48%
70%

70
60
50
40

LOC
RC
NC

30
20
10
0
1

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

Fig. 8. The evolution of unit tests during development. The x-axis represents the 48
development versions. The three release versions among these are 1, 32, and 48. The
left y-axis corresponds to lines of unit test code. Rule coverage (RC) and non-terminal
coverage (NC) are shown as well.

beginning of the disambiguation process that is due to unit-test strategy changes.


Also, between version 23 and 32, when lexical disambiguation was carried out,
unit tests were not added due to limitations of the test utilities.
During the refactoring phase, the previously developed unit tests were used
to prevent introducing errors unwittingly. Small fluctuations of coverage metrics
during this phase are strictly due to variations in the total numbers of production
rules and non-terminals.

Related work

Malloy and Power have applied various software engineering techniques during
the development of a LALR parser for C# [2]. Their techniques include versioning, testing, and the grammar size, complexity, and some of the structure
metrics that we adopted ([15], see Table 2). For Malloy and Power, an important
measure of grammar quality is the number of conflicts. In our setting, ambiguities play a prominent role, rather than conflicts, due to the use of a generalized
LR parser generation technology. We use test coverage as an additional measure
of grammar quality. Also, we develop unit tests and we use them in addition to

integration tests. Finally, we monitored a richer set of metrics during grammar


evolution and we reported how to use them to validate grammar changes.
L
ammel et al. have advocated derivation of grammars from language reference documents through a semi-automatic transformational process [3, 25]. In
particular, they have applied their techniques to recover the VS COBOL II grammar from railroad diagrams in an IBM manual. They use metrics on grammars,
though less extensive than we. Coverage measurement nor tests are reported.
Klint et al. provide a survey over grammar engineering techniques and a
agenda for grammar engineering research [1]. In the area of natural language
processing, the need for an engineering approach and tool support for grammar
development has also been recognized [26, 27].

Concluding remarks

With the detailed grammar engineering case study presented in this paper, we
have contributed to the grammar engineering body of knowledge in several ways.
5.1

Contributions

We showed how grammar testing, grammar metrics, and coverage analysis can
be combined in systematic grammar development process that can be characterised as a tool-based methodology for iterative grammar development. We
have motivated the use of these grammar engineering techniques from the larger
context of grammar-centered language tool development, and we have conveyed
our experiences in using them in a specific grammar development project. We
have demonstrated that this approach can help to make grammar development
a controlled process, rather than a resource-intensive and high-risk adventure.
The approach that we explained and illustrated combines several techniques,
such as version control, unit testing, metrics, and coverage analysis. We have
demonstrated how the development process can be monitored with various grammar metrics, and how tests can be used to guide the process of grammar disambiguation and refactorization. The presented metrics values quantify the size and
complexity of the grammar and reveal the level of continuity during evolution.
We have indicated how this case study extends earlier case studies [2, 3] where
some of these techniques were demonstrated before. Novelties in our approach
include the extensive use of unit tests, combined with monitoring of several
test coverage indicators. These unit tests document individual refactorings and
changes to resolve ambiguities.
As a side-effect of our case study, we have extended the collection of metric
reference data of [15] with values for several grammars of widely used languages,
and we have collected data for additional grammar metrics defined by us in [16].
Although we have used SDF as grammar formalism other formalisms such as
APIs, DSLs or DSMLs might benefit from similar approach. The methodology
can be used to guide the development. The importance of testing and tool support for testing is recognized in software engineering. Metrics can be used used
for validation, quality control, productivity and comparison.

5.2

Future work

L
ammel generalized the notion of rule coverage and advocates the uses of coverage analysis in grammar development [21]. When adopting a transformational
approach to grammar completion and correction, coverage analysis can be used
to improve grammar testing, and test set generation can be used to increase
coverage. SDF tool support for such test set generation and context-dependent
rule coverage analysis has yet to be developed.
We plan to extend the grammar in a modular way to cover other dialects
of the VDM-SL language, such as IFAD VDM and VDM++ . We have already
generated Haskell support for VDM processing for the grammar, and are planning to provide generate Java support as well. The integration test suite deserves
further extension in order to increase coverage.
Availability The final version of the ISO VDM-SL grammar in SDF (development version 48, release version 0.0.3) is available as browseable hyperdocument from http://voodoom.sourceforge.net/iso-vdm.html. All intermediate
versions can be obtained from the CVS repository at the project web site at
http://voodoom.sourceforge.net/, under an open source license. The SdfMetz
tool used for metrics computation is available from http://sdfmetz.googlecode.com.

References
1. Klint, P., L
ammel, R., Verhoef, C.: Towards an engineering discipline for grammarware. Transaction on Software Engineering and Methodology (2005) 331380
2. Malloy, B.A., Power, J.F., Waldron, J.T.: Applying software engineering techniques
to parser design: the development of a c# parser. In: In Proc. of the 2002 Conf. of
the South African Institute of Computer Scientists and Information Technologists,
pages 75 82, 2002. In cooperation with ACM, Press (2002) 7582
3. L
ammel, R., Verhoef, C.: Semi-automatic Grammar Recovery. SoftwarePractice
& Experience 31(15) (December 2001) 13951438
4. Brand, M.v.d., Sellink, A., Verhoef, C.: Current parsing techniques in software
renovation considered harmful. In: IWPC 98: Proceedings of the 6th International
Workshop on Program Comprehension, IEEE Computer Society (1998) 108117
5. Jonge, M.d., Visser, J.: Grammars as contracts. In: Proc. of the 2nd Int. Conference
on Generative and Component-based Software Engineering (GCSE 2000). Volume
2177 of Lecture Notes in Computer Science., Springer (2000) 8599
6. Heering, J., Hendriks, P.R.H., Klint, P., Rekers, J.: The syntax definition formalism
SDF Reference manual. SIGPLAN Notices 24(11) (1989) 4375
7. Visser, E.: Syntax Definition for Language Prototyping. PhD thesis, University of
Amsterdam (1997)
8. Brand, M.v.d., Deursen, A.v., Heering, J., Jonge, H.d., Jonge, M.d., Kuipers, T.,
Klint, P., Moonen, L., Olivier, P., Scheerder, J., Vinju, J., Visser, E., Visser, J.:
The ASF+SDF Meta-Environment: a component-based language development environment. In Wilhelm, R., ed.: Compiler Construction 2001 (CC 2001). Volume
2027 of LNCS., Springer-Verlag (2001)

9. Visser, E., Benaissa, Z.: A Core Language for Rewriting. In Kirchner, C., Kirchner, H., eds.: Proc. of the Int. Workshop on Rewriting Logic and its Applications
(WRLA98). Volume 15 of ENTCS., France, Elsevier Science (1998)
10. L
ammel, R., Visser, J.: A Strafunski Application Letter. In Dahl, V., Wadler, P.,
eds.: Proc. of Practical Aspects of Declarative Programming (PADL03). Volume
2562 of LNCS., Springer-Verlag (January 2003) 357375
11. Kuipers, T., Visser, J.: Object-oriented tree traversal with JJForester. In Brand,
M.v.d., Parigot, D., eds.: Electronic Notes in Theoretical Computer Science. Volume 44., Elsevier Science Publishers (2001) Proceedings of the Workshop on Language Descriptions, Tools and Applications (LDTA).
12. Jonge, M.d.: A pretty-printer for every occasion. In Ferguson, I., Gray, J., Scott, L.,
eds.: Proceedings of the 2nd International Symposium on Constructing Software
Engineering Tools (CoSET2000), University of Wollongong, Australia (2000)
13. Fogel, K.: Open Source Development with CVS. Coriolis Group Books (1999)
14. Kort, J., L
ammel, R., Verhoef, C.: The grammar deployment kit. In van den
Brand, M., L
ammel, R., eds.: ENTCS. Volume 65., Elsevier (2002)
15. Power, J., Malloy, B.: A metrics suite for grammar-based software. In: Journal of
Software Maintenance and Evolution. Volume 16., Wiley (November 2004) 405426
16. Alves, T., Visser, J.: Metrication of SDF grammars. Technical Report DI-PURe05.05.01, Universidade do Minho (May 2005)
17. Halstead, M.: Elements of Software Science. Volume 7 of Operating, and Programming Systems Series. Elsevier, New York, NY (1977)
18. Fenton, N., Pfleeger, S.L.: Software metrics: a rigorous and practical approach.
PWS Publishing Co., Boston, MA, USA (1997) 2nd edition, revised printing.
19. Bravenboer, M.:
Parse Unit home page http://www.programtransformation.org/Tools/ParseUnit.
20. Purdom, P.: Erratum: A Sentence Generator for Testing Parsers [BIT 12(3),
1972, p. 372]. BIT 12(4) (1972) 595595
21. L
ammel, R.: Grammar Testing. In: Proc. of Fundamental Approaches to Software
Engineering (FASE) 2001. Volume 2029 of LNCS., Springer-Verlag (2001) 201216
22. Alves, T., Silva, P., Visser, J., Oliveira, J.: Strategic term rewriting and its application to a VDM-SL to SQL conversion. In: Proceedings of the Formal Methods
Symposium (FM05), Springer (2005) 399414
23. International Organisation for Standardization:
Information technology
Programming languages, their environments and system software interfaces
Vienna Development MethodSpecification LanguagePart 1: Base language.
(December 1996) ISO/IEC 13817-1.
24. Brand, M.v.d., Scheerder, J., Vinju, J., Visser, E.: Disambiguation filters for
scannerless generalized LR parsers. In Horspool, N., ed.: Compiler Construction
(CC02). Volume 2304 of LNCS., France, Springer-Verlag (April 2002) 143158
25. L
ammel, R.: The Amsterdam toolkit for language archaeology (Extended Abstract). In: Proceedings of the 2nd International Workshop on Meta-Models,
Schemas and Grammars for Reverse Engineering (ATEM 2004). (October 2004)
26. Erbach, G.: Tools for grammar engineering (March 15 2000)
27. Volk, M.: The role of testing in grammar engineering. In: Proc. of the 3rd Conf.
on Applied Natural Language Processing, Assoc. for Computational Linguistics
(1992) 257258

You might also like