Syntax and Translation
Syntax and Translation
define syntax:
In programming terms, syntax describes the sequence of symbols that make up valid
programs.
o x=2.45 + 3.67 can mean different things based upon the type of x
define semantics:
Readability
o structured statements
o liberal use of keywords and noise words
o embedded comments
o free-field formats
Bad syntax design can make it very hard for good programmers to write readable
code
o APL
Languages that provide only a few syntactic constructs lead to less readable
programs
o APL
o SNOBOL4
Some syntax errors may alter the meaning of a statement without being
syntactically incorrect.
Writability
Syntax features that make programs easy to write usually make them hard to read
o Writability enhanced by concise and few syntax structures
A syntax is redundant if it can say the same thing in more than one way
Ease of verifiability
Program correctness
is hard.
Ease of translation
Lack of ambiguity
Character set
Character set choice one of the first choices to be made in syntax design
Identifiers
widely accepted syntax:string of letters and digits that start with a letter
o length restrictions
Operator symbols
Most languages use + and - to represent the two basic arithmetic operations
define keyword:
o IF
o WHILE
FORTRAN is difficult due to "DO" or "IF" may not necessarily indicate iteration or
conditional statement
Addition of a new reserved word to a program can break previous programs that
Noise words
Noise words are optional words that are inserted into statements to improve
readability.
o GO required, TO optional
Comments
Blanks (spaces)
define delimiter:
A syntax element used to mark the beginning or end of some syntax unit such as a
statement or expression.
o parenthesis
o begin...end pairs
define free-field:
define fixed-field:
Expressions
define expression:
some value
The basic syntactic building block from which statements are built
In imperative languages, expressions form the operations that allow for the
Statements
o regularity
o readability
o writability
statement types
compiled units
o Inheritance requires the compiler to process some of the subprogram
issues
Important for building modular programs during period of ALGOL, FORTRAN, and
Pascal.
Pascal allows compiler to have access to all these definitions to aid in finding
errors
change
define implementation:
a program implementations consists of several subprograms that are intended to
interact together. The components, called modules, are linked together to create an
is needed.
"implementation component"
"divisions"
operating environment.
statements
implementation
o Perl
o Prolog
o Lisp
Translation can be divided into two major parts: analysis of the source and synthesis
by-statement basis.
Translators can be grouped by the number of passes they make over the source:
information
o Second pass rewrites the source program into a more efficient form via
factor.
To a translator, the source appears at first as one long sequence of symbols. The
subprograms, statements, etc, visible to the programmer are not apparent to the
translator yet.
The initial phase of any translation is to group the sequence of characters into its
parts:
o identifiers
o delimiters
o operator symbols
o numbers
o keywords
o etc
This phase is called lexical analysis, and the program units produced are
The lexical analyzer (scanner) reads lines of the input source, breaks down the lines
into lexemes, and passes the lexemes to the later stages of the translator.
o number
o identifier
o delimiter
o etc
o statements
o declarations
o expressions
o etc
o statement
o expression
o declaration
Semantic analysis
o symbol-table maintenance
o expansion of macros
The semantic analyzer is usually split into a set of smaller semantic analyzers.
construct.
o These small analyzers interact with each other, usually through the
The functions of the semantic analyzers depend on the programming language and
Symbol-table maintenance
One of the central data structures in every translator. It contains an entry for each
o referencing environment
Semantic analyzers enter information into the table as they process:
o declarations
o subprogram headers
o program statements
Other parts of the translator use this information to make efficient code
The symbol table for compiled languages is usually discarded at the end of
translation.
It may be retained
Information implicit within the source must be made explicit in the lower level
object program.
o Type guessing
first letter
Error detection
o The lexical analyzer may send the syntactic analyzer a token that does
The semantic analyzer must recognize errors, produce an error message, and
o For languages that do, they are generally handled in semantic analysis
A macro is a piece of program text that has been separately defined that
will be inserted into the program during translation when the macro call is
encountered
o Like a subprogram, except the macro body is substituted for each call
during translation
3.14
Semantic analyzers must identify the macro call and perform the
substitution
can process it directly, inserting the object code and making symbol
table entries.
The final translation stages deal with constructing the executable program from the
are used, linking and loading will be needed to produce the executable.
Optimization
Code generators generate object code from the semantic analyzer's produced
intermediate code.
The semantic analyzer usually does not worry about surrounding code
Temp1 = B+C
Temp2 = Temp1 + D
A = Temp2
Let the semantic analyzers produce poor code, and clean it up during optimization.
Code generation
After intermediate code has been optimized, it must be formed into one of
o assembly language
o machine code
Code generation involves formatting the output from the information contained in
Output code from separate subprogram translations are placed into final
executable
The incomplete locations are specified in loader tables created by the translator
The linking loader uses the loader tables to link separate translation code
together.
Bootstrapping
o A compiler written in the language itself can now be made using this
Diagnostic compilers
define grammar:
o the rules specify the sequence of characters (lexical items) that form
o regular grammar
SSection6.1: BNF Grammars
Example sentences:
There are many other types of sentences. An interrogative (question) sentence may
This notation is called BNF, Backus-Naur form. For our purposes, you can consider
Syntax
programs
For example, The home / ran / girl fits the syntax of a simple declarative
symbols
Examples of languages
o The set of sequences of a's and b's where all a's precede the b's
ab
aab
abb
Using English to convey language definitions can make things confusing and
unclear.
We solve the problem by using a formal set of rules for determining exactly what
In the simple case, a grammar may list the elements of a finite language
For example,
The symbols making up the strings of the language are called terminal
symbols
Once we've defined a basic set of nonterminals, we can use them to construct more
complex strings. Consider the following rule that defines the language of conditional
statements
Syntactic categories as they are defined may refer to themselves in the rule. Such a
o The second option allows another digit to be added on to the initial digit,
and so on.
called <program>
in the language. For example, the following grammar generates all sequences of
balanced parenthesis:
S → SS | (S) | ()
We can show this in a more concise manner, using the symbol ⇒ to indicate that
BNF grammar
o If there is no way of parsing the string with the rules, it is not in the
language.
The BNF grammar rules cause the parse tree to in fact be a tree.
Ambiguity
"They are flying planes". This sentence can be represented in a couple of ways:
These statements can mean different things although they come from the same
string.
o NOT of a language
G: S → SS | 0 | 1
ambiguous.
The language of all binary strings is not inherently ambiguous, though. The following
G2: T → OT | 1T | 0 | 1
BNF grammars are not ideal for explaining the rules of a language syntax for
programmers.
optional elements
alternative elements
repeated elements
sign
These extensions do not alter the power of BNF in anyway, they just make the
o Identifiers
o Digits
o An if statement
Denote the starting state with an arrow that does not come from another state
null A no
100 B yes
10010 A no
100101 B yes
Characteristics of an FSA
Any string that takes the machine from the initial to the final state is "accepted" by
the machine.
If we assume that
The variable nextchar always contains the first character of the respective
nonterminal
Then we can rewrite the BNF grammar as the following recursive procedure:
procedure Expression;
begin
Term; /* Call procedure Term to find first term */
while ((nextchar='+') or(nextchar='-')) do
begin
nextchar := getchar; /* Skip operator */
Term;
end
end