PPL Unit-Ii
PPL Unit-Ii
A machine consists of
– Maximum length?
Length of Names
• Language examples:
– FORTRAN I: maximum 6
– COBOL: maximum 30
Real is a keyword)
– Name
– Address
– Value
– Type
– Lifetime
- Scope
Variable Attributes
• Type - allowed range of values of variables and the set of defined operations
• Value - the contents of the location with which the variable is associated (r-value)
• Load time -- bind a FORTRAN 77 variable to a memory cell (or a C static variable)
• A binding is static if it first occurs before run time and remains unchanged throughout
program execution.
• A binding is dynamic if it first occurs during execution or can change during execution
of the program
Type Binding
\endash If static, the type may be specified by either an explicit or an implicit declaration
• Type checking is the activity of ensuring that the operands of an operator are of
compatible types
• A compatible type is one that is either legal for the operator, or is allowed under
language rules to be implicitly converted, by compiler- generated code, to a legal
type
• A type error is the application of an operator to an operand of an inappropriate type
• If all type bindings are static, nearly all type checking can be static (done at compile
time)
• If type bindings are dynamic, type checking must be dynamic (done at run time)
– Advantage: allows the detection of misuse of variables that result in type errors
– C and C++
• Java's strong typing is still far less effective than that of Ada
list = 17.3;
• The lifetime of a variable is the time during which it is bound to a particular memory
cell
– Allocation - getting a cell from some pool of available cells
Variable Scope
• Scope and lifetime are sometimes closely related, but are different concepts
Static Scope
• To connect a name reference to a variable, you (or the compiler) must find the
declaration
• Enclosing static scopes (to a specific scope) are called its static ancestors; the
nearest static ancestor is called a static parent
• Variables can be hidden from a unit by having a "closer" variable with the same name
• C++, Java and Ada allow access to some of these "hidden" variables
– In Ada: unit.name
• Suppose the spec is changed so that D must now access some data in B
• Solutions:
– Put D in B (but then C can no longer call it and D cannot access A's variables)
– Move the data from B that D needs to MAIN (but then all procedures can
access them)
– Current thinking is that we can accomplish the same thing better with
modules (classes)
– Most current languages use static scoping at the block level even if they don't
allow nested functions
Storage Bindings & Lifetime:
The lifetime of a variable is the time during which it is bound to a particular memory cell
– Depending on the language, allocation can be either controlled by the programmer or done
automatically
Variable Scope:
– The nonlocal variables of a program unit are those that are visible but not declared there
– The scope rules of a language determine how references to names are associated with
variables
– Scope and lifetime are sometimes closely related, but are different concepts
Static Scope:
– To connect a name reference to a variable, you (or the compiler) must find the declaration
– Search process: search declarations, first locally, then in increasingly larger enclosing
scopes, until one is found for the given name Enclosing static scopes (to a specific
scope) are called its static ancestors; the nearest static ancestor is called a static parent
– Variables can be hidden from a unit by having a "closer" variable with the same name
– C++, Java and Ada allow access to some of these "hidden" variables
– In Ada: unit.name
Assume MAIN calls A and B A calls C and D B calls A and E
– Suppose the spec is changed so that D must now access some data in B
– Solutions:
– Put D in B (but then C can no longer call it and D cannot access A's variables)
– Move the data from B that D needs to MAIN (but then all procedures can access
them)
Dynamic Scope:
Based on calling sequences of program units, not their textual layout (temporal
versus spatial)
– Reference to x is to MAIN's x
Dynamic scoping
– Reference to x is to SUB1's x
Evaluation of Dynamic Scoping:
– Advantage: convenience
Referencing Environments:
– The referencing environment of a statement is the collection of all names that are visible
in the statement
– In a static-scoped language, it is the local variables plus all of the visible variables in all
of the enclosing scopes
–
DATA TYPES
Data type Definition:
Data Types:
– Primitive
– Structured
Integer Types:
Usually based on hardware
May have several ranges
– Python has an integer type and a long integer which can get as big as it needs to.
– Representing Integers:
– Ones complement
– Sign bit
– Ones complement
Twos Complement:
– For scientific use support at least two floating-point types (e.g., float and double;
sometimes more)
– The float type is the standard size, usually being stored in four bytes of memory.
– The double type is provided for situations where larger fractional parts and/or a larger
range of exponents is needed
– Floating-point values are represented as fractions and exponents, a form that is borrowed
from scientific notation
– The collection of values that can be represented by a floating-point type is defined in
terms of precision and range
– Precision is the accuracy of the fractional part of a value, measured as the number of bits
– Range is a combination of the range of fractions and, more important, the range of
exponents.
– We can convert the decimal number to base 2 just as we did for integers
– fixed number of bits for the whole and fractional parts severely limits the range
of values we can represent
– Use a fixed number of bits for the exponent which is offset to allow for negative
exponents
– float
– double
– Some scripting languages only have one kind of number which is a floating point type
– Essential to COBOL
– .NET languages have a decimal data type
– Advantage: accuracy
C# decimal Type:
– 128-bit representation
-28 28
– Range: 1.0x10 to 7.9x10
– no roundoff error
– Boolean
– Range of values: two elements, one for ―true‖ and one for ―false‖
– Could be implemented as bits, but often as bytes
– A Boolean value could be represented by a single bit, but because a single bit of memory cannot
be accessed efficiently on many machines, they are often stored in the smallest efficiently
addressable cell of memory, typically a byte.
– Character
– ational (Scheme)
Character Strings :
– Character string constants are used to label output, and the input and output of all kinds
of data are often done in terms of strings.
– Operations:
– Catenation
– Substring reference
– Pattern matching
– Design issues:
– C and C++
– Not primitive
– Primitive
– Java
– String class
String Implementation
– integer
– char
– boolean
– enumeration types
– ubrange types
Enumeration Types:
– All possible values, which are named constants, are provided in the definition
C example:
– Design issues
– duplication of names
– coercion rules
Ex:
– toString and valueOf are overridden to make input and output easier
System.out.println (season);
prints WINTER
Example:
– Ada’s design:
Type Days is (Mon, Tue, wed, Thu, Fri, sat, sun); Subtype Weekdays is Days
range Mon..Fri; Subtype Index is Integer range 1..100;
Evaluation
Subrange types enhance readability by making it clear to readers that variables of subtypes
can store only certain ranges of values. Reliability is increased with subrange types, because assigning
a value to a subrange variable that is outside the specified range is detected as an error, either by the
compiler (in the case of the assigned value being a literal value) or by the run-time system (in the
case of a variable or expression). It is odd that no contemporary language except Ada has subrange
types.
– If all of the subscripts in a reference are constants, the selector is static; otherwise, it is
dynamic. The selection operation can be thought of as a mapping from the array name and
the set of subscript values to an element in the aggregate. Indeed, arrays are sometimes
called finite mappings. Symbolically, this mapping can be shown as
array_name(subscript_value_list) → element
– The binding of the subscript type to an array variable is usually static, but the subscript value
ranges are sometimes dynamically bound.
– There are five categories of arrays, based on the binding to subscript ranges, the binding to
storage, and from where the storage is allocated.
– A static array is one in which the subscript ranges are statically bound and storage allocation
is static (done before run time).
The disadvantage is that the storage for the array is fixed for the entire execution
time of the program.
– A fixed stack-dynamic array is one in which the subscript ranges are statically bound, but
the allocation is done at declaration elaboration time during execution.
– A stack-dynamic array is one in which both the subscript ranges and the storage allocation
are dynamically bound at elaboration time. Once the subscript ranges are bound and the
storage is allocated, however, they remain fixed during the lifetime of the variable.
The advantage of stack-dynamic arrays over static and fixed stack-dynamic
arrays is flexibility
– A fixed heap-dynamic array is similar to a fixed stack-dynamic array, in that the subscript
ranges and the storage binding are both fixed after storage is allocated
The advantage of fixed heap-dynamic arrays is flexibility—the array‘s
size always fits the problem.
The disadvantage is allocation time from the heap, which is longer than
allocation time from the stack.
A heap-dynamic array is one in which the binding of subscript ranges and storage
allocation is dynamic and can change any number of times during the array‘s lifetime.
– The disadvantage is that allocation and deallocation take longer and may
happen many times during execution of the program.
Array Initialization
Some languages provide the means to initialize arrays at the time their storage is
allocated. An array aggregate for a single-dimensioned array is a list of literals
delimited by parentheses and slashes. For example, we could have
In the C declaration
Arrays of strings in C and C++ can also be initialized with string literals. In this case,
the array is one of pointers to characters.
For example,
Ada provides two mechanisms for initializing arrays in the declaration statement: by listing
them in the order in which they are to be stored, or by directly assigning them to an index
position using the => operator, which in Ada is called an arrow.
A jagged array is one in which the lengths of the rows need not be the same.
For example, a jagged matrix may consist of three rows, one with 5 elements, one with 7
elements, and one with 12 elements. This also applies to the columns and higher dimensions. So,
if there is a third dimension (layers), each layer can have a different number of elements. Jagged
arrays are made possible when multi dimensioned arrays are actually arrays of arrays. For
example, a matrix would appear as an array of single-dimensioned arrays.
For example,
myArray[3][7]
Slices:
A slice of an array is some substructure of that array.
For example, if A is a matrix, then the first row of A is one possible slice, as are the last
row and the first column. It is important to realize that a slice is not a new data type. Rather ,it is
a mechanism for referencing part of an array as a unit.
Evaluation
If the element type is statically bound and the array is statically bound to storage, then
the value of the constant part can be computed before run time. However, the addition and
multiplication operations must be done at run time.
The generalization of this access function for an arbitrary lower bound is address (list[k])
= address ( list [lower_bound]) + ( (k - lower_bound) * element_size)
Associative Arrays
In Perl, associative arrays are called hashes, because in the implementation their elements
are stored and retrieved with hash functions. The namespace for Perl hashes is distinct: Every
hash variable name must begin with a percent sign (%). Each hash element consists of two parts:
a key, which is a string, and a value, which is a scalar (number, string, or reference). Hashes can
be set to literal values with the assignment statement, as in
%salaries = ("Gary" => 75000, "Perry" => 57000, "Mary" => 55750, "Cedric" =>
47850); Recall that scalar variable names begin with dollar signs ($). For example,
A new element is added using the same assignment statement form. An element can be
removed from the hash with the delete operator, as in
Delete $salaries{"Gary"};
Record Types
A record is an aggregate of data elements in which the individual elements are identified
by names and accessed through offsets from the beginning of the structure. There is frequently a
need in programs to model a collection of data in which the individual elements are not of the
same type or size. For example, information about a college student might include name, student
number, grade point average, and so forth. A data type for such a collection might use a character
string for the name, an integer for the student number, a floating point for the grade point average,
and so forth. Records are designed for this kind of need.
Definitions of Records
The fundamental difference between a record and an array is that record elements ,or fields,
are not referenced by indices. Instead, the fields are named with identifiers, and references to the
fields are made using these identifiers The COBOL form of a record declaration, which is part of the
data division of a COBOL program, is illustrated in the following example:
– EMPLOYEE-RECORD.
– EMPLOYEE-NAME.
The fields of records are stored in adjacent memory locations. But because the sizes of
the fields are not necessarily the same, the access method used for arrays is not used for records.
Instead, the offset address, relative to the beginning of the record, is associated with each field.
Field accesses are all handled using these offsets. The compile-time descriptor for a record has
the general form shown in Figure 6.7. Run-time descriptors for records are unnecessary
Record
Name
Field 1
Type
Offset
…...
Name
Type
field n
Offset
Address
Union Types
A union is a type whose variables may store different type values at different times during
program execution. As an example of the need for a union type, consider a table of constants for a
compiler, which is used to store the constants found in a program being compiled. One field of each
table entry is for the value of the constant. Suppose that for a particular language being compiled, the
types of constants were integer, floating point, and Boolean. In terms of table management, it would
be convenient if the same location, a table field, could store a value of any of these three types. Then
all constant values could be addressed in the same way. The type of such a location is, in a sense, the
union of the three value types it can store.
Design Issues
The problem of type checking union types, leads to one major design issue. The other
fundamental question is how to syntactically represent a union. In some designs, unions are
confined to be parts of record structures, but in others they are not. So, the primary design issues
that are particular to union types are the following:
Should type checking be required? Note that any such type checking must be dynamic.
Should unions be embedded in records?
The Ada design for discriminated unions, which is based on that of its predecessor
language, Pascal, allows the user to specify variables of a variant record type that will store only
one of the possible type values in the variant. In this way, the user can tell the system when the
type checking can be static. Such a restricted variable is called a constrained variant variable.
Unions are implemented by simply using the same address for every possible variant.
Sufficient storage for the largest variant is allocated. The tag of a discriminated union is stored
with the variant in a record like structure. At compile time, the complete description of each
variant must be stored. This can be done by associating a case table with the tag entry in the
descriptor. The case table has an entry for each variant, which points to a descriptor for that
particular variant. To illustrate this arrangement, consider the following Ada example:
case Tag is
when True => Count : Integer;
The descriptor for this type could have the form shown in Figure
Offset
Address
Sum
Float
– range of values that consists of memory addresses plus a special value, nil
– A pointer can be used to access a location in the area where storage is dynamically
created (usually called a heap)
Pointer Operations:
– Dereferencing yields the value stored at the location represented by the pointer‘s value
j = *ptr
– void * can point to any type and can be type checked (cannot be de-referenced)
Reference Types:
– C++ includes a special kind of pointer type called a reference type that is used primarily
for formal parameters
– Java extends C++‘s reference variables and allows them to replace pointers entirely
Evaluation of Pointers:
– Pointers or references are necessary for dynamic data structures--so we can't design a
language without them
– An operator can be unary, meaning it has a single operand, binary, meaning it has two
operands, or ternary, meaning it has three operands.
– In most programming languages, binary operators are infix, which means they appear
between their operands.
– One exception is Perl, which has some operators that are prefix, which means they
precede their operands.
– An implementation of such a computation must cause two actions: fetching the operands,
usually from memory, and executing arithmetic operations on those operands.
– operands
– parentheses
– function calls
– operator overloading
– unary -, !
– +,-,*,/,%
A ternary operator has three operands - ?:
Conditional Expressions:
– An example:
– parentheses
– unary operators
– Associativity rules define the order in which adjacent operators with the same
precedence level are evaluated
– Constants: sometimes a fetch from memory; sometimes the constant is in the machine
language instruction
a = 10;
Write the language definition to disallow functional side effects
No two-way parameters in functions
No non-local references in functions
Advantage: it works!
Disadvantage: inflexibility of two-way parameters and non-
local references
Write the language definition to demand that operand evaluation order be fixed
Disadvantage: limits some compiler optimizations
Overloaded Operators:
– Use of an operator for more than one purpose is called operator overloading
Type Conversions
A narrowing conversion converts an object to a type that does not include all of the
values of the original type
Coercion:
– Disadvantage of coercions:
Casting:
– Examples
– C: (int) angle
Errors in Expressions:
Causes
– Operator symbols used vary somewhat among languages (!=, /=, .NE., <>, #)
Boolean Operators:
– Example operators
No Boolean Type in C:
– it uses int type with 0 for false and nonzero for true
Consequence
– a < b < c is a legal expression
postfix ++, --
*,/,% binary +, -
=, !=
&&
||
Mixed-Mode Assignment:
Assignment Statements:
The assignment operator
Compound Assignment:
– A shorthand method of specifying a commonly needed form of assignment
–Example a = a + b is written as a += b
– Examples
sum = ++count (count incremented, added to sum) sum = count++ (count added to sum,
incremented) Count++ (count incremented) -count++ (count incremented then negated - right-
associative)
Assignment as an Expression:
– In C, C++, and Java, the assignment statement produces a result and can be used as
operands
– An example:
ch = get char() is carried out; the result (assigned to ch) is used in the condition for the while
statement
One important result: It was proven that all algorithms represented by flowcharts can
be coded with only two-way selection and pretest logical loops
Control Structure
– A control structure is a control statement and the statements whose execution it controls
– Design question
Selection Statements
– A selection statement provides the means of choosing between two or more paths of
execution
– Multiple-way selectors
General form:
Design Issues:
– If the then reserved word or some other syntactic marker is not used to introduce the then
clause, the control expression is placed in parentheses
– In C89, C99, Python, and C++, the control expression can be arithmetic
– In languages such as Ada, Java, Ruby, and C#, the control expression must be Boolean
Clause Form
– In many contemporary languages, the then and else clauses can be single statements or
compound statements
Nesting Selectors
– Java example
– Java's static semantics rule: else matches with the nearest if Nesting Selectors (continued)
if (count == 0) result = 0;
}
else result = 1;
end
Python
if sum == 0 :
if count == 0 : result = 0
else : result = 1
Design Issues:
–Any number of segments can be executed in one execution of the construct (there is no
implicit branch at the end of selectable segments)
– default clause is for unrepresented values (if there is no default, the whole statement does
nothing)
– Ada
case expression is
when choice list => stmt_sequence; when others => stmt_sequence;] end case;
– More reliable than C‗s switch (once a stmt_sequence execution is completed, control is
passed to the first statement after the case statement
– A list of constants
– Can include:
Subranges
– Multiple Selectors can appear as direct extensions to two-way selectors, using else-if
clauses, for example in Python: if count < 10 : bag1 = True
Iterative Statements
Counter-Controlled Loops
A counting iterative statement has a loop variable, and a means of specifying the initial
and terminal, and stepsize values
– Design Issues:
What are the type and scope of the loop variable?
What is the value of the loop variable at loop termination?
Should it be legal for the loop variable or loop parameters to be changed in the loop
body, and if so, does the change affect loop control?
Should the loop parameters be evaluated only once, or once for every iteration?
FORTRAN 95 syntax
Parameters can be expressions
– Design choices:
Loop variable must be INTEGER
Loop variable always has its last value
The loop variable cannot be changed in the loop, but the parameters can; because
they are evaluated only once, it does not affect loop control
Loop parameters are evaluated only once
End Do [name]
Ada
for var in [reverse] discrete_range loop ... end loop
Design choices:
– Type of the loop variable is that of the discrete range (A discrete range is a sub-range of
an integer or enumeration type).
– The loop variable cannot be changed in the loop, but the discrete range can; it does not affect
loop control
– C-based languages
– The expressions can be whole statements, or even statement sequences, with the
statements separated by commas
– The value of a multiple-statement expression is the value of the last statement in the
expression
Design choices:
– The first expression is evaluated once, but the other two are evaluated with each iteration
– The initial expression can include variable definitions (scope is from the definition to the
end of the loop body)
– Java and C#
Design issues:
– Pretest or posttest?
– Should the logically controlled loop be a special case of the counting loop
statement or a separate statement?
C and C++ have both pretest and posttest forms, in which the control
expression can be arithmetic:
Java is like C and C++, except the control expression must be Boolean (and the body
can only be entered at the beginning -- Java has no goto
Java and Perl have unconditional labeled exits (break in Java, last in Perl)
C, C++, and Python have an unlabeled control statement, continue, that skips
the remainder of the current iteration, but does not exit the loop
Java and Perl have labeled versions of continue Iterative Statements: Iteration Based
on Data Structures
C's for can be used to build a user-defined iterator: for (p=root; p==NULL; traverse(p)){}
C#‗s foreach statement iterates on the elements of arrays and other collections:
– Perl has a built-in iterator for arrays and hashes, foreach Unconditional Branching
Guarded Commands
Designed by Dijkstra
Purpose: to support a new programming methodology that supported
verification (correctness) during development
Basis for two linguistic mechanisms for concurrent programming (in CSP and Ada)
Basic Idea: if the order of evaluation is not important, the program should not specify one
Selection Guarded Command
•Form
...
–If none are true, it is a runtime error Selection Guarded Command: Illustrated
Form
...
If more than one are true, choose one non-deterministically; then start loop again