Unit II
Unit II
Data structuring, in essence, has to do with a system where seemingly random, unstructured data can be
taken as input and a number of operations executed on it linearly or non-linearly.
These operations are meant to analyze the nature of the data and its importance in the larger scheme of
things.
The system then divides the data into broad categories of information as found by the results of the
analysis, and either stores them or sends them on for extra analysis.
This extra analysis can be used to break down the data into further sub-categories or nested category
trees.
During the analysis, some of the data might also be found to be useless and eventually discarded.
The result of this process is structured, meaningful data that can be further analyzed or used directly to
gain business insight.
A basic type is a data type provided by a programming language as a basic building block. Most languages
allow more complicated composite types to be recursively constructed starting from basic types.
A built-in type is a data type for which the programming language provides built-in support.
In most programming languages, all basic data types are built-in. In addition, many languages also provide a set of
composite data types. Opinions vary as to whether a built-in type that is not basic should be considered
"primitive".
Depending on the language and its implementation, primitive data types may or may not have a one-to-one
correspondence with objects in the computer's memory. However, one usually expects operations on basic
primitive data types to be the fastest language constructs there are.
Integer addition, for example, can be performed as a single machine instruction, and some processors offer specific
instructions to process sequences of characters with a single instruction.
In particular, the C standard mentions that "A 'plain' int object has the natural size suggested by the architecture
of the execution environment". This means that int is likely to be 32 bits long on a 32-bit architecture. Basic
primitive types are almost always value types.
Most languages do not allow the behavior or capabilities of primitive (either built-in or basic) data types to be
modified by programs. Exceptions include Smalltalk, which permits all data types to be extended within a program,
adding to the operations that can be performed on them or even redefining the built-in operations.
Classic basic primitive types may include:
Character (character, char);
Integer (integer, int, short, long, byte) with a variety of precisions;
Floating-point number (float, double, real, double precision);
Fixed-point number (fixed) with a variety of precisions and a programmer-selected scale.
Boolean, logical values true and false.
In more detail, the following are advantages of built-in types:
1. Hiding of the underlying representation : This is an advantage provided by the abstractions of higher-level
languages over lower-level (machine-level) languages. The programmer does not have access to the underlying bit
string that represents a value of a certain type.
2. Correct use of variables can be checked at translation time : If the type of each variable is known to the
compiler, illegal operations on a variable may be caught while the program is translated. Although type checking
does not prevent all possible errors to be caught, it improves our reliance on programs.
3. Resolution of overloaded operators can be done at translation time : For readability purposes, operators are
often overloaded. For example, + is used for both integer and real addition, * is used for both integer and real
multiplication. In each program context, however, it should be clear which specific hardware operation is to be
invoked, since integer and real arithmetic differ.
4. Accuracy control : The programmer can explicitly associate a specification of the accuracy of the representation
with a type. For example, FORTRAN allows the user to choose between single and double-precision floating-point
numbers. Accuracy specification allows the programmer to direct the compiler to allocate the exact amount of
storage that is needed to represent the data with the desired precision.
In most cases, built-in types coincide with primitive types, but there are exceptions. For example, in Ada both
Character and String are predefined. Data of type String have constituents of type Character, however.
In fact, String is predefined as:
Data aggregates is any process in which information is gathered and expressed in a summary form, for
purposes such as statistical analysis.
A common aggregation purpose is to get more information about particular groups based on specific
variables such as age, profession, or income. The information about such groups can then be used for
Web site personalization to choose content and advertising likely to appeal to an individual belonging to
one or more groups for which data has been collected.
For example, a site that sells music CDs might advertise certain CDs based on the age of the user and the
data aggregate for their age group.
Online analytic processing (OLAP) is a simple type of data aggregation in which the marketer uses an
online reporting mechanism to process the information.
Programming languages allow the programmer to specify aggregations of elementary data objects and, recursively,
aggregations of aggregates. They do so by providing a number of constructors. The resulting objects are called
compound objects.
A well-known example is the array constructor, which constructs aggregates of homogeneous-type elements. An
aggregate object has a unique name. In some cases, manipulation can be done on a single elementary component
at a time, each component being accessible by a suitable selection operation. In many languages, it is also possible
to manipulate (e.g., assign and compare) entire aggregates.
Type constructors
It is a feature of a typed formal language that builds new types from old ones. Basic types are considered to be
built using nullary type constructors. Some type constructors take another type as an argument, e.g., the
constructors for product types, function types, power types and list types. New types can be defined by recursively
composing type constructors.
Abstractly, a type constructor is an n-ary type operator taking as argument zero or more types, and returning
another type. Making use of currying, n-ary type operators can be (re)written as a sequence of applications of
unary type operators. Therefore, we can view the type operators as a simply typed lambda calculus, which has only
one basic type, usually denoted , and pronounced "type", which is the type of all types in the underlying language,
which are now called proper types in order to distinguish them from the types of the type operators in their own
calculus, which are called kinds.
A Cartesian product is a composite type that joins together values of two or more other types. For example, the
Cartesian product of Type1 and Type 2 is written
Type1 x Type2
A value of this Cartesian product type is one value of Type1 "glued together" with one value of Type2. Values of a
Cartesian product type are often called "tuples".
E.g., in C/C++, struct can be used to define a Cartesian product type. For example:
struct Point {
double x;
double y;
};
double x double
since each value whose type is Point is made up of two double values joined together. Of course, the types that
make up a cartesian product type don't need to be the same. E.g.:
enum Month {
JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG, SEP, OCT, NOV, DEC,
};
struct Date {
Month month;
int day;
int year;
};
The key to answering this question is to realize that for each of the three component values of a Date value---the
month, day, and year---we can freely choose any value that is a member of that component's type. So, when
choosing the month, we have 12 possible values to choose from.
For choosing day or year, we can choose any int value: on most systems, this gives us 232 values to choose from.
Because each choice of component value is independent of the other choices, we find the cardinality of the
cartesian product type by multiplying the cardinalities of each component type. So,
Obviously, this is a very large number, so there are a large (but finite) number of possible Date values.
A Cartesian product type that has no components is called the unit type. It is so-called because the type has only
one member. Many languages use this type as a parameter type or return value type to convey "no information";
for example, the void type in C/C++/Java.
A mapping from one set to another takes one member of the first set and maps it to one member of the second
set.
We write
m: S → T
To denote that m is a mapping from set S to set T. For each member of S, the mapping m will yield a
corresponding member of T.
We can consider m to be a value whose type is S → T. What exactly does S → T mean as a type? A type is a set of
values. So, the type S → T is the set of all possible ways of mapping the members of set S to members of set T.
Formally,
S → T= {m: x ∈ S ⇒ m(x) ∈ T}
Read this as "the type S → T is the set of all values m such that if x is a member of S, then m(x) is a member of T".
Note that a mapping is really the same thing as a mathematical function: it maps members of one set, the domain,
to the members of another set, the range.
Array types in programming languages are mappings; they map a set of index values to a set of element values.
Depending on the size (number of elements) in the array, the set S will generally be a subset of some discrete
type. For example, in a C/C++ program:
int a[6];
int b[15];
In the case of the array a, its index set is the set of int values 0..5. For the array b, its index set is the set of int
values 0..14.
Question: do a and b have the same type? In a technical sense, no, because the set of index values is different for
each array.
We can think of multidimensional arrays as having index values that are tuples (cartesian product values). For
example, the type of a two-dimensional array of Float elements might be
where D1 and D2 are the types of the indexes for the first and second dimension of the array type. (Probably they
are defined as a subset of Integer values.)
bool isEven(int n) {
return (n % 2) == 0;
}
What is the type of this function? It is effectively a mapping from int values to bool, so its type is
int → bool
subset of a given type varies according to the language. Basically, there are three possible
choices:
1. Compile-time binding : If it is not known at compile time where the process will reside in memory, then the
compiler must generate relocatable code (Static). Execution time. If the process can be moved during its execution
from one memory segment to another, then binding must be delayed until run time.
2. Object-creation time binding: The subset is fixed at run time, when an instance of the variable is created. When
a variable of the unconstrained array type is declared, the bounds must be stated as expressions to be computed
at run time. Once the binding is established at run time, however, it cannot be changed (i.e., the binding is static).
3. Object-manipulation time binding : This is the most flexible and the most costly choice in terms of run-time
execution. For these so-called flexible arrays, the size of the subset can vary at any time during the object’s
lifetime. This is typical of dynamic languages
union address {
short int offset;
long unsigned int absolute;
};
The declaration is very similar to the case of a Cartesian product. The difference is that here fields are mutually
exclusive. Values of type address must be treated differently if they denote offsets or absolute addresses. Given a
variable of type address, however, there is no automatic way of knowing what kind of value is currently associated
with the variable (i.e., whether it is an absolute or a relative address). The burden of remembering which of the
fields of the union is current rests on the programmer. A possible solution is to consider an address to be an
element of the following type:
struct safe_address {
address location;
descriptor kind;
};
A safe address is defined as composed of two fields: one holds an address, the other holds a descriptor. The
descriptor field is used to keep track of the current address kind. Such a field must be updated for each assignment
to the corresponding location field.
This implementation corresponds to the abstract concept of a discriminated union. Discriminated unions differ
from unions in that elements of a discriminated union are tagged to indicate which set the value was chosen from.
Given an element e belonging to the discriminated union of two sets S and T, a function tag applied to e gives
either ’S’ or ’T’. Element e can therefore be manipulated according to the value returned by tag.
Type checking must be performed at run time for elements of both unions and discriminated unions. Nothing
prevents programs to be written (and compiled with no error) where an element is manipulated as a member of
type T while it is in fact a member of type S or vice-versa. Discriminated unions, however, are potentially safer
since they allow the programmer to explicitly take the tag field into consideration before applying an operation to
an element, although they cannot prevent the programmer from breaching safety by assigning the tag field a value
which is inconsistent with the other fields.
There are languages that get close to properly supporting the notion of discriminated union. For example, Pascal
offers variant records to represent discriminated unions. The following Pascal declarations define a safe address:
While Pascal allows the concept of discriminated union to be more naturally represented than in C, it does not
make the implementation safer. In Pascal, the tag and the variant parts may be accessed in the same way as
ordinary components. After the tag field of a safe address representing an offset is changed to absolute, it is
possible to access field abs_addr. In principle, this should result in a run-time error, because the field should be
considered as uninitialized.
Therefore, by changing the tag field, the machine interprets the string of bits stored in this area under the different
views provided by the types of each variant. This is an insecure–although in some cases practical–use of variant
records. Viewing the same storage area under different types may be useful in modeling certain practical
applications. For example, a program unit that reads from an input device might view a sequence of bytes
according to the type of data that is required. In general, however, this is an unsafe programming practice,
and should be normally avoided.
2.2.4 Powerset
It is often useful to define variables whose value can be any subset of a set of elements of a given type T. The type
of such variables is powerset (T), the set of all subsets of elements of type T. Type T is called the base type. For
example, suppose that a language processor accepts the following set O of options
Variables of type powerset (T) represent sets. The operations permitted on such variables are set operations, such
as union and intersection. Although sets (and powersets) are common and basic mathematical concepts, only a
few languages–notably, Pascal and Modula-2–provide them through built-in constructors and operations. Also, the
set-based language SETL makes sets the very basic data structuring mechanism. For most other languages, set data
structures are provided through libraries. For example, the C++ standard library provides many data structures,
including sets.
2.2.5 Sequencing
A sequence consists of any number of occurrences of elements of a certain component type CT. The important
property of the sequencing constructor is that the number of occurrences of the component is unspecified; it
therefore allows objects of arbitrary size to be represented.
It is rather uncommon for programming languages to provide a constructor for sequencing. In most cases, this is
achieved by invoking operating system primitives which access the file system. It is therefore difficult to imagine a
common abstract characterization of such a constructor.
Perhaps the best example is the file constructor of Pascal, which models the conventional data processing concept
of a sequential file. Elements of the file can be accessed sequentially, one after the other. Modifications can be
accomplished by appending a new values at the end of an existing file. Files are provided in Ada through standard
libraries, which support both sequential and direct files.
Arrays and recursive list definitions (defined next) may be used to represent sequences, if they can be stored in
main memory. If the size of the sequence does not change dynamically, arrays provide the best solution. If the size
needs to change while the program is executing, flexible arrays or lists must be used. The C++ standard library
provides a number of sequence implementations, including vector and list.
2.2.6 Recursion
Recursion is a structuring mechanism that can be used to define aggregates whose size can grow arbitrarily and
whose structure can have arbitrary complexity. A recursive data type T is defined as a structure which can contain
components of type T.
Conventional programming languages allow recursive data types to be implemented via pointers. Each component
of the recursive type is represented by a location containing a pointer to the data object, rather than the data
object itself. The list itself would be identified by another location containing the pointer to the first element of the
list.
For example, in ML a list can be denoted as either [ ] (the empty list) or as [x: :xs], the list composed of the head
element x and the tail list xs. In order to find an element in a list, we can write the following self-explaining high-
level function:
Pointers are a powerful, but low-level, programming mechanism that can be used
to build complex data
structures. In particular, they allow recursive data structures to be defined. As any low level
mechanism, however, they often allow obscure and insecure programs to be written. Just as
unrestricted goto statements broaden the context from which any labelled instruction can be
executed, unrestricted pointers broaden the context from which a data object may be accessed.
Let us review a number of cases of insecurities that may arise and possible ways of controlling
them.
This fragment initializes array hello to the array value {’h’, ’e’, ’l’, ’l,’ ’o’, ’\0’},i.e., the string "hello" (’\0’ is the null
character denoting the end of the string).Structure a is initialized to the structure value {0.0, 1.1}.
Modern programming languages provide many ways of defining new types, starting from built-in types. The
simplest way, mentioned in Section 3.1, consists of defining new elementary types by enumerating their values.
The constructors reviewed in the previous sections go one step further, since they allow complex data structures
to be composed out of the built-in types of the language. Modern languages also allow aggregates built through
composition of built-in types to be named as new types. Having given a type name to an aggregate data structure,
one can declare as many variables of that type as necessary by simple declarations.
For example, after the C declaration which introduces a new type name complex
struct complex {
float real_part, imaginary_part;
}
any number of instance variables may be defined to hold complex values:
complex a, b, c, . . .;
By providing appropriate type names, program readability can be improved. In addition, by factoring the definition
of similar data structures in a type declaration, modifiability is also improved. A change that needs to be applied to
the data structures is applied to the type, not to all variable declarations. Factorization also reduces the chance of
clerical errors and improves consistency.
The ability to define a type name for a user defined data structure is only a first step in the direction of supporting
data abstractions. The two main benefits of introducing types in a language are classification and protection. Types
allow the world of data to be organized as a collection of different categories.
Types also allow data to be protected from undesirable manipulations by specifying exactly which operations are
legal for objects of a given type and by hiding the concrete representation. Of these two properties, only the
former is achieved by defining a user-defined data structure as a type. What is needed is a construct that allows
both a data structure and operations to be specified for user defined types. More precisely, we need a construct to
define abstract data types. An abstract data type is a new type for which we can define the operations to be used
for manipulating instances, while the data structure that implements the type is hidden to the users.
2.2.8.1 Abstract data types in C++
Abstract data types can be defined in C++ through the class construct. A class encloses the definition of a new type
and explicitly provides the operations that can be invoked for correct use of instances of the type.
A below program class defining the type of the geometrical concept of point. A class can be viewed as an extension
of structures, where fields can be both data and routines. The difference is that only some fields are accessible
from outside the class. Non-public fields are hidden to the users of the class.
In the example, the class construct encapsulates both the definition of the data structure defined to represent
points and of the operations provided to manipulate points. The data structure which defines a geometrical point
is not directly accessible by users of the class. Rather, points can only be manipulated by the operations defined as
public routines, as shown by the following fragment:
class point {
int x, y;
public:
point (int a, int b) { x = a; y = b; } // initializes the coordinates of a point
void x_move (int a) { x += a; } // moves the point horizontally
void y_move (int b ){ y += b; } // moves the point vertically
void reset ( ) { x = 0; y = 0; } // moves the point to the origin
};
The fragment shows how operations are invoked on points by means of the dot notation; that is, by writing
“object_name.public_routine_name”. The only exceptions are the invocations of constructors and destructors. We
discuss constructors below; destructors will be discussed in a later example.
A constructor is an operation that has the same name of the new type being defined (in the example, point). A
constructor is automatically invoked when an object of the class is allocated. In the case of points p1 and p2, this is
done automatically when the scope in which they are declared is entered. In the case of the dynamically allocated
point referenced by p3, this is done when the new instruction is executed. Invocation of the constructor allocates
the data structure defined by the class and initializes its value according to the constructor’s code.
A special type of constructor is a copy constructor. The constructor we have seen for point builds a point out of
two int values. A copy constructor is able to build a point out of an existing point. The signature of the copy
construtor would be:
point (point&)
The copy constructor is fundamentally a different kind of constructor because it allows us to build a new object
from an existing object without knowing the components that constitute the object. That is what our first
constructor does.
When a parameter is passed by value to a procedure, copy construction is used to build the formal parameter from
the argument. Copy construction is almost similar to assignment with the difference that on assignment, both
objects exist whereas on copy construction, a new object must be created first and then a value assigned to it. It is
also possible to define generic abstract data types, i.e., data types that are parametric with respect to the type of
components. The construct provided to support this feature is the template. The below example, the C++ template
implements an abstract data type stack which is parametric with respect to the type of elements that it can store
and manage according to a last-in first-out policy. The figure also describes a fragment that defines data objects of
instantiated generic types:
Type/Program checking :
– Purposes include preventing misuse of primitives (e.g., 4/"hi") and avoiding run-time checking
– Dynamically-typed PLs (e.g., Python, JavaScript, Scheme) do much less type checking
• Maybe none, but the line is fuzzy and depends on exactly what one means by "type checking"..
The big benefit of static type checking is that it allows many type errors to be caught early in the
development cycle. Static typing usually results in compiled code that executes more quickly because
when the compiler knows the exact data types that are in use, it can produce optimized machine code
(i.e. faster and/or using less memory).
Static type checkers evaluate only the type information that can be determined at compile time, but are
able to verify that the checked conditions hold for all possible executions of the program, which
eliminates the need to repeat type checks every time the program is executed.
A static type-checker will quickly detect type errors in rarely used code paths. Without static type
checking, even code coverage tests with 100% coverage may be unable to find such type errors. However,
a detriment to this is that static type-checkers make it nearly impossible to manually raise a type error in
your code because even if that code block hardly gets called – the type-checker would almost always find
a situation to raise that type error and thus would prevent you from executing your program (because a
type error was raised).
Dynamic type checking is the process of verifying the type safety of a program at runtime. Common
dynamically-typed languages include Groovy, JavaScript, Lisp, Lua, Objective-C, PHP, Prolog, Python, Ruby,
Smalltalk and Tcl.
Most type-safe languages include some form of dynamic type checking, even if they also have a static type
checker. The reason for this is that many useful features or properties are difficult or impossible to verify
statically. For example, suppose that a program defines two types, A and B, where B is a subtype of A. If
the program tries to convert a value of type A to type B, which is known as downcasting, then the
operation is legal only if the value being converted is actually a value of type B. Therefore, a dynamic
check is needed to verify that the operation is safe. Other language features that dynamic-typing enable
include dynamic dispatch, late binding, and reflection.
In contrast to static type checking, dynamic type checking may cause a program to fail at runtime due to
type errors. In some programming languages, it is possible to anticipate and recover from these failures –
either by error handling or poor type safety. In others, type checking errors are considered fatal. Because
type errors are more difficult to determine in dynamic type checking, it is a common practice to
supplement development in these languages with unit testing.
All in all, dynamic type checking typically results in less optimized code than does static type checking; it
also includes the possibility of runtime type errors and forces runtime checks to occur for every execution
of the program (instead of just at compile-time). However, it opens up the doors for more powerful
language features and makes certain other development practices significantly easier. For
example, metaprogramming – especially when using eval functions – is not impossible in statically-typed
languages, but it is much, much easier to work with in dynamically-typed languages.
Here is a description of the way (or at least one of the ways) these terms are most commonly used.
In a statically typed language, every variable In a dynamically typed language, every variable
name is bound both name is (unless it is null) bound only to an object.
to a type (at compile time, by means of Names are bound to objects at execution time by
a data declaration)
means of assignment statements, and it is
to an object.
possible to bind a name to objects of different
The binding to an object is optional — if a name
is not bound to an object, the name is said to types during the execution of the program.
be null.
Once a variable name has been bound to a type
(that is, declared) it can be bound (via an
assignment statement) only to objects of that
type; it cannot ever be bound to an object of a
different type. An attempt to bind the name to
an object of the wrong type will raise a type
exception.
A common misconception is to assume that all statically-typed languages are also strongly-typed languages, and
that dynamically-typed languages are also weakly-typed languages. This isn’t true, and here’s why:
(i) strongly-typing
It is one in which variables are bound to specific data types, and will result in type errors if types to not match up
as expected in the expression – regardless of when type checking occurs. A simple way to think of strongly-typed
languages is to consider them to have high degrees of type safety. To give an example, in the following code block
repeated from above, a strongly-typed language would result in an explicit type error which ends the program’s
execution, thus forcing the developer to fix the bug:
We often associate statically-typed languages such as Java and C# as strongly-typed (which they are) because data
types are explicitly defined when initializing a variable – such as the following example in Java:
However, ruby, python, and javascript (all of which are dynamically-typed) are also strongly-typed languages and
the developer makes no verbose statement of data type when declaring a variable. Below is the same java
example above, but written in ruby.
Both of the languages in these examples are strongly-typed, but employ different type checking methods.
Languages such as ruby, python, and javascript which do not require manually defining a type when declaring a
variable make use of type inference – the ability to programmatically infer the type of a variable based on its
value. Some programmers automatically use the term weakly typed to refer to languages that make use of type
inference, often without realizing that the type information is present but implicit. Type inference is a separate
feature of a language that is unrelated to any of its type systems.
(ii) weakly-typing
on the other hand is a language in which variables are not bound to a specific data type; they still have a type, but
type safety constraints are lower compared to strongly-typed languages. Take the following PHP code for example:
Because PHP is weakly-typed, this would not error. Just as the assumption that all strongly-typed languages are
statically-typed, not all weakly-typed languages are dynamically-typed; PHP is a dynamically-typed language, but C
– also a weakly-typed language – is indeed statically-typed.
Myth officially busted. While they are two separate topics, static/dynamic type systems and strong/weak type
systems are related on the issue of type safety. One way you can compare them is that a language’s static/dynamic
type system tells when type safety is enforced, and its strong/weak type system tells how type safety is enforced.
A strict type system might require operations that expect an operand of a type T to be invoked legally only with a
parameter of type T. Languages, however, often allow more flexibility, by defining when an operand of another
type– say Q–is also acceptable without violating type safety.
In such a case, we say that the language defines whether, in the context of a given operation, type Q is compatible
with type T. Type compatibility is also sometimes called conformance or equivalence. When compatibility is
defined precisely by the type system, a type checking procedure can verify that all operations are always invoked
correctly, i.e., the types of the operands are compatible with the types expected by the operation.
Thus a language defining a notion of type compat ibility can still have a strong type system.
The strict conformance rule where a type name is only compatible with itself is called name compatibility
Structural compatibility is another possible conformance rule that languages may adopt.
Name compatibility is easier to implement than structural compatibility, which requires a recursive traversal of a
data structure. Name compatibility is also much stronger than structural compatibility. Actually, structural
compatibility goes to the extreme where type names are totally ignored in the check. Structural compatibility
makes the classification of data objects implied by types exceedingly coarse.
Type conversion or typecasting refers to changing an entity of one datatype into another. There are two types of
conversion: implicit and explicit. The term for implicit type conversion is coercion. Explicit type conversion in some
specific way is known as casting. Explicit type conversion can also be achieved with separately defined conversion
routines such as an overloaded object constructor.
Implicit type conversion, also known as coercion, is an automatic type conversion by the compiler.
Some languages allow, or even require compilers to provide coercion.
In a mixed type expression, a subtype s will be converted to a supertype t or some subtypes s1, s2, ... will be
converted to a supertype t (maybe none of the si is of type t) at runtimeso that the program will run correctly. For
example:
(type)expression
C++-style casting
Several cast syntaxes are used in C++ (although C-style casting is supported as well). The function-call style follows
the form:
type(expression)
This style of casting was adopted to force clarity when using casting. For example, the result of, and intention of,
the C style cast
(type)firstVariable + secondVariable
may not be clear, while the same cast using C++-style casting allows more clarity:
type(firstVariable + secondVariable)
If a type is defined as a set of values with an associated set of operations, a subtype can be defined to be a subset
of those values. The concept of subtype will have a richer semantics in the context of object-oriented languages
If ST is a subtype of T, T is also called ST’s supertype (or parent type). We assume that the operations defined for T
are automatically inherited by ST. A language supporting subtypes must define:
Pascal was the first programming language to introduce the concept of a subtype, as a subrange of any discrete
ordinal type (i.e., integers, boolean, char acter, enumerations, or a subrange thereof). For example, in Pascal one
may define natural numbers and digits as follows:
type natural = 0. .maxint;
digit = 0. .9;
small = -9. .9;
where maxint is the maximum integer value representable by an implementation.
Early examples of this programming approach were implemented in Scheme and Ada, although the best known
example is the Standard Template Library (STL), which developed a theory of iterators that is used to decouple
sequence data structures and the algorithms operating on them.
The description provides an overall hierarchical taxonomy of the features provided by each language for data
structuring. For a full understanding of language semantics, such description must be complemented by a precise
understanding of the rules of the type system (strong typing, type compatibility, type conversion, subtyping,
genericity, and polymorphic features), Pascal Program Structure
Program name
Uses command
Type declarations
Constant declarations
Variables declarations
Functions declarations
Procedures declarations
Main program block
Statements and Expressions within each block
Comments
Every pascal program generally has a heading statement, a declaration and an execution part strictly in that
order. Following format shows the basic syntax for a Pascal program −
The first line of the program program HelloWorld; indicates the name of the program.
The second line of the program uses crt; is a preprocessor command, which tells the compiler to include
the crt unit before going to actual compilation.
The next lines enclosed within begin and end statements are the main program block. Every block in
Pascal is enclosed within a beginstatement and an end statement. However, the end statement
indicating the end of the main program is followed by a full stop (.) instead of semicolon (;).
The begin statement of the main program block is where the program execution begins.
The lines within (*...*) will be ignored by the compiler and it has been put to add a comment in the
program.
The statement writeln('Hello, World!'); uses the writeln function available in Pascal which causes the
message "Hello, World!" to be displayed on the screen.
The statement readkey; allows the display to pause until the user presses a key. It is part of the crt unit. A
unit is like a library in Pascal.
The last statement end. ends your program.
2.4.2 C++
C/C++ arrays allow you to define variables that combine several data items of the same kind but structure is
another user defined data type which allows you to combine data items of different kinds.
Structures are used to represent a record, suppose you want to keep track of your books in a library. You might
want to track the following attributes about each book:
Title
Author
Subject
Book ID
Defining a Structure - To define a structure, you must use the struct statement. The struct statement defines a new
data type, with more than one member, for your program. The format of the struct statement is this:
The structure tag is optional and each member definition is a normal variable definition, such as int i; or float f; or
any other valid variable definition. At the end of the structure's definition, before the final semicolon, you can
specify one or more structure variables but it is optional. Here is the way you would declare the Book structure:
The type structure of C++ is given in Figure 43. C++ distinguishes between to categories of types: fundamental
types and derived type. Fundamental types are either integral or floating. Integral types comprise char, short int,
int, long int, which can be used for representing integers of different sizes. Floating-point types comprise float,
double, and long double. New integral types may be declared via enumerations.
For example
enum my_small_set {low = 1, medium = 5, high = 10}
Arrays are declared by providing a constant expression, which defines the number of elements in the array.
\
2.4.3 Ada
The syntax of Ada minimizes choices of ways to perform basic operations, and prefers English keywords (such as
"or else" and "and then") to symbols (such as "||" and "&&"). Ada uses the basic arithmetical operators "+", "-",
"*", and "/", but avoids using other symbols. Code blocks are delimited by words such as "declare", "begin", and
"end", where the "end" (in most cases) is followed by the identifier of the block it closes (e.g., if … end if, loop …
end loop). In the case of conditional blocks this avoids a dangling else that could pair with the wrong nested if-
expression in other languages like C or Java.
Ada is designed for development of very large software systems. Ada packages can be compiled separately. Ada
package specifications (the package interface) can also be compiled separately without the implementation to
check for consistency. This makes it possible to detect problems early during the design phase, before
implementation starts.
A large number of compile-time checks are supported to help avoid bugs that would not be detectable until run-
time in some other languages or would require explicit checks to be added to the source code. For example, the
syntax requires explicitly named closing of blocks to prevent errors due to mismatched end tokens. The adherence
to strong typing allows detection of many common software errors (wrong parameters, range violations, invalid
references, mismatched types, etc.) either during compile-time, or otherwise during run-time.
Unstructured (scalar) types can be both numeric (i.e., integers and reals) and enumerations. All scalar types are
ordered, i.e., relational operators are defined on them. Enumeration types are similar to those provided by Pascal.
Integer types comprise a set of consecutive integer values. An integer type may be either signed or modular. Any
signed integer type is a subrange of System. Min_Int. .System.Max_Int, which denote the minimum and maximum
integer representable in a given Ada implementation. A modular integer is an absolute value, greater than or equal
to zero. The Ada language predefines a signed integer type, called Integer.
type Small_Int is range -10. .10; -- range bounds may be any static expressions
type Two_Digit is mod 100; --the values range from 0 to 99;
--in general, the bound must be a static expression
As we mentioned, Ada allows subtypes to be defined from given types. Subtypes do not define a new type.
Two subtypes of Integer are predefined in Ada:
Overloading and coercion are widely available in Ada. The language also provides for inclusion polymorphism in
the case of subtypes and type extensions Finally, Ada makes extensive use of attributes. Attributes are used to
designate properties of data objects and types. As we saw in many examples so far, the value of an attribute is
retrieved by writing the name of the entity whose attribute is being sought, followed by a ’ and the name of the
attribute. Ada predefines a large number of attributes; more can be defined by an implementation.
It is intended to complement the conceptual model of programming language processing provided in Chapter 2, by
showing how data can be represented and manipulated in a machine. It is not intended, however, to provide a
detailed account of efficient techniques for representing data objects within a computer, which can be highly
dependent on the hardware structure. Rather, straightforward solutions will be presented, along with some
comments on alternative, more efficient representations.
Data will be represented by a pair consisting of a descriptor and a data object. Descriptors contain the most
relevant attributes that are needed during the translation process. Additional attributes might be appropriate for
specific purposes. Typically, descriptors are kept in a symbol table during translation, and only a subset of the
attributes stored there needs to be saved at run time. Again, we will pay little attention to the physical layout
of descriptors, which depends on the overall organization of the symbol table.