Types
Types
Aaron Bloomfield
CS 415
Fall 2005
1
Why have Types?
• Types provide context for operations
– a+b what kind of addition?
– pointer p = new object how much space?
• Limit valid set of operations
– int a = “string” know error before run time
2
Computer Hardware & Types
• Remember CS216?
– Bits in data are generally untyped
– Bits interpreted as having a type, such as:
• Instruction
• Address
• Floating-point number
• Character
• High-level language interprets bits
3
Type assignment
• Language-dependent
• a+b example again:
– Ada: a and b declared with static types
– Perl: $a and $b are dynamically typed
– Scheme: checked at run-time
– Smalltalk: checked at run-time
– Bliss: completely untyped
4
Static vs. Dynamic Typing
• First, what does strongly typed mean?
– Prevents invalid operations on a type
• Statically typed: strongly typed and type
checks can be performed at compile time
– Ada, Pascal: almost all type checking done at
compile time, with just a few exceptions
– C, C++, Java, Ocaml
• Dynamically typed: type checking is
delayed until run time
– Scheme, Smalltalk, Perl
5
Defining/Declaring a Type
• Fortran: variable’s type known from name
– Letters I-N = integer type
– Otherwise = real (floating-point) type
• Scheme, Smalltalk: run-time typing
• Most languages (such as C++): user
explicitly declares type
– int a, char b
6
Classifying types
• A type is either one of a small group of built-in
types or a user-defined composite
• Built-in types include:
– Integers
– Characters: ASCII, Unicode
– Boolean (not in C/C++)
– Floating points (double, real, float, etc.)
– Rarer types:
• Complex (Scheme, Fortran)
• Rational (Scheme)
• Fixed point (Ada)
7
Number precision
• Is the precision specified by the type in the
language implementation?
– Java, C, C++ Fortran: user specifies precision
• Java: byte, short, int, long
– Most other languages: implementation
– Is the precision consistent across platforms?
• Java: Yes
• C/C++: No
8
Number precision
• If user doesn’t specify type, code can lose
portability due to differences in precision
– In C/C++, precision is platform-dependent
• Neat example in Haskell
– 2 ^ 200
1606938044258990275541962092341162602522202993782792835
301376
9
Type Equivalence
• Determining when the types of two
values are the same
15
Coercion
• Can automatically force values to another
type for the given context
• a+b revisited:
– Without coercion, if both aren’t the same type
then there’s a compile-time error
– With coercion, if either is floating-point, then
addition is floating-point; otherwise int addition
17
Records
• Also known as ‘structs’ and ‘types’.
–C
struct resident {
char initials[2];
int ss_number;
bool married;
};
18
Nesting Records
• Most languages allow records to be
nested within each other.
– Pascal
type two_chars = array [1..2] of char;
type married_resident = record
initials: two_chars;
ss_number: integer;
incomes: record
husband_income: integer;
wife_income: integer;
end;
end;
19
Memory Layout of Records
• Fields are stored adjacently in memory.
• Memory is allocated for records based on
the order the fields are created.
• Variables are aligned for easy reference.
Optimized for space Optimized for memory alignment
4 bytes / 32 bits 4 bytes / 32 bits
22
Variant Records
• variant records – provide two or more
alternative fields.
union foo {
int i;
float f;
char c[4];
}
union DisasmInst {
#ifdef BIG_ENDIAN
struct { unsigned char a, b, c, d; } chars;
#else
struct { unsigned char d, c, b, a; } chars;
#endif
int intv;
unsigned unsv;
struct { unsigned offset:16, rt:5, rs:5, op:6; } itype;
struct { unsigned offset:26, op:6; } jtype;
struct { unsigned function:6, sa:5, rd:5, rt:5, rs:5,
op:6; } rtype;
};
26
void CheckEndian() {
union {
char charword[4];
unsigned int intword; Another union example
} check;
check.charword[0] = 1;
check.charword[1] = 2;
check.charword[2] = 3;
check.charword[3] = 4;
#ifdef BIG_ENDIAN
if (check.intword != 0x01020304) { /* big */
cout << "ERROR: Host machine is not Big Endian.\nExiting.\n";
exit (1);
}
#else
#ifdef LITTLE_ENDIAN
if (check.intword != 0x04030201) { /* little */
cout << "ERROR: Host machine is not Little Endian.\nExiting.\n";
exit (1);
}
#else
cout << "ERROR: Host machine not defined as Big or Little Endian.\n";
cout << "Exiting.\n";
exit (1);
#endif // LITTLE_ENDIAN
#endif // BIG_ENDIAN
}
27
Arrays
• Group a homogenous type into indexed
memory.
29
Multidimensional Arrays
• Two ways to make multidimensional arrays
– Both examples from Ada
– Construct specifically as multidimensional.
31
Array Memory Allocation
• Global lifetime, static shape:
– The array’s shape is known at compile time,
and exists throughout the entire program.
• Array can be allocated in static global memory.
• int global_var[30]; void main() { };
34
Memory Layout Options
• Ordering of array elements can be
accomplished in two ways:
– row-major order – Elements travel across
rows, then across columns.
– column-major order – Elements travel
across columns, then across rows.
Row-major Column-major
35
Row Pointers vs.
Contiguous Allocation
• Row pointers – an array of pointers to an
array. Creates a new dimension out of
allocated memory.
• Avoids allocating holes in memory.
day[0]
S u n d a y M o n Array =
day[1]
d a y T u e s d a 57 bytes
day[2]
y W e d n e s d a
day[3] Pointers =
y T h u r s d a y
day[4] 28 bytes
F r i d a y S a
day[5] Total Space =
t u r d a y
day[6] 85 bytes
36
Row Pointers vs.
Contiguous Allocation
• Contiguous allocation - array where each
element has a row of allocated space.
Memory
39
Sets
• Introduced by Pascal, found in most recent
languages as well.
• Common implementation uses a bit vector
to denote “is a member of”.
– Example:
U = {‘a’, ‘b’, …, ‘g’}
A = {‘a’, ‘c’, ‘e’, ‘g’} = 1010101
41
Enumerations
• enumeration – set of named elements
– Values are usually ordered, can compare
enum weekday {sun,mon,tue,wed,thu,fri,sat}
if (myVarToday > mon) { . . . }
• Advantages
– More readable code
– Compiler can catch some errors
• Is sun==0 and mon==1?
– C/C++: yes; Pascal: no
• Can also choose ordering in C
enum weekday {mon=0,tue=1,wed=2…}
42
Lists
43
Lists
• list – the empty list or a pair consisting of an
object (list or atom) and another list
(a . (b . (c . (d . nil))))
44
Recursive Types
45
Recursive Types
• recursive type - type whose objects may
contain references to other objects of the
same type
– Most are records (consisting of reference
objects and other “data” objects)
– Used for linked data structures: lists, trees
struct Node {
Node *left, *right;
int data;
}
46
Recursive Types
• In reference model of variables (e.g. Lisp,
Java), recursive type needs no special
support. Every variable is a reference
anyway.
• In value model of variables (e.g. Pascal,
C), need pointers.
47
Value vs. Reference
• Functional languages
– almost always reference model
• Imperative languages
– value model (e.g. C)
– reference model (e.g. Smalltalk)
• implementation approach: use actual values for
immutable objects
– combination (e.g. Java)
48
Pointers
• pointer – a variable whose value is a
reference to some object
– pointer use may be restricted or not
• only allow pointers to point to heap (e.g. Pascal)
• allow “address of” operator (e.g. ampersand in C)
– pointers not equivalent to addresses!
– how reclaim heap space?
• explicit (programmer’s duty)
• garbage collection (language implementation’s
duty)
49
Value Model – More on C
• Pointers and single dimensional arrays interchangeable,
though space allocation at declaration different
int a[10]; int *b;
• For subroutines, pointer to array is passed, not full array
• Pointer arithmetic
– <Pointer, Integer> addition
int a[10];
int n;
n = *(a+3);
– <Pointer, Pointer> subtraction and comparison
int a[10];
int * x = a + 4;
int * y = a + 7;
int closer_to_front = x < y;
50
Dangling References
• dangling reference – a live pointer that
no longer points to a valid object
– to heap object: in explicit reclamation,
programmer reclaims an object to which a
pointer still refers
– to stack object: subroutine returns while some
pointer in wider scope still refers to local
object of subroutine
• How do we prevent them?
51
Dangling References
• Prevent pointer from pointing to objects
with shorter lifetimes (e.g. Algol 68, Ada
95). Difficult to enforce
• Tombstones
• Locks and Keys
52
Tombstones
• Idea
– Introduce another level of indirection: pointer contain the address of the
tombstone; tombstone contains address of object
– When object is reclaimed, mark tombstone (zeroed)
• Time overheads
– Create tombstone
– Check validity of access
– Double indirection
• Space overheads
– when to reclaim??
• Extra benefits
– easy to compact heap
– works for heap and stack
53
54
Locks and Keys
• Idea
– Every pointer is <address, key> tuple
– Every object starts with same lock as pointer’s key
– When object is reclaimed, object’s lock marked (zeroed)
• Advantages
– No need to keep tombstones around
• Disadvantages
– Objects need special key field (usually implemented only for
heap objects)
– Probabilistic protection
• Time overheads
– Lock to key comparison costly
55
56
Garbage Collection
• Language implementation notices when
objects are no longer useful and reclaims
them automatically
– essential for functional languages
– trend for imperative languages
• When is object no longer useful?
– Reference counts
– Mark and sweep
– “Conservative” collection
57
Reference Counts
• Idea
– Counter in each object that tracks number of
pointers that refer to object
– Recursively decrement counts for objects and
reclaim those objects with count of zero
• Must identify every pointer
– in every object (instance of type)
– in every stack frame (instance of method)
– use compiler-generated type descriptors
58
Type descriptors example
myFunc type descriptor at 0x104
public class MyProgram {
i offset address
public void myFunc () {
Car c; 0 0 (Car) 0x018
}
}
Car type descriptor at 0x018
public class Car {
char a, b, c;
i offset address
Engine e; 0 4 (Engine) 0x0A2
Wheel w;
} 1 5 (Wheel) 0x005
61
Conservative collection
• Idea
– Number of blocks in heap is much smaller than number of
possible addresses (232) – a word that could be a pointer into
heap is probably pointer into heap
– Scan all word-aligned quantities outside the heap; if any looks
like block address, mark block useful and recursively explore
words in block
• Advantages
– No need for type descriptors
– Usually safe, though could “hide” pointers
• Disadvantages
– Some garbage is unclaimed
– Can not compact (not sure what is pointer and what isn’t)
62