Ch_02_Revision on Basic Concepts
Ch_02_Revision on Basic Concepts
For our discussion, bits can be either 1 or 0. Since a single bit can only represent only two values, we
must group bits together to represent a wider range of numbers. The most common grouping is a byte,
which is 8 bits grouped in sequence. For example, the following is a byte:
When bits are grouped, there are a variety of methods for interpreting them collectively. Each method of
interpretation is called a bit model. We will now examine several bit models for representing various
types and ranges of numbers.
1
Hawassa University
School of Informatics, Computer Science Department
A bit value of 0 indicates a place value of 0; a bit value of 1 indicates a place value as given in the table.
The total value of the number represented is found by adding up the place values of all the bits. In the
example above, the value represented in the 8 bits (1 byte) is 19:
By common convention, a value of 0 in the sign bit indicates a positive number, while a value of 1 in the
sign bit indicates a negative number. In this example, the decimal value represented using the 8 bits is
−19. Using the sign-magnitude bit model, it is possible with 8 bits to represent whole numbers in the
range −127 to +127.
The sign-magnitude model suffers from two drawbacks. First, notice that there are two possible bit values
for zero: 00000000 can be interpreted as “positive zero,” while 10000000 is interpreted as “negative
zero.” This does not make much sense. Even more important, using this bit model makes binary addition
somewhat complicated.
When a two’s complement number has a 1 in the highest bit, it indicates that the number is negative. To
find the value, we perform the same steps:
After performing these steps, the value provides the magnitude of the negative number. In this example,
we get a magnitude of 7, so the original bit pattern 11111001 is known to be −7.
2
Hawassa University
School of Informatics, Computer Science Department
denote powers of 2 that are negative, and hence fractions. For example, we could use 1 byte (8 bits) as
follows:
This approach is called a fixed point bit model, because the position of the number of digits (in base 10)
that can be represented in the fraction are fixed. In our example, the base 10 value that is represented by
the 8 bits is 18.375. In this example, only three fractional digits of a base 10 number can be represented.
This is generally considered a limitation, and a poor use of available bits.
The following example demonstrates these steps. The value −118.625 will be encoded:
The exponent is biased by adding 127 so that the exponent can be stored as a mangnitude-only, and yet
still represent the range −127 to +128. For our example, the final 32-bit floating point representation is:
Note that the spaces are for our convenience only; they are not represented in any way in the computing
system. The following are some additional examples:
In C, both the char and unsigned char data types use the ASCII bit model. However, because of their 7-bit
nature, the eighth bit allows for a second, different use of each data type. If we interpret each of the bit
3
Hawassa University
School of Informatics, Computer Science Department
patterns as magnitudeonly whole numbers, then we obtain the base 10 values listed in Table 2.1. The
unsigned char data type can be interpreted in this manner, providing a range of values from 0 to 255. If we
interpret each of the bit patterns using the two’s complement bit model, then we obtain base 10 values in
the range −128 to +127. The char data type can be interpreted in this manner. This dual interpretation of
the bit patterns can be seen through the following C code:
While programming, even when it is not necessary to understand how the individual bits are organized in
variables, it is often important to understand how many bytes are used by variables. The sizeof() operator
reports how many bytes a data type, or a variable, is using:
int i;
char c;
double d;
printf("%d %d %d %d\n",sizeof(i),sizeof(c),
sizeof(d),sizeof(float));
The result of executing this code is:
4184
An array is intended to group together a list of values, all of the same type, under one variable name. It is
much easier to code computations on an array than on a list of independently named variables:
for (i=0; i<100; i++) /* array coding */
sum=sum+qty[i];
sum=qty1+qty2+qty3+... /* without an array */
A structure is intended to group together an assortment of values of different types under a single variable
name. Like an array, it makes it easier to code computations. A string is intended to group together a
4
Hawassa University
School of Informatics, Computer Science Department
series of character symbols under one variable name. It is closely related to an array, but because it is
intended only for text data, a number of functions have been crafted to perform text-specific operations on
strings. A pointer is intended to hold the address of another variable, to provide a “gateway” or path of
indirect access to the other variable. It is used most often in passing values between pieces of code
(functions).
Array:
An array is a construct used to store a set of values using only one variable name. Each of the values
occupies a cell. Every cell in the array is the same size, meaning it occupies the same amount of memory.
The size of each cell is dictated by the data type (char, int, float, double) given in the variable declaration.
The number of cells is also given in the variable declaration. Here are some examples:
Faulty addressing can lead to another type of crash called a bus error. A bus error occurs when a program
tries to access a memory address that is physically impossible (or nonexistant).
Multidimensional Array
Arrays in C can have more than one dimension. But computer memory is all arranged in one-dimensional
order, as though it were one long street of bytes. How then are multidimensional arrays stored in
computer memory? Does a computing system have some other strange, multidimensional space for use
only by these arrays? Of course not. The answer is that the cells in the multidimensional array are listed
out, one at a time, in one-dimensional order. For example, consider the following code:
int a[3][2];
a[0][1]=7;
a[1][0]=13;
Strings
A string is a specific type of array: it is an array of char, containing a sequence of values where a value of
’\0’ signifies the end of the string. Although the array could be of any size, it is assumed that the valid data
in the array starts at the first cell, and ends with the first cell having a value of ’\0’. For example:
char d[8];
d[0]=’H’; d[1]=’e’; d[2]=’l’; d[3]=’l’; d[4]=’o’;
d[5]=’\0’; /* ’\0’ indicates the end of string */
Multidimensional strings
One of the common uses for multidimensional arrays is to store a list of strings. Since each string is a
one-dimensional array, a list of strings requires a twodimensional array. For example:
char n[2][4];
n[0][0]=’T’; n[0][1]=’o’; n[0][2]=’m’; n[0][3]=0;
n[1][0]=’S’; n[1][1]=’u’; n[1][2]=’e’; n[1][3]=0;
5
Hawassa University
School of Informatics, Computer Science Department
There are a handful of calculations that are common to a large number of text processing problems. These
calculations include finding the length of a string and comparing the contents of two strings. Because they
are so common, the C standard library has evolved to include functions to perform these calculations.
There are five functions that cover the most common calcuations and operations:
We used the memory map to study how arrays and strings work. Writing out a memory map is often
useful during program design. During debugging, it can be invaluable. In this chapter, we extend these
ideas to pointers and structures. All variables have an address in memory, much the same way that all
houses and businesses have a street address in the real world. However, the organization of buildings and
streets becomes clearer by looking at a map. It tends to be easier to explore an unknown city or to reach a
destination using a map. In much the same way, it tends to be easier to design code for a problem
involving pointers or to debug the code using a memory map. Pointers (and to some degree, structures)
can be the most difficult tool to master in the C programming language. The goal of this chapter is to
increase the proficiency of the reader with these tools through a deeper understanding of how they work.
Pointers
A pointer is a construct used to store an address of a variable. We declare a variable to be a pointer-type
variable by preceding its name with the asterisk symbol (*). For example:
char c,*cp;
int i,*ip;
float f,*fp;
double d,*dp;
The variables c, i, f, and d are normal variables, each holding a different type of value. The variables cp,
ip, fp, and dp are all pointers, each holding an address.
Read more about How many bytes does a pointer use?
Structure
A structure is a construct used to group a set of variables together under one name. The first step in using
a structure is to declare its organization. For example:
struct person { /* "person" is name for structure type */
char first[32]; /* first field is array of char */
char last[32]; /* second field is array of char */
int year; /* third field is int */
double ppg; /* fourth field is double */
}; /* ending ; to end definition */
This code does not create a variable. There are no bytes of storage named yet. Instead, this code creates a
template for a new variable type called struct person. One can think of a struct person as being a similar
6
Hawassa University
School of Informatics, Computer Science Department
construct to an int or double. It is a name for a data type, not a name for a variable. In this example, the
struct person consists of two arrays of 32 char, an int, and a double, for a total of 76 bytes.
Input/ Output
It is assumed that the reader is familiar with basic user I/O, such as reading input from a keyboard and
printing text to a terminal display. The reader is likely also familiar with basic file I/O, such as reading
and writing text to a file. It is common for a student to first learn text-based I/O; unfortunately, this can
bias the student toward thinking of all I/O as text-based. The goal of this chapter is to provide a broader
picture into the aspects of generic I/O. It is important to remember that text is only one type of data. When
considering generic I/O, we should remember to think of the data as raw bytes. It is up to the sender or
receiver of the data to interpret the bytes.
Streams, Buffers
An I/O transaction occurs when a program receives bytes from a source or sends bytes to a destination.
Example sources that send bytes to a program include a keyboard, mouse, file, and sensor. Example
destinations to which a program sends bytes include a display, file, printer, and actuator. Programs can
also send bytes to other programs, acting as sources or destinations.
Most modern operating systems, including Unix, use the stream model to control I/O. In this model, any
input or output transaction can be viewed as a flow of bytes from a source to a destination. The flow of
bytes is commonly referred to as a stream. The operating system creates and manages streams based upon
the function calls made by the program. The program has some control over how the stream is operated,
but in general it is managed by the operating system.
7
Hawassa University
School of Informatics, Computer Science Department
{
FILE *fpt;
fpt=fopen("data.txt","w");
fprintf(fpt,"Fortytwo 42 bytes of data on the wall...");
fclose(fpt);
}
The non-f I/O functions are system calls. Depending on which operating system they are called from, they
may behave differently. The f-versions of the I/O functions are standardized in the C library. A
programmer can expect them to behave similarly regardless of the underlying system. Unless otherwise
motivated, a programmer is encouraged to always use the f-versions of the I/O functions.
A general rule of thumb is that the f-versions of the I/O functions are buffered, while the non-f-versions
are not (buffers will be discussed shortly). This is not always true, as some system implementations of I/O
functions may include buffering. However, in practice a programmer can expect to frequently encounter
this distinction.
The standard streams are most commonly used by calling the scanf() and printf() functions. The scanf()
function is actually a specialized version of the more generic fscanf() function. While the fscanf()
function can receive bytes from any stream, the scanf() function is “hardwired” to the stdin stream. In the
following example, the scanf() and fscanf() function calls perform the exact same operation:
char s[80];
scanf("%s",s);
fscanf(stdin,"%s",s);
The same is true with regards to printf() and fprintf(). While the fprintf() function can send bytes along
any stream, the printf() function is hardwired to the stdout stream. In the following example, the printf()
and fprintf() function calls perform the exact same operation:
8
Hawassa University
School of Informatics, Computer Science Department
Buffers
A buffer is a temporary storage between the sender and receiver of bytes on a stream. When a stream is
created, one can think of it as having an address from which bytes are sent and an address at which bytes
are received. Each address is at a memory location controlled by the operating system. The buffer is an
additional piece of memory that is used to moderate the flow of bytes from the source to the destination.
A buffer is useful in a variety of situations. For example, what if the sender puts bytes into the stream
faster than the receiver can handle? Or what if a program is in the middle of a calculation and is not
prepared to receive any bytes? The buffer can store up the bytes until the program is able to handle them,
receiving them either at the reduced rate or when it is ready for them.
There are three basic types of buffering: block buffering, line buffering, and unbuffered. They differ in
how the temporary storage is flushed. Flushing is the act of emptying out the temporary storage, sending
all the bytes in the buffer on down the stream to the receiver. Each type of buffering differs as to how it
flushes. In a block buffer, a fixed-size chunk of memory is filled before being passed on to the receiver. A
block can be any size, although byte sizes that are powers of 2 are typical (e.g., 1 KB, 16 KB, etc.). In a
line buffer, any bytes inside the buffer are sent to the receiver once a newline character (byte value of 13)
is received. The newline character is also sent to the receiver. Finally, if the stream is unbuffered, then
each byte is sent to the receiver as soon as it is placed in the buffer. The buffer operates as though it is
transparent.
Block buffering is commonly used for large data transfers, such as file I/O. It makes the transfer more
efficient by saving up a large number of bytes before actually transporting them. If a program is doing a
lot of large data transfers, block buffering will speed it up. Line buffering is typically used for text-based
I/O, such as when interacting with a user. It allows bytes to be modified before actually committing them
to the stream transport. For example, a delete or backspace key can be used to modify the bytes in the line
buffer, while the enter key can be used to initiate the flush. Finally, buffering may be completely turned
off when responsiveness is critical. For example, a program may want to take action after any keypress
provided by the user, without having to wait for flushing. In this case, the input stream would likely be
unbuffered.
Pipes, Files
Pipes
The term pipe is used in several contexts in I/O. Sometimes, the word “pipe” is used interchangeably with
“stream” to refer to a flow of bytes between a source and destination. More often, it is used in contexts
where streams are reconnected to alternate sources or destinations. An analogy to plumbing can be made
as follows: imagine a water pipe that is disconnected at its source but left connected at its other end. The
disconnected end is then reconnected to an alternate source. By modeling a stream on this concept, one
can think of the connections at stream ends as pipe fittings. The process of connecting and reconnecting
streams is referred to as piping, or pipelining. The analogy also extends to replacing a single pipe-to-pipe
fitting with a three-way fitting, connecting one source to two destinations. One can imagine reconnecting
the standard out stream so that it simultaneously sends the same bytes to a file and a display. This is
another example of piping.
Pipelining the output of one program to the input of another program can be done repeatedly. This allows
us to write programs that perform single, simple operations, and to link them together into complex
chains in order to accomplish tasks.
9
Hawassa University
School of Informatics, Computer Science Department
Files
Every computer user is familiar with files, but what exactly is a file? A file is a one-dimensional array of
bytes. Regardless of what sort of data is inside the file, it is always stored as a one-dimensional array of
bytes. This may seem counterintuitive for a file that contains an image, or a movie, or a database. An
image, movie, or database can be stored as a one-dimensional array of bytes by writing all the elements
out in a long list. Furthermore, it does not matter what sort of data is inside the file. If the file contains
text, an image, a song, or an executable program, it is still a one-dimensional array of bytes. The
difference is only in which bit model was used to group and encode the bytes. Therefore, when we think
of reading from or writing to a file, we can always consider the process as being similar to accessing a
one-dimensional array of bytes.
One of the jobs of an operating system is to manage file storage. A system typically provides a set of
function calls, for use by programs, to interact with files. These include the operations already examined,
such as the ability to read and write data, plus additional operations. We will explore these additional
operations and related functions as we discuss each associated topic.
File Pointer
A file pointer is a marker used to keep track of the location for reading or writing on a stream. When a file
is opened, the file pointer points to the first byte in the file. Each time a byte is read, the file pointer is
automatically advanced to the next byte. If multiple bytes are read, then the file pointer is advanced
beyond all the bytes that have been read.
10