0 Introduction and Basics
0 Introduction and Basics
Open sesame!
0.0 C - An Overview
C is one of the widely used languages. It is a very powerful language suitable for
system programming tasks like writing operating systems and compilers. For example,
the operating systems UNIX and OS/2 are written in C and when speaking about
compilers its easy to list out the compilers that are not written in C! Although it was
applications. It is used in the embedded devices with just 64-KB of memory and is also
used in super computers and parallel computers that run at un-imaginable speeds. C and
its successor C++ cover most of the programming areas and are predominant languages
1986],
“C is clearly not the cleanest language ever designed nor the easiest to use, so
It is portable [can be executed on multiple platforms, even though the language has
C is a language for programmers and scientists and not for beginners and learners.
So it’s naturally the language of choice for them most of the times.
C is not a perfectly designed language. For example few of the operator precedence
are wrong. But the effect is irreversible and the same operator precedence continues to be
and readability. This is the secret of its widespread success. Lets see a classic example for
such code:
while(*t++ = *s++) ;
This code has less readability. It is curt and to the point. It is efficient (compared to
C is thus a language for the programmers by the programmers and that is the basic
and this fact is reflected in its standardization process also. Some of the facets of the
Understanding this design philosophy may help you understand some puzzling details of why
C is an attitude!
There is no such jargon as ‘writability’ and here I refer it to as the ability to write programs lucidly
Algol-60
CPL
Algol-68
BCPL
Pascal
C is neither a language that is designed from scratch nor had perfect design and contained
many flaws.
implemented. Later BCPL (Basic CPL) came as the implementation language for CPL by
Martin Richards. It was refined to language named as B by Ken Thompson in 1970 for
the DEC PDP-7. It was written for implementing UNIX system. Later Dennis M. Ritche
added types to the language and made changes to B to create the language what we have
as C language.
C derives a lot from both BCPL and B languages and was for use with UNIX on
DEC PDP-11 computers. The array and pointer constructs come from these two
languages. Nearly all of the operators in B is supported in C. Both BCPL and B were
type-less languages. The major modification in C was addition of types. [Ritchie 1978]
says that the major advance of C over the languages B and BCPL was its typing structure.
“The type-less nature of B and BCPL had seemed to promise a great simplification in the
Although K& R C had a rich set of features it was the initial version and C had a
lot to grow. The [Kernighan and Ritchie 1978] was the reference manual for both the
programmers and compiler writers for almost a decade. Since it is not meant for compiler
writers, it left lot of ambiguity in its interpretation and many of the constructs were not
clear. One such example is the list of library functions. Nothing significant is said about
the header files in the [Kernighan and Ritchie 1978] and so each implementation had
their own set of library functions. The compiler vendors had different interpretations and
added more features (language extensions) of their own. This created many
inconsistencies between the programs written for various compilers and lot of portability
features ANSI formed a committee called X3J11. Its primary aim was to make “an
C. The committee made a research and submitted a document and that was the birth of
ANSI C standard. Soon the ISO committee adopted the same standard with very little
Even experienced C programmers also doesn’t know much about ANSI standard
except what they frequently read or hear about what the standard says. When they get
curious enough to go through the ANSI C document, they stumble a little to understand
meant for compiler writers and vendors ensures accuracy and describes the C language
precisely. So the language used in the document is jocularly called as ‘standardese’. For
example to describe side effects, the standard uses the idea of ‘sequence-points’ that may
help confusing the reader more. L-value is not simply the ‘LHS (to =) value’. It is more
ANSI standard is not a panacea for all problems. To give an example, ANSI C
widened the difference between the C used as a ‘high-level language’ and as ‘portable
preferred even now by the various language compilers to generate C as their target language.
Because it is less-typed than ANSI C. To give another example, many think ‘sequence-
points’ fully describe side-effects and the belief that knowing its mechanism will help to
Although the C may be a base for successful object oriented extensions like C++
and Java, C still continues to remain and be used with same qualities as ever. C is still a
preferable language to write short efficient low-level code that interacts with the
The old C programmers sometimes used assembly language for doing jobs that
languages may do the same. They will write the code in their favorite language and for
low-level routines and efficiency they will code in C using it as an assembly language.
Startup
routine
main()
function
Called by Called by
main main
Exit Exit
handler handler
The life of a C program starts by being called by the OS. The space is allocated
for it and the necessary data initializations are made. The start-up routine after doing the
initialization work always calls the main function with the command line parameters
passed as the arguments to it. The main function may in-turn call any function calls
available in the code and the calling of functions continues if any such calls are there.
If nothing abnormally happens the control finally returns to main(). main() returns
to start-up routine. Start-up routine calls exit() to terminate the program with the return
exit(main()); //or
exit(main(argc,argv));
The exit function calls all the exit handlers (i.e. the functions registered by atexit()). All
files and stdout are flushed and the control returns back to OS.
If abort() is called by any of the functions, then the control directly returns to the
OS. No other calls to other functions are made nor do the activities like flushing the files
take place.
More information about this process and the functions involved are explained in
Source files are of two types: interface source files and implementation source
files. The interface source files are normally referred to as header files normally have .h
The implementation source files contain the information like function definitions,
other definitions and the information needed to generate the executable file, allocate and
initialize data.
The standard header files are examples for the interface files and the code is
available as .lib files and are linked at link-time by the linker to generate the .exe file. It
should be noted that only the code for the functions used in the program gets into the .exe
file even though many more functions are available in the header files.
done while translating the program, translation phases are available in ANSI C [ANSI C
1998]. The implementation may do this job in a single stretch, or combine the phases, but
the effect is as if the programs are translated according to that sequence. For example, the
implementation can have a preprocessor that does the work of all the phases intended for
representations,
6. mapping from each source character set member and escape sequence in
character set,
In C, logically main is the function first called by the operating system in a program.
But before main is executed OS calls another function called ‘start-up module’ to setup
various environmental variables and other tasks that have to be done before calling main and
Say you are writing code for an embedded system. In the execution environment,
there is no OS to initialize data-structures used. In such cases, you may have to insert
your code in that ‘start-up module’. Compilers such as Turbo and Microsoft C provide
facilities to add code in such cases for a particular target machine, for e.g. 8086.
0.8 ma in()
main is a special function and is logically the entry point for all programs.
Returning a value from main() is equivalent to calling exit with the same value.
main()
int i;
static int j;
The variables i and j declared here have no difference because the scope, lifetime
and visibility are all the same. In other words the local variables inside main() are created
when the program starts execution and are destroyed only when the program is
terminated. So it does not make much sense to declare any variable as static inside the
main().
The other differences between main() and other ordinary functions are,
¾ it is the only function that can be declared with either zero or two (or
¾ main() is the only function declared by the compiler and defined by the user,
¾ main() is the only function with implicit return 0; at the end of main(). When
control crosses the ending ‘}’ for main it returns control to the operating system
¾ return type for main() is always is an int, (some compilers may accept void
main() or any other return type, but they are actually treated as if it is declared
as int main(). It makes the code non-portable. Always use int main() ).
Standard C says that the arguments to main(), the argc and argv can be modified. The
// recursion 2
if(atoi(argv[1])>=0)
// prints
The name of the arguments is customary and you can use your own names. The
first two arguments needed to be supported by the operating system. If numeric data is
passed in command line, they are available as strings, so you must explicitly convert
them back.
int i = 0;
printf("%s\n",argv[i++]);
while(*argv)
printf("%s\n",*argv++);
}
The third argument char *envp is used widely to get the information about the
while(*envp)
printf("%s\n",*envp++);
TEMP=C:\WINDOWS\TEMP
PROMPT=$p$g
winbootdir=C:\WINDOWS
COMSPEC=C:\WINDOWS\COMMAND.COM
PATH=C:\WINDOWS;C:\WINDOWS\COMMAND;D:\SARAL\\BIN
windir=C:\WINDOWS
BLASTER=A220 I5 D1 T4
CMDLINE=noname00
int i=0;
while(environ[i])
printf("\n%s",environ[i++]);
The recommended way is to use the solution provided by ANSI as getenv() function for
maximum portability:.
int main()
if(env)
puts(env);
else
$p$g
Exercise 0.1:
argv[0] contains the name used to invoke the program. Is there any circumstance
The termination of the program may happen in one of the following ways,
Normal termination,
• by calling exit(),
Abnormal termination,
• by calling abort(),
• by raising signals.
The general way in which C programs are loaded into the memory is in the
following format,
Stack
free
heap
Program Code
¾ Data segment,
programmers),
¾ Code segment,
The data segment contains the global and static data that are explicitly initialized
The other part of data segment is called as BSS segment (standing for - Block
Starting with Symbol – because of the old IBM systems had that segment initialized to
zero) is the part of the memory where the operating system initializes it to Zeroes. That is
how the uninitialized global data and static data get default value as zero. This area is
fixed has static size (i.e. the size cannot be increased dynamically).
The data area is separated into two areas based on explicit initialization because
the variables that are to be initialized can initialized one by one. However, the variables
that are not initialized need not be explicitly initialized with zeros one by one. Instead of
that, the job of initializing the variables to zero is left to the operating system to be taken
care of. This bulk initialization can greatly reduce the time required to load.
Mostly the layout of the data segment is in the control of the underlying operating
system, still some loaders give partial control to the users. This information may be
This area can be addressed and accessed using pointers from the code. Automatic
variables have overhead in initializing the variables each time they are required and code
is required to do that initialization. However, variables in data area does not have such
runtime overhead because the initialization is done only once and that too at loading time.
The program code is the code area where the executable code is available for
execution. This area is also of fixed size. This can be accessed only be function pointers
and not by other data pointers. Another important information to note here is that the
system may consider this area as read only memory area and any attempt to write in this
Constant strings may be placed either in code or data area and that depends on the
implementation.
The attempt to write to code area leads to undefined behavior. For example the
following code may result in runtime error or even crash the system (surprisingly, it
int main()
{
static int i;
strcpy((char *)main,"something");
printf("%s",main);
if(i++==0)
main();
For execution, the program uses two major parts, the stack and heap. Stack frames
are created in stack for functions and heap for dynamic memory allocation. The stack and
heap are uninitialized areas. Therefore, whatever happens to be there in the memory
becomes the initial (garbage) value for the objects created in that space. These areas are
Lets look at a sample program to show which variables get stored where,
int initToZero1;
FILE * initToZero3;
int main()
{
int stringLength;
// stored in BSS
strcpy(dynamic,”something”);
stringLength = fp(dynamic);
static int i;
// stored in BSS
"arg2",0};
while(*arguments)
printf("\n %s",*arguments++);
if(!i++)
fp(3,str);
// in my system it printed,
// temp.exe
// thisFileName
// arg1
// arg2
After seeing how a C program is organized in the memory, to cross check the
void crossCheck()
int allocInStack;
void *ptrToHeap;
ptrToHeap = malloc(8);
if(ptrToHeap){
crossCheck();
else
int main(){
crossCheck();
ANSI says that the pointer comparison is valid only when the comparison is
It is only a general case that stack and heap grow towards each other and stack is
in higher memory locations than the heap. C does not assure anything as such.
This program is not portable. These kinds of problems are discussed throughout the book
and you will be familiar with such ideas when you finish reading this book.
Exercise 0.2:
static int i = 0;
Where will be the variable i allocated space? Is it in BSS or initialized data segment?
Exercise 0.3:
The diagram doesn’t show where the variables of storage class ‘extern’ and
‘register’ are stored. Could you tell where would they be stored?
0.13 Errors
Errors can occur anywhere in the compilation process. The possible errors are,
¾ preprocessor errors,
¾ linker errors.
Apart from these, runtime errors can also occur. If prevention is not taken for such
run-time errors, it will terminate the program execution and so avoiding/handling them
In C, if exceptions occur error flags kept by the system indicate them. A program
may check for exceptions using these flags and perform corresponding patch up work.
The program can also throw an exception explicitly using signals that are discussed under
errno defined in <errno.h>. More discussion about these header files is in later chapters.
Run-time errors are different from exceptions. Errors indicate the fatality of the
Exercise 0.4:
runtime error?
int i = 1/0;
1 PROGRAM DESIGN
- Aristophanes
Clear, efficient and portable programs require careful design. Design of programs
involves so many aspects including the programmer’s experience and intuition. Thus it is an
art rather than a science. This chapter explores various issues involved in program design.
Portability is an important issue in the program design and the ANSI committee
has dedicated an appendix to portability issues. ISO defines portability as "A set of
attributes that bear on the ability of the software to be transferred from one environment
¾ Operating Systems
¾ Hardware
¾ Compiler
been successfully implemented on almost all platforms available. However C still has
some non-portable features. In other words, C has the reputation of a being a highly
portable language, but it has some inherently non-portable features. In fact, special care
should be taken for programs that are to be ported, and details about behavioral types,
The way the program acts at runtime determined by the behavioral type. The
¾ well-defined behavior,
¾ implementation-defined behavior,
¾ unspecified behavior,
¾ undefined behavior,
Behavioral types are not to be confused with errors. Illegal code causes
types occur in legal code and are defined only for the actions of the code at runtime.
You can write code without knowing anything about the behavioral types. But
knowledge about this is very crucial if you want to make your code be portable and of
high quality. The problems that arise out of portability are very hard to find and correct.
1.1.1.1 Well-defined behavior
When the language specification clearly specifies how the code behaves
is the most portable code and has no difference in its output across various platforms.
The [Kernighan and Ritchie 1988] and ANSI Standard documents are the closest
behavior.
example, the standard library function malloc(size) returns the starting address of the
allocated memory if ‘size’ bytes are available in the heap, else it returns a NULL pointer.
Both [Kernighan and Ritchie 1988] and ANSI describe how malloc behaves
i++;
if(i==0)
// prints
The code behaves the same way irrespective of the implementation and the same output
is printed.
1.1.1.2 Implementation defined behavior
When the behavior of the code is defined and documented by the implementers or
compiler writers, the code is said to have implementation defined behavior. Therefore,
the same code may produce different output on different compilers even if they reside on
The best example for this could be the size of the data types. The documentation
Since it is almost impossible to write code without implementation defined code. For our
int i;
then your program has such behavior. A programmer is free to use such code, but he
char ch = -1;
appendix.
1.1.1.3 Unspecified behavior
The designers of the language have understood that it is natural that the
implementations vary for various constructs depending on the platform. This makes the
implementation efficient and fit for that particular platform. Some of these details are too
implementation specific that the programmer need not understand that. These are need
unspecified behavior. One such example is the sequence in which the arguments are
someFun( i += a , i + 2);
The arguments of a function call can be evaluated in any order. The expression i +=a
You should not write code that relies upon such behavior.
specifies that the behavior that is implementation specific. The main difference is that the
the user generally accesses directly. Whereas in unspecified behavior the compiler vendor
may not document it and are implementation details that are generally not accessed by the
users.
The standard committee did not define the constructs of these two behavioral
implementation.
1.1.1.4 Undefined behavior
If neither the language specification nor the implementation specifies the behavior
of an erroneous code, then the code is said to be have undefined behavior. The behavior
So the code that contains such behavior should not be used and is incorrect
because of erroneous code or data. Undefined behavior may lead to any effect from
Here the variable j is assigned with exploiting the fact that in that environment the
int *i;
*i = 10;
i is a wild-pointer and the result and behavior of the code of applying indirection
operator on it undefined.
These are examples of using undefined behavior. Code with undefined behavior is
always undesirable and should be strictly avoided. In such cases, either use assert to make
sure that you don’t use that accidentally or remove such behavior from the code.
platform,
¾ to make the code generated for a particular platform to be more efficient. (E.g.
near, far and huge pointer types in Microsoft and Borland compilers for x86
platform).
Let’s see an instance for a requirement of language extension and how that request
is satisfied.
In writing programs like device drivers and graphical libraries the speed is crucial.
Access to the hardware registers and other system resources may be required sometimes.
There are instances where manipulation of registers and execute instructions that are
inaccessible through C but are accessible through assembly language (C has low-level
features but not this much low level at the cost of portability). In C the assignment of one
array/string to another is not supported. But the assembly language for that hardware may
have instructions that may do these operations atomically (block copy) which will require
be implemented in C or in assembly language, recognises the need for such access to the
special cases. Examples for such library functions are getchar(), memcpy() etc.
Thus there is a need that the assembly code be directly written in C. This will help
asm(assembly_instruction);
will insert the assembly_instruction be directly injected into the assembly code generated.
Let’s say we have to install a new I/O device. How the interfacing to that device
be made? This can be done using C code now and using assembly code wherever it is
required.
This feature is also useful for time-critical applications where an overhead of even
Using assembly code for efficiency has many disadvantages. The programmers
who update the code may not be familiar with the particular assembly language used.
Moreover porting the code to other systems requires the code be rewritten in that
particular assembly language. This feature (and as in the case of all language extensions)
Avoid using language extensions unless you are writing code only for a particular
environment and the efficiency is of top priority. Stay within the mainstream and well-
Writing portable code is not done automatically and it is only by conscious effort
as far as C is concerned. The following steps are recommended when writing any serious
C code:
1. Analyze the portability objectives of any program before writing any C code.
2. Write code that conforms to the standard C. This should be done even if your
extensions). Using such features when writing standard C code possibly will
harm the portability of the code. Use standard C library whenever possible.
Avoid using third party libraries when achieving the same functionality
3. When the support for the functionality is not available in the standard library
look for the functionality in the library provided by your compiler vendor. See
4. When the functionality you want is not available even in the library provided
by your compiler vendor, look for any such library in the market preferably in
5. Only after failing to have such functionality in the third-party libraries, decide
to develop your own code, that too keeping portably in mind. Try to do it in C
code and only if not possible go to the options like using assembly code for
your programs.
Lets look at an example of how this can be applied systematically for a problem-
at-hand. XYZ company wants a tool for storing, retrieving and displaying the
photographs of their employees in a database form. The company already has acquired a
special hardware for scanning the photographs. It is already using software developed in
C for office automation and they have the source code for the same.
For the problem C suits well because they are already have the application
running in C and source code is also available and the tool for scanning and storing the
On the first hand examine the scope of the problem. This is a requirement that
may be required in many companies and so it has lot of scope for being used outside the
company. The places where it may be required may have to interface with different
hardware (like scanners) and may require running on different platforms. Therefore, the
gains due to portability seem to be attractive, even if portable code is not possible, the
As the next step you see if the code can be written completely in standard C. The
platform you work is UNIX and so for storing the data, low-level files can be used. Doing
so will harm portability, so use standard library functions for doing that. For this
problem, interfacing with the hardware is required and for displaying the photos graphics
support is needed. Even though writing complete code in standard C is not possible, most
of the code can still be written in standard C. Make sure to keep the non-portable code
For interfacing with external hardware devices your compiler provides special
header files and the source code is also available for you. The scanner is accompanied
with software for interfacing it with your code. You observe that the same functionality is
achievable by using the library provided by your vendor, without using the interfacing
software from the scanner. Hence, you resort to using the library since this can work for
any other scanners also although you need to write some more code.
The standard C does not have any graphics library. Unfortunately, your compiler
vendor also happens to not provide one such library. You have a good assembler, also
you are an accomplished assembly language programmer, and your compiler has options
to integrate the assembly code in your code. However, you observe that a portable
graphics package available by a third-party software vendor. You have to spend a little
for purchasing that and that graphics package does not perform as good as your assembly
code. You end up by buying the graphics package because it has better portability
options.
Thus you end up writing the code that is maximally portable without using
language extensions, platform dependent code or assembly code. In addition, you make
lot of money selling the package to other companies with little or no modifications. So it
is always preferable to write maximally portable code, if not fully portable code.
Throughout the book I stress on the importance of portability and writing portable
code. This doesn’t mean that you should never write non-portable code. My point is that
writing portable code helps you to have maximum benefit by distributing the code to
¾ isolate/group all the platform specific code to few files (if the code is to be
ported to other platforms it is enough to change only the code in those files)
The ability to write non-portable and platform specific code is actually a one of
effectively used to write code for a particular platform, you can reap the
maximum benefit from the available underlying platform. For example lets see
an example of using system calls of UNIX for executing one program from within
another.
The system calls used for low-level process creation are execlp() and execvp().
The execlp call overlays the existing program with the new one , runs that and exits. The
execlp(path, file_name,arguments...);
A variant of execlp called execvp is used when the number of arguments is not known in
advance:
execvp(path, argument_array);
System calls are further discussed under the chapter in “Unix and Windows
programming in C”.
1.2 La ngua ge Fea tures to Avoid
Every language has its own strengths and weaknesses. They have strongholds,
traps and pitfalls. So, language supports a feature doesn’t mean that that feature should be
used. This is true for even a small language like C with less features. For example, the
Sometimes you have to avoid using some language features, depending on the
environment you program. For example while programming for embedded systems,
C is a language where you can code in different ways to solve the same problem.
So careful decision should be made in selecting the language features that are harmless,
well understood and less error-prone. For example, take a simple task of finding the
biggest of three numbers. Depending on the requirement and situation, you can either opt
for macros or functions, but in general, it is better to avoid macros and go for functions (I
“preprocessor”).
slight difference in speed can make a big difference. C was, of course, designed keeping
efficiency in mind, but the problem is that it was based on PDP machines. One such
example is the memory access techniques in C that are based on PDP Machines.
One cannot fully rely on the compiler to optimize and it is always good to hand-
applications. Because the programmer knows his intentions clearly and can optimize
better while writing the code to the compiler analyzing the code and make the code
efficient.
The optimizations that are possible can vary with requirements. In some cases, the
readability of the code needs to be slightly affected for optimizing the code. In addition,
optimizing depends on the platform, the minute hardware details, and many
much-optimized code.
For example, infinite loop for(;;) generates faster code than the while(1) even
though both intends to do the same. This is because for(;;) is a specialized condition for
the ‘for’ loop that is allowed by C language to indicate infinite loop so the compiler can
generate code directly for the infinite loop. Whereas for ‘while’ the code has to be
generated such that the condition has to be checked and transferred after always checking
the condition.
Some machines handle unsigned values faster than the signed values. Due to its
desirable properties like they values never overflow, making explicit that the value can
never go negative through the code itself etc., makes usage of unsigned over signed
whenever possible. Copying bulk of data from one location to another location can be
efficient if it is done in block multiples of eight bytes than byte by byte. Such an example
Recursion is acceptable to map the problem directly to solution but can be costly
if the function has lot of auto variables occupying lot of space. In such cases avoid
In the early days of C, it was used mostly for systems programming only. Initially
since it is widely believed that doing programming in high-level languages have the cost
of efficiency. Soon the C compilers became available in multiple platforms and they were
written such that they generated specialized code to fit the underlying machines.
Importantly optimizers did a good job and became an important part in almost every C
compiler. Optimizers can do some optimizations (like register optimizations) that are not
Efficiency is not just a design goal but a driving force in C’s design. So writing
efficient code is natural in C (and most of us, the C programmers even do it sometimes
unconsciously).
efficient code. Efficiency is thus the combined quality of both the language and its
implementation.
Although the optimizers do a good deal of work in improving the efficiency of the
code, it is not good to write code that depends on optimization be done by it. Most of the
optimizations can be done by good programming practices, careful and good designing.
There are numerous techniques to write optimal code and it is always better to write
The size of the executable code may be unnecessarily large due to many reasons.
The reuse of the code is good in the sense it makes use of already available code
that is normally a tested one. It reduces the development time also. However, it has a
trade-off too. Large amount of code duplication takes place if code reuse is not done
carefully. It makes the code harder to maintain (as opposed to the popular belief that
reuse makes maintenance easier. Of course, this is true if care is taken while reusing
code) because the original code is not tailored to solve the current need.
The tradeoff for the program size is the performance. If the file is too big, the
whole program cannot reside in the memory. Therefore, frequent swapping of pages has
to take place to give space for new pages . The overall effect is the performance
degradation.
In case of paged memory management systems (like DOS); not in every operating system. My idea is to
convey that making .exe files unnecessarily big affects performance.
1.3.3 Memory Management
is because the code has to be written to take care of dynamic storage allocation failures
and runtime overhead is involved in calling the memory allocation functions that may
sometimes take more time. Managing the allocation and deallocation of memory
on this sometimes. Examples are the deallocation of memory twice and using the memory
area that has already been deallocated. For these reasons, automatic storage must be
C provides you with different flavors of types1 that can be tailored to suit any
particular need. The language does not specify any limit on the range of the data types.
So depending on the hardware, the compilers can implement them efficiently. This means
that integer can be implemented with the native word size of the processor, which makes
the operations faster. In addition, the library code or the math co-processor, depending on
In C the types may be broadly classified into scalar, aggregate, function and void.
There are further sub-divisions, which can be understood from the diagram. Before
Variables are names given to the memory locations, a way to identify and use the
area to store and retrieve values. It is for the programmer, and so they do not exist after
the executable code is created. Whereas the constants live up to the compilation process
1
I want to clarify the difference between ‘type’ and ‘data type’. Data type specifies a set of values and a set
of operations on those values. However, type is a super set of data type, obtained by using existing data
types to come out with a composite set of values and a set of operations on those values (e.g. using
typedefs). Hereafter I use ‘type’ synonymously with ‘data type’.
// and so can take address of it.
int cp = &10;
That is the same reason why constants cannot be used in the case of passing to functions,
*i = *j;
*j = temp;
intSwap(&i, &j);
// is perfectly acceptable
intSwap(&10,&20);
One obvious exception is the string constants that are stored in the memory. For
example, you should have used the code like this using this fact,
int i = strcmp(“string1”,”string2”);
printf(“%p”,”someString”);
// prints the address of the string constant “someString”
In other words variables are addressable whereas literal constants are non-
addressable and that is why you can apply unary & operator only to variables and not for
constants.
Variables can be classified by the nature with which the value it stores changes.
The value of these variables can only be changed through program code (like
assign statements, which changes the value stored in that variable). All the variables used
in C programs are synchronous unless otherwise explicitly specified (by const or volatile
qualifiers)
// are synchronous
These variables represent the memory locations where the value in that location is
modified by the system and is in the control of the system. For example, the storage
location that contains the current time in that system that is updated by the timer in the
These are initialized variables that can only be read but not modified. The const
// means that the variable rov may be used for reading purposes only
This classification of variables was not there in the original K&R C because there
were no const or volatile qualifiers then. This is due to ANSI C, which introduced these
two qualifiers (called as cv-qualifiers standing for const and volatile qualifiers).
Constants are naming of internal representation of the bit pattern of the objects2. It
means that the internal representation may change, but the meaning of constant never
2
‘object’ is a region of memory that can hold a fixed or variable value or set of values. This use of
word ‘object’ is different from the meaning used in object-oriented languages. Hereafter the word ‘object’
is used to mean the variable and its associated space.
2.3.1 Prefixes and suffixes
Prefixes and suffixes force the type of the constants. The most common prefixes
are ‘0x’ and ‘0’, used in hexadecimal and octal integers, respectively. Prefix ‘L’ is used
to specify that a character constant is from a runtime wide character set, which is
The suffixes used in integers are L/l, U/u (order immaterial). L denotes long and
U for unsigned. In addition to the suffix L/l, the floating constants can have F/f suffix. If
no suffixes are there, the floating-point constant is stored as double, the F/f forces it to be
In the absence of any overriding suffixes, the data type of an integer constant is
Escape characters are the combination of the \ and a character from the set of
characters given below or an integer equivalent of the character, which has a special
If we use a character to specify the code then it is called a character escape code.
They are
\a, \b, \f, \n, \r, \t, \v, \?, \\, \’, \”
2.3.2.2 Numeric escape code
If we specify the escape character with the \integer form, then it is called numeric
escape code.
Exercise 2.1:
Escape characters (in particular, numeric codes) allow the mapping supported by
If all the values of a data type lie along a linear scale, then the data type is said to
be of scalar data type. I.e. the values of the data type can be used as an operand to the
relational operators.
Character type is derived from integer and is capable of storing the execution
character set. The size should be at least one byte. If a character from the execution
Types
Scalar
Arithmeti
Integral
Char
Integer
Enum
Floating
Float
Double
Pointer
Aggregate
Void
Function
characters.
Version 1:
alphabet. It is not portable because the hardware may support some other character set
Version 2:
This may not be the case in every character set, so may fail.
Version 3:
If you want to print the ASCII character set (supposing your system supports it),
char ch;
for(ch=0;ch<=127;ch++)
to your surprise this code segment may not work! The simple reason for this is that the
char may be signed or unsigned by default. If it is signed then ch++ is executed after ch
reaches 127 and rotates back to -128. Thus ch is always smaller than 127.
Exercise 2.2:
Can we use char as ‘tiny’ integer? Justify your answer. If yes, does the fact that
The constants represented inside the single quotes are referred to as character
ANSI C allows multi-byte constants. Since the support from the implementations
may vary, the use of multi-byte constants makes the program non-portable (multi-byte
int ch = ‘xy’;
// This is a multibyte-char
Prefix L signifies that the following is a multi-byte character where long type is
wchar_t ch = L‘xy’;
Exercise 2.3:
But you know that it takes two bytes for a character constant. Then why doesn’t name2
mechanism called multi-byte characters. When used, the runtime environment interprets
implementation defined.
long ch = ‘abcd’;
// character.
Wide character may occupy 16 bits or more and are represented as integers and
wchar_t ch = 'C'; // or
Prefix L indicates that the character is of type wide-character and two bytes are allocated
For the wide-character strings, similar format is to be followed. Here the prefix L
is mandatory.
cannot apply the same string functions for ordinary chars to strings of wide-chars.
For this, ANSI provides equivalent wide character string library functions to plain chars.
For e.g.
wcslen(wideStr)
this is equivalent to strlen() for plain chars and wprintf for printf etc.
You can look it this way. Plain chars take 1-byte and wide-characters normally 2-
bytes. Both co-exist with out problems (as int and long co-exist) and both have similar
Multi-byte characters are different from wide characters. Multi-byte characters are
made-up of multiple single byte characters and are interpreted as a single character at
Library functions support is available for the wide characters but not for the
not much support is available for wide character manipulation for its full-fledged use.
Portability problems will arise by the byte order or by the encoding scheme supported
(say for Unicode UTF). If you want your software to be international, you may need this
facility, but unfortunately, the facilities provided by the wide characters is not adequate.
The run-time library routines for translating between multibyte and wide
n);
this function converts the wide-character string to the multi-byte character string (it
char mbbuf[100];
Similarly,
This function tells number of bytes required to represent the wide-character ‘wc’
ASCII is only for English taking seven bits to represent each character. The other
European languages use extended ASCII that takes 8-bits to represent the characters that
too with lot of problems. The languages such as Japanese, Chinese etc. used a coding
scheme called as Double Byte Coding Scheme (DBCS). Because the character set for
such languages are quite large, complex, and 8-bits are not sufficient to represent such
character sets. For multilingual computing lot of coding schemes proliferated that lead to
lots of inconsistencies. To have a universal coding scheme for all the world languages
(character sets) Unicode was introduced. Unicode takes 16-bits to uniquely represent
each character.
ANSI C inherently supports Unicode in the form of wide characters. Even though
wide-characters are not meant for Unicode they match with the representation of
Unicode.
single bytes. The preceding bytes can modify the meaning of successive bytes and so are
not uniform. They are strictly compiler dependent. Comparatively wide-characters are
uniform and are thus suitable to represent Unicode characters. As I have said, facilities
available for use of wide-characters for Unicode not adequate but is that is the solution
offered by ANSI C.
The execution character set is not necessarily the same as the source character set
used for writing C programs. The execution character set includes all characters in the
source character set as well as the null character, new-line character, backspace,
horizontal tab, vertical tab, carriage return, and escape sequences. The source and
2.4.1.2.5 Trigraphs
Not all characters used in the C source code, like the character ’{’, are
available in all other character sets. The important character set that does
not have these characters to represent is ISO invariant character set. Some
?? #
??( [
??/ \
??) ]
??’ ^
??< {
??! |
??> }
??- ~
Trigraph Sequences
is the most efficient data type in terms of speed. The size of an integer is usually the word
size of the processor, although the compiler is free to choose the size. However, ANSI C
Octal constants (ANSI C) begin with 0 and should not contain the digits 8 or 9.
F (in either case). The constant, which starts with a non-zero number, is a decimal
constant. If the constant is beyond the range of the integer then it is automatically
int i = 12;
int j = 012;
It is not only the beginners who easily forget that 012 and 12 are different and that
the preceding 0 has special meaning. Octal constants start with 0 is certainly non-intuitive
Exercise 2.4:
Have you ever thought of if 0 an octal constant or decimal constant. Does the
Enumeration types are internally represented as integers. Therefore, they can take part in
expressions as if it were of integral type. If the variables of enumeration type are assigned
with a value other than that of its domain the compiler may check it and issue a warning
or error.
The use of enums is superior to the use of integer constants or #defines because the
Exercise 2.5:
Is it possible to have the same size for short, int, long in some machine?
These types can represent the numbers with decimal points. Floats are of single
precision and as the name indicates, doubles are of double precision. The usual size of
All the floating-point types are implicitly signed by definition (so ‘unsigned float’
ANSI C does not specify any representation standard for these types. Still it
implementation. The standard header file <float.h> defines macros that provide
All floating-point operations are done in double precision to reduce the loss in
precision during the evaluation of expressions [Kernighan and Ritchie 1978]. However,
ANSI C suggests that it can be done in single precision itself, as the type conversion may
Since C was originally designed for writing UNIX (system programming), the
nature of its application reduced the necessity for floating point operations. Moreover, in
the hardware of the original and initial implementations of C (PDP-11) floating point
arithmetic was done in double precision only. For writing library functions seemed to be
easy if only one type was handled. For these reasons the library functions involving
mathematics (<math.h>) was done for double types and all the floating point calculations
To some extent it improved efficiency and made the code simple. However, this
most efficient calculations in single precision only. Later the C became popular in
engineering applications which placed great importance on floating point operations. For
these reasons the ANSI made a change that for floating point operations implementations
Although the actual representation may vary with implementations, the most common
The floating point arithmetic was one of the weak points in K&R C. As indicated
previously, one of the changes suggested by the ANSI committee is the recommended
This standard uses 32 bits (4 byte) for representing the floating point. The format
is explained below.
• The next 8 bits are used to store the exponent (e)in the unsigned form
S Exponent Mantissa
3130 2322 0
• The next 11 bits are used to store the exponent (e)in the unsigned form
S Exponent Mantissa
6362 5251 0
2.4.1.5.2.3 Format of Long Double
For long double the IEEE extended double precision standard of 80 bits may be
used.
• The next 15 bits are used to store the exponent (e)in the unsigned form
S Exponent Mantissa
79 78 64 63 0
There are four limits in specifying the floating-point standard. They are minimum
and maximum values that can be represented, the number of decimal digits of precision
and the delta/epsilon value, which specifies the minimal possible change of value that
Care should be taken in using the floating points in equality expressions since
floating values cannot exactly be represented. However, the multiples of 2’s can be
float f1 = 8.0;
double d1 = 8.0;
if(f1 == d1)
printf(“this will certainly be printed”);
if(fp1 == fp2)
// do something
As we have seen, this may not work well (since the values cannot be exactly
represented). Can you think of any other way to check the equality of two floating points
// do something
Where FLT_EPSILON is defined in <float.h> and stands for the smallest possible
change that can be represented in the floating point number. You check for the small
change between the two numbers and if the change is very small you can ignore it and
consider that both are equal. If this still does not work, try casting to double to check the
equality. Of course, this will not work if you want to the values exactly.
The number is a huge number that a float variable cannot contain. So, an overflow
occurs and the behavior is not defined in ANSI C (since it is an undefined behavior, it
may produce an exception or runtime error or even may continue execution silently).
Exercise 2.6:
Is it possible to have the same size for float, double and long double types in some
machine?
scientific notation. In ordinary notation, we will include a decimal point between the
numbers. The scientific notation will be in the form mantissa |E/e| exponent. The
mantissa can optionally contain a decimal point also. A floating-point constant is never
A pointer is capable of holding the address of any memory location. Pointers fall
¾ pointers to objects.
A function pointer is different from data pointers. Data pointers just store plain
address where the variable is located. On the other hand, the function pointers have
several components such as return type of the function and the signature of the function.
constants. Pointer constants are not supported in C because giving the user the ability to
manipulate addresses makes no sense. However, there is one such address that can be
given access to freely. That is NULL pointer constant. This is the only explicit pointer
constant in C.
In DOS (and Windows) based systems, the memory location 0x417 holds the
information about the status of the keyboard keys like CAPS lock, NUM lock etc. The
sixth bit position holds the status of the NUM lock. If that bit is on (1) it means that the
NUM lock is on in the keyboard and 0 means it is off. The program code (non-portable,
if(*kbdptr&32)
else
Here the requirement of pointer constant is there and that role is taken by the integer
constant and the casting simulates a pointer constant to store the address 0x417 in ‘kbptr’.
The aggregate types are composite in nature. They contain other aggregate or
scalar types. Here logically related types are organized at physically adjacent locations. It
consists of array, structure and union types, these will be discussed in detail later.
2.6 Void Type
For the close relationship between the variables and functions, functions are also
Arrays and pointers are sometimes referred to as derived data types because they
are not data types of their own but are of some base data types.
2.9 Incomplete Types
If some information about the type is missing, that will possibly given later is
struct a;
// incomplete type
int i = sizeof(a);
Here the structure ‘a’ is declared and not yet defined. Therefore, ‘a’ is an incomplete
type. The definition may appear in the later part of the code like this:
struct a{ int i };
int i = sizeof(a);
Consider,
stackType fun1();
are function declarations that make use of this feature that the struct stack and stackType
are used before its definition. This serves as an example of the use of forward
declarations.
printf("%d",sizeof(t));
printf("%d",sizeof(TYPE));
In these two examples, it is evident that some information is missing to the compiler and
so it issues some error. Lets now move to the case of pointers, an example for logical
incomplete type, where it is not evident that some information is not available.
int *i = 0x400;
*i = 0;
value may not be available for modification. This is an example for ’Incomplete type’ in
Poin t t o Pon d er
Type specifiers are used to modify the data type’s meaning. They are unsigned,
can use the unsigned type specifier. The idea of having unsigned and signed types
separately started with the requirement of having a larger range of positive values within
The signed on other hand operates in another way, making the MSB to be a sign
bit; it allows the storage of the negative number. It also results in a side effect by
reducing the range of positive values. If we do not explicitly mention whether an integral
type is signed or not, signed is taken as default (except char, which is determined by the
implementation).
The way signed and unsigned data types are implemented is same. The only
The following example finds out if the default type of character type in your
system is signed or unsigned. In addition, the property of arithmetic and logical fill by
char ch1=128;
ch1 >>= 1;
ch2 >>= 1;
printf("Default char type in your system is %s“,
If you are very serious about the portability of the characters, use characters for
the range, which is common for both the unsigned and signed (i.e. the values 0 to 127). If
Unsigned types obey the laws of arithmetic modulo (congruence) 2n, where n is
the number of bits in the representation. So unsigned integral types can never overflow.
However, it is not in the case of floating point types. This is one of the desirable
Exercise 2.7:
main(){
int i= -3,j=i;
i>>=2;
i<<=2;
}
2.10.2 Short and Long
Short, long and int are provided to represent various sizes of possible integral
types supported by the hardware. The ANSI C tells that the size of these types are
implementation defined, but assures that the non-decreasing order of char, short, int, long
is preserved (non-decreasing order means that the sizes are char <= short <= int <= long
If we need to add some special properties to the types we can use the type
qualifiers. The available type qualifiers are const and volatile. ANSI C added these
execution of the program, we can use the const qualifier. An expression evaluating to an
const object should not be used as lvalue. The objects declared are also sometimes called
as symbolic constants.
Constness is a compile time concept. It just ensures that the object is not modified
and is documentation about the idea that it is a non-modifiable one. It helps compiler
The default value for uninitialized const variable is 0. Also if declared as a global
extern int i;
// implicitly initialized to 0.
evaluation).
area = 2 * PI * r;
In this code, the compiler may replace PI with 3.14, which helps creating efficient
code. ( still smarter compilers may treat 2 * 3.14 as a constant expression and evaluate
Note : const is not a command to the compiler, rather it is a guideline that the object
declared as const would not be modified by the user. The compiler is free to impose or
Exercise 2.8:
Can we change the value of the const in the following manner? If yes then what is
*(&constVar) = var?
Exercise 2.9:
What is the difference between the constness as in const int i = 10 and ‘10’?
2.11.2 Volatile Qualifier
a[i] = i++;
Here the optimization part of the compiler may think that the setting of flag to 0 is
repeated 100 times unnecessarily. So it may modify the code such that the effect is as
follows,
a[i] = i;
where both the loops are equivalent. However, the second is optimized version and
executes faster. While making optimization, it assumes that the value of the object will
not change without the knowledge of the compiler. But in some cases, the object may be
modified without the knowledge (control/detection) of the compiler (read about types of
variables in the beginning of the chapter. ‘without knowledge of the compiler’ means it is
an asynchronous object). In those cases, the required objective may not be reached if
the final time later. The code uses a location 0x500 where the current time is updated and
// asynchronous variable
printf(“%d”,currTime);
currTime = *timer;
is executed again and again without any necessity and puts it (optimizes the code)
currTime = *timer;
printf(“%d”,currTime);
In addition, as you can see the problem is that the optimization is made on the
Before seeing another example, lets see what it means to have both const and volatile
Here i is declared as the variable that the program(mer) should not modify but it can be
Let us see another example. Consider that your objective is to access the data
from a serial port. Its port address is stored in a variable and using that you have to read
had optimization be done on the code, the code will look like this.
*portAddress = 255;
readPort() is done immediately (like, if code is available like a = 5; a =10; then the first
Therefore, the optimized code will not work as expected. In these cases use
meaning that the address stored in the portAddress cannot be changed and the value
is done then the object and all its constituents will be left unoptimized.
Other examples for such cases where volatile should be used are:
¾ the memory location whose value is used to get the current time, accessing the
scan-code form a keyboard buffer using its address and in general - memory
mapped devices,
¾ writing code for interrupt handling. There may be some variables that is accessible both
by the interrupt servicing routine (ISR) and the regular code. In such cases the
¾ writing code where multithreading is done. For example, say two threads
access a memory location. Both threads store the value of this variable in a
register for optimization. Since both threads work independently, if one thread
volatile it will not be stored in a register and only one copy will be maintained
threaded environment. There may be shared memory locations and more than
one processor may access and modify the value leading to inconsistent values.
In all such cases volatile must be used to prevent optimizations be done on those
variables.
2.12 Limits of the Arithmetic Types
Limits are the constraints applied in the representation of the data type.
Translation limits specify the constraints, with how the compiler translates the
E.g. ANSI C defines that the compiler should give the support to at least 509
The range of values, which the data type can represent, is specified by the
numerical limits.
The standard header files <limits.h> and <float.h> defines the numerical limits for
However, the values in the <limits.h> define the minimal and maximal limits for
the types. To find out the actual limits in the system that you are working the following
method can be used (although other implementations are possible this implementation
Similarly, the other macro constants can be defined. This is for integer where the
size of integer is implementation defined. But for char the size is already known. So
writing our own versions of CHAR_MIN, CHAR_MAX is direct (But keeping the fact in
the mind that the char implementation could be signed or unsigned by default).
# if (((int)((char)0x80)) <0 )
#else
#endif
Typedefs create new type names. This adds a new name for an existing type
// WORD is char
WORD w1;
// w1 is an char
// WORD is int
WORD w1;
// w1 is an int now
byte b1,b2;
The ability to create new type names using #define and typedef seems to be
similar and of same power. But this similarity is superficial. Lets start with a very simple
example:
// o.k
Because the * applies only to p1 and not to p2. This problem doesn’t arise in the
int var;
#define ptr1 char *
myPtr1 = NULL;
// o.k.
myPtr2 = NULL;
// error !
because ptr2 is of type char pointer. In addition, it shows typedef is not textual
replacement.
The capability of creating a new type name using typedef is superior to #define.
typedef void(*fType)();
i.e. fType is pointer to function with return type void and taking no arguments.
fType myPtr;
myPtr = clrscr;
The strength of the typedef lies in the fact that they are efficiently handled by the
parser. Since #define may result in hard-to-find errors, it is advisable to replace them with
typedefs.
2.13.2 Some Interesting Problems
the code is very simple and direct but the compiler flags an error. Guess why?
the type that is defined by the typedef cannot be used with modifiers. The reason is that
the types declared with typedef are treated as special types by the compiler and not as
textual replacement. (If it were textual replacement, this code should be legal). So
applying short to modify the type numTimes to declare times as short int fails.
To achieve the same result numTimes has to be again typedefed to the new type,
typedefs also have some peculiarly qualities, for example, a typedef can declare a
is valid!
issues an error stating that “l-value specifies a constant object”. What went wrong?
which states str is a const pointer to a character, and so an attempt to change it using the
assignment,
str = “another”;
flags an error.
To force what the programmer intended to do, the code should be modified as
follows,
str = "another";
This is an another example to show that the typedef is not the same as the textual
replacement as in #define.
As I have said, typedef names share the name space with ordinary identifiers.
Therefore, a program can have a typedef name and a local-scope identifier by the same
name.
typedef char T;
int main()
T T=10;
printf("%d",T);
Here the compiler can distinguish between the type name and the variable name.
One more interesting problem arises with typedefs because of this property.
typedef char T;
you want to declare a const int variable named as T in the same scope as type T:
But you know that when the type name is missing in a declaration, the type int is
assumed. So you can write this declaration of const int variable T like this:
const T = 10;
But you know that the name T also stands for char type since that name is typedefined.
const char;
where the compiler thinks that variable name is missing. So it issues an error.
Therefore, the rule is that, when declaring a local-scope identifier by the same
Is there any possibility that sizeof (typeOrObject) operator return value 0 as the
struct _iobuf {
char *_ptr;
int _cnt;
char *_base;
int _flag;
int _file;
int _charbuf;
int _bufsiz;
char *_tmpfname;
The detail behind FILE is abstracted and the user uses FILE freely as if it is a
datatype.
Typedefs may be useful in cases like this and particularly in the complex declarations.
Consider that your requirement is to access the video buffer to manipulate screen.
#else
#endif
#define ROWS 25
#define COLS 80
typedef struct {
}unit;
SCREEN[xPos][yPos].ch = ch;
SCREEN[xPos][yPos].attr = attr;
here typedefs abstract the detail of the type, and allows freely creating and manipulating
Typedef helps in increasing the portability! One of the main reasons for using
typedef is to make it easy for a program to be more portable with no or minimal changes.
The types declared by typedefs can be changed according to the target machine by the
implementation.
notice that strlen returns size_t. In our compiler it was defined in <stdio.h> as,
other examples are clock_t and time_t defined in <time.h>. One such implementation is,
typedef long time_t
The choice is made based on the target machine and the compiler. Therefore, it
effectively suits the needs providing portability across platforms without requiring the
source code leaving untouched of change. In other words, if typedefs are used for data
types that may be machine dependent, only the typedefs need change when the program
is moved. If any up-gradation of the software is needed it is enough that the typedef is
changed.
when it is needed to be used in huge arrays can be modified and used as,
Exercise 2.11:
What is year 2038 problem? (hint: it is related to typedefing time_t discussed here)
Two types are said to be equivalent if they have same type attributes.
int a;
dummy b;
auto c;
}
types are said to be of equivalent types if there is structural compatibility (in C).
Structural compatibility applies for all types except structures, and for structures, name
struct div_t {
int quot;
int rem;
}div1 = {1,2};
struct t_div {
int quot;
int rem;
}div2;
div2 = div1;
As you can see in this example, the two structures only differ by name and so cannot be
particular type. It is not a strongly typed language3. We say it is not strongly typed
because, the variables can be freely assigned with the variables of other types and strict
type checking is not made. For e.g., it is almost universal among the C programmers to
int getchar( );
without explicit cast. Similarly conversions from void * to other pointer types are
Casting should not be done to just to escape from the errors and warnings issued by the
compiler. The compiler actually wants to warn you that there is possibility of loss of
information during conversion and to make sure that you really know what you want to
do. Explicit casting is the more powerful (than assignment and method invocation
3
It must be said here that viewpoints differ on whether C is strongly typed or not. The reader is encouraged
to adopt the viewpoint that he/she is most comfortable with. I strongly suggest to treat C as not a strongly
typed language and that is the viewpoint adopted and accepted widely.
conversions) and forces the compiler to change the type. It also improves the readability
Casting sometimes indicates the bad design. But in some cases, casting is
casting.
Let us start with the mistakes the novice C programmers make. Consider the
float percentage;
marksObtained = 973;
totalMarks = 1000;
The programmer expects the program to print 97.30%, and is disappointed to see
that it prints 0%. This is because the division operator performs an integer division that
What went wrong? In the above example, marksObtained and totalMarks both
are integers. So integer arithmetic is made on it resulting in 0 (of course, the resultant 0
percentage = (float)marksObtained/totalMarks*100;
2.17 Implicit Conversions
Since the compiler automatically does implicit conversions, they may result in hard-to-
char ch;
// skip
If the char is unsigned by default, then the loop may not terminate. Because
comparing for equality of EOF (usually -1), with an unsigned number yields a false.
The problem will be solved if the variable ‘ch’ is declared as an integer, which is
signed.
LONG DOUBLE
DO UBLE
FLO AT
UNSIGNED LO NG INT
LONG INT
UNSIGNED INT
INT
They are conversions from short, char, enum, and bitfields to int and float to
double. These promotions do not lose any information regarding sign or order of
magnitude (but for unsigned types read about unsigned & value preserving). These
conversions may occur to improve efficiency (for e.g. integral types are internally
represented as integers).
unsigned chars and unsigned shorts are involved, integral promotion takes
place to accommodate the resulting value. Based on the resulting type, signed
int or unsigned int, the results may differ considerably. This issue of
preserving.
else
printf(“%u”, result);
In this case, the unsigned and signed constants are mixed. So the signed value is
promoted to unsigned resulting in a very large value. So always, the else part is executed.
You can force sign preserving by giving the U/u suffix for operands.( e.g. 2u-3u).
To solve this problem of signed and unsigned mixing, either cast the signed to
unsigned if you are sure that it cannot take negative values. Or else, cast unsigned to
signed if its value cannot exceed INT_MAX. This will help eliminate nasty bugs due to
this problem.
Exercise 2.12:
int i = -1;
printf(“%d”,i);
2.17.4 Assignment Conversions
the type of the LHS if they are not the same. Either truncation or widening of values
occurs.
operands to be of specific types. These conversions take place when the given type does
not match with the required type. They follow data type hierarchy.
When the formal parameter and the actual parameter differ in type, the actual
parameter is automatically converted into actual parameter’s type. The same rule applies
floating-point variable is assigned with an integer constant, it need not exactly represent it
displayed” ;
printf(“%s”, number);
If you run this code, some of you will be amazed to see the string printed on the
screen.
int when they involve in expressions and sign-preserving conversions are made.
Conversions from one pointer type to another and back results in no loss of information.
However, conversions from pointer type to int/long and back is not assured (it is also
wrong to assume that always the size of int is enough to hold the pointer). This is what
[Kernighan and Ritchie 1988] puts in words, "...A pointer can be assigned to
any integer value, or a pointer value of another type. This can cause address exceptions
when moved to another machine". [ANSI C 1988] also tells the same, but in other
The freedom given by C in many cases like the one given here may be harmful to
portability of your program (one exception for this case is if the integer value is 0 which can
be freely assigned).
char * c =NULL;
somePointer = NULL;
without explicit casting (this is another place where the type are freely mixed
Generic pointer type void* may be used as an intermediate for conversion from
one pointer to another. It may also be used in the cases when the pointer type is not
known.
Exercise 2.13:
int (*fp)();
Any object can be converted to void. This conversion is to explicitly specify that
the value is discarded. Functions with return type void specifies the discarding value of
2.17.10 No Conversions
types occur.
3 DECLARATIONS AND INITIALIZATIONS
extern int i;
is an external declaration that does not reserve or associate any space with. Various types
¾ Explicit Declarations
These are declarations made explicitly in the program with no extra information
assumed.
¾ Implicit Declarations
These are default assumptions made by the compiler when type or relative
information is missing.
¾ External Declarations
External declarations specify that the object names are declared in some other
¾ Duplicate Declarations
Declarations for object names can be repeated without changing the meaning.
Forward references are using of variables before their declarations. In C, they are
their declaration.
So declarations like,
are valid.
goto end;
….
….
end :
over( );
int main()
fun();
}
at the point the name fun is encountered the name is not known to the compiler.
Since it resembles a function call, it assumes it to be the name of the function and
generates code for calling that function. In other words, it assumes it to be declared as,
¾ fun() takes any number of arguments (declaration fun() doesn’t mean that it
Knowing these details of what the compiler assumes for the forward declaration of
functions, now you can reason it out why the following code makes compiler issue error
int main()
fun();
float fun()
return 1.0;
Since the compiler assumes that return type of fun() is integer and later its assumption is
contradicted by the definition of the fun() that it returns float it issues an error.
Writing code that depends on the implicit assumptions due to forward references
int main()
int i = fun();
// garbage value in i.
fun(int i)
printf(“%d”,i);
this is an example of how such assumptions may lead to buggy code and such code
In short do not depend on the forward declaration of functions and in such cases
declare them explicitly so that the intention will be clear to both the compiler and the
reader of the program. This will also help the compiler to catch errors due wrong number
// declaration.
int main()
{
float f = fun(10);
float fun(int i)
return (float)i;
3.3 Definition
address.
definition. In the later parts of a program, if that variable is defined then that tentative
int i;
// contents of source.h
int i;
int i;
main( )
{ }
int i;
// end of source.h
One definition rule states that there may be any number of declarations for a
variable / function but there can be only one definition for that variable / function which
ODR seems to be easy and straightforward but is tough for the compiler to follow in
the case of multi-file programs, because it must keep track of so many declarations and
definitions and check for type equivalence. The problems associated with ODR crop up only
Declaration introduces a name into the program and a definition introduces some
code.
3.3.3 Difference between declarations and definitions
name with its type and space. So the definition performs the function of the declaration
also. The aim of the declaration is to introduce to the compiler of the type associated with
There can be any number of declarations for a name and all the declarations must
agree on the type of the entity referred to (also refer to ‘type equivalence’), whereas a
Declarations and definitions are distinct in the cases of struct, union, extern and
typedef. They are declared separately and variables are defined later.
struct anything{
int i,j;
};
//definition
extern int i;
// i does not reserve any space but can be used in the code.
Difference between the declaration and definition is still more evident in case of
functions,
return a+b;
Although typedef stands for “type definition”, typedef always is a declaration (of
types).
As you can see the INT is only an information to the compiler and reserves no
space (except that the name is entered into the symbol table). So typedefs are and can
only be declarations.
int i=0;
This is a definition (extern keyword is ignored in this case). Since the initial value
is specified, this information takes precedence, i get space allocated and initialized to 10.
3.3.4 Declarators
int i,j;
Here the variables i and j are said to be declarators i.e. the variables that are
the program code and used from the point where the declaration/definition is made
So declarations like,
int i = 0, j = i, k = j;
are valid. These examples are meaningful and works correctly but not the following:
int a = 10;
int b = b + a;
int main()
// at this point.
void fun()
int main()
// o.k. The compiler now knows variable name ’global at this point.
int* i,j;
declares i as integer pointer and j as an integer. In short the * declared is applied only to
the immediate declarator. The space between the * and declarator doesn’t make any
difference,
int *i, j;
int (*i), j;
Note that this also becomes an example for redundant parenthesis, where parenthesis is
used for grouping. This may help and be clearer to a novice programmer.
In short take care to make sure that you declare the declarators correctly when more
are legal. The order may change and makes no difference. So,
..... and all the 32 possible combinations are equivalent. Only legal combinations
are allowed. Legal combinations mean that the semantics of the declaration/definition
declarations like this makes no sense and will flag error. Declarators occur only at the end
Whenever you are declaring a variable, you automatically determine the scope,
lifetime and visibility. These three are important issues associated with any variable.
Understanding the proper meaning and usage of these concepts is essential for good
programming.
3.6 Scope
Scope is defined as the area in which the object is active. There are five scopes in
C. They are,
They are declarations at the top most layers. They are available up to the life of a
program. All the functions have this scope. This is otherwise known as global scope.
It has scope such that it may be accessed from that point to the end of the file.
void dummy(void) { }
Only labels have this scope. In this scope, they are active up to end of the
function.
void printFun()
print:
int main()
int i=1,j=2;
if(i < j)
goto print;
This code will be flagged error by the compiler saying that the label print is not known
because labels have only function scope. If you have to jump unconditionally between the
Declarations that are active up to the end of the block (where a block is defined as
statements within { }). All the declarations inside the function have only block scope.
int c;
int d;
As I have said, function scope applies only to labels. So should not be confused
with block scope. The function arguments are treated as if they were declared at the
beginning of the block with other variables (remember that the function body is also
considered as a block within { }). So the function arguments have block scope (not
function scope).
Local scope is general usage to refer the scope that is either function or block
scope.
They are having the scope only inside the prototype declaration. This scope is
interesting because the variable names are valid only in the declaration of prototype and
does not conflict with other variable names. It exists for very little time and of less use
int temp;
if(a < b)
goto ReturnB;
return a;
ReturnB:
return b;
}
3.6.6 Selecting Minimal Scope
When a name has to be resolved, that name is searched in the minimal scope, and
if that is not available, it is searched at higher levels of scope. So, if a variable has to be
declared, you have to select a minimal scope possible. If you can limit your scope, that
increases the efficiency, readability and maintainability of your program. If you want a
variable which is not useful outside a block, declare it inside the block and not in the
outer ones. Similarly, if you want a variable whose value will be accessed only within the
function but has to retain the value between the function calls, opt for static variable to a
global one.
Exercise 3.1
What is lexical scope and is it related with the scoping discussed here?
It is the period of time the object allocated space ‘lives’. i.e. it is the time in which
int h = 10;
int foo()
int i;
// automatic lifetime
// dynamic lifetime
int k = 10;
// automatic lifetime
j = &k;
printf(“%d”, *j);
global variables.
int foo(int i)
return j+1;
when happens when called with foo(1)? Returns value 2 to the calling function. (It does
For the first time the function foo is called with argument as 1, the control sees
there is a static variable j that is to be initialized. For that, it calls foo again with i value
1. Now for the second time the control enters the function foo since static variable is
called already for initialization, it executes return statement. All static variables by
default are initialized to 0 so it returns 0+1(=1) as the value of j. The function return
backs to initialize the j to 1. Finally, foo returns to the calling function with value 2.
The object lives (i.e. storage remains) up to termination of the function in which it
The life is created (storage allocated) and exited (storage deleted) by the whims of
provided in standard header files are not considered as part of the language. Even if
dynamic lifetime makes sense that is why some authors do not consider dynamic lifetime
as a type of lifetime in C.
3.8 Visibility
code. An object association may be ‘hidden’ from other object associations due to name
spaces.
e.g. {
float i;
int i;
Here name i with associated with float object is hidden inside the block due to
Visibility must not be confused with scope. Although both concepts are related,
float i;
int i;
availability if the variable. In the above example float i is still available inside the inner
int foo()
int * iPtr;
int i = 10;
iPtr = &i;
printf(“%d”, *iPtr);
// prints
// 10
function, the stack-frame gets erased and so all the local data for that function is
destroyed. However, it has to be remembered that the space is not erased as the block is
exited although the scope of that variable is over (hence block scope).
In this example, ‘i’ has block-scope. Lifetime of i does not end with the exit of
block. In other words, the space for ‘i’ remains allocated even after the ‘}’ of the block is
Note: Although this code works without any problem, the following one may not:
int foo()
int * iPtr;
int i = 10;
iPtr = &i;
int k = 20;
printf(“%d”, *iPtr);
// may print
// 10 or 20
// in my system it printed 20
}
This is because the some optimizing compilers may reallocate the memory of
variable ‘i’ to ‘k’ because the block is exited. Due to such reallocations while
If variable names are available afresh to use and independent of other associations
they are called as name spaces. To term in other words, name-space is the scope within
cases,
struct name;
union name;
int name;
name :
int name;
foo:
printf(“%d”,foo);
is a legal code that shows how the namespaces can be and still make sense.
Since compiler can keep track of the list of different variables in separate lists,
there exists no ambiguity in differentiating between the usage of the variables. This is
is valid because name is overloaded in three different name spaces. We say name is
d1.name = 10;
name.name = 10;
in all three cases the usage makes clear distinction for the compiler of which name is
which and hence said that they lie in different namespaces. In other words, it should be
clearly resolvable to the compiler by context without ambiguity. If any ambiguity arises,
an error is flagged.
typedef struct dummy {int dummy ;} dummy;
dummy dummy;
error is flagged here because the tagnames created by typedefs share same namespace as
variable names.
3.10 Linka ge
It is creating the corresponding link between the actual declaration and the
3.10.1 No Linkage
No linkage indicates that the name is resolved at context itself and no linkage is
necessary to be done by the compiler. The linker has no job working on this so said to
int i;
a[i]=10;
}
It specifies that the name that should be linked is within the current translation
unit (say current file). So it simplifies the job because it knows that the definition is
available in that file itself and so does not have to look in other files.
int i;
main()
printf(“%d”,i);
here the variable name i is of file scope, so is available only in the current translation
unit. So linker searches and links with the name only within the translation unit.
This indicates that the definition available for the corresponding declaration(s)
can be available anywhere outside the current translation unit. It as the keyword extern
indicates that it is external to the compiler and the current translation unit, and so is
#include “somefile.h”
int main()
{
out();
The linker has to resolve the name ‘out’ that is not defined in the current
Exercise 3.2:
Exercise 3.3:
What are static, global and dynamic linkages and early and late binding in
Variables can be put into some combination of scope, lifetime and visibility by
using some keywords and placing them appropriately and this is called as storage class of
the variable.
Broadly speaking there are only two storage classes. Auto and static. Other
3.11.1 Static
Depending on the context where the static declaration is used it can give two
different meanings.
1. If declared at global scope it means that the variable has file scope.
E.g.1
static myprintf( ); // only file scope
declares that myprintf function is having file scope and is not accessible to other files.
E.g. 2.
As a whole, the static keyword helps in limiting the variables to file scope such
that collision of names at time of linkage is avoided. In other words, static limits
A static variable (with local or block scope) can be initialized with the address of
any external or static item, but not with the address of another auto item. Because the
address of an auto item is not a constant since auto items are created in stacks and erased
3.11.2 Extern
¾ forward declarations,
The first use of extern is to explicitly specify that the variable is a candidate for
external linkage.
link-time.
For declaring a forward declaration for a structure, declare only the name, say,
struct some; and later you can give the body. But this is not possible if you want to
extern int i;
...
int i = 10;
3.11.3 Auto
instructive to compiler but for you to just remember that it is of automatic type.
Practically it serves no purpose (of course you know that the variables declared inside
functions are automatic and telling explicitly that the variable is auto doesn’t help you
anyway).
3.11.4 Register
You can specify a local variable as register if you think that variable may be
accessed frequently. The compiler will try to put that variable in a microprocessor
register if one is available. Else this keyword is ignored and treated as if it is declared as
auto. Declaring such frequently used variables to be placed in registers may only lead to
small performance improvement. Modern compilers will easily find out the variables that
will be frequently accessed and will place them accordingly. This is a deprecated feature
It is illegal to apply ‘&’ operator to a register variable. Why because, both the
pointer that stores the address of the variable and the value in that address have to be in
the memory for applying & and * operators. Then only these operations remain
meaningful and valid. However, register variables may not be in active memory when the
& operator applied and so C prohibits applying this operator on register variables.
register int i;
arrayAccess[i];
is not available for placing it in memory it may at-least place the variable in cache
of two.
Exercise 3.4:
After reading this paragraph, can you reason it out why the register variables
cannot be global/static?
Exercise 3.5:
If variables of local scope are not given, a storage auto class is assumed. For
variables outside functions, extern is assumed (not static). This applies to undeclared
It is not possible to have more than one storage class specifier in a declaration.
Initially this question may not make any sense at all to you because all of us know
that typedef is concerned with types and it has nothing to do with storage classes. First
Yes. typedef is also considered as a storage class specifier. As the grammar for C
specifies,
storage-class-specifier : one of
under storage class specifier is purely for syntactic convenience. Here syntactic
convenience means,
¾ to list typedef with storage class specifiers makes grammar simple; or else,
¾ It makes easy to check the rules like, no two storage class specifiers can occur
in a declaration.
// doesn’t make any sense. how can a type name cannot have storage
// class specifier
final authority for answering such questions because it is the one that describes the
language itself. There are few instances where only going through grammar of language
give you final answer and that cannot fail. Another such example is complex
declarations. If you have any argument in finding the meaning of a complex declaration,
C is infamous for its complicated declaration syntax where the beginner will
certainly stumble. Complex declarations are hard to decode and understand. However, it
is an essential area to master because you may require it in real programming
applications.
[Binsock 1987].“Take any declaration, start with the innermost parenthesis (in the
absence of parenthesis, start with the declared item’s name), and work clockwise through
Consider,
void (*x[10]) ( );
As the rule says start from the innermost parenthesis, take only (*x[10]) now. In
that part start with the variable name, here it is x. Work clockwise and read it. x is an
array[10] of pointer. Then continue reading remaining parts. ... to function returning type
void. Reading it fully, it comes out that; x is an array[10] of pointer to function returning
type void.
This rule works for all declarations of [Kernighan and Ritchie 1978]
However, when these qualifiers (const & volatile) are involved there is no
‘shortcut’ for reading the declaration. Even though short cuts can help for some extent, its
better to rely on grammar to help (which is the formal way and can never fail). The
productions associated with this problem of decoding the complex declarations are,
declarator :
direct-declarator :
identifier
(declarator)
direct-declarator [ constant-expression(opt) ]
direct-declarator (identifier-list(opt))
Listed here are two productions for non-terminals declarator and direct-declarator.
Non-terminals mean that they do not make part of the result but they are tools for getting
the result. In the productions, the other non-terminals are constant-expression and
parameter-type-list (which mean that they in turn will have other productions and they
will be in the LHS). (opt) specifies that the particular part is optional.
To parse any complex declarations, just start with the non-terminal declarator and
replace that non-terminal with the RHS of the production. In a production, if there are
more than one alternative to choose from RHS, (as in the case of direct-declarator) select
the best match that fits your input. Go on replacing the non-terminals until all the non-
terminals are replaced with terminals, and the process of replacing is the required result
for us.
Initialization is setting the values in the object at the time when the objects are
allocated space.
Global variables are initialized implicitly. They are initialized to zero (since the
float f;
// f = 0.0
int i;
// i=0
// b.a=0, b.b=0.0
double * i;
// i = NULL
int a[3];
again like,
int i=0;
The same rules apply for static variables as for the global variables (for example,
Local variables on the other hand are not initialized automatically since they are
allocated space in stack frames. Therefore, care should be taken to initialize them
this will issue an error if this declaration is in global scope but no problem if this is in
block scope. This is possible because the creation and so initialization of the local
3.13.3 Arrays
arr[i] = 0;
gives the curt way to initialize all elements to zero. This technique for
initialization is possible at the point of declaration only. In those cases memset() comes
handy.
It sets n bytes of str to character ch. The difference is that the initialization with
{0} is done at compile time and memory set by memset is at runtime. So it is efficient
Arrays can be initialized with values given within curly braces. If the initialization
list contains members less than the available members do, the remaining members are
initialized to zero.
It is an error to initialize an array with more than the possible elements. One
exception for this rule is character arrays (because the objective may be to have not a
Interestingly enough the [Kernighan and Ritchie 1988] did not have
option for array initialization inside the functions. The reason was that the stack frames are
allocated at runtime and if the arrays have to be initialized, the code has to be generated for
that.
int fun()
The code generated is bulky; that’s why the initial [Kernighan and Ritchie
1988] did not have that feature. Later [ANSI C 1988] added this feature although it
may postpone allocating memory until the variable is referenced once. This gives rise to
the concept of early and late initialization. Practically in C late initializations can be
treated as assignments.
int i = 10, j ;
// early initialization
...
j = 10;
late initialization.
Depending on the situation, you can follow either early or late initialization.
However, it is easy to declare a variable and forget to initialize it before using it. So it is
initialization. Assignment is storing a value after the memory location is created. This is
the primary difference. Another difference is that, the target of initialization is always an
In case of static and global variables, the difference is that the initialization values
are stored in the areas like initialized data segment (as you have seen in the structure of a
C program in memory). The compiler does this work, whereas code is generated for the
assignment statements. This makes little efficiency difference between initialization and
assignment.
int j;
...
j = 10;
and assignment in most of the cases and can be used interchangeably. An example for
this is the negligible difference between storing the value 10 into i and j we saw now.
Lets see few cases where only initialization can be done and not the assignment.
// and
const int i;
i = 10;
Since the read only variables can only be initialized and the assignment happens
// and
char str[];
str = “string”;
3. The case when the structure/union contains a const member. Consider the
code:
struct temp{
const int i;
};
t = s;
t.i = 21;
The compiler can easily find the difference between the function declaration and a
call if the statement contains any of the keywords like int, static etc.
is clearly a function declaration because the function call can only contain the variable
clrscr();
int main()
{
// something here
in this case the clrscr() is the declaration of the function that has implicit assumption of
the int type as return type and that the function can take any number of arguments.
When it is not clear if the function declaration or a function call, the compiler
gives the benefit of doubt to the function call and thus resolves to function call as in:
int main()
clrscr();
In this case, as you can see, 1) this can be a function declaration, 2) it is a function call
- Janet Lane
expression is a set of data objects and operators that can be evaluated. Any expression in C
An object is a region of memory that can be examined and stored into. A lvalue is
an expression that refers to an object in such a way that the object may be altered as well
as examined. Traditionally lvalues are variables that appear on the LHS of = operator.
enumeration types.
However, not every lvalue may be used for assignment. They are non-modifiable
lvalues. A function name is an lvalue, however you cannot modify it, so is a non-
modifiable lvalue. Another example is an array name, which is a pointer constant and
cannot be modified.
called the rvalue. It can be used on the RHS of an expression. Sometimes rvalue is the
Try the expression like ++a++, and you will get a compiler error. This is because,
it is treated as ++(a++) (since the postfix ++ takes precedence to the prefix ++). a++ is a
Note: This example assumes that ‘a’ is just a lvalue and it is not of any pointer type.
p1=p;
while(*p!=’\0’)
++*p++;
Here the expression ++*p++ is perfectly valid. (*p++) yields an lvalue and prefix
Exercise 4.1:
Comment on the following code:
struct constStruct{
int i;
const int j;
};
int main()
temp1.i = 10;
temp2.j = 10;
In general, do you agree with the statements ”for structures and unions to be
modifiable l-values, they must not have any members with the const attribute” and
Exercise 4.2:
Both the non-modifiable lvalues and rvalues don’t allow assignment to be done on
have special meanings in C source text and helps the compiler to understand the program.
For example : (colon) is a punctuator (and not an operator) that comes after labels and
case statements in switch, that helps the compiler in understanding the intended meaning
of the program. Certain operators also work as punctuators depending on the context. An
example is () that helps to group the expressions. Here it acts as punctuator. It acts as
expressions and when used in for loops or function declarations work as punctuator
punctuator : one of
[ ] ( ) { } * , : = ; ... #
and & are the operators that have be used both as unary and binary operators. ?: is the
only operator that takes three operands (thats why it is ‘ternary’ operator).
Exercise 4.3:
What is the ‘arity’ of , (comma) operator?
The value of name depends on its data type, which is determined by the
declaration of that name. The name itself is considered as an expression and is an lvalue.
A literal is a numeric constant, when executed as an expression yields that constant as its
value.
(), [], . (dot) , -> operators come in the top of the operator precedence hierarchy.
followed by a closing parenthesis. The value of this is the value of the enclosed
expression, and will be an lvalue if and only if the enclosed expression is an lvalue. The
presence of parenthesis does not affect whether the expression is an lvalue or not.
Component selection operators are used to access fields of the structure and union
struct student{
int rollNo;
char name[10];
}*studentPtr, student;
syntax of using parenthesis every time to dereference the member. To avoid this, ->
¾ Function Calls
The postfix operator ++/-- performs these operations respectively, a side effect
producing operation. The operand must be an lvalue and may be of any of integral type.
The constant 1 is added/subtracted to the operand, modifying it. The result is the old
value of the operand before it was incremented/decremented. This value is not an lvalue.
¾ Type Casting
A cast expression causes the operand value to be converted to the type named
within the parenthesis. The result is not an lvalue. Some implementations in C ignore
certain casts whose only effect is to make a value narrower than normal.
¾ Unary Minus/Plus
The unary operator - computes the arithmetic negation of its operand. The operand
may be any of the arithmetic type. The usual unary conversions are performed on the
I.e. -x = 1 – x
Unary plus operator is introduced in ANSI C for symmetry with unary minus. It is
the only dummy operator in C (dummy operator means neither it will have effect in the
program nor the compiler will generate any code). This is due to the multiplicative
applied to 10000)?
For finding the answer lets experiment with some sample values.
// in my system sizeof(int) == 2
printf(“%d”, sizeof(32767));
printf(“%d”, sizeof(32768));
The similar logic can be applied to the negative values also. We don’t know if negative
printf(“%d”, sizeof(-32767));
// a constant expression
printf(“%d”, sizeof(-32768));
// constant expression.
So the result of from this experiment is that the – (minus) sign in the constants
If you want to take advantage that the value -32768 be represented in integer itself,
¾ Negation
The unary operator ! computes the logical negation of its operand. The operand
may be any of the scalar type. The result of the operator is of type int.
The unary operator ~ computes the bitwise negation of its operand. The operand
may be any of the integral type. Every bit in the binary representation result is the inverse
The following example shows the relationship between the – (unary minus) and ~
(bit-wise negation) operators. These relations are due to the nature of two's complement
arithmetic.
int i =10;
~a = -1 ^ a
(the reason being that -1 is represented as all 1’s and ex-or-ing it with the variable
The unary operator & returns a pointer to its operand, which must be a lvalue.
None of the usual conversions are relevant to the & operator and its result is never an
lvalue.
¾ Indirection
operator is the inverse of & and vice-versa. The (converted) operand must be a pointer
and the result is an lvalue referring to the object which the operand points.
int i;
j = *&*&i;
// assigns i to j
Here & operator creates a reference and * dereferences it. This sequence can be
applied any number of times and can be assigned until the types of LHS and RHS agree.
¾ Pre Increment/Decrement
The unary operator ++/-- performs these operations respectively, a side effect
producing operation. The operand must be an lvalue and may be of any of integral type
(That is why expressions such as ++2 are not valid). The constant 1 is added/subtracted
to the operand and the result stored back in the lvalue. The result is the new value of the
i = ! ~ - sizeof(i) ;
The unary operators evaluate from right to left. In this case, sizeof(i) => 2. Then
4.4.1 Sizeof
The sizeof operator is used to obtain the size of a type or data object. Sizeof
C language is the language in which the size of a data type may differ according
to the underlying machine and to tackle this problem in a portable way this operator is
required.
¾ it also is a keyword
¾ function type
¾ type void
int i = sizeof(int);
// o.k. paranthesized type name
Similarly,
int fun()
return 0;
int main()
printf(“%d”, sizeof(foo));
// error.
printf(“%d”, sizeof(foo());
// when function calls are involved, its type is its return type.
compile time by examining the type of the objects in the operand, thus the expression
itself is not compiled into executable code. In other words the expression within the
sizeof operator are not evaluated but parsed for finding its type. Hence, any side effects
that might be produced by the execution will not take place. For e.g.
k = sizeof( j++);
When array name is applied to sizeof operator then the result is the size of the
When the string constant is given as parameter, it returns the number of characters
in that string.
sizeof("abcd");
int arr[10];
int));
// prints 10
sizeof can take part in constant expressions. This is because it is evaluated and replaced at
compile time.
Exercise 4.4:
main(){
int arr[10];
Exercise 4.5:
void fun();
printf(“%d”, sizeof(foo());
¾ Multiplicative
The multiplicative operators are *, /, %. The operands are of arithmetic type for *
and / while integral type for %. / computes quotient, whereas % computes remainder.
Condition 1:
a = (a/b) * b + (a%b)
Condition 2:
abs( a % b) < abs(b)
No confusion arises if both the operands are positive. If any of the operands are
a % b == a & (b-1)
¾ Additive
one is arithmetic and another is pointer. No other operands are allowed. Both the
operands may be of type pointers or arithmetic type for subtraction. The following is a
famous example for swapping two values without using any temporary variables:
*i = *i + *j;
*j = *i – *j;
*i = *i – *j;
However, such examples of swapping the two variables serve only for aesthetic
#define SIZE 5
int i;
swap(&arr[i], &arr[SIZE-i-1]);
printf(“%d “,arr[i]);
// prints 1,2,0,4,5
The problem is not with the array; it is with the swapping function. When elements to be
swapped happens to refer to the same location, the swapping fails. You can verify this
int i = 1;
swap(&i, &i);
printf(“%d”, i);
// prints 0 !!
Note that this problem would have gone unnoticed with testing done only on arrays
The moral is that, proper testing should be made and most of the possible cases (if
¾ Shift
The binary operator << indicates a left shift and >> indicates a right shift. They
are left associative. The operands for these must be of integral type. The usual
conversions are performed separately on each operand and the type of the result is that of
The value of exp1 << exp2 is exp1 left-shifted exp2 bits; in absence overflow the
equivalent is multiplied by exp2. In case of exp1 >> exp2 the value is exp1 right-shifted
negative value. If exp1 is a signed number, the result depends if the implementation
int i = -1;
i >>= 1;
If the bit indicated by ? is filled with 1, the value of the sign bit, we say it is
arithmetic shift. If, by default, it is filled with 0s in the vacated space, we call it as logical
shift.
• Arithmetic Shift
The MSB is filled with the copy of sign bit, preserving the sign of the value.
• Logical Shift
The MSB is filled with zero thus modifying the sign of the value.
So, it is always safer to use unsigned short or int for right shift.
In >> the right operand should not be larger than the size of the object.
E.g. 1 >> 17
char ch1=128;
ch1 >>= 1;
ch2 >>= 1;
if(ch1 == ch2)
unsigned");
else
")
The operators < <= > >= indicate comparison. Either the operands may be of
arithmetic type or both may be of the same pointer type. The result is either 0 or 1. For
pointer operands, the result depends on the relative location within the address space of the
two objects pointed to; the result is portable only if the object pointed to lie within the same
array.
sense (the aim is to see if i is smaller than j and is in-turn smaller than k).
This expression is also a valid C expression but yields an unlikely result. For (i<j)
result is either 1 or 0. This resulting value is compared with k which yields wrong result
¾ Equality
The operators == and != indicate equality comparison. Either the operands may be
both of arithmetic type or both may be of the same pointer type, or one of the operand
may be a pointer and the other a constant integer expression with value 0. The result is
¾ Bitwise
integral type. The result is that of the converted operands. The operators are &, ^ and |.
int x=1;
x &= 0xFFFE;
However, this suffers a portability problem. This setting assumes that your
machine has the integer size of 2 bytes (hence it clears all the bits remaining)
x &= ~0x1;
This does not assume anything about the internal int size and ports well.
¾ Logical
The operators && and | | are used for binary logical operations. They accept
operands of any scalar type. There is no constraint between the types of the two
The Boolean expression involving these operators are executed only as far as it is
required to determine the truth-value of the expression. It derives from the logical fact
that there is no point in continuing evaluating the truth-value of the Boolean expression
once it is determined and cannot change after that. So the remaining expression is
int i = 0;
if(++i==1 || i++==1)
printf(“i=%d”,i);
// prints i=1
i = 0;
if(++i==0 && i++==1)
else
printf(“i=%d”,i);
// prints i=1
Take the first case: ++i==1 is true, and the following operator is ||. (true || anything) is
true. So the remaining expression will not be evaluated. So i++==1 is not executed and
the result is evident by the output from printf statement. Similarly, i++ is false and the
following operator is &&. Its truth-value is determined (false && anything) is false. The
remaining expression is not evaluated and this is evident by the value of i in printf
statement.
Conditional operator has right to left associativity. This can be shown with an
------2------- ------3------
((a>b) ? ((a>c) ? a : c) : ((b>c) ? b : c))
-------------------------1-----------------
The numberings and parenthesis show which ‘?’ is linked to which ‘:’.
((a>b) ? a : b) = 1;
is valid.
The ‘assert’ macro is used to verify certain conditions in the program and report if
the condition fails and so acts as a debugging tool. The advantage of using the assert is
that defining NDEBUG will remove all the assert statements from the source code. Even
after debugging is over the assert statements become a documentation for the conditions
#ifndef NDEBUG
#else
#define assert(cond)
#endif
Using conditional operator in the place of if has the advantage of being very curt
and does not have the problem of ’if matching nearest else’ is avoided. To elaborate,
consider it were implemented using if then there is a chance that it will cause problems
int i = 0;
if(i==0)
else
// this printed
The bug is in the if condition of assert macro. The else part in which the prinf
statement is available becomes the else of the if condition of assert macro. When the
condition of assert becomes true the else part is executed, leading to bug in the code.
Placing the if statement of assert macro inside a block is the usual method to
#define assert(cond) { \
if(!(cond)) \
(fprintf(stderr, "assertion failed: %s, file %s,\
But the caller calls the assert macro ending with an semi-colon. In macro
expansion the combination }; becomes available leading to the error “illegal else without
matching if”. So this problem of ‘nesting to nearest else’ cannot be solved by enclosing if
statement inside a block. The best solution happens to be using the conditional statement
Exercise 4.6:
The ternary operator( ? : ) is roughly an equivalent for if-then-else. Can you think
of any reason why there is no operator like a single question mark(?) as a binary operator
Exercise 4.7:
int c = (a,b)?(b,a)?a:b:a;
operator. Every assignment requires an lvalue as its left operand and modifies it by
storing new value into it. The result of this expression is never an lvalue. Assignment
¾ Simple
Given by =, which replaces the value of the expression by the object referred by
the lvalue.
¾ Compound
The compound statements are of the form var1 op = exp where op is a valid
binary operator. This is nothing but the shortcut for var = var op exp.
immediate operands. These instructions increase directly specify the required information
so help in faster execution. ’C’ makes efficient use of this feature by providing compound
statements for which translation can be done directly to its corresponding machine
a=a+10;
MOV AX,_a
ADD 10
MOV _a, AX
INC _a, 10
in some machine.
4.8 Sequentia l (Comma ) Opera tor
Comma operator has the lowest precedence of all operators. Comma operator is
one of the few operators to which the order of evaluation is specified. Both comma and
conditional operators evaluate from left to right. The values of left expressions are
discarded. The type and value of the entire expression is the type and value of the
rightmost expression.
// value 1 is discarded.
int i = 1,2;
// given as i = 1; 2;
The amazing power of comma operator can be realised by the fact that the
// or
// or even
(a^=b),(b^=a),(a^=b);
// to show explicity how the expression is evaluated
a ^= b ^= a ^= b;
Some symbols can act as both a separator and an operator. Comma operator is one
int i, j;
add(1,2);
int j = (0,1);
for( ; i, j ; j--)
in all these 3 cases comma acts as an operator. (1,2) evaluates to 2 and passed as a first
Reducing the strength of the operations means replacing the operators with equivalent,
iexp << 3
equal to left-shift by 3). Left-shift requires less microprocessor resources than multiplication,
These optimizations are done by the compiler optimizations without your knowledge.
However, you can make this explicit if execution efficiency a top concern for you.
When two or more operators are mixed with each other, precedence determines
which operands are evaluated first. It means that an operators’ precedence is meaningful
only if other operators with higher or lower precedence are present. The precedence of
something = a << 4 + b;
The programmer intended to calculate (a << 4) and add b to the result. But it is
interpreted as:
(a && b == c && d)
There where no separate operators for the bit-wise and logical operators (&,&&,|
and ||) when C was originally developed. The & and | operators served a dual role and had
the precedence level as it is now. It used the traditional notion of finding the meaning
To separate the concepts of bit-wise and logical operations, these two new
operators (&& and || operators) were added. But the problems persisted with precedence
levels. For example the conditions like the following one were still a problem:
“In retrospect it would have been better to go ahead and change the precedence of
& to higher than ==, but it seemed safer just to split & and && without moving & past an
existing operator“[Ritchie1982].
( ) [] -> .
* / %
+ -
>> <<
= = !=
&
&&
||
?:
= op=
Note : Here operators are listed in descending order of precedence. Several operators
a > 0 ? a++ : a += 2
Here the programmer intends to increment the value of ‘a’ if it is greater than zero else
treated as :
( (a > 0) ? (a++) : a) += 2
which is makes compiler to issue an error. So if you are not clear of the operator
Associativity determines how the operators group if they are at the same
a = b = c = d;
is equivalent to,
a = (b = (c = d))
because the = operator is right associative. All the assignment operators (like *=),
conditional operator and most of the unary operators are right associative. All other
expression given that the operands are of same precedence (remember that order of
int a = b + c + d;
The reason for allowing compiler is free to evaluate expressions in its desired
For example, such case of no-associativity arises in case of FORTRAN where the relational operators
cannot be combined together.
int i = i + j + i;
int i = 2 * i + j;
In general, the compiler is free to evaluate expressions in any order. It may even
rearrange the expression as long as the rearrangement does not affect the final result, but
because the function call i() would be done only once instead of two.
Note: Although the operator precedence, associativity and order-of-evaluation are all
As the name indicates, side effects are changes because of evaluation of a main
computation.
only for its side effect and the value returned is ignored.
int timesCalled;
printf(“Sum = %d”,x+y);
++timesCalled ;
timesCalled will be incremented as a side result. The better way to achieve the same
return ++timesCalled;
In this function call the information that timesCalled is made explicit and that
variable is also contained within the function, so it is better than the previous version, and
function participating in an expression. Side effects hidden within the function shows the
design is not good and such functions make debugging complex and reduce readability.
Side effects is the idea to be understood well since it can affect the portability of
your code and reduces the quality of your code if used without care.
i = ++i + i++ + ++ i;
that can lead nowhere. The point is that such expressions involve side effects and the
result value of evaluating the expression will not be the same across various systems.
Once my friend asked me about the result of executing the following piece of
code:
signed char ch = 5;
while(ch = ch--)
printf(“%d”,ch);
His aim was to print the value of ch and decrement it as the loop executes that
would eventually terminate when the value of ch becomes 0. So for this code he expected
the output to be 43210. But the program went to infinite loop, printing 555555….! Such
Actually, if you think you can reason it out why this code went to infinite loop.
However, remember that the same code may run perfectly as expected in another system,
giving expected results. So beware of side-effects. Understanding about side effects is so
important.
(called as side effect operators). A problem in side effect operators is that it is not always
possible to predict the order in which the side effects occur. For example:
int a, i = 2;
can give different results depending on the implementation. If a side effect operator is
used in an expression then the effected variable should not be used anywhere else in the
expression. On the other hand, the ambiguity can be removed by breaking it into two
int a = i++;
a += ++i;
encountered.
A program should not depend heavily on side effects. It is not a desirable property
a[i] = ++i;
will always give the same result when executed in your system but may yield some other
readability of the code or it smoothens the flow of control and can be handy if used
carefully.
/* skip */ ;
this has better readability than the equivalent code and is short and so I preferred it.
points. "Between the previous and next sequence point an object shall have its stored
value modified at most once by the evaluation of an expression. Furthermore, the prior
value shall be accessed only to determine the value to be stored". If the statement
involves change of value within a sequence point we can say that side effect is involved
¾ comma,
¾ logical-AND,
¾ logical-OR, and
¾ conditional.
The expressions involving these operators mostly do not suffer from the problem of
side effects.
There are some contexts in which the expression appears, but its value is not used:
¾ An expression statement,
2. void ;
3. (void) i * j ;
int i = (10,20);
int i = 0;
// do something
int main( ){
add(1, 2);
}
// no variable to assign the return value
(void)add(1, 2)
without side effects is discarded, the compiler may issue an error warning message, this
may also occur if main operator of a discarded expression has no side effect. Side-effect
¾ All constant expressions may contain integer constants, char constants, sizeof
¾ as initial values for external and static variables (for this unary & operator
can be used),
¾ Assignment,
¾ Comma,
¾ Decrement,
¾ Increment,
It is worth noting that ?: operator can be used in constant expressions. The values
int array[arraySize];
continue and goto statements. Direct way of altering the program flow in C is through
goto and setjump / longjmp library routines. The use of goto is considered to harm the
structured design of the program [Dijkstra 1968] although most of the languages support
goto statements. Dijkstra notes, “The goto statement as it stands is just too primitive; it is
too much an invitation to make a mess of one’s program”. That paper stirred lot of
gotos can be altogether eliminated from use in the programs by using the other control
transfer statements.
One control transfer statement is switch and in that case statements integral
constants are expected. Floating types are not possible to use with ‘switch’
statements because the designers have thought that since checking of exact
One of the sources of nasty bugs in C is that the case statements in switch
device [Duff 1984]. The objective is to get ‘count’ number of data pointed by ‘from’ to
register count;
do
*to = *from++;
while(--count>0);
and this version compiled in VAX C compiler ran very slow for some reason. So
register count;
register n=(count+7)/8;
switch(count%8){
}while(--n>0);
The idea is to find out the number of times the loop to be executed in ‘n’ and call
switch to copy for modulus value. The do-while loop just ignores the case statements as
labels. This technique exploits the fact that case statements do not break automatically. It
is not clear whether this technique is for or against the fall-through property of switch,
but is presented here just to show how such mechanisms can be exploited.
Is it enough to have only few control flow statements like if, while, for etc. Does
The structure theorem [Bohm and Jacobini 1966] shows that any program construct
can be converted into one using only while and if statements! To give one example, the
equivalence between the for and while loops is a well known one:
for(init;check;incr)
// body
// is equivalent to,
init;
while(check)
// body
incr;
The theorem proves that all the programs can be programmed using only standard
structures (for your astonishment, only ‘while’ is enough and even ‘if’ is not necessary! A
‘Saki’(H.H. Munro)
Pointers are forte of C and C employs pointers more explicitly and often than in
any other programming language. It is the most difficult area to master and error prone.
In C the design, decision is made such that the usage of pointers is explicit. C
pointers can point to virtually any object/anywhere. "...its integration of pointers, arrays,
and address arithmetic is one of the major strengths of the language" [Kernighan
void * acts as a generic pointer only because conversions are applied automatically
when other pointer types are assigned to and from void *’s. ’s (but note that the ANSI
standard made void * pointers a generic type for pointers to objects, it exempted function
pointers from this universality). In memory, it is aligned in such a way that it can be used
with pointer to any other types and for compound types also.
A pointer must be declared as pointing to some particular type. This holds true
even for void *. void * is means it is capable of pointing to anything as opposed to the
far as the standard is considered, but for practical programming more than three levels of
indirection is rarely required. For each level of indirection qualifiers may be introduced,
int arr2D[10][10];
This is because both are not of the compatible type. Since the compiler issues an error,
and both are of different types, you always resort to explicit casting,
char **. But you are intelligent and know that you can always cast any pointer type to
To your frustration, the compiler complains that it cannot convert array type to
void * type and you give-up trying and end-up deciding to get a new C compiler.
I illustrated through the steps because this is normally the way we attack
problems. When an error message occurs, we try to do something that will shut-up the
compiler.
Actually the problem associated with is different. Had you closely followed the
error message, you would have known that arr2D is not of two dimensional pointer type,
Exercise 5.1:
void * is generic pointer type for any pointer type(like char *,int * etc.) . Find out
what is the generic pointer types for pointer-pointer, pointer-pointer-pointer (like char
**…) etc? Is it void **… or is such generic pointer types necessary at all?
unknown *));
All will print the same value for a particular machine. Similarly the whatever may
be the dimension of the pointer be, the size remains the same.
All will print the same value for a particular machine. Consider the following two
statements:
both of the statements are equivalent even though the first is more clear and the purpose
is evident.
The point is that pointer is also a variable, but unlike other variables, these
variables store addresses (of other memory locations). Therefore, the size of any pointer
variable irrespective of the type or dimensions is same. Now, coming to the previously
are valid. This is because the compiler knows the size of any pointer variable. So it has
no problem in determining the size of the structure which contains the forward reference.
b *next;
}b;
that involve typedefs will flag an error. Typedefs cannot be forward referenced because
the type itself that is being used is not known at that point. So the compiler cannot
understand what the identifier ‘b’ means, resulting in issuing an error message.
boolPtr = &boolVar;
*boolPtr = true;
printf("%d",*boolPtr);
// prints
// 1
Pointers for enumeration types can be created and this shows the close
relationship between the integers and enums because enums are internally of integral
types. Although such pointer types are rarely used its nonetheless possible.
C assures that the order of the structure members is assured as given by the
struct someStruct{
int i;
float j;
}aStruct;
if(ps1<ps2)
// it prints
The order of the members is maintained in the memory. This concept is easily understood
in the case of structures. What about unions? Look at the following code:
union someUnion{
int i;
float f;
}aUnion;
if( pu1==pu2)
// it prints
equal, stating that they start at the same location, a shared location.
Addition operation has restricted usage. Two pointers cannot be added. Any other
mathematical operation is meaningless and not allowed. But even in the three allowed
operations, the arithmetic is assured to be meaningful and defined only if the address to
take into account the size of type (i.e. the number of bytes needed to store a type object).
intPtr++;
doublePtr++;
someStructPtr++;
Subtracting two pointers to elements of the same array object gives an integral value
As I have said, addition operation has restricted usage. Restricted usage in the
sense two pointers cannot be added but other addition operations like increment and the
int i = 1;
iarr += i;
// allowed
iarr += 2;
// allowed
iarr++;
// allowed
iarr + jarr;
// not allowed
int iarr[10];
int *k= i + j;
// can note that the k will point beyond the limits of iarr.
int diff = j – i;
printf(“%d”,diff);
// of positions.
may arise for such pointer additions as in the following one (this example is in
The requirement is to find the middle element of the array in binary search
algorithm. ‘low’ and ‘high’ are the two pointers pointing to the beginning of the array.
Since subtracting two pointers is perfectly acceptable, this turns out to be a fine solution.
NULL is a pointer constant and its definition may vary with implementations. It is
the universal initializer for all pointer types. It is normally defined as follows,
Because the NULL is the only character that is neither printable nor a control
character.
possibilities of accessing null pointers. The following code tells one such way to do that,
if(p){
If the pointer variable is not initialized and if access is made to that variable, the
behavior is undefined. The pointer arithmetic holds only if the range is within the array.
So it is always good to check the validity of the pointer before accessing it. The following
code makes sure that ptr is not NULL before accessing it.
printf(“%d”, *ptr);
This uses the property that boolean operator will be evaluated until the truth-value
is determined. So if the value of ptr is zero, the condition will fail and will come out.
Dangling pointers and memory leaks are the nastiest problems that may arise in a
program. Dangling pointers arises when you use the address of an object after its lifetime
int *fun()
int x = 10;
return &x;
….
int *k = fun();
if(a>b){
free(block);
….
*block = 10;
Dangling pointers may harm the execution of your program to any extent. So it
Memory/storage leaks occur when you fail to free the storage when it is no longer
used.
ptr = ptr1;
Most of the systems support a function known as ‘alloca’ that allocates memory
The problem also is the same property itself! We cannot return the dynamically allocated
chunk back from the function and other functions cannot manipulate it. In addition,
The effect of memory leaks is not evident until the system runs out of memory.
Consider the case where multiple programs are executed at a time, which share same
heap. If the memory leak is much the system will run out of memory and the only way to
¾ Dangling references,
¾ Corrupted pointers
A global pointer should never reference a local auto variable, because when that
local variable goes out of scope its memory is released to be reused by the stack for
something else. Now this global pointer becomes a dangling pointer. If the global pointer
To avoid the problems with uninitialized pointers (that is discussed above), they
corrupted pointers. They are also created if made pointed to other word boundaries than
Since wild pointers can pose serious threats to the correct execution, validity of
the programs and notoriously hard to debug. The programs should be free from these
wild pointers.
The checking validity of a pointer object i.e. if the pointer is pointing to a valid
location or not is a very big problem and in some cases, the validation cannot be done at-
It is taken for granted that the address passed is a valid one and processing is
Exercise 5.2:
int *p=0;
int i=0;
while(i++<n)
p = &arr[i];
*p = 0;
}
5.10 Pointers a nd Const Qua lifier
When the pointer declaration involves const qualifier, it can be understood easily
are equivalent. It can be read, as ptr is a pointer to a variable of type integer that is a
constant. It says the value of the variable pointed by ptr may not be modified.
*ptr = 10;
// invalid, but
ptr = &var1;
// is valid
where the const in the arguments guarantee that the strings pointed by the arguments will
not be changed.
// constant pointer
which can be read as ptr is a constant pointer(don’ get confused with pointer
constant. Pointer constant is used in case of array names) to variables of type int. Now,
ptr = &var1;
// invalid, but
*ptr = 10;
// is valid
Combining the both,
which can be read as ptr is a constant pointer to a variable of type int which is
constant. So, both the pointer and the variable pointed by it cannot be modified.
ptr = &var1;
// invalid, also
*ptr = 10;
// is invalid
Pointer constant is distinct from constant pointer and should not be confused with
each other. An array name for example is a pointer constant. Function names are also
pointer constants because they address they point to cannot be changed. NULL as we
have just seen is also a pointer constant. It is used in the sense as the word ‘constant’ as
used in character constant, integer constant etc. Where the implicit meaning is that the
object itself is a constant and so no chance of modifying. In other words it can never be
But in the case of constant pointers the word const is applied as an adjective
qualifying the pointer as a constant. So it means that it also may not appear as an lvalue
and the address it ‘contains’ may not be modified. Unlike pointer constants the constant
pointer it can be modified indirectly and the ‘const’ness is imposed artificially by the
compiler.
int *temp;
ptr = someArray;
temp = &ptr;
*temp = someArray;
temp = &array;
*temp = someArray;
Pointer to a constant means that the value is not changed through that pointer,
Pointer is defined as the type that can hold the addressable range of the system. In
addition, the size of the pointer is normally the size of int (for efficiency). This two may
not necessarily are be compatible with each other, so leads to problems in few
implementations.
In Intel’s x86 systems, the size of int is 2 bytes. So pointers may be implemented
to hold two bytes. This can address up to 216-1 memory locations (size of one segment in
these machines). This is enough for small programs that manipulate fewer amounts of
data. Nevertheless this is far less than the addressable portion of the memory. To
overcome this difficulty they have an alternative patchwork where they have far pointers.
Therefore, a pointer variable when declared as far is of 4 bytes that can address up to 232-
1. This too cannot determine a memory location exactly if used in comparisons like ptr1
= = ptr2. Because it is stored in the format segment : offset. To overcome this, when you
declare a pointer as huge, it also takes 4 bytes of memory, but which stores the addresses
as absolute addresses. Again to make these kinds of changes to pointers as default type
they have memory models etc. which makes the problem worse.
Such implementations (compiler vendors like Microsoft, Borland etc. support this
one) are specific to some subset only and the reader is recommended to follow the
Arrays are list of objects of same type. Array name is an lvalue. However, a non-
modifiable lvalue (a pointer constant). Since an array name points to the beginning of a
memory location, if it is modified just like a pointer reference, that memory location will
be lost. So it is called as pointer constant signifying that you may refer its value but may
not modify it. As we said previously, assignment between arrays is not allowed, whereas
all other assignment types are allowed. The reason being that arrays are pointer constants
The C arrays are low-level in nature. They reflect the storage of the array
elements in the physical hardware. To support the argument that C-arrays are sufficiently
¾ The array name refers to the starting address of where the actual storage of the
array members begin, and this helps in assigning the array to a pointer of the
¾ No padding is done,
When you declare an array, it is assured that it is allocated contiguously and with
no padding between the array elements done. Were padding possible, it would be costly
in case of large and multi-dimensional arrays. Arrays doesn’t have any information store
in itself on the type of information they store or its size etc. Consider the following
example:
// ??? is vPtr
foo((void *)iArr);
now, using vPtr there is no way to determine the type of the array or size of the
Only through all these properties, pointer - array relationship is possible and
int i, a[10];
for(i=0;i<=10;i++)
a[i]=0;
This seems to be a harmless code. But the for loop accesses a[10] which is not there. This
is called as 'one past error' (also referred to as ‘fence-post’ error) that even experienced
programmers commit. In C array index begins from 0 and there is no way to change this
default base. Pointer arithmetic should be limited within the available block. So,
references like a[-1] and a[15] are illegal since the reference is outside the block and so
the behavior is undefined. ANSI C loosens this rule by allowing access one element past
the memory block. However, it cannot point below the base address.
int array[10];
array[10]=100;
array[11]=100;
// illegal
array[-1]=100;
To reason out this behavior consider the following code in a x86 machine.
int array[10];
ptr --;
Due to such problems as in the previous code, it is illegal to access the elements
below the base address. But what about ‘one past error’. The same problem occurs if the
array ends at the end of a segment and tries to move past it. In this case, an ANSI
compiler makes sure that the array bound is at-least one below the segment limit. All this
is due to this ubiquitous one past error. So by accidentally accessing a[10] as in that
example will not lead to undefined behavior ( but it by no means say that a[10] is
Arrays and pointers are closely related. Pointers are just l-value to the object they
point to. array[index] is exactly same as *(array + index) (or in turn *( index + array ) in
any case). Since most of the times there is a direct hardware support, the pointer
relation is direct and explicit. So we can refer array[index] as index[array] that means the
same. Exactly one of (a, b) must be a pointer and one of (a, b) must be an integer
expression. So, array[1] is equal to 1[array] since array is a pointer and 1 is an integer
expression. Even casting may be applied and is valid till the condition is satisfied. For
example:
a[b]==(char*)a[(int)b].
int a[10][10];
int k = 2[a][2];
int m = 2[2][a];
Showing the relationship between arrays and pointers is very simple as in the
Type arr[10];
&arr[0] == arr
The base address of the array is &arr[0] and it is equal to just saying arr. In other
words the array starts at the 0’th position and applying the & operator to that position
yields nothing but the base address of that array. The same can be expanded as,
Note that the casting ‘arr’ to (char *) is essential to make sure that the addresses are
added in a scale of 1.
So whatever may be the type, for arr[1], the following relation holds true:
This shows the relationship between single dimensional arrays and pointers.
Why did I use a typedef for ‘Type’? That is to extend this relationship to multi-
// or more
and still the relation holds good, for the simple reason that multi-dimensional
used to manipulate the values stored in the array and allows rapid access to a particular
array location.
int arr[10][10];
ptr[11] = 10;
arrays’.
char **list;
list[1] = "India";
printf(" %d ",strlen(list[i]));
// prints 24 5 14
where the strings of variable size are used. Popular method is to have dynamic memory
allocation to be done on the every dimension. The command line arguments for example
After knowing what is flattening of arrays and ragged arrays it is the time to
int flattened[30][20][10];
int ***ragged;
int i,j,numElements=0,numPointers=1;
numPointers+=30;
numPointers+=20;
ragged[i][j]=(int*)malloc(sizeof(int)*10);
numElements +=10;
}
printf("Number of elements = %d",numElements);
// it prints
As you can see the ragged arrays require 631 pointers, in other words, 631 *
sizeof (int *) extra memory locations for pointing 6000 integers. Whereas, the flattened
array requires only one base pointer: the name of the array enough to point to the
On the other hand, the ragged arrays are flexible. In cases where the exact number
of memory locations required is not known you cannot have the luxury of allocating the
memory for worst possible case. Again, in some cases the exact number of memory space
required is known only at runtime. In such situations ragged arrays become handy.
To illustrate, consider the example of a text editor. The size of the text the user is
going to type cannot be predicted. If worst case of 256 columns and 1024 lines is
assumed and the space is allocated statically, it will require 256 * 1024 bytes, or 256 kilo-
bytes. Even if we allocate such big chunk of memory, our text-editor will have the
limitation that the user can type only up-to 256 characters per line and 1024 lines at the
maximum. On the other hand, declare a 2D pointer for storing the information that the
user types. Memory can be allocated dynamically to fit the need. If the user types
nothing, no space will be allocated. If he types a lot with varied number of characters in
each line, the space can be allocated exactly with additional space for storing the pointers
for each line. There is no limit on the number of lines, since the pointers to the line is also
allocated dynamically and can vary. So the only limitation happens is to be the size of the
available memory. As you can see this ragged array approach conserves lot of space in
So the selection depends on our requirement, and each approach have their own
Unlike the languages Pascal and FORTRAN that follows column-major order, C
which most of the accessing is made in the programming. Lets look at an example for
traversing a N * M 2D matrix,
printf(“ %d ”, matrix[i][j]);
Each row in the matrix is accessed one by one, by varying the column rapidly.
The C array is arranged in memory in this natural way. Consider the following one,
printf(“ %d ”, matrix[j][i]);
This changes the column index most frequently than the row index. Is there any
accesses the array in the natural order (row-major order) of C, hence it is faster, whereas
the second one takes more time to jump (If you want to verify the fact that the first one is
faster than the second one, use clock() function in <time.h>, run in your machine and see
the time difference to execute them. In our machine, the second code took twice as much
The difference may be small in case of small arrays. However, as the number of
dimensions and the size of element increases the performance difference would be
significant.
In C static allocation is made for arrays. So all the information required for
allocation of memory is needed. In other words incomplete information will not suffice
and will lead to compile time error. This is except for first dimension.
be used in place of int a[] in the above declaration and they in no way are equivalent. The
confusion between the two of the usage arises from the fact that they have the same
I.e. foo(int *arr) and foo(int arr[]) are equivalent. Similarly foo(int**arr ) and
foo(int *arr[]) are all equivalent. But they are not equivalent elsewhere.
Consider,
,sizeof(astr));
// psrt = 4, astr = 7
int *i;
int j[20];
i = j;
int **i;
int j[10][20];
i = j;
These two are not equivalent. i and j are of different types and hence i cannot be assigned
int j[10][20];
foo(j);
This is because only pointers not arrays can be passed to functions and only here the []
printf(“%d”, iptr[-1]);
// prints 2
The negative indices may even be useful in some cases. If ‘iptr’ is pointing
somewhere middle in the integer array then iptr[-1], iptr[0], iptr[1] will give the previous,
currTok[-1],currTok[0],currTok[1]);
// prints
prevTok = integer
currTok = long
nextTok = float
Using negative indices is not recommended because it may confuse the reader
who reads the program. The array access should be within the limits of the array and
printf(“%d”, iarr[-1]);
// undefined behavior
ptr++;
// perfectly o.k.
arr++;
// compiler error
The reason for the behavior being that arr is a name of an array, and if expressions
such as a++ were allowed it may leave the memory area allocated to be stranded and you
may miss the link later. To avoid these kind of pitfalls C restricts the arithmetic on array
names a calls it as pointer constant. This means you can examine the contents using the
name but may not modify it. In case of ptr in the example it is declared as a pointer so is
Strings are implemented as const char pointers. Since they are pointers the string
<string.h>).
constant. It includes printable characters and escape characters. Two continuous strings
separated by white space characters (that include new-line character) are concatenated by
”zation”;
puts(str);
// prints stringization
"hari",
“ranjitha”
“prakash”,
0,
};
while(name[i])
puts(name[i++]);
The programmer expected that four names be printed but he got only three. What
went wrong? Clue. It printed ravikumar, hari and ranjithaprakash. The programmer
missed a comma while typing that led to stringization of strings, causing an unexpected
problem.
Constant strings may get placed either in code or data area and that depends on
the implementation. That means, the string constants are available to use even after
char *try1()
return temp;
char *try2()
return temp;
char *try3()
return temp;
}
int main()
puts(try1());
puts(try2());
// undefined behavior!
puts(try3());
// undefined behavior!
In the try1() ‘temp’ is initialized to a character const. Since the character constants
are stored in code/data area and not allocated in stack, this doesn’t lead to dangling
pointers. But this is not the case of try2() and try3(). These two functions suffer from the
problem of dangling pointers. In try2() temp is a character array and so the space for it is
allocated in heap and is initialized with the character string “character string”. This is
the function so the string data is not available in the calling function main(). So the puts
reads some unknown position leading to undefined behavior. The function try3() also
suffers from the same problem and the problem is more easily identifiable because the
puts(arr);
You know that constants are also allocated space while compilation of the
program. In this example, “string” is stored somewhere in the memory and its address is
stored in ptr. Compiler may store both the strings in a single location and assign the same
string to both arr and ptr. Any modifications through ptr may also affect arr. This concept
It is to be noted that the shared strings does not have to occur in full strings, part
// beware that the part "string" may be common in both the strings
if(ptr == arr)
if(“string” == “string”)
These checking works on the common sense that no two different string constants
This sharing of string optimization by the compiler is to save space. This may
constants are used in the source program. For example, you may have “some string” as
the string used 100 times in the text. If shared strings are used it will avoid storing “some
string” 100 times in the code generated and store only one string instead.
Since shared strings can economize space, how will you force sharing of strings to
be enforced by the compiler? One obvious way is to switch on the shared string flag
puts(something);
char * t;
t = s;
t[0] = ‘m’;
This is called as shallow copy. By default in C shallow copy is done for pointer
assignment.
If you want the coping to be done in the space for the t then you should do the same
explicitly.
strcpy(t,s);
Since it is pointing to the different location from the source, (even though the
contents of the both are same after the copying) , change by one variable doesn’t affect the
another.
t[0] = ‘m’;
In other words, shallow copy is just to copy the pointers whereas deep copy is to
provide address for the new object where the copy of the source content is stored.
Arrays and strings are closely related, as string is just an array of characters.
Consider an example,
char string[10];
strncpy(string ,”Tom cruise”,3)[3] = ‘\0’;
Copies “Tom” into string and NULL terminates it. Similarly to traverse a
putchar(”aeiou”[i]);
Although these examples will not help in real-life programs, they show how the
arrays, pointers and strings are closely associated with each other.
Look at the following code to see that the rules that apply for multidimentional arrays
char arrayInfo[]=”hello”;
ptrInfo[2]=’a’;
The first one defines an ordinary character array. The second one points to a
character string. The compiler is free to store the string constant “hai” anywhere it
wishes. Consider that it happens to be stored in a read only memory (ROM). If we want
to change this string at runtime, it may produce unexpected result. So if you want to
modify the string later, it is advisable to store it in an array as in the first one.
An array cannot be passed or returned from the function, whereas structures can
be. Similarly, structures can be assigned to each other whereas arrays cannot be although
both are aggregate data-types. This is a particular place where C suffers from the
is because the array bounds need not be perfectly known. Hence the size of the whole
array may not be predicted exactly and due to the close relationship between the pointers
and arrays (an array name is a pointer constant and doesn’t correspond to the whole
array). But this is not in the case of structures. So it is possible to embed the array in a
Raw datatypes, when combined together to have a logical relationship with them,
it becomes very powerful. Structures do that job of containtership; unions and bit-fields
Structures may be initialized with the expressions of the same type. The structure
members can be initialized with a list of values enclosed within the braces according to
the order of members. If any members are left uninitialized then they are initialized to
zero.
Structures and unions can also be initialized just like arrays as follows,
initializes all members of sStruct to zero including the padding bits (if any). This applies
to unions also.
initializes the space occupied by the union to 0 including the padding bits (if any);
In C structures can be assigned only if they have the same name (refer to “type
✝
Few other languages provide structure assignment that follow structural equivalence.
struct structure1 {int i;} sv1={1}, sv2;
sv2 = sv1;
s2 = s1;
that two different structures that are meant for different purposes can’t be mixed together
accidentally or purposefully. This leads to safe and clean model for structure assignment
[Stroustrup 1994].
// it prints
// 1 1
This idea is for assigning to structure pointers of different names. Similarly this
idea can be extened to assign one struct variable another with different name.
// it prints
// 2 2
Here an explicit cast does the job to override the name equivalence. When making
explicit cast, the programmer is aware that he is explicitly changing the type, so
acceptable. When a situation arises that you must override the name-equivalence and
require structural equivalence, the simple technique of explicit casting can be used.
ANSI provides a way to determine the byte offset of any non-bitfield structure
size_t = offsetof(type,mem);
// syntax
// e.g. of usage.
or simply as,
struct temp{
char *name;
struct dateOfBirth{
}DOB;
}student;
The above example uses a nested structure. Since a new name space is created inside a
structure, the names may be overloaded. The nested structures need to be defined (it is
not possible to just declare structure because a structure member is needed) inside the
structure.
It possible to create a inner structure variable outside the enclosing structure. For
example:
// legal
You can create on-the-fly structure definitions in the function arguments and
return types. Such structures are available from that point of definition.
defines a structure called some and passes a variable of the same type to the function foo.
This style reduces the readability of the code much, so, should be avoided.
to the function. Since passing is done through stack, passing structures by value is
costlier. So if you really want to send a structure by value make sure that it is very small
or alter the program such that it is sent through a pointer to that structure.
6.6 Unions
Unions can be considered as special case of structures. The syntax for both is
mostly the same and only the semantics differ. The memory is allocated such that it can
accommodate the biggest member. There is no in-built mechanism to know which union
member is currently used to store the value. Consider the following code:
union value{
};
printf(“%d”,v.iVal);
// be cautious. How do you know that foo() has only set
In other words, with the union itself it is not possible to know the member that it
is currently used. So there is more possibility that you access a wrong member type and
The solution may seem simple by introducing another variable of type integer.
#define INT 0
#define DOUBLE 1
#define FUNC 2
int type;
// the foo() should set the global variable ‘type’ to correct value
switch(type) {
default : (v.fp)();
Its better to use enums to ints because it specifies a closed set of values as in this
case:
concerned. To provide that logical relationship, use a structure. The full code may now
become:
union value{
};
struct dummy{
enum type t;
};
struct dummy v;
v.t = INT;
v.val.iVal = 10;
return v;
int main() {
switch(v.t) {
default : (v.val.fp)();
}
Yes, the code becomes longer, but this is a robust solution for using the unions.
constants available using the enumeration itself. It is a good design to have the count in
the end of the enumeration. Similarly add a enumeration constant specifying the illegal
NUMOFHOLIDAYS = 2};
6.7 Bitfields
The members of structures that have bitfields cannot be applied with the
following operators,
¾ (indirection),
¾ & (addressof),
¾ [] (subscripting) and
¾ sizeof operator
Pointers store addresses and that means the memory locations that are directly
addressable can only be used for pointers assigning to pointers. In case of bitfields, direct
addressing of bits is not possible. That is why pointer operations (indirection and
Sizeof returns the number of bytes it occupies and not bits. If sizeof were to allowed
on bitfields, confusion may arise if it tells the number of bits or bytes the member occupies.
In the case when bit-fields and other data fields come together, the bitfields should
come first.
unsigned int : 3;
char * name ;
};
All the members in a bitfield need not be named ones. Unnamed members cannot
be accessed and used from the code and serve the purpose of padding.
struct bitField {
int day : 3;
int sex : 1;
}sample;
enum
workingDay{monday=2,tuesday,wednesday,thursday,friday };
#define MALE 0
#define FEMALE 1
sample.day = friday;
sample.sex = FEMALE;
sample.sex);
// it printed
The poor programmer expected day = 6, sex = 1 to be printed but what happened?
Before that lets see something about the range of types. For signed chars, the range is
from –128 including 0 to +127. In other words it is from -27 including 0 to 27-1.
Generalizing this idea, for signed integral quantities of bitsize n the range is from –2(n-1)
including 0 to 2(n-1)-1.
Coming back to the problem notice that the bit-fields are ints and the signed is the
default for ints. For the member ‘day’ it takes 3 bits. Finding the range by using our
formula it is –4 to 3. ‘friday’ is 6 that cannot be represented in the range and thus rotates
to –2. The same applies to ‘sex’ field and +1 cannot be represented with single signed bit
and so the only bit available is treated as the sign-bit and the value printed is –1.
The cost of using bitfields is losing of portability. How the bit field definition and
declaration will pack the bits in the given space depends mostly on the underlying
hardware.
The following example is a highly unportable way to get the parts of a floating
point variable. It is assumed that the machine is small-endian and it follows IEEE floating
point arithmetic.
union{
float flt;
struct{
}bitField;
}floatVar={-1};
floatVar.bitField.exponent);
// prints
Since portability is the cost, bitfields should be used in places where space is very
limited and that functionality is demanding. The space vs. time complexity is involved in
fetching bitfields. In assigning storage for integers, bitfields should not be used instead to
save space.
struct birthdate {
}myDOB;
is not recommended because the space(one byte) saved is comparatively less preferable
to the complex shifting of bits involved (time complexity is involved so, using three
integers instead of bitfields is more efficient and is recommended). Separate integers may
be used for representing true and false values rather than using individual bits for
However bitfields,
¾ once defined can be accessed in the same way as we access the ordinary
variables.
char name[10];
struct nameList *next;
}nameList;
{"prabhakar",0},
{"abilash",list+3},
{"arvind",list},
};
while(temp)
printf("\n%s",temp->name);
temp = temp->next;
// it printed
// abilash
// arvind
// carl clemens
// prabhakar
This program works on the rule we have already seen that ‘a declarator is
available to be referred in the program code and used from the point where the
declaration/definition is made’. Here the identifier list is available and used in its
initialization list itself and this is based on the pointer arithmetic. The compiler knows the
size of the structure so it can calculate the addresses like list + 3 (which is nothing but list
+ (sizeof(nameList) * 3).
Such self-referential structures are very useful in implementing data structures
like linked-lists, trees etc. that uses pointers to the nodes of same type.
structures. There may be some alignment requirements required by the environment. For
example, ints may required to be aligned at even numbered addresses and longs at
address numbers divisible by 4. This leads to internal and trailing padding of bytes in the
structure.
Due to this padding, the value returned by the sizeof may not be equal to the
simple addition of the sizeof the structure members. Because, sizeof when applied to
structures, returns number of bytes required to represent the structure including that
struct someStruct {
char cc;
float ff;
double dd;
void *vp;
};
if(size1 == size2)
printf(“No padding is done”);
else
// in my system it printed,
dependent. Therefore, writing the structure information to a file in one machine and
reading that file in another machine may not work even if you use the same structure. So,
transferring binary files is not portable. However, these problems are not there in
transferring the text files and so it is better to store and transfer the information as a text
Since you cannot use == operator to check the equality between the two
structures, you have to write a separate function that will check for equality of each
structure members. memcmp should not be used because there may be padding between
the members (whereas memcpy can be used for copying the structure).
// ok. Because all the fields including the padding part are copied.
// But you will not require this because you can assign as
// struct1 = struct2;
ANSI C specifies that there should atleast be one data member in a structure.
Unions do not assure the way in which the fields are arranged and are compiler
dependent. So, the program shouldn’t depend on the internal arrangement of the fields. If
you want to do some type conversion between two types, use casting to do the same.
Don’t use the unions to convert from one type to another as in the following example.
union value {
int intValue;
long longValue;
}val;
val.intValue = 10;
Here there are two types, int and long that are members of union value. To
is the correct way to perform the conversion. The reason is that in the previous case, you
are depending on the way in which the fields of union are aligned.
and its size cannot be expanded as required. For example, to have a structure
struct system{
char type[10];
char manufacturer[20];
int numOfPeripherals;
int perpheralID[MAX_POSSIBLE];
// worst-case assumption of number of peripherals
};
The number of peripherals shall vary with systems and worst case size
should be allocated for accommodate that. Still if more peripherals than the
variable-size array:
struct system{
char type[10];
char manufacturer[20];
int numOfPeripherals;
// this field has the number of items that are pointed
int perpheralID[1];
};
sizeof(int));
// note how space is allocated for the structure
if(structPtr == NULL)
structPtr->peripheralID = getID();
- Mahatma Gandhi
redundant code and help in reusability. For example, the functions provided in standard
header files.
Declaration of the function means introducing the function with return type and
arguments.
Storage class for functions may be either static or extern. Extern is the default for
functions and for function declarations it is the only storage class allowed. So,
void fun(int);
Static is the only storage class specifier allowed before a function definition. This
Nesting of functions is not allowed in C (but only extern function declarations are
idea of procedures as in other languages (like Pascal). The return type void means that
some value is returned from the function but that return value is purposefully ignored.
int fun(...);
ellipsis specify that any no. of arguments(or none) will be accepted without type checking
ANSI C introduces void, and so, if you want to specify that the function takes no
int fun(void)
int i = 10;
int * foo()
return &i;
int main()
{
*foo() = 100;
printf(“%d”, i);
// prints 100;
the function name in LHS of = operator is nothing but the variable i in disguise. This
code employs the pointer concept to show that the pointers are addresses and demonstrate
int * foo()
int j;
return &j;
No compiler errors are raised. The behavior of this code is undefined. Just
remember that the functions and the auto variables are allocated space in stack and are
automatically removed from memory after the function returns. In this case, &j (taking
addressof ‘j’) will give some address which is not allocated for this purpose and
assignment like,
*foo() = 100;
means assigning to that unknown address. This leads to the undefined behavior of the
code.
Now consider the following code segment.
int * foo()
static int j;
return &j;
This code is acceptable. The static data is allocated in the space that exists up to the end
*foo() = 100;
// is actually,
j = 100;
is reasonable. I have seen some expert programmers using the functions as l-values like
this using this idea. Remembering the previously set value is used in the case of strtok()
char *wish()
printf("%s\n",s);
return s;
int main()
{
strcpy(wish(),"Good morning");
wish();
// it prints
// Wake up
// Good morning
On the first-call it issues the message Wake up and after that it always issues the
The word ‘activation record’ is used in the same meaning as ‘stack frame’ here.
Every function that is called is allocated memory in the stack and that is called as stack-
frame.
Incoming actual
parameters
with argument count
Saved state information
(like old stack pointer)
Frame(stack)pointer
Local data (variables)
Temporary storage area
Outgoing parameters
(becomes incoming for
next frame)
Ritchie1981]. This format is generally followed in many implementations but may differ
from implementations.
Return values are actually passed by register. The size of all the contents can be
predetermined by the compiler in the compile time itself. If the arguments are variable
length arguments then the size may vary only for that part of the frame. So except for the
variable length arguments, the size of the activation record remains fixed.
It has to be noted that, the incoming parameters are saved by the calling function
rather than in the called function’s activation record. The stack frame of the called
function overlaps the stack frame of the calling function to get the parameters. Similarly
when the called function calls yet another function, it stores the parameter data to be
The saved information includes the control link to the calling function. Only for
the activation records of same function, the size remains the same. For different functions
the size may vary. So control links are required to keep track of how to return to the
remembered by pushing the address in the stack. Function instance or stack frame is
created on each call to a function. Local variables are allocated space in the stack area
itself. The parameters are treated as local variables. Therefore, it is legal to apply
addressof operator to the parameters. But never return the address of the local variables
since space is returned back for further usage and is immediately destroyed.
The function parameters are treated as local variables declared in the first level of
scope inside a function (so it forces the compiler to enter a new scope before the { is
encountered ).
int j;
// it is as if: int i, j;
The only legal storage class in function arguments is register storage class.
The C functions support only pass by value. The copy of the data is sent to the
swap(&i, &j);
*b =temp;
Since any changes made to the memory location pointed by that address, the
change made is not local to that particular for that function. Thus, it successfully imitates
pass by reference.
do not agree then type conversion takes place according to the conversion rules. If the
number of arguments do not agree, then last significant arguments are taken into account.
will lead to compile time error because in pass by reference, the function expects actual
If your machine follows ‘small endian’ order it will be stored in the machine as
0010 in the lower byte and 0001 in the higher byte. I.e. the lower order bytes are stored in
the higher addresses and vice-versa. For example, this ordering method is followed in
Intel based machines. In the ‘big endian’ machines, the higher order bytes are kept in the
higher addresses itself. Examples are systems with the processors SUN’s SPARC and
Motorola PowerPC.
The figures listed shows how the bytes are organized in the memory.
union findEndian{
int i;
char c[sizeof(int)];
}myEndian;
myEndian.i = 1;
if(myEndian.c[0] == 1)
scheme");
else
By using pointers also, you can determine the byte order of your machine:
int x = 1;
else
Problems may occur if the information of both the types is used in a mixed
manner. Such problems that occur due to the mix between the two types of endian
The ‘endian problem’ may crop up in your program if you intend to make your
program work between machines (say networks). For example, if you are taking the
The solution is to declare a local variable and use the address of that local variable.
{
int value = data;
Exercise 7.1:
int i = 0;
// sizeof(int) == 2
scanf(“%c”, &i);
// input 0 here
printf(“%d”, i);
Since functions by default have external linkage, if the function is not already
defined, the match between the formal and actual parameters go unchecked. There is lot
of scope that the programmer makes mistake by calling the function with wrong
argument type. This can be costly since the function executing with wrong arguments
To avoid this, functions can be declared by using the prototypes. This prototype
feature is added in ANSI C following the idea from C++. Prototypes are just an
indication to the compiler about the name of a function, its argument and its return type.
This helps the compiler to make strict type checking when the function calls are made.
To support prototypes, the compiler requires only some little more effort and no
double foo();
for initializing the pointers to functions before those functions are defined.
The return type also is by value. When the function encounters return statement it
returns to the calling position of the calling function. The default return type is int (if
return type is not specified). The value is normally returned via registers.
// file-1
double foo( )
{ return 1.0 };
// file-2
i.e. return ;
is used in the functions to return the control to the calling function. If the return type for a
function is void, then it is more meaningful to have such construct. For functions with
So it is always good to make sure that the actual parameters and return types are
Normally the arguments are pushed into the stack before the function is called. The effect
is that the arguments are passed from right to left. For example:
return (opt1+10*opt2+20*opt3);
add(10,20,30);
the arguments 10,20,30 are pushed into the stack one by one. Then the arguments are
int *i;
int numOfArguments=3;
printf("%d ",*i);
If you want to see the call stack (the stack in which the return addresses of the
previously called functions are stored) decrement i in the above program. To verify that
they really are the previously called functions try printing the locations which will
contain the addresses of functions, together with the registers, pc values etc. by your own.
C is one of the few languages that require a runtime stack and a heap. Stacks
normally grow from top to bottom and heaps from bottom to top as shown in the diagram
stack and heaps share the common space and are uninitialized and that’s why it is
asserted that it is the duty of the programmer to properly initialize it. It is a common
design to have the design of heap and stack growing towards each other. This is because
in most of the cases, either of them will be used much and so this type of organization to
helps to efficiently use the memory. The figure also shows how the code where the actual
program code resides and the data area where all the static data are allocated space.
Stack
Free
Heap
void foo()
int i, j, k, l;
Since local variables are allocated in stack frames the program shows that the addresses
ANSI C does not assure anything about the order in which the arguments are
evaluated.
This function call depends on the index value to be scanned first and that scanned
value is used as an index in next argument (technically, this statement involves side
effects). There is no assurance that ‘index’ will be scanned first. In this example, even if
we assume that &index is evaluated first there is no assurance that value of index will be
successfully scanned. To see how it practically applies, consider again the example of
int a = 10;
int b = 20;
b = func(a=b, a);
return y;
The idea is to use the order of evaluation of the function arguments. Function
arguments are evaluated and pushed from right to left. So, func() sends value of ‘a’ and
remembered inside the function and goes on being assigned by ‘b’. The func() returns the
remembered value of ‘a’ which is in turn assigned to ‘b’, in effect interchanging the
This solution suffers from the problem that the finding of values depend on the
allows it to be passed through the machine registers rather than the stack,
Since the arguments are register variables, this helps to improve the efficiency also if the
initialization. For example, usual arithmetic conversions are performed in both the cases.
extensions to the language like PASCAL keyword in Borland compilers and as __stdcall
These keywords may appear only before functions. ‘cdecl’ forces the arguments
be accepted in conventional C style. Pascal keyword specifies that the arguments are
add(1,2);
here ‘cdecl’ is assumed and the number of operands does not match. Since the values are
popped in the reverse direction, the return value is 50. If it is Pascal type function be
Understanding this difference can help you understand many interesting nuances
arg1
arg2 stack frame model for fun1
arg3
E.g. cdecl fun2(int arg1,arg2,arg3);
arg3
arg2 stack frame model for fun2
arg1
C PASCAL
Pushes the arguments in the reverse Pushes the arguments in the stack
The first argument in the function The last argument is at the top of the
The stack is cleared by the called function The stack is cleared by the calling
function
functions to remove the arguments from the stack, before they return from the caller.
Whereas, in the ‘cdecl’, the calling function is responsible for cleaning up the stack. This
is one of the main differences between the Pascal calling and C calling convention.
Most of the languages follow the standard (Pascal) calling convention (like Visual
Basic and APIs such as Win32). This is because it reduces the size of the code generated.
On the other hand, the cdecl allows the variable argument lists to be implemented.
The differences between the two calling conventions are summarized in the table
given.
7.9 Recursion
Functions can be called in C recursively, either direct or indirect. For each call a
new stack frame is created. This makes them independent of automatic variables of
previous set.
The following is a simple program to check for well formed parenthesis that
applies recursion and introduces the use of functions like advance() and match() that are
char advance()
{
return (currentToken = *inputString++);
if(currentToken != token)
else
advance();
void parens()
while(currentToken==’(’)
advance();
parens();
match(’)’);
int main()
advance();
parens();
if(currentToken == ’)’)
printf("\nError : ’)’ without matching ’(’");
example is about solving the n-queens problem using recursion (this technique is called
back-tracking algorithm).
#include<math.h>
#include<stdio.h>
#include<stdlib.h>
int x[9];
int num=0;
int xyz=0;
return 0;
return 1;
if(place(k,i))
x[k] = i;
if(k==n)
printf("%6d",num);
printf("%5d",x[j]);
printf("\n");
num++;
else
nQueens(k+1,n);
int main()
nQueens(1,8);
// it prints 92
Exercise 7.2:
Write a recursive function to reverse a string. The solution may seem simple and
direct. However, finding a very good solution to this problem can be challenging (and
interesting!)
Functions, just like data types, are declared and defined. The defined function is
available in the memory. So we can take address of it and store it in a function pointer.
code segment, where that function’s executable code is stored; that is, the address to
int (*fooPtr)( );
Here fooPtr is a function pointer. It can be assigned with any function having no
int foo( );
// or
fooPtr = &foo;
The pointer can be used later to call that corresponding function in two ways.
fooPtr( );
// or
(*fooPtr)( );
For example the * is to indirect the function pointer and the surrounding parenthesis is to
fooPtr( ); // or
(fooPtr)(); // or
(*fooPtr)( ); // or
Similarly due to this the syntax of function pointers, the following is valid:
int fun();
(&fun)();
In other words it can be viewed as if functions are always called via pointers, and
that "real" function names always decay implicitly into pointers (The same reasoning that
applies to the equivalence of a[2], 2[a], *(2+a) and *(a+2). "... a reference to an array is
converted by the compiler to a pointer to the beginning of the array" [Kernighan and
int foo();
sizeof(foo);
// function pointer
The point to remember is that the return type and arguments must be identical for
int (*fp)(float );
fp = foo1;
extern foo2();
fp = foo2;
// Erroneous. There is no way by which the compiler can verify the
// types. If the return type or argument types differ for foo2 then
// undefined behavior.
This is a very crucial point to remember because most of the times the compilers
cannot flag error by noting the mismatch. This is because many of the assignments are
done at runtime and the type information may not be available to the compiler.
routines. bsearch and qsort (defined in <stdlib.h>) are examples for such functions.
This sorts the array base having n elements of each size in ascending order. This
provides sorting function for any type of objects including structures. The comparison
function of each type should be passed and the appropriate size should be given. The
The following example is to show the real world example of using function
pointers. Say you want to write a menu program. The aim is to write a program that will
call corresponding function is selected in the menu at runtime. Therefore, the requirement
is to write declaration a function pointer with int as common return type. It may be
declared as:
int (*fnPtr)( );
switch(select) {
case NEW : fnPtr = & new( ); break;
…..
fnPtr( );
This is an easy example, but it shows they apply in real programming and it is
using these function pointers whose value is determined at runtime is known as ’call
back’.
applications, care should be taken and problems may crop up even in unexpected places.
written as,
return (x-y);
This solution may suffer a problem. If x is a big positive integer and y a big
negative one then x-y may lead to overflow and so will give incorrect output (e. g take
else if ( x == y ) return 0;
else return 1;
Return type and arguments of a function is called as the interface of the function.
It is called as an interface because these are the actual parts that form the window to the
use that function. Functions abstract the implementation of the function as the user is
interested only in using it. Since functions you write may be frequently used, the interface
It is also a not good design to modify the values passed to the functions without
explicitly evident that the change is made. An unusual example of how a function should
Consider the function strtok(). It is used to separate the tokens from the passed
string. If the first argument is not NULL the pointer value is remembered inside the strtok
function to be used in the subsequent calls. In following calls strtok if first argument as
NULL, the tokens from previously remembered string are returned. strtok() returns
NULL when it reaches end of the string. Also when strtok() is called it replaces the end
char *aToken;
puts(aToken);
In this example, at first the strtok gets the address of ‘statement’ in the first call.
In the subsequent calls, it is called with first argument as NULL, so it returns tokens from
the ’statement’ separated by any of the ‘separator’. At end the original string looks like
this,
in which the original string is modified as a side effect of the, main effect of returning a
token from the string (so that the subsequent calls will return the consecutive calls). So it
is necessary for the users to know about the effect of strtok and so he must pass the copy
of the original string rather than the original string itself, which is an undesirable
property.
<stdarg.h>, va_start, va_arg and va_end. va_list is used to declare variable to access the
list. Examples for variable argument functions are the ubiquitous scanf and printf.
Variable for variable argument list should be declared to access the arguments of
type va_list. Then va_start initializes that list and arguments are accessed one by one by
Before getting the next argument, it is necessary to know the argument type (in
arguments passed).
• it can be accessed only sequentially (it means that you cannot access the
#include<stdarg.h>
#include<stdio.h>
va_list printList;
va_start(printList,format);
// variable argument lists depend on the assumption that
if(*p==’%’)
switch(*++p){
case ’d’: {
printf("%d",iVal);
} break;
case ’s’: {
printf("%s",sVal);
} break;
else if(*p==’\\’) {
// escape sequence
switch(*++p) {
}
else
putchar(*p);
va_end(printList);
int main(){
int i=10;
myPrint("format %d %s",109,"sriram");
becomes,
This means that all the code that is calling the function addStudent needs to be
modified to take single argument (this problem of changing the interface can be easily
The idea is to wrap the new function into the old one such that the interface
remains the same. The old code remains unaffected. Similarly, let us consider that the
new structure
typedef struct {
char *name;
int typeOfEntry;
}student;
In this case also wrapper function can be used without affecting the legacy code,
wrapperAddStudent(student *stud){
addStudent(stud->name, stud->typeOfEntry);
Thus the idea of wrapper functions can be precious in reuse of code and while
maintenance is done.
8 DYNAMIC MEMORY ALLOCATION
file <stdlib.h> (also <alloc.h> in some implementations). If at the compilation time itself
allocate right amount of memory since dynamic memory allocation is not the part of the
language itself. If it were part of the language there will be some support to allocate the
right amount of memory but since this is not allocating right amount of memory should
operating system or underlying hardware. There is a memory allocator that will manage
the services required by allocation functions, by acquiring a very large chunk of memory
from the operating system. Then it fragments the memory, keeps track of allocated parts
and deallocates as required. If the allocated block of memory is exhausted, the memory
manager requests for another block from the OS and continues functioning. Interested
memory it silently returns NULL. So, always check if memory is properly allocated.
The following simple program may be used to determine your heap size.
int main(){
int kbs=0;
while(malloc(SIZE))
kbs++;
Never assume about the size of the data type when allocating memory for various
data types. For example say you want to allocate memory for 10 integers. It is portable to
use malloc (sizeof(int)*10), to malloc(2*10) where you assume that an integer is of size 2
bytes.
It is a false notion to believe that malloc allocates exact amount of memory you
request. In practice it may allocate more (or even nothing, if no memory is available by
returning NULL). Say our requirement is to have a linked list of individual characters,
struct node{
char data;
struct node * next;
}*node;
and so may think that it takes approximately three bytes (assuming two bytes for pointer).
But the memory allocator in C is implemented such that it allocates memory only in
So it comes out that dynamic allocation is not suitable for applications where very
small chunks of memory are needed. The solution to this problem may be,
(1) Predict the approximate amount of memory that will be required and allocate
(2) Use your own memory allocator tailored to your need (this may work out very
For allocating large blocks, the available allocation mechanism suites very well. It
is also the duty of the programmer to properly initialize the space allocated dynamically.
calloc() function returns the memory (all initialized to zero) so may be handy to you if
you want to make sure that the memory is properly initialized. calloc is internally malloc
p = malloc(m * n);
memset(p, 0, m * n);
Here the memset() function is employed to initialize the allocated block. This
function is very useful and handy to initialize large block of memory without the need to
traverse the whole array. A point to note here is that the second argument (the value to
initialize) is a character. So for initializing floats or doubles the same old technique of
Similarly, the blocks of memory can be copied using memcpy() and memmove()
functions (memset and these two functions are declared in <string.h>). The only
difference between these two functions is that memmove() can be used with overlapping
memory area, whereas memcpy() for non-overlapping memory areas (of course with
K&R C didn’t have the generic pointer(void *) so it had char * as the return type
for malloc. Why a char * ? Why not an int * or float * ? The reason being that characters
require no padding (since chars are assured to be one byte in length nothing is required to
pad and padding is one of the main reasons why casting should be done between pointer
types). So it is as if you are accessing individual bytes and so served the purpose well.
Since pointers of type void can be assigned to any type without casting you can
since ANSI C says that malloc returns void *. But the old function (char *) malloc()
needed to be casted before being assigned to other pointer types. It is the matter of taste
Exercise 8.1 :
You know that in some machines certain types have to meet the alignment
requirements, for example, ints and floats should start at the even addresses. This will be
taken care by the compiler when allocating memory for such types declared in the
programs. How will you take care of the problem of aligning the types to required
boundaries when you allocate memory explicitly by using dynamic memory allocation
(say malloc)?
¾ if the space can be continuously allocated after ptr, the memory size is
extended and the same ptr is returned. If the space is not available
somewhere else. If so, the data pointed by ptr is copied byte by byte to the
new location and the new address is returned. If size bytes are not available
NULL is returned.
¾ if size is 0 and ptr is not NULL then it acts like free(ptr) (and always returns
NULL).
char *ptr=NULL;
ptr = realloc(ptr,100);
/* extends the block (of size 100) currently pointed by ptr to size
preserves old block contents upto 100 bytes and returns new pointer */
ptr = realloc(ptr,100);
/* shrink the memory block by 150 bytes keeping only first 100 bytes.
ptr = realloc(ptr,0);
/* ptr != NULL and size ==0. So it acts as it free(ptr) is called. */
Due to its dynamic behavior as free, malloc, and realloc depending on the
it behaves like this depending on the arguments passed. It is meaningful and gives the
power. However, do not do this in your programs. This approach is suitable for library
functions. You cannot expect users of your users be aware of such behavior by your
functions depending on the arguments passed. If you have to write such function provide
four functions, each for one behavior (like shrinkMem, extendMem etc.). This will make
your code more readable and allow your users to select the function depending on the
functionality required.
Consider a problem like a string that may require growing string at runtime by
concatenating it. realloc() comes here handy for implementing this growing arrays. You
pass the pointer to the already available string(p1) and the string to be concatenated(p2).
if(p1 == NULL)
return NULL;
return NULL;
strcat(p1, p2);
return p1;
A subtle problem is there in the realloc it to source. If realloc fails then source
will be set to NULL so the pointer to previously allocated string will be lost. So the
char *temp;
strlen(target)+1);
return -1;
source = temp;
strcat(source, target);
…
By introducing a new temporary variable temp, the original array is preserved
Since there is close relationship between pointers and arrays, the memory
for(i=0;i<10;i++)
array[i]=0;
// array access
8.5 free()
Not freeing the dynamically allocated memory after use may lead to serious
free() assumes that the argument given is a pointer to the memory that is to be
freed and performs no check to verify that memory has already been allocated. Freeing
the unallocated memory will lead to undefined behavior. Similarly if free() is called with
invalid argument that may collapse the memory management mechanism. So always
1988],
free (ptr);
Here, in the expression ptr = ptr -> next, ptr is accessed after ptr is released using
function free (This also serves as example for dangling pointers where the pointer is used
even after freeing the block). So the behavior of this code segment is undefined.
For this problem [Kernighan and Ritchie 1988] suggests a simple solution:
introduce a temporary variable to hold the address to be pointed next. The code can now
be written as:
temp=ptr->next;
free (ptr);
Another frequently made mistake is to free a same block twice. This may occur
accidentally if two pointers point an object and you call free by using both the pointers. A
good point is to remember is that the pointer is not set to NULL after it is freed, and so it
is our duty to make sure that this problem does not occur. Most of the compilers do not
pose any problem in freeing a pointer whose value is NULL. So it is a good idea to set
the pointer to NULL after freeing it and if the same pointer used in free will not cause
problems.
int * ptr;
ptr = malloc(10);
….
free(ptr);
// freeing doesn’t set ptr to NULL
ptr = NULL;
free(ptr);
// now no problem.
but that has to be used very carefully. C leaves this dynamic memory management to be
statement begins with a # symbol in a separate line and so is not a free formatted one.
allows preprocessor to just check only the first character of every line and determine if it
Preprocessing is a powerful tool that should be used with care because it can result in
hard to find errors, because the code after preprocessing is not visible/transparent to the
user. The code what the user sees is different from the preprocessed code that is sent to
the compiler and so logically forms a layer between the user and the actual code, making
it hard to debug.
its close association with low-level programming and assembly language programming.
Its power is most often underutilized because macro-processors are more familiar to
¾ conditional compilation,
¾ text replacement,
¾ file inclusion.
¾ stringization operation,
9.1 Comments
Comments can be present in ’any’ part of the C code where a white space is
allowed. Comments are stripped from the source code and a white space is inserted in
[Kernighan and Ritchie 1988] specifies that the comments can not be
fails, because the first /* ends at the first */. This leaves,
But it is a convenient for the programmers to use such comments and some
compilers have the option to have nested comments. An interesting problem on nested
comments is discussed in [KOE-89]. The problem is to write a C code that would run in
the compilers that support both the nested comments and normal comments and find out
it is being run in such a compiler without error messages. Interesting problem indeed! I
didn’t feel the toughness of the problem till I tried. He gives a hint too: “a comment
symbol /* inside a quoted string is just part of the string; a double quote " " inside a
comment is part of the comment”. The solution finally [KOE-89] give is complex but the
/*/*0*/**/1
Exercise 9.1 :
Write simple C code to strip the C style comments from the source code.
E.g. aContin\
uousToken
This facility is also useful in defining lengthy #defines because #defines are
This allows the macro replacement text to be typed in the next line and by the line
String substitution can serve number of purposes where it really suits to the
purpose and other alternatives (like functions, consts) are not suitable. It
Even though other better alternatives to declare a constant exist like consts and
It is better to declare constant variables like this instead of #defines because type-
checking can be done by the compiler and type errors can be caught easily.
¾ #defines,
¾ enums
For me, selecting the way to declare a constant mostly makes no much difference
There are some predefined macro constants for use in the programs.
__LINE__ : Has the current line number. If you want to print the information of in
__DATE__: Replaces the string "date" where __DATE__ appears at the place of preprocessing.
__TIME__ : Replaces the text "time" where __TIME__ appears and is the time at that
point of preprocessing.
#if defined(__STDC__)
compiler");
#elif
// without parameters
demoFnDcl();
#endif
__FILE__, __LINE__);
__TIME__);
When the # symbol is given without any text serves no purpose but to increase
readability.
9.6 #line
The compiler keeps track of the line numbers in the program code for indicating
the compiler errors. For example, in your compiler you may get a compiler error like this:
“myprog.c”, line 100: syntax error - undefined symbol ‘z’
The preprocessor command #line helps to change the line number and the filename
#line lineNumber
the optional "fileName" forces the __FILE__ constant to be changed to the given
value.
#line 99 "newfile.c"
",__FILE__,__LINE__);
may be used in utilities like syntax analyser or a compiler to show the error messages
Let us have another example. Assume that the following code is given in the same
file.
void funOne() 1
{ 2
int i; 3
i = j + 10; 4
} 5
#line 1 funTwo
void funTwo() 1
{ 2
} 4
This helps debugging better in the bigger programs. The line command forces the
implementation dependent or machine specific. This is by the #pragma directive. For e.g
#pragma startup
Header files are included by using the #include directive. Purpose of using header
files include,
file programs.
// this is "myheader.h"
#define _MYHEADER_
// this is "actual.c"
#ifndef _MYHEADER_
#include "myheader.h"
#endif
Header files can only be text files. It is better to include declarations of functions
and other data structures in your own header files. It is better to keep coding part separate
#include "somefilename"
characters are not considered and similarly any \ inside "somefilename" need
second header file. That is, in order for a declaration in a header to compile without error,
the compiler must have already included another header file. There are primarily two
ways of satisfying this requirement: shallow and deep nesting of header files.
The ‘shallow nesting’ approach forces the programmer to explicitly #include the
// contents of first.h
struct s1 { ... };
// contents of second.h
// note here that this requires the first.h be for using this
// contents of myprog.c
#include “first.h”
#include “second.h”
struct s2 someS2;
// O.K. No problem
As you can see, the inclusion of the file “second.h” requires that “first.h” is
On the other hand, in “deep nesting” approach, automates much of the work of
// contents of first.h
struct s1 { ... };
// contents of second.h
#include “first.h”
This kind of nesting relieves the programmer from the burden of manual inclusion. This
approach is preferable when an entire header file must be processed to enable the
compilation of a second header. But they have a problem. Consider the code:
// contents of common.h
struct s1 { ... };
// contents of first.h
#include “common.h”
// contents of second.h
#include “common.h”
// contents of myprog.c
#include “first.h”
#include “second.h”
So this may lead for possible redefinition errors to occur (of course, this problem can be
It is not always possible to follow the same approach and so, depending on the
situation, the nesting approach should be chosen. In general try to use forward
declarations to avoid nesting of headers (but in the example discussed this is not possible)
Another interesting problem arises when you try to include one header file in
another when forward declarations are not possible. Consider the code:
// contents of first.h
#include "second.h"
struct s1 {
struct s2 someS2;
};
// contents of second.h
#include "first.h"
struct s2 {
struct s1 someS1;
};
// recursion”
In most of the programs, lot of standard header files and other header files are
included. The compiler parses the contents of the header files and the information is
included in the symbol tables, occupying space. In case of projects that are linked from
various files, there is much chance that the header files are reincluded and thus processed
again by the compiler again redoing the processing already done. Thus, most of the
compiler’s time is spent like this, care should be taken not to reinclude any header files
(we just saw techniques for avoiding such reinclusion of header files).
Some compilers (for instance Borland compilers) provide an option for handling
Symbol tables corresponding to the parse of header files when they are compiled for the
first time are entered and stored into the disk as files (possibly with .SYM extension).
The next time the header file is included by any other files the information from the
corresponding file already stored is loaded and used. This greatly improves the speed of
the compilation of the re-included header files. Using this compiler facility will not affect
portability in any way. See your compiler documentation for more information about
foo()
{
return 1;
int main()
Preprocessor does not have a separate namespace. It just operates on code before
compiler operates on it. So the preprocessor tokens and the program text share the same
name space and the above example points out the problem due to this fact. This is one of
the reasons why using preprocessor may lead to hard to find errors.
replacement text is stored without expanding it with arguments (the effect is that any
errors in macro expansions are reported only if macros are called in the source text).
#if defined(getchar)
#if defined(getchar(ch))
are equivalent.
int main(){
puts(something);
The use of preprocessor is not only limited to including files and macros.
# if defined (__STDC__)
#else
complier");
#endif
This also helps in eliminating the redundant declarative code. What will happen if
a header file is included in the program like this? Will the code be included twice?
#include <stdio.h>
#include <stdio.h>
No. It will include the required information only once since conditional inclusion
is made.
// inside <stdio.h>
# if !defined(__STDIO_H)
#define __STDIO_H
...
#endif
When the #include <stdio.h> is encountered for the first time the macro constant
__STDIO_H gets defined. For the second time the preprocessor encounters
again. The same idea can be used for the header files written by the users for guarding
included.
The following example demonstrates how the code can be used at the
#else
testPrint(str1,str2)
#endif
......
#if defined(TEST)
int main( ){
.....
#end if
contain sizeof expressions, enumeration constants, type casts to any type, or floating-type
constants.
naïve one and it cannot recognize (parse) typenames. In addition to that, sizeof is a compile
time operator and sophistication (one such that is available in the semantic analyzer) is
needed to find out the size of the type or the size required for storing the result of the
expression.
So, macro expressions are not ordinary expressions. Always be cautious while using
#undef SOMETHING
#if SOMETHING != 0
#else
#endif
errors. For e.g. if you forget to include <limits.h>, then the following condition becomes
#endif
The ‘assert’ macro (defined in <assert.h>) is a fine example of how you can use
While testing the programs where conditional compilation is done, all the
defined is a special operator in preprocessor which can be used after #if or #elif.
which makes code crisp and to the point and the above code cannot be given in a
single #ifdef statement and will require multiple #ifdefs to achieve the same result.
The main differences between the macros and functions circle around the following
four points:
¾ Type-checking,
Macro expansion is not just text replacement with arguments; it is C’s version of
pass by name functions. Since the preprocessor does it, macros do not know types and so
Macros also mimics ’inline functions’ which is compile-time operation and also
enables crisp code and faster operation since there is no overload of creating and
destroying stack frames (i.e. no overhead of function calls is involved). This performance
gain was valuable at the time when C was designed and the machines were slow. Heavy
use of macros in standard header files demonstrates this fact. But this notion is losing
ground today because, modern machines are considerably efficient and fast and the extra
Consider a very simple example of writing a macro for swapping two values.
int a =10;
int b = 20;
swap(a,b);
int j = 1;
swap(j,a[j]);
// it prints “j = 2 , a[j] = 1”
However, there is a subtle bug in the program. It seems to work correctly, but
mysteriously the array now contains the values {1,2,1}! What happened in macro
replacement?
j is assigned with the expected value. The problem is with the statement a[j] = temp; Here
what happens is, the value of j is modified to 2 due to the assignment j = a[j]. Now this
new value of j is used to access the value a[j] (i.e. a[2]) which is assigned to the value the
value temp (which is 1). In other words the assignment a[2] = temp has taken place
leading the mysterious change (in the output you print values of the modified j, so output
Then you may ask me to give the correct macro. For your surprise, generalized
Similarly the ubiquitous factorial example used to show the use of recursion
Since many of library ’functions’ in header files are macros, is it true that you
For example:
char (*fp)();
fp=getchar;
standard library function. So, there exists both macro and function for certain ‘library
functions’.
When a header file declares both a function and a macro version of a routine, the
macro definition takes precedence, because it always appears after the function
declaration. When you invoke a routine that is implemented as both a function and a
macro, you can force the compiler to use the function version in two ways:
a = toupper(a);
a = (toupper)(a);
#include <ctype.h>
#undef toupper
For comparison lets see how assert ‘macro’ can be implemented using functions.
if (!cond)
abort();
#ifdef NDEBUG
#else
__FILE__, __LINE__)
#endif
So the availability of both the macro and function equivalent makes sure that the
address of that ‘function’ can be taken and passed as function pointer to other functions.
10 STANDARD HEADER FILES AND I/ O
This chapter gives a quick look at what the standard header files in alphabetical
order to show what they can offer. <stdio.h> is discussed in detail in this chapter because
many of the functionality for standard I/O are available in this standerd header and so is
important.
There are many header files supported by your compiler. For example
<graphics.h>, <conio.h> are supported in Borland, Turbo C compilers which are non-
standard and are platform specific for graphical and console input/output functions
respectively. ANSI C gives a set of standard header files that have to be supported by
every ANSI conforming compilers and using the functions is highly recommended and is
assured to be portable.
This chapter deals with the details associated with these standard header files.
The header files may contain functions and function like macros. The word
‘function’ in this chapter may refer to both functions and macros defined in it.
It may lead to mystifying bugs if you use any function or variable name that is
same as the runtime function identifiers. Also avoid using variables starting with _
The library function names share the same namespace of the ordinary variables
if(pow)
always evaluates to true. The programmer intended to use the variable name pow
and since pow is the name of a library function and ‘if’ condition checks for its address
that is always a non-zero value. In particular library function names should not be
if( p == 0)
malloc() returns 0 if it cannot allocate memory to indicate error. This type of error
indication is better than to give runtime error and then terminate the program abnormally.
Error indication is part of the return values in library functions and it is customary to use
non-zero values for success and 0 for failure (as in the previous case). It is also the duty
of the programmers to check (or to forget/overlook) for the possible error that may have
occurred.
Returning non-zero to indicate success in is true only for C library functions. For
UNIX it is the other way. There 0 indicates success (as in exit(0) for successful/normal
exit) ). This is because the return value is not for the use in the program. Rather it is for
use by the OS possibly UNIX, so is one such exception. Another examples for indicating
errors by return values is getchar(), getc() etc. that may return EOF.
getc() returns EOF when some error occurs while reading or if it cannot read from
the specified file. If end-of-file is reached in the file it is reading also it returns EOF. It is
left to the programmer for finding the cause of return of EOF. For example:
int ch = getc(someStream);
if(ch==EOF)
{
if(ferror(someStream))
clearrerr(someStream);
else
Indicating errors by return values poses no problem as far as the values for error
and the valid values that should be returned by the function doesn’t overlap.
the error value (i.e. 0) and valid return value because 0 is not a valid pointer value. But
int i = atoi(str);
if( i == 0)
Can you spot out the problem? atoi converts the string value “0” successfully to
integer value 0 and returns the same. The checking condition checks for 0 and mistakes it
to be an error in conversion and issues an error. In this case, overlapping of the valid
return values and the error condition is there, resulting in subtle bug/error. Another such
If you were to design the interface for such functions, an alternative approach can
be used.
separating the return value which can either return true/false (1/0) and the value that is
int i;
if( flag == 0)
this means extra effort to programmer, but works for most of the cases.
if(myGetChar(&ch))
// no problem
else
On the other hand the convenience of using old shortcuts like printf(“%d”,
This header file contains the assert macro that is for minimal support available in
C language for debugging purpose. Enough has been discussed already about its use and
implementations. Each of the implementation of assert macro have heir own merits and
#ifndef NDEBUG
#define assert(cond,fileName,lineNo) \
if(cond) \
{} \
else \
__assertFun__ (cond,fileName,lineNo) \
#else
#define assert(cond,fileName,lineNo)
#endif
// The function definition should not be included twice and so should not
#ifndef NDEBUG
fflush(NULL);
fflush(stderr);
abort();
#endif
One point to note while using the assert statements is that there shouldn’t be any side-
effects involved in assert expressions. For example can you see what may go wrong with
this statement?
assert(val++ != 0);
In this case the assert expression involves side-effects. So the behavior of the code
becomes different in case of debug version and the release version thus leading to a
int *foo(){
assert(s != NULL);
return s;
This is wrong. Because ‘assert’ would be disabled when code is released and so
there is no way to handle the dynamic memory allocation failure at runtime. So a plain if
statement checking the condition and the corresponding remedy statement has to be
given.
10.6 <ctype.h>
This header file contains functions/macros for that are used for testing the
explicitly checking the values of the characters that are particular to a character set (say
10.7 <errno.h>
occurs then the corresponding handler will be called. Another approach by C is by letting
the user to check for any erroneous conditions. ‘errno’ is a global value available to the
Each possible error recognized by the system is assigned a unique value. The
standard library functions may also set errno. The most recently occurred error would be
available in the errno. It should be remembered that neither examining the errno does not
reset the error condition nor any other library functions reset it to 0. So to detect an error
first set errno to zero, call the library function and after that to verify that the library
implementation.
Return Value:
#include<errno.h>
#include<math.h>
#include <stdio.h>
int main(void)
double result;
double x = -1;
errno = 0;
result = log(x);
if(errno == EDOM)
perror("domain error");
perror("range error");
else
result);
return 0;
perror() is the standard library function that will send an error to the stderr.
10.8 <floa t.h> a nd <limits.h>
These two header files defines constants that specify various the numerical limits.
Since library functions are included in thousands of source files, efficiency of the
library was a driving force behind designing library functions. For example, consider the
pow() library function in <math.h>. The exponentiation operation is not a part of the
language (as in some other languages). The programmer has to explicitly invoke the
function to perform the operation. This is because the language design decisions was
made such that the features that may be implemented efficiently using runtime functions
10.11 <setjmp.h>
This header file has macros setjmp and longjmp to support non-local jumps.
Setjmp saves the entire state of the computer’s CPU in a buffer declared in the
jmp_buf jbuf; statement and longjmp restores it exactly with the exception of the register
variables being used at the time of the longjmp call are going to be lost forever. Note that
any pointer to memory allocated from the heap will also be lost, and you will be unable to
access the data stored in the buffer. The solution is to keep a record of the buffers’
#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>
jmp_buf jbuf;
int callSomeRoutine();
int main(){
int val;
val = setjmp(jbuf);
if(val != 0){
exit(1);
callSomeRoutine();
int callSomeRoutine(){
printf("\nThis is from the subroutine");
longjmp(jbuf,1);
} // it prints
It is well known that, the usage of goto can be completely eliminated by using
This standard header file provides the prototypes for signal and raise functions.
signal() is used for installing the handler for the various signals,
¾ SIG_DFL
¾ SIG_IGN
These are the six standard signals defined by the ANSI standard. Addition to this the
implementation may support more signals. The user cannot declare his/her own signals,
void myHandler(int);
signal(SIGSEGV, myHandler);
.....
raise(SIGSEGV)
Handlers are meant for recovering or resuming from exception occurred. There
are predefined signal handlers available for each signal and for installing the default
signal, the corresponding sigName has to be called in signal() with second argument as
SIG_DFL To ignore the signal sigName install that sigName with fName value
SIG_IGN.
signal(SIGFPE, SIG_DFL);
Signal(SIGILL, SIG_IGN);
installed handler if it can successfully install the function, and in case of failure, it returns
SIG_ERR.
SIGSEGV");
Under event management, interrupts can be handled using signals. Examples for such
<signal.h> also defines the type sig_atomic_t. This is the type with which the
exit handlers can communicate to other functions. The word ’atomic’indicates that any
assignments that are made to the objects of this type are done atomically free of hindered
by interrupts.
This header file is for the support of variable argument lists that are described in
the chapter on functions. It has the declaration of type va_list and the macros va_start,
10.14 <stddef.h>
This is the most important header file that almost all projects use because it
constains functions for I/O, file management etc. This is a big header file that nearly one
third of functions/macros in the standard library are from this header file.
The C language provides no facility for I/O, leaving this job to library routines.
[Ritchie et al.] illustrates one difficulty with this approach. In machines where int is not
the same as long, to indicate the difference the format specifier may be writen as “%D”
instead of “%d”.
“Thus, changing the type of x involves changing not only its declaration, but also
other parts of the program. If I/O were built into the language, the association between
the type of an expression and the format in which it is printed could be reconciled by the
compiler.”
In other words, separating the I/O part from the language proper may have been a
good design approach. This will make the language small and also leave it to the libraries
to take care of. And this has few disadvantages. By not being part of the language, the I/O
int i;
printf(“%d”, i);
Here the format string is required to specify the type of the argument associated
with i. If it were the part of the language this redundant information will not be required.
The problem of mismatch of number of items in format string and the arguments cannot
be found out at compile time for the same reason. And also if a change is made in the
type or number of arguments in the I/O routine, the change must be reflected in the
",fVar,sizeof(fVar));
This is an example of the problems that may arise due to the mismatch between the
format string and the arguments. The first argument is a floating point variable and it
takes 4 bytes. But the format string expects the first argument to be an integer (%d) and
so reads two bytes from the stream (it doesn’t do any typecast that you may expect). The
effect is reflected in the argument read next (i.e. the sizeof(fVar)) leading to wrong
output.
The main aim behind designing the standard I/O library is the provision for
efficient, extensive, portable file access facilities and easy formatting. The routines that
invisible to the user, which minimizes both the number of actual file accesses and the
that are read or written pass through FILE’s character buffer. For e.g. you send character
by character to a file using a FILE *. You are actually writing to that FILE’s character
buffer and when that buffer gets filled the data is written in bulk (block) to the file
intended to be written. Similarly while reading from file/any input device the data read is
through the FILE’s buffer and when it gets full, it will call the read primitive/system call.
These details are completely hidden from the users. This buffering makes limited
use of system calls (like few calls to read/write system calls). It also avoids unnecessary
manipulation directly by the user (the user need not maintain the buffer explicitly) and
offers a high-level of abstraction that they can be implemented in any supported OS.
¾ file pointer,
¾ buffer pointer,
¾ file cursor.
It is the pointer to the FILE structure. From this pointer only the information
regarding the file can be accessed. In other words, it is the address of the allocated stream
This is the character pointer that points to the character buffer that is internally
maintained. Its only when the buffer pointer reaches the end-of-buffer the buffer is
flushed.
It is the pointer that keeps track of the current access position of the file used.
Whenever you use the functions like fread, fseek, fsetpos etc. you are actually updating
the value file cursor. Similarly the fgetpos, ftell etc. return the current position of the file
cursor.
10.15.4 Streams
destination of data that may be associated with a disk or other peripherals. For usage
stream can be considered to be file pointers. For e.g., the prototype for the fflush is
fflush(stdio);
Predefined streams that are opened automatically when the program is opened are
stdio, stdout, stderr (standard input, standard output, standard error and devices 0,1,2
respectively). A device (also known as file handle) is a generalized idea of file and can be
treated as if it were a file. For example, say the stdio may be the keyboard. C treats the
information coming from the keyboard as if it were from a text file (it even returns EOF
Practically both the stdout and stderr write to the console only:
have the same effect. But the second one is preferable to the first one for two main
reasons.
Consider the case of redirecting the output to other files as input (for example: as
pipes in Unix). In such cases, if the error message is written to the stdout that will go as
junk input to the receiving file. So, prefer using stderr to stdout in issuing errors.
The second reason is the difference in way stdout and stderr works. The stdout
writes to buffer, so it may take some time to send the output to the device, whereas the
write to stderr is an un-buffered one. This means, the information written to stderr is
immediately sent to the device, without being stored and waited in the buffer to be
flushed.
So while writing error messages to stdout, use fflush(stdout) to write the out
output fully, to make sure that the message is shown without delay. Because the possibly
following abort() function makes program terminate without properly flushing and
fflush(stdout);
abort();
Or simply use:
abort();
2) blockI/O
When you want to do input or output to a file, you have a choice of two basic
mechanisms for representing the connection between your program and the file: file
descriptors and streams. File descriptors are represented as objects of type int, while
Internally they use the primitive, low-level operations for reading and writing of files.
You can use the functions like open that return the file descriptors rather than the file
pointers and are considerably low-level. That will increase the efficiency, but the
portability will be lost. The other operating systems may not support the same primitives
and facilities.
However, the main advantage is that the streams provide richer, powerful
facilities and formatting features in the form of numerous functions. The streams take
care of lot of details like buffering internally so that the programmer does not have to
bother about them. For example, the Unix OS has lot of system calls that can access and
get the job done from the OS directly. But the C programmers for Unix still prefer to use
Whereas the operating system primitives (like Unix system calls) provide only the
basic facilities like block read and write and programmer have to take care of managing
and keeping track of it. The operating system primitives are advantageous in cases. For
example, the operations that has to be done, that are particular to a device. In such cases,
So, the selection between the OS primitives and standard I/O functions depends
on the need. To have higher portability stick to streams unless you want some direct
access of special functionality. You can also open a file initially as low-level one and
In older versions of C, ‘long float’ was a synonym for double. So a format specifier
for long float and double remains the same (you use %lf for printing the double value in
printf/scanf)
Exercise 9.2 :
There were three records stored in the file pointed by fp, but the following code
while(!feof(fp))
puts(studentRec.name);
Can you guess why there is no distinct format specifier for double in printf/scanf
The functions getchar() and putchar() get/put a character from the stdin/stdout and
getchar() == getc(stdin);
putchar() == putc(int,stdout);
Since the character input/output from the stdin/stdout is the are frequently used
and using getc and putc versions that take FILE * as one of their arguments is tedious.
and they act to get/put a char from the specified file. They are macros. The equivalent
and the functionality of the macro and the function versions remain the same. This
is one of few explicit places in standard library where both the macro and function
As you can see, the arguments and types are ints that is counter-intuitive. There
¾ to enable the characters of size more than one byte to be handled by the
implementations
// do something
This problem is eliminated if the return type is int and the ch is of type int,
// do something
10.15.7 getchar()
getchar() terminates only when a newline character is got. This may create
problems in interactive environments if getchar() is used there for user response. Since
getchar() terminates only on seeing a newline and returns only one character the
remaining characters that may have been pressed shall remain in input buffer. This may
aggravate the problem. The problem may be solved by replacing getchar() by
getch()/getche() (but they are not part of standard header file and are declared in
The functions in <stdio.h> fseek and ftell return long. This limits the file size that
can be handled by these functions to LONG_MAX. To avoid this, ANSI has defined two
functions fgetpos() and fsetpos() that returns fpos_t (which is a platform dependent file
One of the ways to run one program from another is using ‘system’ function:
where the content of the commandStr depends on the source operating system. It is
normally sent as command string for the shell (command interpreter) of the underlying
operating system.
The major disadvantage of the system function is that the program cannot access
and use the output of the command it runs in the 'shell'. For this there are two routines
provided:
these are non-standard ones that are available in some systems that return the file pointer
for manipulating the output of the result of the execution of the 'commandStr'.
10.16 <stdlib.h>
10.16.1 exit()
exit() takes integer as argument. exit(0) indicates normal program termination and
successfully and want to return back use exit(0) (logically to OS and User). If you
encountered any abnormal condition or runtime error and want to quit prematurely use
exit(1). This return value may be used by the O.S (say UNIX) to see if the program has
successfully executed or not (but this value is normally ignored). exit() calls up all the
abort() is used in case of serious program error and thus for abnormal program
termination. abort() does not return control to the calling process. By default, it
terminates the current process and returns an exit code of 3. Abort doesn’t call any exit
handlers (atexit()) and immediately returns control to O.S. (using signals abort() can be
made to call any cleanup functions). It also doesn’t flush the stream buffers. It may be
void abort(void)
raise(SIGABRT);
exit(EXIT_FAILURE);
As you can see, there are differences between using exit() and abort() and
exit(int) causes atexit() to be called and other termination actions to take place.
abort() causes to terminate the program abnormally and immediately without any rollup
actions.
10.16.3 atexit()
You may require performing some cleanup tasks or other tasks before the
program terminate. For this task atexit() function comes handy. It takes pointer to a
function as argument. The functions pointed by atexit will be called before the program
void rollup()
int main()
if(atexit(rollup) != 0)
exit(1);
This prints “I am being called by exit()” because while exiting from program exit
The functions registered with atexit are also called as exit-handlers. Since exit-
handlers are called for roll-up activities and resource-freeing activities, they are
10.17 <string.h>
All of the functions that are listed in <string.h> rely on NULL termination of the
strings. If NULL is missing then these functions will continue processing possibly
corrupting memory as they go. For most of the string family of functions listed in
<string.h> you will see a corresponding ‘n’ family of functions i.e. strcpy and strncpy,
stricmp and strnicmp, strcat and strncat etc. The corresponding ‘n’ functions perform the
same tasks as their corresponding counter parts with the exception that they take an extra
parameter ‘n’ which specifies how many characters in the string to operate on. It is
strongly recommended that this ‘n’ family of functions be used rather than their
counterparts if you cannot always ensure that the strings you use are properly NULL
terminated.
Note: However use of strncpy (a function in such ‘n’ family of functions) require a
special caution. The problem is its inconsistent behavior. Sometimes it terminates the
destination string with NULL char and sometimes it doesn’t. So don’t depend on the
char t[10],s[10]=”something”;
strncpy(t,s,4);
t[4] = ‘\0’;
The standard library particularly <string.h> consists of many functions that are
weird looking and rarely used. For e.g. the strspn and strcspn functions.
is the prototype for strspn (string span). It returns maximum length of the initial sub-
strspn(“oursystem”,”aeiou”);
will return 2 because the initial substring ou is made entirely of the characters in the
strcspn is the compliment of the strspn function. It returns the span of the first
strspn(“system”,”aeiou”);
yields 4 because the initial substring syst is made up of the characters that are not in
second string.
Exercise 10.1 :
To get string input from the keyboard which one of the following is better?
1) gets(inputString)
Exercise 10.2:
printf(“%s”,str);
printf(str);
10.18 <time.h>
It has functions for manipulating date and time. One of the many requirements for
standard library but one can be easily written using the clock() function available in this
header file.
clock_t target;
; /* null statement */
should be done in seconds, then new function need not be written and the existing itself
This constant CLOCKS_PER_SEC refers to the number of clock ticks that tick in
a second. So the call that is given here makes the program to wait for 1 second (In some
clock() function returns the processor time elapsed since the program invocation.
So this function can be used to find the efficiency of the code and for testing purposes.
clock_t start,finish;
long duration,i;
start = clock();
for(i=0;i<100000000L;i--) // the code to be tested
finish = clock();
Object-oriented design is a good design methodology that helps in overall design of the
software. This chapter explores the relationship of object-orientation and C and how to
Take an abstract data type (say stack). Object-oriented languages enforces the
accessibility of the stack only to its member functions (methods). It prevents the illegal
use of the data by the programmer purposefully sometimes and accidentally most of the
times. By this way it allows only certain operations on it, the necessary details are known
to the user (programmer). The same can be enforced in procedural language like C by
strict standards and the careful coding by the programmer (but the same level of design
// Note that the static qualifier that limits them to file scope.
elementType pop();
void push(elementType );
boolean isEmpty();
boolean isFull();
In a high level of abstraction each file can be treated as a class. The global
variables shall become ‘public’ variables and the static variables become ‘private’
variables. The functions declared acts on the data available are much like the methods in
object-oriented languages that act on the class/object data. Most of the functionality of
object-orientation can be viewed like this. But the ‘variables’ of type ‘stack.c’, that is the
Object-orientation is strongly based on procedural nature (even though this fact may not
illustration of this idea, let me show how object-orientation can improve readability.
sscanf(char *,...)
sprintf(char *, ...)
fscanf(FILE *,...);
fprintf(FILE *,...);
that take FILE * as arguments. A look through the standard library shows that there are
many clones for the same scanf and printf functions, that are general purpose (in user’s
point of view).
str.scanf();
str.printf();
// and
fp.scanf();
fp.printf();
etc. where the printf and scanf names are used with the same name, but after a
qualification. Invention of new names is not necessary and the readability also increases
because of the usage of printf and scanf in different namespaces. Or else a single version
of overloaded scanf or printf functions can be provided and depending on the arguments
Extending the same idea to for the FILE object: you can look fopen() as a
construtor, fprintf, fscanf,fgetc etc. as member functions and fclose() as destructor. This
leads to simplicity of organization of ideas and encapsulation and power, and is thus the
done in programming languages such as C. Nothing is far from true. Even there is an
existence theorem stating that Objective-C is implemented in C (there are even some
using procedural languages efficient, easy and maintainable?” shall be a better question
to discuss about.
the designs. But such designs are best implemented using object-oriented languages.
with some difficulty. Even if you don’t plan or need object-oriented design for your
language must map object-oriented concepts into the target language, whereas the
Object-oriented languages enforce the constraints externally but the base remains the
same. To illustrate, the early implementations of C++, converted the C++ code to C code
(does it looks same as object-oriented programs written in C?). Eiffel is an object-
oriented language. Eiffel compilers translate source programs into C. A C compiler then
compiles the generated code to execute it later. Another such example is the DSM
language.
Object-orientation is not the panacea for all problems in programming and one
✝
such example is the performance degradation . Of course, the main objective in using
object-orientation is not power or efficiency but making the programs more robust,
maintainable and reusable etc., and to make the programmers life easier. So it is worth
code reuse. Conventional languages don’t have that mechanism, so for code reuse extra work
In [Martin et al. 1991] the authors have mentioned three basic ways to do this
1. Physically copy code from super-type modules (copy the code and have proper
2. Call the routine in super-type modules (call the copied modules from the extra
code with proper maintenance code. This works as long as all the information
regarding the subtypes are known and clear of what aspects to inherit),
✝
This is not a categorical statement. It’s just comparison of performance of procedural Vs object-oriented
languages, to give Fortran and C as examples of high-performance languages.
3. Build an inheritance support system.
easily implemented. This is because of its features like presence of pointers - particularly
function pointers, its loosely typed nature, dynamic memory allocation. Lets now discuss
theoretically and having an example after that for illustrating how these concepts
materialize.
¾ Representing Classes
between the class data members and the structure members. In other words, the classes
methods of the class with the structure should be put into a separate file unit.
¾ Encapsulation
¾ Representing Methods
¾ Creating objects
Creating objects is just the same as creating structure variables. But the
explicitly. The dynamic allocation facility (malloc and free) can be used for that.
¾ Inheritance
¾ Miscellaneous support
Object-oriented languages support the concept of pointer to the self (like ‘this’
pointer in C++ and ‘self’ in Ada). Passing an extra parameter (as first parameter) to all
oriented principles.
11.4.1.1 Implementing the basic parts
struct stack{
int top;
elementType array[STK_SIZE];
};
as a data-structure in form of a structure that has encapsulation and the methods are
If you want the variable (object) of type stack, it’s just as simple as,
stack aStack;
But this has function pointers that have to be initialized. The function pointers
{
if(stk->top)
return stk->array[stk->top--];
stk->array[++stk->top] = element;
else
Now an initializer function (constructor) has to be called for each variable before
stk->top = 0;
stk->pop = popFun;
stk->push = pushFun;
stk->isEmpty = isEmptyFun;
stk->isFull = isFullFun;
}
The code which uses this stack structure looks like this:
int main()
stack aStack;
init(&aStack);
aStack.push(&aStack,20);
printf("%d",aStack.pop(&aStack,20));
This implementation just does what is required and can be improved. The space
can be made allocated dynamically and freed when the scope exits. For such dynamic
if(stk==0)
object”);
exit(0);
stk->top = 0;
stk->pop = popFun;
stk->push = pushFun;
stk->isEmpty = isEmptyFun;
stk->isFull = isFullFun;
allocation:
stack *someStack;
init(someStack);
In the structure, you can see that for each function (method) supported in the
structure the function pointers occupies space. Since the functions are going to remain
same for all the objects associated with the class they can be put in a separate structure
called as ‘class descriptor’. This will make the Stack structure to contain:
struct stack{
int top;
elementType array[STK_SIZE];
};
struct classDescriptor{
elementType (*pop)(Stack *);
};
Now every object of the stack type is enough to contain the pointer to the
classDescriptor. This will greatly minimize the size of the object required to support the
member functions. The price is the extra level of indirection that have to be applied for
The same idea of class descriptor is be used for implementing inheritance. Single
inheritance is direct and easy. Add the new data and class descriptors to the new class
method.
Similarly the features like polymorphism, exception handling etc. can be handled. These
features are more exploited in the object-oriented languages that generate C as the target
code than the C programmers do. The older versions of C++ and Objective-C had similar
in it (after some work). Many of the languages and application systems indeed generate C
code as output.
The users who are more interested both in the C and the object technology the
solutions are the object extensions to the C language like C++ and Java (and possibly C#
in near future). But it will be interesting to see how C itself can be used to emulate
As I have said, it is possible, but it doesn’t mean that we have to use object
mind. C is better used as a system programming language and that’s what it is meant for.
language. But what about the millions of code written in C? If I want object-oriented
design then should I start everything from scratch, forget and throw all the hard work
previously done? In such cases the idea of using C for object-oriented design is
tough. But it will serve the purpose. The example we just saw explores the possibilities of
Today C++ and Java are the most famous object-oriented languages.
C++ follows the merged approach with C to that of orthogonal approach by
Objective-C. This means that the C programs can still be written in C++ and C is
C# is a new language from Microsoft. All the three are object-oriented and are
based on C. They have the strong base built by C. They improve upon C by having
object-orientation, removing the problematic and erroneous parts of C and add upon
Separate chapters are devoted for discussing the languages C++, Java and C# as a
comparison between C.
11.6 Objective-C
an additional layer of object orientation and the code is in turn converted to bare C code.
This kind of design has a particular advantage. The syntax need not be the same as that of
- Genesis 11:1
‘C with classes’ was the answer to the object oriented extension to C language by
Bjarne Stroustrup. It was modified and named as C++. C++ is a separate language that is
complex and huge compared to C and approach towards the problem solving itself differs
from C. One of the main reasons behind the popularity of C++ is due to its backward
compatibility with C.
Almost every C program is a valid C++ program. C++ was first implemented as
translation into C by use of a preprocessor. Now almost all the available C++ compilers
convert C++ programs directly to object code. C++ improves upon C by changing the
in C++, you should remember certain important points that are significantly different
from C.
C is a systems programming language and so has raw power, and C++ enjoys the
same due to its backward compatibility with C. For example the use of pointers in C++ is
Preprocessor is a naive tool for serious programming, whereas virtual functions of C++
are very powerful that provides runtime polymorphism. Lets look at an example where
significant. 'Message maps' are used for passing specified messages to derived class
member functions. Had MFC used virtual functions for messages, it has to allocate
11,280 bytes for each control that the application needs. Each control has to inherit from
a hierarchy of nearly 20 window classes derived from CWnd and CWnd. It also declares
virtual functions for more than 140 messages. Assume that sizeof(int)==4 and so it uses a
vtbl that needs 4 byte entry for each function. So it comes out that contol needs to get
allocated 11,280 bytes (140 * 4 * 20) for supporting virtual functions. So it is better to go
for macros, where no such memory overhead is there in such cases. This is an example
for a situation where the selection of a language feature to use based on the requirement
at hand.
My point is that, due to its low-level nature, C has much power. Since C++ is
Even though C++ is a superset of C, there are subtle differences between the two.
maintaining the compatibility and migration from C to C++ (In particular from ANSI C
to ANSI C++).
static i = 10;
const i = 10;
are no longer legal. This implicit assumption of int was a subtle source of
• Implicit char to int is removed. In C the automatic conversion from char to int
is made in the case where char variables involve in expressions. But in C++
int getchar();
char ch = getchar();
The implicit conversion from char to int is not valid in C because C++ is
strongly typed to C.
The reason is same. C++ is more strongly typed than C. This explicit casting
implicit conversion.
prohibited. Thus,
int main()
{
main();
Calling main again makes no sense and C++ corrects this problem by
C. This primarily is to increase the support the two byte coding schemes like
• C Console and File I/O are different. But in C++ there is not much difference
• consts can be used for specifying size of arrays in C++ (but not in C)
• In C the size of enumeration constant is sizeof(int) but not in C++. In C++ the
• Using prototypes for functions is optional in C. But in C++ functions are not
• Tentative definitions are not allowed in C++. The example that we saw for
int i;
int i=0;
// invalid in C++.
#define NULL 0
(this is because of the same reason, in C++ is a strongly typed language)
C++ programmers prefer using plain 0 (or 0L for long) to using NULL.
char * cptr = 0;
• In C the global const variables by default have external linkage. All const
variables that are not initialized will be automatically set to zero. But in C++
global const variables have static linkage and all the const variables must be
explicitly initialized.
const int i;
Does the following code (in C++) have static or global linkage?
constant pointer. Hence it has global linkage. To force static linkage, modify
string constants.
By mistake the programmer may have forgotten to give space for the NULL
will be encountered somewhere else. Since the access is beyond the limit of
• There exist some very subtle difference between C and C++. One such
yields an lvalue.
int i = 0;
++i++;
(++i)++;
++i = 0;
because int is the most efficient data-type that can be handled. But it wastes
memory too. If the size of an integer is 4 bytes then the character constant also
if(sizeof(‘a’)==sizeof(char))
printf(“this is C++”);
else if(sizeof(‘a’)==sizeof(int))
printf(“this is C”);
• In C++ you cannot bye-pass any declarations or initializations that are not
given within a separate block by using jumps are there. Such jumps can occur
switch(something)
break;
case ‘b’ : j = 0;
break;
// error.
}
This is because, if the declarations and initializations are skipped and the
goto end;
int j = 0;
end :
if(1 > 2)
int j = 0;
#ifdef __cplusplus
#define NULL 0
#else
#endif
Such code that should be available according to the compiler used for
compiling code can be given this way using the constant __cplusplus for
conditional compilation.
• Empty parameter list in C means any number of arguments can be passed to
int fun();
int i = fun(10,20);
Thus in C++,
int fun();
// and
int fun(void)
are equivalent.
The reason why int foo() means that it may take any number of arguments is
int fun()
int a, int d;
int fun();
struct someStruct{
};
because the C++ functions are capable to be overloaded and the information about the
int fun(int);
to tell the compiler that the identifier ‘fun’ is a function name. The linker just checks for
int fun(int);
// and
int fun(float);
// and
that not only the compiler pass the information about the function name, it also has to
pass the information about the arguments. This is done by a technique called as ‘name
mangling’.
‘Name mangling’ means that the information about the function name, argument
types and other related information like if it is a const or not all are encoded to give a
unique function identifier to the linker. The job of the linker becomes easy to resolve the
If ‘name mangling’ is not done the function has C linkage else it follows C++
linkage.
If C functions are called from C++ programs then it is likely to show linker errors
saying that the function definition is not found. This is because the functions in C++
programs have C++ linkage and the functions compiled in C have C linkage as we have
// in cProg.c
int cfun(){
// some code
//in cppProg.cpp
int cfun();
int main(){
cfun();
To make this C function acceptable in C++ code the declaration for ‘cfun’ should
be changed as follows,
//in cppProg.cpp
int main(){
cfun();
Preceding the function declaration by extern "C" instructs the C++ compiler to
follow C linkage for that function i.e. 'name mangling' is not done for that function. As
we have seen the 'name mangling' is necessary for function overloading. So if a function
//in cppProg.cpp
More than one C functions are if necessary to be declared to have C linkage then those
extern "C" {
int cfun1();
Otherwise they can be put in a header file and that inclusion can be declared to
have C linkage,
extern "C" {
#include "cfundecl.h"
This forces all the functions declared in the header file "cfundecl.h" to be used in
this C++ file to have C linkage. If you think preceding every C header file to be preceded
with extern "C" is tedious, other tactic can also be followed. If the declarations may have
to be used in both C and C++ compilers. C compilers doesn’t recognize the keyword
// in "cfundecl.h"
#ifdef __cplusplus
extern "C" {
#endif
int cfun1(int);
#ifdef __cplusplus
#endif
Or this conditional can be still simpler. Just strip the two tokens, extern and “C”
#ifndef __cplusplus
them
// to white-space
#endif
Note: This kind of special inclusion of C header files is necessary for the non-standard
and user-defined header files only. For standard header files, ordinary inclusion is
enough.
#include<cstdio.h>
This kind of using C functions from C++ code has many advantages. One big
advantage is that the legacy C code can directly be reused in C++ code.
The underlying representation for the C++ classes and plain C structures is almost
the same.
class cppstring{
private:
int size;
int capacity;
char *buff;
public:
string();
// and destructor
};
struct cstring{
int size;
int capacity;
char *buff;
};
The memory layout for the structure ‘cstring’ and ‘cppstring’ are almost the same.
In other words the C++ compiler treats the class ‘cppstring’ just as ‘cstring’ structure. It
means that the member functions are internally treated as global functions and the calls to
the member functions are resolved accordingly. They do not occupy space in the memory
layout for the object. This makes C++ object model very efficient. This is to show how
The advantage is that the code like the following can be used,
void print(cstring *cs){
cs->capacity);
// the old legacy code for cstring can be used for accessing
int main(){
print((cstring *) cpps);
This equality between the struct and class is true unless the class has no virtual
members, has no virtual base class in its hierarchy and no member objects have either
virtual members or virtual base classes. In short the class should not be associated with
any thing ‘virtual’ in nature. This is because the memory layout will then have virtual
pointer table that makes the class and structure representation no more as equivalents.
Another point to note is that to have the equivalence between the class and struct,
the data of the class should not be interfered by access specifiers (private/ public/
protected). This restriction is by ANSI because there is a possibility that the layout may
differ in case of intervening access specifiers. But almost all compilers available now
doesn’t make any difference due to this and so this point can be safely ignored. To put it
together, you can safely access a C++ object's data from a C function if the C++ class
has,
¾ all its data in the same access-level section (access specifiers private
/protected /public).
Nevertheless this property of the object model of C++ is used in the applications
such as storing the data objects in DBMS, network data transfer of objects etc. This
makes the legacy C code be used in the object-oriented code, backward compatibility
Other than the differences discussed between C and C++ there are other subtle
The main() has to be compiled by a C++ compiler only. Because the code for
static initialization for the C++ objects has to be inserted only by the C++ compiler.
When mixing C and C++ functions make sure that both the compilers are from
same vendor. For example the compilers will follow similar function calling mechanisms
Most C code can be called from C++ without much problems. Similarly C++ code
can also be called from C code under certain constraints. Transition from C to C++ will
be smooth if the subtle differences between the two languages are understood well.
The downward compatibility with C is one of the main reasons behind the
widespread success of C++. It is probably the topic that creates heated arguments among
C++ programmers and each have their own views about this. C++ would have been
certainly different (and ‘better’) if downward compatibility were not the one of the main
Java is a commendable addition to C based languages. Bill Joy defines Java as,
Java is based on C and borrows a lot from C and so is closely related to it. Of
course, one major difference is that Java is an object-oriented language. Java also
borrows lot of ideas from C++ but the relationship with C is closer because C is the base
language.
Java cuts lots of features from C, modifies some features and also adds more
features from C. This part discusses how Java learnt lessons from C by improves upon C
C is a great success. There is no doubt about it. But some cost is involved in that
'writability'4. Integer is 'int' and 'string copy' is 'strcpy' in C. Java also uses the same
keywords in C because, they are accepted and widely used by the programmers. But in
the case of C standard library, it is powerful but very small and C programmers had to
Java removes lot of features from C which are either not suitable or problematic
for various reasons like readability, portability etc. Pointers are the toughest area to
master and is more error prone and night-mare for novice programmers. The designers of
Java thought that the preprocessor concept is an antiquated one and so removed from
Java. So features like conditional compilation is not there and cannot be done in the pure
sense. Java is a pure object-oriented language and use of global variables violates the
programmer has to be very cautious in assuming the size of a data-type. As we have seen
this may help suit the hardware and improve the efficiency. In Java the sizes of the data-
underlying platform. But in Java the byte-order is Big-endian. This resolves problems
that arise due to the difference in byte order between machines particularly when the data
is transferred from one machine to another in networks. This is help Java much because
In C when >> operator is applied to a variable, the filling of the rightmost vacated
So the programmer should not assume anything about the filling followed. Java solves
this problem by having separate operator for arithmetic and logical fills.
4
Although there is no such jargon as ‘writability’, here I refer it to the ability to write the programs easily.
>> is for logical fill (vacated rightmost bits are filled by 0)
int i = 0;
i = i++ + ++i;
This is not the case of Java. Java says that the change is immediately available to
int i = 0;
i = i++ + ++i;
implementation-dependent. It is a paradox that this is the main reason that the portability
of C programs gets affected (even though C programs have reputation of being very
portable). This seems to be a less-significant problem, but is really a big one because
portability is one of the main reasons for Java’s birth, one of the main design goals and
that makes it the most portable programming language as of today. Java improves upon C
by removing the constructs having various behavioral types in C by having mostly well
The C syntax is flexible and there are normally more than one-ways to specify the
same thing. Pointers are C's stronghold, is also the problematic and tough feature to be
understood by the programmers. Pointer arithmetic is the place where even the
experienced C programmers stumble. Java doesn’t have explicit pointers, but have
references that can be considered as cut-down version of pointers and arithmetic cannot
be done on it. Dynamic memory management needs the programmer to carefully handle
and memory explicitly and there is lot of scope to make fatal mistakes. Java has garbage
collection that makes the programmer free for worrying about recollecting the allocated
memory.
Java improves upon C syntactically and this makes tricky programming hard to
write in Java. In any programming language, it is left to the programmer to not to resort
to tricky programming and one can always write one such. To give one example in Java
class tricky{
static{
System.out.println(“Hello world”);
This program when run prints the message “Hello world” and terminates by
raising an exception stating that arguments to main() are missing. Because in Java the
strings. Since they are treated as special objects, one main drawback is also there. Java
Only one argument is enough to be passed as the command line argument in Java
because ’s’ is a String array so, argv.length == argc. Command line arguments are not
When giving path-names in include files, explicitly giving the path name can
#include "C:\mydir\sys.c"
the program using this line written to be used in Windows requires it to be changed to,
#include "/mydir/sys.c"
The path is for the original system where the file is located and will certainly vary
with the path where the file will be available when it is ported and compiled in some
other machine. Java solves this problem by having the concept of packages and with the
use of the environmental variable CLASSPATH that is used to indicate the compiler
assuming that the files are stored in both the systems with same directory and file names and relative
path
13.2 Ja va Technology
The basic technology with which the Java works itself is different from the C
based languages. C like languages has static linkage and work on the basis of
conventional language techniques. But Java is different in the sense it has the platform
independence for a greater extent and other advantages that its relative languages lack.
¾ Java compiler,
independent codes that form the basis of the platform independence by Java. They are
targeted at the stack-oriented approach. All the evaluation is done in the stack by
b = a + 10;
code.
The next part is the Java intermediate file format. This is a standard format that is
understood by the Java virtual machine (JVM or Java interpreter) that operates on and
executes it. The byte-codes and all other related information for execution of the program
are available in a organized way in the intermediate class file. This is similar to the .EXE
code that is organized in a particular format that could be understood by the underlying
operating system. The Java class file format is very compact and plays very important
C like languages has source code portability for some extent. Along with full
source code portability, Java goes to next level of portability that may be termed as
executable file portability. That means that the Java class files that are produced by
compiling Java programs in on any compiler and platform will run on any machine
provided that JVM is available to run that. This is achieved only through the class file
The last part and the most important one is the Java interpreter or otherwise called
as Java virtual machine. This simulates a virtual machine that may be implemented in any
machine. Thus the uniform behavior of the Java programs is assured even across
platforms.
13.3 Ja va Na tive Interfa ce
Java code has portability and the native codes, like the one written in C, can
produce code that is efficient. To get the best of both worlds, the portable Java code can
be used and the very frequently used functions like library functions can be written in C
and Java Native Interface (JNI) achieves just that. This part of the chapter explores what
JNI acts as an interface between the C and Java code. With this functionality of C
Calling Java code from C code can be done for and have following advantages,
¾ To achieve the efficiency of the code and for time-critical applications that is
¾ Lot of libraries and tested code that are available in the other more mature
Calling Java code from C code can be done for and have following advantages,
not be written again. The C code can just call the Java code but through JNI
through native methods this can be achieved. Runtime type checking is the
feature that is not available in C and for that JNI can be used to do the same.
To explain how the Java code can be written to call the C code lets have an
example. The process is a little tedious one. The example includes a function written in C
to add and display the two float variables that are passed to the native function. There is
another method to have a wrapper function to call the C standard library functions. The
process explained is the generalized one and the exact detail of how the interface between
class callCFromJava{
// be available at run-time
static {
System.loadLibrary("twofloats");
// the name of the DLL where the code for the native
// methods is located.
}
public static void main(String[] args) {
float i=10.0f,j=20.0f;
new callCFromJava().addTwoFloats(i,j);
System.out.println(new callCFromJava().sin(1.1));
In the Java code and the notable points to enable calling native code are,
keyword ‘native’ to indicate the compiler that they are native methods.
¾ The native methods are treated and called the same way as Java methods.
¾ Inside the static block (this block is for initialization of static variables and is
called before main() ) the library where the code for C programs is available is
loaded.
javac callCFromJava.java
The next step is to generate a header file that contains the information about the
methods that should be available for the C/C++ compilers for generating DLLs such that
it will be accessible to the JVM. A utility called as javah is available for this purpose and
javah callCFromJava
This generates the header file “callCFromJava.h” for the native method. This header file
has to be included in the C code where the code for the C functions is available.
The next one is the important step of writing the C native methods. The code
#include <jni.h>
file.
#include <stdio.h>
#include <math.h>
// note that the header file for native methods is included here
#include "callCFromJava.h"
// extra parameters.
float k = i+j;
printf("%f\n",k);
return;
jdouble value)
return sin(value);
The function names also have special way of naming. All the functions start with
Java_ followed by the class name to which the native method belongs and that is
followed by the actual function name. Also note that all the native functions have first
The new naming convention, the inclusion of the header files enables the C/C++
compilers to compile the code to a DLL that will be accessed by the Java Interpreter.
With this the process of calling Java code from C code ends.
This DLL is used by the Java interpreter at runtime to find and execute the
java callCFromJava
// prints
// 30.000000
// 0.8912073600614354
This is to achieve the functionality of the Java code through the JVM itself from
the C code. For example you may write a browser program and to support applet
functionality the JVM has to be embedded into the program. In this case JNI can be used
to embed the JVM and whenever an applet have to be displayed, the Java Interpreter can
be invoked from the code to do the same. This is through the ‘invocation APIs’ that are
- Lewis Carroll
object oriented, and type-safe programming language derived from C and C++”
languages. For the past two decades, C and C++ have been the most widely used and
successful languages for developing software of varied kind. While both languages
flexibility comes at a cost to productivity and Microsoft claims it has come out with a
It remains to be seen how successful C# is going to be. It is the idea to get the
features of rapid application development of Visual Basic and the power of Visual C++
and the simplicity similar to its competitor Java. This chapter devotes the see the features
Since the language is being developed when this book is written and the information about the language
is not still fully available, the information provided in this chapter may not fully comply to the language
that is actually released. Most of the information available is based on the preliminary information
available in the Internet and [Hejlsberg and Wilamuth 2001].
14.1 Wha t C# promises?
C# is part of Microsoft Visual Studio 7.0. It has common execution engine that
language can be used to run the programs from other languages such as Visual basic and
supports scripting languages such as Python, VBScript, Jscript and other languages. It is
called as Common Language Subset (CLS). It doesn’t have its own library. The already
available VC++ and VB libraries are used. C# is not as platform independent as Java and
Wilamuth 2001].
Lets see how the simple “Hello, world” program looks like in C#.
using System;
class Hello
Console.WriteLine("Hello, world");
The using directive is from Pascal language and this becomes shortcut for:
System.Console.WriteLine("Hello, world");
As you can see everything is within a class and so C# is a pure object-oriented
programming language. The Main (declared as a static method) is the starting point for
the program execution. The WriteLine function greets you “Hello World”.
14.3 Da ta types in C#
¾ Value types,
¾ Reference types.
C# doesn’t have explicit pointer types and instead have reference types. Internally
references are nothing but pointers but with lot of restrictions imposed on its usage.
References have their own merits and demerits. Unlike pointers, pointer arithmetic
cannot be done on reference types and the power of the reference type is less than that of
C# has a high-level of view that all types are objects, it is referred to as some sort
The value types are just like simple C data-types that do not have any object-
oriented touch with them. It includes signed types sbyte (8-bits), short (16-bits), int (32-
bits), long (64-bits) and their unsigned equivalents byte, ushort, uint, ulong.
There is one character type in C# (like Java) that can represent a Unicode
character.
The floating-point types are float, double. bool type (which can take true/false),
object (base type for all types), String (made up of sequence of Unicode characters) are
available. It also includes types like struct and enum. C# doesn’t support unions.
C# implements built-in support for data types like decimal and string (borrowed
from SQL), and lets you implement new primitive types that are as efficient as the
existing ones. In C for most of the requirements the type ‘int’ suffices, and when use for
such decimal arises it is customary to typedef and use the existing data-type as new type.
Strings as you know, there is not much support in C language, it is not a data-type and
14.4 Arra ys
Arrays in C# are of reference type and mostly follow Java style. C# has the best
of both worlds by having the regular C array which is referred in C# as rectangular array
int regular2DArray [ , ];
pointer array and allocating memory dynamically for each array. The same idea is
followed here except that instead of pointer type, reference type is used. This makes
optimal use of space sometimes since the sub-arrays may be of varying length. The
compromise is that additional indirections are needed to refer to access sub-arrays. This
access overhead is not there in rectangular array since all the sub-arrays are of same size.
When more than on way of representation is supported then at some point of time
the user will require switching from one representation to another. Here to convert from
one array type to another, techniques called as boxing and un-boxing are used.
Structs are of value type compared to classes that are of reference type. This
means structs are plain structs as in C and the classes are used for object-orientation as in
C++ or Java. The advantage here is that if an array of struct type needs to be declared
they can fully be contained in the array itself. Whereas an array of class type will allocate
the space only for the references and the allocation of space for objects should take palce.
Delegates are the answer for the function pointers in C. As we have seen function
pointer is a very powerful feature in C but can be easily misused and is sometimes
unsafe. Delegates closely resemble function pointers and C# promises that delegates are
void aFun()
14.7 Enums
C# enumerations differ from C enums such that the enumerated constants need to
enum workingDay {
monday,tuesday,wednesday,thursday,friday };
workingDay today;
today = workingDay.monday;
One of the major problems in C is that virtually any type can be casted to other
type. This gives power to the programmer when doing low-level programming. C# is
strongly typed and arithmetic operations and conversions are not allowed if they overflow
the target variable or object. If you are a power programmer C# also has facility to
#define
#undef
#if
#elif
#else
#endif
#define PRE1
#undef PRE2
class SomeClass
#if PRE1
DoThisFunction();
#else
DoThatFunction();
#if PRE2
DoSomeFunction(this.ToString());
#endif
#endif
with #ifs and #endifs looks ugly and it becomes hard to test code with possible
conditional inclusions. For that C# provides conditional methods that will be included if
that preprocessor variable is defined. First the method has to be declared as conditional
like this:
[Conditional (“PRE1”)]
{
System.Console.WriteLine(“This method will be executed
#define PRE1
cond.DoThisFunction();
#undef PRE1
cond.DoThisFunction();
This is not a very significant addition and this may affect the class hierarchies.
For example if the function in base class is a conditional method then depending on the
preprocessor variable is defined or not the derived classes may override it or create a new
function. In my view conditional methods will help confusing the programmer more than
help programmer.
C# also supports #error and #line in addition to a new directive #warning that
The real-world applications require to work with old code that is available and
¾ Including native support for the Component Object Model (COM) and
Windows-based APIs.
¾ Low-level programming is possible and basic blocks like C like structures are
supported.
implemented and not have to be done by the programmer explicitly. Due to its close
relationship with COM objects, the C# programs can natively use the COM components.
It should be noted that the COM can be written in any language and that difference
At one-level where the full-control over the resources is required, the C/C++ like
programming of using pointers and explicit memory management inside marked blocks
can be done.
All this means that the tested, legacy C/C++ code need not be discarded and can
machine understandable code. The software that performs this job is known as a
compiler. This machine level code is also known as object code since creating this code is
the objective of the compiler. In this part of the chapter we will have an outlook on the
passes, depending upon the complexity. Sometimes these passes may be organized to do
it logically rather than actually going through one pass after another. The control flow in
preprocessing => lexical analysis => syntax analysis => semantic analysis =>
At this stage we get a relocatable code, which is given to the linker, in turn
produces an executable code. To execute it, we need to load the code in memory. A
way to have symbolic constants and inline functions. One of the main features of the
Preprocessor, is NOT a part of compiler, rather it is a tool that runs before the
compiler operates on the code. This feature is mainly used in assemblers. Most of the
languages don’t have a preprocessing facility, although one can be added to it. The
lengthy sentence as a whole, we divide it into words and try to analyze it. Similarly a
compiler divides the entire source program into lexical units, better known as tokens and
then process it. A token is a well-defined word of the programming language. It may be a
The lexical analyzer (also called as scanner) accomplishes this breaking up job. In
some implementations the lexical analyzers even enter the identifier names into the
read the word “manner”. It follows the greedy algorithm, i.e. look for the longest word.
When it has reads the characters ‘m’, ‘a’ & ‘n ’, it never immediately interprets it as a
word(man). It always assumes that the following characters may be a part of this word,
and in this example it is. So ‘ner’ is also read and ‘manner’ is considered as a word.
When the assumption fails the word already formed is returned, and the received
A C lexical analyzer will always follow this ‘maximal munch’ rule (sometimes
¾ x++y
¾ x+++y
¾ x++++y
¾ x+++++y
‘identifier’ to the parser. And then it looks for the next token. It moves to scan +. It
doesn’t immediately return ‘plus’. The next character is also ‘+’. As ++ is valid, it returns
‘plus_plus’ and starts scanning for the next token. ‘y’ is read and returned as an identifier.
expression).
So if you want the lexical analyzer to interpret like x++ + ++y you should
that is scanned as i && & j. Here is & a unary or binary operator? Similarly consider this:
general how do you think the resolution of unary and binary operators are made in such
cases?
Exercise 15.2:
int i = 10;
int * ip = &i;
int k = *ip/*jp;
printf(“%d”,k);
The compiler issued an error stating that “Unexpected end of file in comment
started in line 3”. The programmer intended to divide two integers, but by the ‘maximum
munch’ rule, the compiler treats the operator sequence / and * as /* which happens to be
//or
int k = *ip/(*jp);
Just like any natural language we use, all the programming languages have their
own syntax. The syntax is explained by the unique grammar of the language. Grammar
acts as a tool to recognize the program given as input. Syntax analyzer, as the name
suggests, verifies the structure of the program against the language specifications with the
help of grammar. Every valid program should conform to the corresponding grammar. C
also has a well-defined grammar. Grammar is a very powerful tool. For example, the
precedence and associativity of the operators in the expressions can be directly specified
by the grammar.
The rules of precedence are encoded into the production rules for each
operator. Production rules in-turn call others so, the precedence is formed.
additive-expression:
additive-expression +(or)- multiplicative-
expression
First the production rules that involve the lowest levels is called. They
in-turn check for the production rules that involve the operators of higher
are recognized from left-to right because the left-non-terminal comes in the
left associativity.
ConditionalExpr:
ConditionalOrExpr
The left-non-terminal comes in the RHS of the production rule. In this way,
Answers to questions related to many questions such as why the compilers require
sizeof (int);
sizeof int;
// issues error
are related with syntactic issues. A look at the grammar of C will help understand this,
unary-expression:
postfix-expression
++ unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof (type-name)
The grammar says that unary-expression may be made up of any one of the six
options. As you can see that the syntax of the sizeof says that the type-name has to be
if(i>j)
goto end;
else
return;
end :
The compiler issues a syntax error stating that “; is missing before }”.
labeled-statement :
identifier : statement
statement. The syntax analyzer issued an error because the program didn’t follow the
grammar correctly. Knowing this you can insert any statement or a null statement to
if(i>j)
goto end;
else
return;
end :
syntax
Thus, the duty of the syntax analyzer is to make sure that the program concerned
conforms fully to the grammar and promptly issue messages to the user if violated.
After verifying the correctness for the syntax, the compiler goes on to the next
phase called semantic analysis. In this phase the compiler understands the language
constructs and the appropriate object code is produced, either explicitly or implicitly. The
grammar has the limitation that it can only check the syntactic validity of the program.
Inferring the meaning of a statement involves careful analysis of the source program.
Only if the statements make any sense the code is generated, else an error is issued.
Parser is the part of the compiler that takes care of syntax and semantic analysis.
const int i = 1;
i++;
If this code is executed the compiler will issue a warning message like
“unreachable code i++”. It predicts that the if condition will never be executed and this
Code generation involves generating the code that is for the target machine where
15.1.5 Optimization
This is an optional, but desirable part of the compiler. Actually the generated code
may not be the optimal code to do the job. So the compiler will use well-known
algorithms to analyze and produce a smaller and efficient code that does the same job.
Lets see few examples of how the compilers optimize the code that we write.
i * j;
It has to be remembered that the statements are executed for their side effects. In
this case, i and j are multiplied with no side-effects and so has no effect in the program
execution. So the optimizer can safely omit the code generated for the statement.
Strength reduction involves replacing of constructs that take more resources with the
int i = 3,j;
a[j] = j;
The for loop can be replaced with more efficient and simpler statements as follows,
a[0] = 0;
a[1] = 1;
a[2] = 2;
statements.
involved in both the expressions provided that no variable that is part of the expression
occurred in the LHS of any assignment statement (i.e. didn’t undergo any change).
l = m * k * 20 + k;
Here the expression k * 20 is enough to be evaluated once and the result can be used in
the next expression assuming that k’s value is not modified before that replacement
occurs.
int j = k * 20;
l = m * j + k;
evaluation. Substitution of constant variables with its equivalent constants is also one
such optimization.
Semantic analysis can reveal many important details that may be useful in
Here at compile time compiler can identify that the code given contains statements that
will never be executed. An intelligent optimizer will not produce any code for those
unreachable statements. That’s why you sometimes get warning messages like
“unreachable code in function _xyz”, and “the code has no effect in function _pqr”.
Since most of the execution time is spent in executing the statements inside the
loops in almost any program, optimizations on loops have significant effect in the
while(i <= j)
if(i<100){
i +=10;
// do something
else {
// do something else
i +=10;
while(i <= j)
i +=10;
if(i<100){
// do something
else {
// do something else
because the statement i += 10; is part of both the if and else statements and the code
becomes compact.
There are several loop optimization techniques like this are available, [Aho and
Ullman 1977] is one good reference that will be useful in applying optimizations even
optimizations. For debugging purposes the optimizations can be switched off. For release
verions of the software and for testing it is better to have optimizations be done on the
code.
Exercise 15.3:
a. for(i=0;i<num;i++)
b. for(i=num;i>0;i--)
Which one de you think executes faster? (if necessary assume that no code optimization
Symbol table is the structure that will be accessed by almost all stages of the
compiler and lot of other tables has to be managed. The table-management modules take
care of the management of the various tables involved in the process of compiling.
While compiling errors can occur at anytime and the error-handling module takes
care of the process of issuing the error messages to the user. It also has the important job
of recovering form the error and continues the compilation process if possible.
Lets look at an example of taking a code and see it through various stages of how
The lexical analyzer is the first part of the compiler to attack it and it returns the tokens to
KEYWORD (int)
IDENTIFIER (i)
EQUAL_TO_OP
IDENTIFIER (four)
MULTI_OP
INT_CONSTANT (8)
ADD_OP
IDENTIFIER (fun)
OPEN_PARENS
CLOSE_PARENS
SEMI_COLON
The syntax analyzer checks for the correctness of the code program code and
finds it to be acceptable. It sees that the operators are having valid operands and the
Next the semantic analyzer operates on it to see what the code means. Here it
enters the variable i into the symbol table. It checks if the identifiers four and fun are
already declared and looks for its details in the symbol table.
Now the intermediate code generator creates code for this code segment that may
look like:
MOV _temp1, 8
// mov 8 to temp1
The optimizer may optionally operate on this intermediate code to generate a more
The final part is the code generator that converts the platform independent
intermediate code to platform dependent object code. With this the compilation process
ends and the linker takes care to link the object files to generate the final executable code.
so it requires a comparatively small compiler. In fact, Dennis M. Ritchie wrote the first C
compiler for PDP-11 that can be stored in just 4k of memory. It was basically a two-pass
compiler and optional third pass for optimization was also available. It used operator
precedence for expressions and recursive descent parsing for all other constructs.
As I said earlier, most of the compilers in the market have two pass structure. The
first pass is used for lexical and syntax analysis and produces intermediate code. The
15.4 Pa rsing
Parsing is the general term for the part of the compiler that involve the
generator. This serves as the next component to lexical analyzer. Parsing techniques
involve in analyzing the syntactic and semantic parts of the program from the tokens
returned from the lexical analyzer and serves to generate code by the intermediate code
generator. There are lots of parsing techniques available from simple to complex and
This type of parser is one of the easiest parser to implement. As the name
indicates it employs the recursion as the basis for parsing the given source. The
production rules of the given grammar can directly be converted to functions in the
parser. In other words it can have one-to-one relationship between the grammar
productions and the functions that have to be written in the recursive-descent parser. So it
becomes easy to hand-code the whole parser and has more readability and is clearer for
The production rules involve left-recursion, which means that the same non-
terminals that are in the LHS appear in the RHS of the production also. Since the parser
additive-expression:
expression
additiveExpr()
{
additiveExpr();
while( additiveOperators(token) )
advance();
multiplicativeExpr()
The grammar production means something different. It means that all the
The multiplicative expression production in-turn will in-turn check for the operators with
still higher precedence than them to get them recognized first through similar grammar
productions.
additive-expression:
multiplicative-expression additive-expression-
prime
additive-expression-prime:
expression-prime
Now it can be implemented using equivalent functions as,
additiveExpr()
multiplicativeExpr();
additiveExprPrime();
additiveExprPrime()
if( additiveOperators(token) )
advance();
multiplicativeExpr();
additiveExprPrime();
As you can see, the first function, additiveExpr()is called only once and it
inturn calls additiveExprPrime(). This second function has recursion and calls
over again and again till all the additive operators are exhausted.
The both can be combined and the equivalent function can be written more non-
formally as,
additiveExpr()
multiplicativeExpr();
while( additiveOperators(token) )
advance();
multiplicativeExpr();
additiveExprPrime();
Look at the original grammar production and the available function implementation. This
productions.
Java inherits most of the operators from C and so the C and Java expressions
work very similar. ‘tinyExpr’ is a small expression compiler that compiles the
implementing various parts of the compiler: the lexical analyzer, parser etc
and how they interact and work together to convert the source code to the
¾ as one application to show how powerful recursion is and explain the working
our programs and understanding the working of this compiler may serve to
unravel some ’secrets’ of how expressions are evaluated and why sometimes
they give some ’weird’results.to explain how Java bytecodes are produced and
¾ the basics of how bytecode interpreters/ virtual machines (Java) works and
compiler
Yes. The implementation is so small and serves its purpose of compiling, executing
expressions and at the same time fulfills all the purposes in one. The full implementation is
given as appendix and the concepts are explained in the remaining of the chapter.
bytecodes. These byte-codes are subset of actual Java byte-codes. The compiler generates
¾ automatic variable declaration and the variables are assumed to be of type int.
¾ constant-expression evaluation
tinyexpr.h - this header file in-turn includes standard header files and contains
function prototypes
mnemonic.h - the mnemonic codes for bytecodes are available in this header file
tinyexpr.c - this is the main file that contains the recursive-descent parser and its
code- generator.
symtable.c - this file contains the simple symbol table and the functions operating
on it
codegen.c - this file contains few code-generation routines used in main file
‘tinyExpr’ also has a small interpreter to load and execute the bytecodes generated.
Such full fledged interpreter is also known as ‘virtual machine’ because the mnemonic codes
are executed in a simulated manner and the behavior of the bytecodes can be made platform
compiler also.
intrepre.c - this is the main interpreter file where the interpreter loop is there
ostack.c - this program has operand stack (where operands are stored and
a=a + b * c
Or expressions like:
can be written. Since the variables are implicitly declared, any variable name can be just
used. However it should be noted that the variable names are case-sensitive. Here the
variables var1, var2 are initialized and then used to initialize the value of another variable
var3 which is subsequently printed using the API ‘write’. It is the only API supported in
‘tinyExpr’ as of now.
15.5.1 The Lexical Analyzer
space. For example, for giving the expression a*b+c it has to be given as a * b + c,
explicitly.
standard library function strtok() to separate the tokens that is available in the source
expression. Since this method is use instead of writing a lexical analyzer from scratch, the
lexical analyzer becomes very small and its structure, how it works and its use become
very clear and serves the purpose on-hand. The function of the lexical analyzer is thus
simplified significantly because the user while giving the expression itself explicitly
The lexical analyzer gives unique ID in the form of enumerator constants to all
the tokens. The tokens are returned to the parser. It modifies two global values that
contain the name of the identifier and the integer value of the current token that is
analyzed.
In this parser the scanner becomes just a function that it is called whenever a new token is
required. It integrates the functionality of syntax analysis, semantic analysis and code-
generation.
void additiveExpr()
multiplicativeExpr();
while(token==PLUS||token==MINUS)
int index=token;
advance();
multiplicativeExpr();
if(index == PLUS)
fprintf(outFile,"%c",IADD);
else
fprintf(outFile,"%c",ISUB);
}
As you can see it looks just as similar to what we have done previously. The extra
information is for integrated code-generator. As the process of parsing goes, the code is
generated in parallel, and is stored in an intermediate file for later execution by the
interpreter. Here in this example, the two mnemonics associated with the additive-
expression are IADD and ISUB that stands for integers-addition and integers-subtraction
respectively. When the parser sees the + or – symbol, it acts just as to generate the code
for the corresponding symbols. ‘token’ stands for the representation of the token that is
iload_0
iload_1
iload_2
imul
iadd
istore_0
This is available in the intermediate code file “a.out”. The bytecodes are loaded
The variables are assigned numbers based on the appearance of the variables in
the expression. "iload_0" refers to "load the integer value of the variable numbered 0".
This code is for a stack-oriented machine. So the variable is loaded into the
Similarly the values of b and c are also loaded into the stack. After that the
"imul" refers to "integer multiplication" and has the effect of popping two integers
on the top of the stack, multiply the result and push them back on the stack. So the total
result is of replacing the top two values in the stack by their multiplied value. Now the
stack contains two integers. The value of ’a’ in the bottom and the multiplied result of b
Now the "iadd" is encountered and it has the similar functionality as of "imul". It
pops the two values that are on the stack, adds them and pushes the result in the stack.
Now the stack contains the end result of evaluating the expression a + b * c.
The final mnemonic code istore_0 pops the result from the operand stack and
stores it in the local variable ‘a’ (whose index is 0, that’s what he suffix _0 of the opcode
ways. For example it doesn’t support even statements and Boolean expressions. Looping
constructs and other features can be added easily if labels are supported. For example the
if(i<j)
++i;
It can be compiled to bytecode as follows,
iload_0
iload_1
if_icmpl T_0
F_0 :
iconst_0
goto E_0
T_0 :
iconst_1
E_0 :
ifeq OUT_OF_IF_1
iinc 0 1
iload_0
OUT_OF_IF_1 :
return
Similarly the functions can be supported and the list of extensions possible is
used for writing OS and compilers. Most of the Windows programming is done today
using MFC and C++. On the other hand, most of the C programmers are well versed in
using it for programming in UNIX. So this chapters provides a short session on how the
two of the famous operating systems that makes use of the power of C.
16.1 UNIX a nd C
FORTRAN, Pascal and Ratfor etc. Such compilers translate the language code to C
intermediate code. The ordinary C compiler then compiles this C intermediate code.
Advantage of using such approach is that new compilers are particularly easy to write. It
is just enough to write a translator from that source language to C. All the issues such as
portability are taken care by the C compiler that is working at the backend. Another big
advantage is that the code written in different source languages can be integrated with
each others with no problem and can be interfaced with various applications written in
UNIX also provides utilities such as LEX and YACC (Yet Another Compiler
Compiler). LEX is a lexical analyzer generator and YACC is a parser generator both
There were major benefits associated with writing the whole UNIX operating
system in C. D.M. Ritchie [Ritchie 1978] lists the benefits by using C as the
implementing language,
software tools it provides to the users of the UNIX. Writing such software packages was
possible only because of using high-level languages like C. These software packages,
’that would have been never written at all if their authors had to write assembly code;
many of our most inventive contributors do not know, and do not wish to learn, the
Writing the code for operating systems involve thousands of lines of code that is
hard to maintain and debug if assembly language were used. ‘The C versions of programs
that were rewritten after C became available are much more easily understood, repaired,
and extended than the assembler versions. This applies especially to the operating system
itself. The original system was very difficult to modify, especially to add new devices,
software that runs on several machines and whose expression in source code is except for
To have a taste of how C can be used directly by providing special header files for
16.1.2 Semaphores in C
C under UNIX has semaphores and the functions are defined in the header files
semctl(newsem,value);
semop(semaphor_send,…);
System calls are used to access the services from the kernel. For the programmer
it is just like a library function and invokes it in the same way as C functions are called.
If you program for UNIX in C, you will be using the UNIX specific functions
innumerable times. If the code for the function is included in the executable code, that
will unnecessarily bloating the code size. To avoid this overhead, in UNIX, you have
system calls. You just declare them and use it in your code, but the code will not be
included in the executable file. Instead of that, the system calls will be called at runtime.
To put in other words, the essential difference between the C functions and
system call is that when a program calls a subroutine the code executed is always part of
the final object program (even if it is a library function). With the system call it is
available with the kernel and not with the program (similar to the APIs in the Windows
Programming).
System calls constitute primitive operations on the objects like file or processes
Although the standard I/O is very efficient they ultimately uses system calls only
Any process that interacts with its environment (however small way it is) must
Windows is one of the most famous OS and we (programmers) to get optimal use
of Windows can do programming. One important point to note is that the Windows APIs
are written in C itself and Windows SDK fully uses C for Windows programming. Using
C and the native APIs is not the only way to write programs for Windows. However, this
approach offers the best performance, the most power, and the greatest versatility in
exploiting the features of Windows. Executables are relatively small and don’t require
external libraries to run (except Windows DLLs) as opposed to other approaches like
using MFC for achieving the same functionality. In this chapter we are going to see how
#include <windows.h>
("FirstProgram"), 0) ;
return 0 ;
devices such as stdout and stderr and get input using stdin (if you want such facilities like
using plain C functions like gets, scanf, printf etc. you should do Windows Console
applications). These devices can be taken for granted for being present, already opened
and used in OS such as DOS and UNIX but not in Windows. File handling like the calls
same way you use C library functions such as strlen. The primary difference is that the
code for C library functions is linked into your program code, whereas the code for
As you may have noticed the arguments of the WinMain have very different
"Hungarian Notation", attributed to Charles Simonyi. It just means that every variable
name is prefixed with letter(s) denoting the data type of the variable. In szCmdLine, sz
Prefix Data-type
C Char
I Int
S String
Fn Function
P Pointer
writing C programs.
Hungarian Notation helps in identifying the types from their prefixes and so can
help in debugging and avoid mistakes. But as you can see it makes the programs less
readable. The idea behind the Hungarian notation is that conveying information about the
variable is important. And for that sometimes readability is affected. But it certainly suits
for requirements like Windows programming where a couple of thousands of APIs are
there such notation comes handy in understanding the types of the arguments. But for me
it doesn’t make much sense in using Hungarian notation for ordinary programs.
Windows programming uses lots of ‘derived data-types’, beyond plain ints and
chars. They are mostly typedefined and sometimes #defined. They are all in capital
letters. Just look at the third argument PSTR in WinMain. It is a derived datatype that
says it is a pointer to a string (char *). There are few non-intuitive derived types like
LPARAM and WPARAM. Even though they look awkward to declare variables like,
HINSTANCE hInstance; or
WNDCLASS wndClass;
they serve a few important purposes and use a very good idea of C. They allow the
variables to be defined by the programmer without any necessity to know how or what
type it is defined as. They make usage abstract and encapsulate the structure definitions.
Near universally the Windows programmers use HINSTANCE without knowing what it
is defined as. They serve to improve maintainability and portability. For e.g.
typedef struct {
int x;
int y;
} POINT;
was the definition of the POINT derived type when defined in Windows 3.1 a 16 bit OS.
When Windows 95, which is a 32 bit OS came, POINT was redefined as,
typedef struct {
long x;
long y;
} POINT;
Again in 32-bit programming in Windows all the pointers are 32-bits, so all the
pointers are capable of pointing anywhere in the memory. near and far pointers are just
The Windows programs written for Windows 3.1 still remained valid and remain
Windows runs not only English but also various languages like Chinese, Italian
#ifdef UNICODE
#else
typedef WNDCLASSA WNDCLASS;
#endif
where W stands for ‘wide’) or the old one supporting ASCII (WNDCLASSA, where A
stands for ‘ASCII’). And the programs still remain unchanged even-if whole internal
In C the linker links all functions used in the source program with the
corresponding code and becomes part of the EXE file generated. The EXE files are thus
large in size but they are very fast to execute (because the code for the functions is
contained within). This is called as static linking. Whereas in Windows programming, the
calls to the API are only provided. The linker just attaches the name of the DLL and the
calling information with the EXE. This is called as dynamic linking. This makes the EXE
code generated very compact. The code for the APIs is present in the DLLs that are
available in every system installed with Windows. They are called whenever the program
is loaded into the memory and those functions are needed to be executed. So the code is
not redundantly stored in every file that is present in the system. When updating is
needed is enough to update the DLLs and all the applications that are using the APIs are
An interesting question to ask is, “if Windows use DLLs that will be available
only at runtime, can their names be used for assigning to function pointers?”
The answer is Yes. Calling of such functions using function pointers whose value
in the newer version of Windows any change in that API should affect that application.
i.e. it might fail if it encountered newer or older version of API than it might expect. As
the functions are modified to contain the improved functionality, the interfaces need to be
changed sometimes. Such change in interfaces will make the applications making use of
old interfaces have to be recompiled to support the functions with modified interfaces or
The problems due to versioning occur because it is not the part of the language
itself. This is a general problem encountered in maintaining any application. Lets look at
MoveTo was a graphics function in 16-bit versions of Windows to set the current
First argument is the device context handle (that is used as a control for drawing
position and is packed as two 16-bit values (unsigned long takes 32-bits in 16-bit
version).
In the 32-bit versions of Windows, the coordinates are 32-bit values. So the
problem occurred because the 32-bit versions of C was not available with 64-bit integral
type. This meant that the previous current coordinate cannot be returned as 64-bit (two
32-bit values packed together) and the function interface needed to be changed. The
The last argument is a pointer to a POINT structure that contains previous current
and both the old and new MoveTo functions will coexist. So the interface changes and
such change will require all the applications that use MoveTo to be recompiled to use the
improved version of MoveTo. The alternate solution is to change to the new name
MoveToEx with this modified interface and retain the old MoveTo function as it was.
That is what introducing the MoveToEx function having declaration exactly does in this
case,
Thus the problem of introducing the new functionality was solved without
affecting the old ones. The new MoveToEx need not be written again and can work as a
wrapper function (that is already discussed). Lot such other examples are there in
Windows programming and one such is discussed here to explain the versioning concept.
Most of the versioning problems can be avoided if the interfaces are designed
properly. Interfaces should be designed keeping future revisions in mind and the t should
be . Lets look at a similar situation in C standard library and how that problem is tackled:
div_t div (int num, int denom);
struct div_t {
int quot;
int rem;
};
The standard library function div (declared in <stdlib.h>) returns object of type div_t to
contain quotient and remainder and not of type long which may contain two integers in
Applying the same idea to MoveTo function, a careful design would have looked
like this,
Thus a problem of changing the interface would have been avoided in later stages.
The naming solution in case of Windows is not the best one either. Let’s say that
some more modifications are required to the function MoveToEx in later date. How shall
MoveTo, MoveTo2, MoveTo3 etc. And this kind of naming convention is of course a
matter of taste.
17 THE NEW C9X STANDARD
The new standard C9X is defined and is very recently got approved. This is a
compilers conforming to this new standard. It formalizes many practices, improves upon
the previous ANSI standard that referred to as C89 because this first standard for C was
approved in 1989.
One of the basic goals of the C9X committee was to make sure that the old code
remains legal and remain unaffected in the new standard. It adds long awaited basic
facilities like single line comments and mixing of declarations and statements.
This chapter is not a exhaustive coverage of changes and only discusses the most
important of the changes and that are introduced in C9X. The base documents for this
chapter is [ANSI C 1998] and [Rationale ANSI C 1999] and for more information the
language, as opposed to the popular belief. Its main aim is to promote portability and give
the overall goal of specifying the standard of the language. Most important of
these principles listed here are based upon the ideas given in [Rationale
ANSI C 1999]:
considered is the change of code. The existing code should not get or should
be least affected. But if the same change can be achieved by affecting the
change in the existing implementations (say compiler implementations), it is
preferably be done.
and can get the maximum utility from any target system. The standard
to give clear diagnostic messages to the users. The change in the behavior of
The changes can be done to the language but it should not violate the
The new types added in C9X are:_Bool, long long, unsigned long long, float
_Imaginary, long double _Complex. Note that many of these new types uses already
available keywords.
The new type long long that has at-least 64-bit precision is introduced
2) 64-bit microprocessors are becoming common that the new type will
A new type long long is useful in platforms where the size (int) == 2 and
sizeof(long) == 4. Even in other machines having greater word size the integral values
occupying more than the sizeof(long) bytes may be required. In such cases there is a
requirement to represent integral value occupying 8 bytes and the new type satisfies that.
in the mathematical and scientific areas and there complex types are often
required. For that C9X introduced the complex type as basic data-type itself.
Inside the blocks mixing of declarations and statements is allowed. Again this is
from C++.
int i;
printf(“%d”,i);
makes. Since in most cases, the value to be initialized is available only in the later part of
the code.
It is a very natural to forget initializing the variables. And the simple rule now
with C9X standard is to declare the variable and initialize it just before its use. With this
simple rule it becomes very convenient for the programmer to declare and use variables
to goto statements. What should happen if the declarations are bye-passed by a goto
statement?
goto out:
int i=0;
in:
int i=0,j;
j=20;
goto in;
If the declaration has the initialization value, then it is initialized twice. If the
The ‘init’ part of the ‘for’ loop can have declarations. The scope of the variables
int i;
for(i=0; i<10; i++)
printf(“%d”, a[i]);
Here the variable i is required only within the block. This is the case for most of
the ‘for’ loops we write in C. Declaring the variable outside the ‘for’ loop makes the
scope of the variable to the enclosing block. Allowing the variables be declared inside the
printf(“%d”, a[i]);
is convenient.
int j = biggest(tempArr);
In such cases it would be helpful if you can pass the array directly without creating a
For this C9x introduces the idea of compound literals. Compound literals
be needed only once. For example, an array can be created in the argument itself and
C9X adds a new array type called a variable length array type. The
inability to declare arrays whose size is known only at execution time was
necessary for having one’s own implementation for growing/variable length arrays.
void sizingFun(int n)
....
array type is a runtime expression. Previously the length of the arrays was
It should be noted that since the space for variable length array is
sizeof operator can be applied to such arrays to find the sizeof the array,
int sizeOfVarArr = sizeof( int (*)[n] );
[static 10]);
This declaration assures that the variable length arrays passed are
surely of length at-least 10 elements, and the static indicates just that.
So this specifies that the arrays passed are of length at-least 10 (so it
indirectly specifies that the address passed cannot be NULL) and they are
non-overlapping.
And as usual the const can be used to specify that the pointer always
n+=5;
iVarArr another;
So the type remains the same and such side-effects doesn’t affect typedefs.
One of the main design consideration C9X for macros is that there
should be some way provided such that what-ever that is possible to do with
functions should also be made possible using macros. Functions have variable
length argument passing mechanism and C9X introduces the same way of
#define varLenMacro(filePtr,...) \
varLenMacro(stdout, ” %d ” ,someThing);
Additional rule for variable length macro list is: there must be at least
What is the requirement of this new qualifier? For that remember what
we have discussed of the difference between the following two library functions:
“The only difference between these two functions is that memmove() can be used
with overlapping memory area, whereas memcpy() for non-overlapping memory areas
In other words, the problem memmove suffers is called as the problem of ‘aliasing’.
The implementation becomes slow because it cannot assume that the locations pointed by the
’s1’ and ’s2’ are disjoint and so cannot do optimizations on it. Consider the case of the
memcpy(). It explicitly specifies that the ’s1’ and ’s2’ are of disjoint arrays and so efficient
that two different pointers are being used to reference different objects, then
values. If does so it without considering ‘aliasing’, that may give rise to wrong results. In
other words, the compiler cannot do much optimization because of that optimization may
{
int i;
a1[i] += a2[i];
}
here there is a chance that the pointers a1 and a2 refer to the same position.
So optimization cannot be done on this. For such cases if the restrict qualifier
is specified then it means that the pointers a2 and a1 are the pointers that
primarily point to the memory and aliases point only ’through’ them. Since
both are restrict that means they should be the primary means of the
restrict a2)
the ‘restrict’ qualifier provides a standard means with which to make, in the
previously be made only for library functions. The ‘restrict’ keyword allows
ANSI C 1999].
Now lets look how the ordinary declarations be made with restrict qualifier. To
specify that the integer pointer is restricted, you should declare as:
‘const-volatile qualifier’) now becomes ‘cvr qualifier’. Most of the semantics of ‘cv
Point to Ponder:
‘restrict’ is only for additional optimization, so it can safely be deleted from the
program.
The ‘struct hack’ technique we saw in the chapter on ‘structure and unions’ is a
useful one. Since this is a popular and widely used technique, C9X has
standardized the method of such usage. So ‘struct hack’ is now valid code but
The last member of structures now may have an incomplete array type.
struct system{
char type[10];
char manufacturer[20];
int numOfPeripherals;
// this field has the number of items that are pointed by
};
Here the size of the member ‘peripheralID’ can differ according to the
requirement. But (as the standard specifies) only the last element can be such member.
the array but counts any padding before it. This makes the malloc call as
int num = 5;
sizeof(int));
Here note the change that it is num * sizeof(int) and not (num-1) * sizeof(int)
because of the reason that the sizeof operator doesn’t include the ‘flexible array member’
Inline functions are now possible with the help of the keyword ‘inline’ (adapted
from C++). It should however be noted that the ‘inline’ keyword is a is a function-
This is to make the small functions inline such that the overhead for function calls
is avoided. This an alternative for using macros and by using such inline functions the
preprocessor).
17.2.10 Designated initializers
Previously unions can only be initialized with their first member. But
now with the help of designated initializers, unions can be initialized via any
This one of the most basic and expected facilities the C programmers have longed
for. For writing short comments single line comments are very convenient. It is a nasty
bug (that is frequently made) is to forget to give the closing */ comment for starting /*.
With the new // comment that is borrowed from C++ will help programmers to easily
e. g. \Uaa00
This formalizes the idea that the return types have to be explicitly specified.
This will help avoiding errors due to implicit assumption of return types
identifiers (also note that the predefined macros have file scope. E.g.
time.
void funName()
if(someError)
// prints
disadvantages. Now integers need not do all the work of the boolean types since
The numbers before the answers refer to the question numbers in the text.
0.2 In BSS.
static int i = 0;
static int i;
both are equivalent. The variable ‘i’ being explicitly initialized to 0 in the first
0.4 Compile time error. Constant expressions are evaluated at compile-time itself.
2.2 Yes. Char data type can take part in expressions much like as int data type. If the
char is unsigned then no problem arises. However, for the other case a problem
arises. Because the representation of signed variable in the memory (bit pattern)
2.5 Yes.
2.6. Yes. The keyword to remember is non-decreasing order of size and this is the
only restriction laid for the size of these data types. Equal sizes for all these types
shift.
2.8 Although some implementations allow it, it is not assured that the value will be
2.10 No.
2.11 Most of the implementations typedef time_t as int or as long int. If the sizeof long
int or int happens to be 4 bytes, there is a problem. Because time_t stores number
of seconds lapsed after midnight of Jan 1, 1970 and can contain upto 2^32-1
seconds. The number of years that can be represented like this ends up at Jan 18,
2038. So the next day of this day will be interpreted as Jan 1, 1970, and this is
2.12 The output depends on whether value or signed preserving is followed. sizeof
value/sign preserving.
3.2 No Linkage
3.4 No, because the local/register variables have no linkage and are allocated in stack
and microprocessor registers respectively. Linker links the name and its
associated space at link time and since local/register variables are available at
4.4 The functions can never pass an array. So in this function actually an array pointer
is passed (it happened here that size was 2 bytes both for int * and int).
4.5 Compiler error. sizeof cannot be applied on void. Since the function returns void,
5.2 If the body of the loop never executes p is assigned no address. So p remains
NULL where *p =0 may result in problem (may rise to runtime error NULL
pointer assignment)
7.1 If it were a small-endian machine it will print the ASCII equivalent of 0 i.e. 48. If
it were a big-endian machine, it will print the value equivalent of reversing the
bytes.
9.1 while(!feof(fp)){
ch = fgetc(fp);
{
ch=fgetc(fp);
if(ch==’*’)
while( ch=fgetc(fp),!feof(fp) ){
if(ch=='*')
if(checkif(fp,'/')){
ch=fgetc(fp);
break;
9.2 fread reads three records successfully. It will return EOF only when fread tries to
read another record and fails reading EOF (and returning EOF). So it prints the
last record two times and comes out after seeing EOF.)
10.1 The second one is better because gets(inputString) doesn’t know the size of the
string passed and so, if a very big input (here, more than 100 chars) the characters
will be written past the input string. When fgets is used with stdin performs the
10.2 If the str contains any formatting characters like %d then it will result in a subtle
associated with some memory and the object should be of size atleast 1; so the
printf may print the value 1 or a bigger number as the size of the structure.
APPENDIX - II ‘TINYEXPR’ EXPRESSION COMPILER
/*******************************************************************
example support for error detection and recovery are very poor.
Author: S G Ganesh
Date : 3-4-2001
*********************************************************/
/*Header file for opcode mnemonic constants used both in the ‘tinyExpr’
as in Java */
enum mnemonic{
ICONST_M1 = 2,
ICONST_0 = 3,
ICONST_1 = 4,
ICONST_2 = 5,
ICONST_3 = 6,
ICONST_4 = 7,
ICONST_5 = 8,
BIPUSH = 16,
SIPUSH = 17,
ILOAD = 21,
ILOAD_0 = 26,
ILOAD_1 = 27,
ILOAD_2 = 28,
ILOAD_3 = 29,
ISTORE = 54,
ISTORE_0 = 59,
ISTORE_1 = 60,
ISTORE_2 = 61,
ISTORE_3 = 62,
IADD = 96,
ISUB = 100,
IMUL = 104,
IDIV = 108,
IREM = 112,
INEG = 116,
ISHL = 120,
ISHR = 122,
IAND = 126,
IOR = 128,
IXOR = 130,
IINC = 132,
INVOKEMETHOD =186,
RETURN = 177,
};
#include <string.h>
#include <stdio.h>
#include <conio.h>
#include <stdlib.h>
#include <ctype.h>
#include <limits.h>
lexToken advance(void);
parser
void statementExpression();
void preIncrementExpression();
void preDecrementExpression();
void postIncrementExpression();
void postDecrementExpression();
void assignment();
void expr();
void assignmentExpr();
void assignment();
void inclusiveOrExpr();
void exclusiveOrExpr();
void shiftExpr();
void additiveExpr();
void andExpr();
void multiplicativeExpr();
void unaryExpr();
void preIncrementExpr();
void preDecrementExpr();
void unaryExprNotPlusMinus();
void postfixExpr();
void primary();
void name();
void postIncrementExpr();
void postDecrementExpr();
void methodInvocation();
(codegen.c) */
void emitNumConstCode();
if(errType==WARNING)
return advance();
}
else /* error */
getchar();
exit(0);
/*****************************lex.c***********************************/
#define MAXLEN 80
enum bool{false=0,true};
*/
enum lexVals token; /* this holds the current token’s lexical value
*/
int currInteger;
/* the integer value if the current one is a integer constant */
MINUS_MINUS,
LEFT_SHIFT_EQ,
ILLEGAL_TOKEN = -1,
}lexToken;
" ",
};
enum lexVals i;
if(!strcmp(symbol[i],tokenString))
return ILLEGAL_TOKEN;
if(tokenStr)
if(isalpha(tokenStr[0]))
currIdentifier = tokenStr;
temp = IDENTIFIER;
}
else if(isdigit(tokenStr[0])) /* if it starts with a digit then
it is a integer constant */
temp = INTEGER_CONST;
currInteger = atoi(tokenStr);
*/
temp = searchSymbols(tokenStr);
if(temp == ILLEGAL_TOKEN)
expression");
return temp;
else
return ILLEGAL_TOKEN;
new
char *tokenStr;
expression */
initLex = false;
/* strtok places a NULL terminator in front of the token, if
found */
nextToken = getTokenFromString(tokenStr);
token = nextToken;
nextToken = getTokenFromString(tokenStr);
return token;
is
required to be present */
if(token == expected)
advance();
return true;
else
return false;
}
int main()
int i;
clrscr();
while((i=advance())>=0)
if(i==0)
printf("\n %s",currIdentifier);
else if(i==1)
printf("\n %d",currInteger);
else
printf("\n %s",symbol[i]);
#endif
/*************************symtable.c****************************/
/* this program contains code for the symbol table management that is
an array of strings */
char symbolTable[tableSize][tokenSize];
/* sets a new variable provided that name and type are available*/
insert entry");
return -1;
else
strcpy(symbolTable[symbolTop],string);
return symbolTop++;
/* see in symbol table if the string is already been declared else set
int i = symbolTop-1;
while(i >= 0)
{
if(strcmp(symbolTable[i],string)==0)
return i;
i--;
return setVariable(string);
fprintf(outFile,"%c%d%d",IINC,offset,incOrDec);
void emitNumConstCode()
switch(currInteger)
fprintf(outFile,"%c%c", BIPUSH,currInteger);
fprintf(outFile,"%c%d", SIPUSH,currInteger);
switch(offset)
else
fprintf(outFile,"%c%d",ILOAD,offset);
}
switch(offset)
else
fprintf(outFile,"%c%d",ISTORE,offset);
/* ******************************tinyexpr.c *************************/
/* this is the main source file that contains the implementation of the
#include "mnemonic.h"
/* this header file inturn includes standard header files and contains
function declarations*/
#include "errhandl.c"
#include "lex.c"
#include "symtable.c"
/* this file contains the simple symbol table and the functions
operating on it */
#include "codegen.c"
char sourceExpr[MAXLEN];
FILE *outFile;
and other are optional so the function are designed in that way */
preIncrementExpression();
preDecrementExpression();
postIncrementExpression();
postDecrementExpression();
methodInvocation();
assignment();
void preIncrementExpression()
if(token==PLUS_PLUS)
int i;
advance();
i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,1);
emitLoadCode(i);
advance();
void preDecrementExpression()
{
if(token==MINUS_MINUS)
int i;
advance();
i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,-1);
emitLoadCode(i);
advance();
void postIncrementExpression()
int i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,1);
advance();
advance();
void postDecrementExpression()
int i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,-1);
advance();
advance();
void expr()
assignmentExpr();
if(token == COMMA)
advance();
expr();
void assignmentExpr()
assignment();
inclusiveOrExpr();
void assignment()
int i,index;
i=lookUpSet(currIdentifier);
if(token != EQUAL)
emitLoadCode(i);
index = token;
advance();
assignmentExpr();
switch(index)
emitStoreCode(i);
} /* end switch */
{
if(index==EQUAL || index==STAR_EQ || index==DIV_EQ ||
index==LEFT_SHIFT_EQ || index==RIGHT_SHIFT_EQ ||
index==OR_EQ)
return 1;
return 0;
void inclusiveOrExpr()
exclusiveOrExpr();
while(token==BIT_OR)
advance();
exclusiveOrExpr();
fprintf(outFile,"%c",IOR);
void exclusiveOrExpr()
andExpr();
while(token==EXOR)
advance();
andExpr();
fprintf(outFile,"%c",IXOR);
void andExpr()
shiftExpr();
while(token==BIT_AND)
advance();
shiftExpr();
fprintf(outFile,"%c",IAND);
void shiftExpr()
additiveExpr();
while(token==LEFT_SHIFT||token==RIGHT_SHIFT)
int index;
index=token;
advance();
additiveExpr();
if(index==LEFT_SHIFT)
fprintf(outFile,"%c",ISHL);
else
fprintf(outFile,"%c",ISHR);
void additiveExpr()
multiplicativeExpr();
while(token==PLUS||token==MINUS)
int index;
index=token;
advance();
multiplicativeExpr();
if(index == PLUS)
fprintf(outFile,"%c",IADD);
else
fprintf(outFile,"%c",ISUB);
void multiplicativeExpr()
unaryExpr();
int index;
index=token;
advance();
unaryExpr();
switch(index)
void unaryExpr()
if(token==PLUS || token==MINUS)
int index=token;
advance();
unaryExpr();
if(index==PLUS)
; /* do nothing so no opcode */
else if(index==MINUS)
fprintf(outFile,"%c",INEG);
if(token==PLUS_PLUS)
preIncrementExpr();
else if(token==MINUS_MINUS)
preDecrementExpr();
advance();
else
unaryExprNotPlusMinus();
void preIncrementExpr()
int i;
advance();
i=lookUpSet(currIdentifier);
if(i < 0)
else
emitPrePostIncDecCode(i,1);
emitLoadCode(i);
void preDecrementExpr()
int i;
advance();
i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,-1);
emitLoadCode(i);
void unaryExprNotPlusMinus()
if(token==BIT_NOT)
advance();
unaryExpr();
fprintf(outFile,"%c",ICONST_M1);
fprintf(outFile,"%c",IXOR);
else
postfixExpr();
void postfixExpr()
primary();
name();
postIncrementExpr();
postDecrementExpr();
void name()
{
if(token==IDENTIFIER)
int i = lookUpSet(currIdentifier);
emitLoadCode(i);
advance();
void primary()
methodInvocation();
if(token==B_PARENS)
advance();
expr();
verify(E_PARENS);
else if(token==INTEGER_CONST)
emitNumConstCode();
advance();
void postIncrementExpr()
if(token==PLUS_PLUS)
{
int i;
i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,1);
advance();
postfixExpr();
void postDecrementExpr()
if(token==MINUS_MINUS)
int i;
i=lookUpSet(currIdentifier);
emitPrePostIncDecCode(i,-1);
advance();
postfixExpr();
/* at present supports only one API namely "write". You can include
interpreter */
int i=0;
if(!strcmp(APIList[i++],s))
return -1;
void argumentList()
expr();
while(token == COMMA)
advance();
argumentList();
void methodInvocation()
fprintf(outFile,"%c",INVOKEMETHOD);
fprintf(outFile,"%c",index);
verify(E_PARENS);
else
void compile()
outFile = fopen("a.out","wb+");
if(outFile==NULL)
initLex = true;
advance();
expr();
fprintf(outFile,"%c",RETURN);
fclose(outFile);
printf("Successfully compiled");
int main()
printf("\n %s ",prompt);
gets(sourceExpr);
if(strcmp(sourceExpr,"quit")==0)
break;
else if (strcmp(sourceExpr,"exec")==0)
system("interpre.exe");
else
compile();
/****************************interpre.h******************************/
/* the header files inclusion and the prototype declarations for the
compiler go here */
#include<stdio.h>
#include<stdlib.h>
int pop();
void istoreCode();
void iadd();
void isub();
void imul();
void idiv();
void irem();
void ineg();
void ishl();
void ishr();
void iand();
void ior();
void ixor();
void iinc();
void jreturn();
void invokemethod();
void atexitFree();
/************************error.c***********************/
getchar();
exit(0);
/************************ostack.c***********************/
#define MAXSIZE 32
operand[oTop++]=oper;
else
int pop()
if(oTop > 0)
return operand[--oTop];
else
/************************interpre.c***********************/
#include"interpre.h"
#include "mnemonic.h"
mnemonics used */
#include "error.c"
#include "ostack.c"
variable */
stored
execution */
void istoreCode()
localVariable[PC++] = value;
}
void istore(int i)
localVariable[i] = value;
void iadd()
push(value1+value2);
void isub()
push(value1-value2);
void imul()
void idiv()
push(value1/value2);
void irem()
int value2=pop();
int value1=pop();
push(val);
void ineg()
int value=pop();
push(-value);
operators */
void ishl()
int value2=pop();
int value1=pop();
push(value1<<value2);
void ishr()
int value2=pop();
int value1=pop();
push(value1>>value2);
void iand()
void ior()
push(value1 | value2);
}
void ixor()
push(value1 ^ value2);
void iinc()
localVariable[index] += jconst;
void jreturn()
useful */
void invokemethod()
if(byteCode[PC++] == 0)
printf("%d",pop());
switch(functionCode)
PC += 2; break;
case ILOAD : push(localVariable[PC++]); break;
curpos = ftell(stream);
length = ftell(stream);
return length;
void atexitFree()
free(byteCode);
int main()
long length;
int size;
/* open the bytecode file */
if(stream==NULL)
length = fileSize(stream);
if(byteCode==NULL)
atexit(atexitFree);
size = fread(byteCode,1,length,stream);
if(size==0)
\"a.out\"");
callFunction(byteCode[PC++]);
fclose(stream);
}
APPENDIX III - ANSI C Decla ra tions
<errno.h>
clock_t <time.h>
div_t <stdlib.h>
FILE <stdio.h>
fpos_t <stdio.h>
jmp_buf <setjmp.h>
ldiv_t <stdlib.h>
ptrdiff_t <stddef.h>
sig_atomic_t <signal.h>
size_t <stddef.h>
time_t <time.h>
va_list <stdarg.h>
wchar_t <stddef.h>
17.6 Sta nda rd C structure a nd union decla ra tions
17.6.1 <stdlib.h>
typedef struct {
} div_t;
typedef struct {
} ldiv_t;
17.6.2 <locale.h>
struct lconv {
char *decimal_point;
char *thousands_sep;
char *grouping;
char *int_curr_symbol;
char *currency_symbol;
char *mon_decimal_point;
char *mon_thousands_sep;
char *mon_grouping;
char *positive_sign;
char *negative_sign;
char int_frac_digits;
char frac_digits;
char p_cs_precedes;
char p_sep_by_space;
char n_cs_precedes;
char n_sep_by_space;
char p_sign_posn;
char n_sign_posn;
};
17.6.3 <time.h>
struct tm {
effect) */
};
APPENDIX IV - ANSI C IMPLEMENTATION-SPECIFIC STANDARDS
Certain aspects of the ANSI C standard are not defined exactly by ANSI. Instead,
may take the maximum advantage of the underlying hardware. A well-known example is
the size of integer. The programmer can take advantage of the underlying hardware for
maximum efficiency and to access the resources of underlying hardware. This is at the
3.1.2.5 The representations and sets of values of the various types of floating-point
numbers.
3.1.2.5 The representations and sets of values of the various types of integers.
sequence not represented in the basic/extended execution character set for a wide
character constant.
3.1.3.4 The current locale used to convert multibyte characters into corresponding wide
3.1.3.4 The value of an integer constant that contains more than one character, or a
wide character constant that contains more than one multibyte character.
3.2.1.2 The result of converting an integer to a shorter signed integer, or the result of
cannot be represented.
3.3.2.3 What happens when a member of a union object is accessed using a member of
a different type.
3.3.3.4 The type of integer required to hold the maximum size of an array.
3.5.1 The extent to which objects can actually be placed in registers by using the
3.5.2.1 Whether a plain int bit-field is treated as a signed int or as an unsigned int bit
field.
3.5.2.2 The integer type chosen to represent the values of an enumeration type.
3.5.4 The maximum number of declarators that can modify an arithmetic, structure, or
union type.
expression that controls conditional inclusion matches the value of the same
3.8.2 The support for quoted names for includable source files.
3.8.8 The definitions for __DATE__ and __TIME__ when they are unavailable.
4.1.1 The decimal point character.
4.1.5 The null pointer constant to which the macro NULL expands.
4.2 The diagnostic printed by and the termination behavior of the assert function.
functions.
4.3.1 The sets of characters tested for by isalnum, isalpha, iscntrl, islower, isprint and
isupper functions.
4.5.1 Whether the mathematics functions set the integer expression errno to the value
4.5.6.4 Whether a domain error occurs or zero is returned when the fmod function has a
4.7.1.1 The semantics for each signal recognized by the signal function.
4.7.1.1 The default handling and the handling at program startup for each signal
4.7.1.1 If the equivalent of signal(sig, SIG_DFL); is not executed prior to the call of a
4.7.1.1 Whether the default handling is reset if the SIGILL signal is received by a
4.9.2 Whether the last line of a text stream requires a terminating newline character.
4.9.2 Whether space characters that are written out to a text stream immediately
4.9.2 The number of null characters that may be appended to data written to a binary
stream.
4.9.3 Whether the file position indicator of an append mode stream is initially
4.9.3 Whether a write on a text stream causes the associated file to be truncated
4.9.4.2 The effect if a file with the new name exists prior to a call to rename.
4.9.6.2 The interpretation of an - (hyphen) character that is neither the first nor the last
4.9.9.1 The value to which the macro errno is set by the fgetpos or ftell function on
failure.
4.10.3 The behavior of calloc, malloc, or realloc if the size requested is zero.
4.10.4.1 The behavior of the abort function with regard to open and temporary files.
4.10.4.3 The status returned by exit if the value of the argument is ] other than zero,
EXIT_SUCCESS, or EXIT_FAILURE.
4.10.4.4 The set of environment names and the method for altering the environment list
used by getenv.
4.10.4.5 The contents and mode of execution of the string by the system function.
4.12.2.1 The era for clock. The formats for date and time.
Note: The section numbers refer to the ANSI Standard as on February 1990.
.
Suggested Rea dings
This is a ‘must to read’ for anyone serious about C programming. The first edition
served as the only primary source of C reference for nearly a decade. Many intricate parts
of the language can be understood from this book because this book is from the author of
the language itself. Its appendix has a compact reference manual for C.
Now this C reference manual is in its fourth edition. This is an excellent reference
next to [Kernighan and Ritchie 1988] and it has full coverage of basic
building blocks of C like operators and expressions. Its second part explores the C
[Koenig 1989]
communicate ideas. This is a small book, worth reading because C is the language where
programmers can easily make mistakes. The book explores the intricate parts of the
language to make you aware of the traps and to avoid pitfalls that are in C.
[Allison 1998]
This is a very good book for knowing various issues associated with the language
features with innumerable examples. It has a good coverage of practical usage of various
[Allison 1998]
[ANSI C 1989]
[ANSI C 1998]
[Binsock 1987]
Bohm C. and Jacobini G., Flow Diagams, “Turing Machines and languages with
[Dijkstra 1968]
Edsgar Dijstra, “GOTO Statement Considered Harmful”, CACM 11:3, March
1968
[Duff 1984]
James Gosling and Bill Joy, “The Java Programming Language”, Addison-
Wesley, 1995
Corporation, 2001
[Holub 1990]
Johnson S.C. and Ritchie D.M., “The C language calling sequence”, Computing
Science Technical Report No. 102, AT&T Bell Laboratories, Murray Hill, N.J, 1981.
[Kernighan and Ritchie 1978]
[Koenig 1989]
[Kruglinski 1995]
Wesley, 1995
[Ritchie 1978]
[Ritchie et al.]
James Rumbaugh, Michael Blaha, William Perarlani, Frederick Eddy and William
NJ, 1991
[Stroustrup 1986]
MA, 1986
[Stroustrup 1994]
looking up something,
- Franklin P. Adams