BDS C File IO Primer
BDS C File IO Primer
Leor Zolman
BD Software
The file I/O library functions provided with BDS C fall into two catagories: "raw"
and "buffered." The raw file functions, typically coded in assembly language for best
performance, are essentially a CP/M-oriented low-level interface where data transfers
always occur in multiples of full CP/M logical sector (128 byte) quantities. The
buffered functions (written in C) provide a byte-oriented, sequential file I/O system
geared especially for "filter'I-type applications; buffering allows you to read and
write data in whatever sized quantities are most convenient while invisible mechanisms
worry about things like sector buffering and actual- disk I/O; thus' the buffered I/O
.functions are usually more convenient to deal with than the raw functions, but they
generate a lot of overhead by being slow and hogging up quite a bit·of memory for code
and buffer space.
Since buffered I/O is composed of raw I/O functions plus some extra code, I'll first
present the raw I/O in detail, and then go onto the buffered functions.
The raw functions are characterized by their concern with "file descriptors". A file
descriptor (fd) is a small integer value that becomes associated with a currently
active file. l'h;is fd is always obtained by. calling either the "open" or "creat'"
functions; their usage is:
All other raw functions require an fd to specify the file to be operated on (except
"Wllink" and "rename", which take filename pointers). The "read" and "write"
functions are used to transfer data to and from disk. Their typical usage is:
1
BDS C File I/O Primer
For each file opened under raw I/O, there exists an invisible "r/w pointer" to keep
track of the next sector to be written or read. Immediately aft~r a file is opened,
the r/w pointer always starts at sector 0 (the first sector) of the file; it is bumped
after "read" and "write" calls by the number of successfully transfered sectors, so
that (by default) the next transfer happens sequentially. One nice extension of the
EDS C raw I/O functions over their REALLY-raw CP/M equivalents is the elimination of
the concept of "extents"; Instead of "extent numbers" and "sector numbers within the
current extent" to be reckoned with for every file, there is only a single 16-bit r/w
pointer to be considered. The value of a file's r/w pointer may be obtained by
calling the "tell" function, and modified by calling "seek".
To illustrate the use of raw I/O in a program, let's build a simple utility to make a
copy of a file. The command format for this utility (which we'll; call "copy") shall
be: ~
This will take the file named by 'filename' and create a copy of it named by
'newname'. Since this is to be a classy utility, we want full error diagnostics in
case something goes wrong (such as running out of disk space, not being able to find
the master file, etc.) This includes checking to make sure that the correct number of
arguments were typed on the command line. It is sometimes convenient to summarize a
program in a half-C/half-English pseudo code form to avoid going in blind; Here is
such a summary of the copy program:
copy(filel,file2) {
if (exactly 2 args weren't given) { complain and abort}
·if (can't open filel) { complain and abort}
if (can't create file2) { complain and abort}
While (not end of filel) {
Read a hunk from filel and write it out to file2;
if (any error has ocurred) { complain and abort }
}
close all files;
}
'And here is the actual C program that implements the above procedure:
2
BDS C File I/O Primer
linclude "bdscio.h" 1* ~he standard header file *1
~define BUFSECTS 64 1* Buffer up to 64 sectors in memory *1
int fdl, fd2; 1* File descriptors for the two files *1
char buffer[BUFSECTS *. SECSIZ]; 1* The transfer buffer *1
main ( argc,argv)
int argc; 1* Arg count *1
char **argv; 1* Arg vector *1
{
int oksects; 1* A temporary variable *1
1* make sure exactly 2 args wer:e given *1
if (argc !- 3)
perrorc'-Usage: A>copy filel file2 <cr>\n");
Now let's take a look at the program. First come the declarations: we need a file
descriptor for each file_ involved in the copying process, and a large array to buffer
up the data as we shuffle chunks of disk files through memory. The size of the buffer
is computed as the sector size (de-fined in BDSCIO.H) times the number of sectors of
buffering desired (defined at the top of this program as BUFSECTS).
In the "main" function, the first thing to do is make sure the correct number of
3
BDS C File 1/0 Primer
arguments were given on the command line. Since the 'argc' parameter is provided free
by;'~he run-time package to every main program, and is always equal to the number of
arg'uments given PLUS ONE, we test to make sure it is equal to three (i.e, that two
argUments were given). If argc is not equal to three, we call "perror" to print out a
complaint and abort the program. "Perror" interprets its arguments as if they were
the· first two arguments to a "print!" call, performs the required "printf" call,
aborts operations on the output file (this wouldn't have any effect if called before
the file is opened; this would be the case if the "argc 1- 3" test succeeds), and
exits to CP/M.
If we make it past the argc test, it is time to try opening files. The next statement
opens the master file for reading, assigns the file descriptor returned by "open" to
the variable 'fdl', and causes the program to be aborted if "open" returned an error.
This "can all done at one time thanks to the power of the C expression evaluator; if
you aren't used to seeing this much happen in one statement, take a moment to follow
the: ~arenthesization carefully. First the call to "open'! is performed, then the
assignment to 'fdl' of the return.value from "open", and then the test to see if that
value' was ERROR. If the value was NOT equal to ERROR, control will pass-onto the next
'if' statement; otherwise, the appropriate call to "perror" diagnoses the problem and
terminates the program. Creating the output file follows exactly the same pattern.
Having made it thro'ugh all the preliminaries, it is time to start copying some data
(finally!). Each time through the 'while' loop, we read as much as we can get (up to
BUFSECTS sectors) into memory from the master file. The "read" function returns the
number of sectors successfully read; this may range from 0 (indicating an end-of-file
rEOF] condition) up·to the number of sectors requested (in this case, BUFSECTS)~ with
a ';;a.lue of ERROR being returned on disaster (when the disk drive door pops open or
something). Whatever this value may be, it is aSSigned to 'oksects' for later_
examination. In the special case when it is equal to zero, indicating EOF, the "while"
lo6p will be exited. Otherwise, we enter the loop and attempt to write back out the
data that we just read in. First, though, we want to make sure no gross error
occurred, so a check is performed to see if ERROR was returned by the "read" call. If
so, it's Abortsville. Having safely circumnavigated Abortsville, we' call "write" to
dump the data into the output file. If we don't succeed in writing the number of
sect.ors we want to write, it's back to Abortsville with an appropriate error message
(most "'''rite errors B,re caused by running out of disk space.) If the "write" succeeds,
we go back to the top of the loop and try to read some more data.
The last thing to do, once the "while" loop has been left, is to mop up by closing the
files. just to be complete, we check to make, sure the output file has closed
coirectly. And that's it.
The:iaw'file I/O functions are most useful when large amounts of data, preferably in
even sector-sized chunks, need to be manipulated. The preceding file-copy program is a
typical application. Raw file I/O requires you to always think in terms of
"sectors"--while this poses no particular problem in, say, the file-copy example. it
does add quite a bit of complexity to shuffling bits and pieces of randomly-sized
data. ConSider, for example, the unit known as the "text-line": A line's worth of
ASCII data may vary in,size anywhere from 1 byte (in the case of a null string,
represented by the terminating null only) up to somewhere around 133 bytes t or maybe
'ev~n more if you're dealing with some really fancy printing device. Anyway, some
convenient ~ethod to read and write these text-lines to and from disk files would bea
very useful thing for text proceSSing applications. Ideally we'd like to be able to
call a single function, passing to it some kind of file descriptor and a pointer to a
The spotlight in the world of buffered I/O is a structure called, amazingly, 'an "I/O
buffer". Within this structure is a large, even-sector sized character array~within
which the data being transferred is stored, and several assorted pointers and
descriptors to keep track of "what's happening" in the data array portion of the
buffer. There's a file descriptor to identify the file in raw I/O operations, there's
a pointer into the data array to tell where the next byte shall be' read from o~·
written to, and there's a counter to tell how many bytes of either data or ·spac~
(depending on whether you're r·eading or writing) are left before it ,becomes necessari~"
to reload or dump the buffer. (1)
Buffered I/O functions use pointers to I/O buffers just as the raw functions use file
descriptors. There are six functions that perform all actual buffered I/O for single
bytes of data; the other buffered I/O functions (such as "fputs" and "fgets lt ) do their
stuff in terms of the six "backbone" functions.
For reading files we have "fopen", "getc", and "fclose". "Fopen" is called t:.Q
associate an eXisting input file with a user~provided I/O buffer area by initializing'
all the variables in that buffer. "Getc" grabs a byte from the buffer, first refilling,
the dacaarray from disk whenever the array is found to be empty, and re~urns ~
special value (Ear) when the end of the file is reached. It Fclose" closes the ,file,
associated ~th an I/O buffer.
For writing files there are Itfcreat" t "putc", "fflush", and "fclose" again ("fclose'~
leads a double existence.)' "Fcreat" creates a new file and prepares an associated I/O.
buffer structure for recieving data. The data is written to the buffer via calls to
"putc U , one byte at a time. When all the data has been "putc"-ed, "fflush" is calle4
to dump out the contents of the not-yet-full I/O buffer to the disk file. Finally,
"fclose" wraps things up by closing the associated file.
The only functions that actually read and write data are "getc" and "putc"; functions
such as "fgets", "fputs", "fprintf", etc. do their reading and writing in te~s C!;.
"ge tc" and tt put c" •
Let's look at a simple first example. The following program prints a given text f11~
out on the console, with line numbers generated on the left margin:
1. The devious user-may wonder why there is space taken for a byte counter, when,t~~
data pOinter could just as well be compar~d to the last array address to detect ,.~
full/ empty buffer. Actually t it ends up being more efficient with the counter,..
because the code required to compare two addresses is usually bulkier than the c0d.',
require~ to decrement a counter and test for zero.
5
BDS C File I/O Primer
/*
PNUM.C: Program to print out a text file ~th
automatic generation of line numbers.
*1
f'include "bdsc io .h"
main(argc ,argv)
char **argv;
{
char ibuf [BUFSIZ) ; /* declare I/O buffer *1
char linbuf[MAlLINE); /* temporary line buffer */
int 1ineno; /* line number variabele 'r */
if (argc 1= 2) { /* make sure file was given */
printf(ttUsage: A>pnum filename <cr> \n");
exit() ;
}
if (fopen(argv[l],ibuf) -- ERROR) {
printf(tlCan't open %s\n",argv[l]);
exit(); .
}
fclose (ibuf) ;
}
The declaration of 'ibuf' provides the I/O buffer area for use ~th "fopen", "getc"
and "fclose". The symbolic constant tlBUFSIZ", defined within the BDSCIO.H header file,
tells how many bytes an I/O buffer must contain; this value will vary with the number
of sectors desired for data buffering. See BDSCIO.H for instructions on how "'1:0_ .
customize the buffered I/O mechanism for a different buffer size (the default is eight
sectors).
After chec:~pgi. the argument count and opening the specified file for ~uffered input,
alltheR,EA.L"w,ork takes place in one simple "while" statement. First the "fgets"
function r~ads a line of text from the file and places it into the 'linbuf"array. As
long as the, end of file isn't ~ncountered, "fgets" will return a non·..zero (true) value
and the bo(iy: ,of, the "wh:tle" statement will be executed. The bon.} c;x" ists of a single
call to "printf",inwhich the current line number is printed out followed by a colon,
space, and': ,,,,~he' current text line. After the value of 'lineno' is used, it is
incremented "(by the ++operator) in preperation for the next iteration. The cycle of
reading and pr.inting lines continues until "fgets" returns zero; at that point the
"while" loop is abandoned and "fclose" wraps things up.
For our final example we have the kind of program known as a "filter". Generally, a
filter reads an input file, performs some kind of transformation on it', and writes the
result out into a new output file. The transformation might be quite complex (like a C
6,
BDS C Fil'e I/O Primer
compilation) or it might be as trivial as the conversion of an input text file to
upper case. Since printing costs are pretty high these days, let's skip the C
c~mpller for the time being and take a look at a To-Upper-Case filter program:
{finclude "bdscio.h"
main(argc,argv)
char **argv;
{
char ibuf[BUFSIZ), obuf[BUFSlZ];
int c;
1f (argc !- 3) {
printf (''Usage: A>ucase file newfile <cr> \n");
exit();
}
if (fopen (argv [,I] ,ibuf) -- ERROR) {
printf{tlCan't open %s\n",argv[l]);
exit() ; l<f'\
}
if (fcreat(argv[2],obuf) -- ERROR)
printf("Can't create %s\ntf,a~k.r2]);
exit();
}
putc(CPMEOF,obuf);
fflush(obuf) ;
fclose(obuf);
fclose(ibuf) ;
\
j
This time there are two buffered I/O streams to be dealt with: the input file and the
output file. The first thing to do is check for the correct number of arguments (in
this case, two: the name of an existing input file, and the name of the output,file .to
be created). Then "fopen" and "fcreat" are called, to open and create the two files"
for buffered I/O. 'If that much succeeds, the main loop is entered and the fun begins~·.'
On each iteration of the loop, a single byte is grabbed from the inputfile;': 'arid},
compared with the two possible end-of-text-file values: EOF andCPMEOF. Normally,:'thEf"
last thi,ng in a text file SHOULD be a CPMEOF '( control-Z) character. But, some'texf
editors (none that I use) neglect to place the CPMEOF character at the end of 'a"file
if the file happens to end exactly on a sector boundary. in this case', CPMEOF:wil!'
never be seen and the physical end-of-file value (EOF) must be detected. The
complication. this causes is rather tricky ••• the EOF value returned by "getc" is -1,
which must be represented as a 16-bit value because "char" variables in BDS C cannot
take on negative values. This is why the variable 'c' is declared as an "int" instead
of a "char" in the above program; if it were declared as a "char", then the
'sub-expression
'7
BDS C File I/O Primer
c - getc(ib'uf)
would result in a value having the type "char" and could never possibly equal EOF as
tested for in the program. Should "getc."ever return EOF in such a case, 'c' would end
up being equal to 255 (the "char" interpretation of the low order 8 bits of the value
EOF) • Thus, '" c' is declared as an, ~!int" so that ,the EOF comparison can make sense.
This is awkward because 'c' is used here for holding characters, and it would be nice
to have it declared as a character variable. There's actually a way to do it, at the
price of complete generality: if the EOF in the comparison were chang'ed to 255, then
"'c' ,,"ould have to be be declared as a "char"!) and the program would work ••• EXCEPT for
when an actual hex FF (decimal 255) byte is encountered in the input filel Now, while
it is a .pretty safe bet to assume there aren't allY h~xFF bytes in your average text
file, there may be exceptions. Also, t.here's no law that says filters can only be
Wr1t.t~n for text files • Consider a program to take a binary file and "unload" it,'
creating an Intel-format HEX file. Would we want it to halt when the first hex FF is
e~countefed? No, the original method is clearly the most general.
" ' . '
Once _, ~avirig ~etermined>~,:~~t" the end-of-file has not been encountered, the body of the
'''while u statement is executed. Here we use "toupper" to convert the charac ter obtained
from "getc" to upper case, and then we use "putc" to write the resulting byte out to
the output file. To be neat, errors are checked for: the program terminates if .G putc"
'returns ERROR.
For 'more examples of the usage of buffered I/O, see CONVERT.C, CCOT.C, 'IABIFY.C and~
"'IELNET.C. Also, take some time to inspect the files BDSCIO.H, STDLIB1.C and STDLIB2.C,
'which contain the sources of all the buffered I/O functions.
8
BDS C File I/O Primer