0% found this document useful (0 votes)
11 views

LEC10 Io

The document discusses Unix I/O and file system concepts. It describes how all devices are represented as files in Unix/Linux, and how basic file operations like opening, reading, writing and closing files work via system calls. It also covers file types, pathnames, metadata and standard I/O.

Uploaded by

鄔浚偉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

LEC10 Io

The document discusses Unix I/O and file system concepts. It describes how all devices are represented as files in Unix/Linux, and how basic file operations like opening, reading, writing and closing files work via system calls. It also covers file types, pathnames, metadata and standard I/O.

Uploaded by

鄔浚偉
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Lecture 10

System-Level I/O

Acknowledgement: These slides are based on the textbook


(Computer Systems: A Programmer’s Perspective) and its slides. 1
Outline

 Unix I/O

 Metadata, sharing, and redirection

 Standard I/O

 Closing remarks

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 2


Unix I/O Overview
 A file (in Unix/Linux) is a sequence of m bytes:
 B0 , B1 , .... , Bk , .... , Bm-1

 Interesting fact: All I/O devices are represented as files:


 /dev/sda2 (/usr disk partition)
 /dev/tty2 (terminal)

 The kernel is also represented as a file:


 /boot/vmlinuz-3.13.0-55-generic (kernel image)
 /proc (kernel data structures)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 3


Unix I/O Overview
 Mapping of devices to files allows kernel to export
simple interface called Unix I/O:
 Opening and closing files

open()and close()
 Reading and writing a file
 read() and write()
 Changing the current file position (seek)
 indicates next offset into file to read or write
 lseek()

B0 B1 ••• Bk-1 Bk Bk+1 • • •

Current file position = k


Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 4
File Types
 Each file has a type indicating its role in the system
 Regular file: Contains arbitrary data
 Directory: Index for a related group of files
 Socket: For communicating with a process on another machine

 We ignore the other file types (beyond our scope)


 Named pipes (FIFOs)
 Symbolic links
 Character and block devices

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 5


Regular Files
 A regular file is used to store data
 Applications often distinguish between text files and binary files
 Text files are regular files with only ASCII or Unicode characters
 Binary files are everything else
 e.g., object files, JPEG images
 Text file is sequence of text lines
 Text line is sequence of chars terminated by newline char (‘\n’)
 Newline is 0xa, same as ASCII line feed character (LF)
 End of line (EOL) indicators
 Linux and Mac OS: ‘\n’ (0xa)
line feed (LF)

 Windows & Internet protocols: ‘\r\n’ (0xd 0xa)
 Carriage return (CR) followed by line feed (LF)

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 6


Directories
 Directory consists of an array of links
 Each link maps a filename to a file
 Each directory contains at least two entries
 . (dot) is a link to itself
 .. (dot dot) is a link to the parent directory in the directory hierarchy
(next slide)
 Commands for manipulating directories
 mkdir: create empty directory
 ls: view directory contents
 rmdir: delete empty directory

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 7


Directory Hierarchy
 All files are organized as a hierarchy anchored by root directory
named / (slash)
/

bin/ dev/ etc/ home/ usr/

bash tty1 group passwd droh/ bryant/ include/ bin/

hello.c stdio.h sys/ vim

unistd.h
 Kernel maintains current working directory (cwd) for each process
 Modified using the cd command
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 8
Pathnames
 Locations of files in the hierarchy denoted by pathnames
 Absolute pathname starts with ‘/’ and denotes path from root

/home/droh/hello.c
 Relative pathname denotes path from current working directory
 ../droh/hello.c

/ cwd: /home/bryant

bin/ dev/ etc/ home/ usr/

bash tty1 group passwd droh/ bryant/ include/ bin/

hello.c stdio.h sys/ vim

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition unistd.h 9
Opening Files
 When you open a file 
Informs the kernel that you are ready to access that
file
int fd; /* file descriptor */

if ((fd = open("/etc/hosts", O_RDONLY)) < 0) {


perror("open");
exit(1);
}

 Returns an identifying integer file descriptor


 fd == -1 indicates that an error occurred
 Each process created by a Linux shell begins life with three
open files associated with a terminal:
 0: standard input (stdin)
 1: standard output (stdout)
 2:Computer
Bryant and O’Hallaron, standard
Systems: A error (stderr)
Programmer’s Perspective, Third Edition 10
Closing Files
 When you closing a file 
informs the kernel that you have finished accessing that
file
int fd; /* file descriptor */
int retval; /* return value */

if ((retval = close(fd)) < 0) {


perror("close");
exit(1);
}

 It is important to check the error code;


it is tricky to close a (shared) file in threaded programs
 See: https://linux.die.net/man/2/close

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 11


Reading Files
 Reading a file  copies bytes from the current file position
to memory, and then updates file position
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */

/* Open file fd ... */


/* Then read up to 512 bytes from file fd */
if ((nbytes = read(fd, buf, sizeof(buf))) < 0) {
perror("read");
exit(1);
}

 Returns number of bytes read from file fd into buf


 Return type ssize_t is signed integer
 nbytes < 0 indicates that an error occurred

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 12


Writing Files
 Writing a file  copies bytes from memory to the current file
position, and then updates current file position
char buf[512];
int fd; /* file descriptor */
int nbytes; /* number of bytes read */

/* Open the file fd ... */


/* Then write up to 512 bytes from buf to file fd */
if ((nbytes = write(fd, buf, sizeof(buf)) < 0) {
perror("write");
exit(1);
}

 Returns number of bytes written from buf to file fd


 nbytes < 0 indicates that an error occurred

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 13


Outline

 Unix I/O

 Metadata, sharing, and redirection

 Standard I/O

 Closing remarks

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 14


File Metadata
 Metadata is data about data, in this case file data
 Per-file metadata maintained by kernel
 accessed by users with the stat and fstat functions
/* Metadata returned by the stat and fstat functions */
struct stat {
dev_t st_dev; /* Device */
ino_t st_ino; /* inode */
mode_t st_mode; /* Protection and file type */
nlink_t st_nlink; /* Number of hard links */
uid_t st_uid; /* User ID of owner */
gid_t st_gid; /* Group ID of owner */
dev_t st_rdev; /* Device type (if inode device) */
off_t st_size; /* Total size, in bytes */
unsigned long st_blksize; /* Blocksize for filesystem I/O */
unsigned long st_blocks; /* Number of blocks allocated */
time_t st_atime; /* Time of last access */
time_t st_mtime; /* Time of last modification */
time_t st_ctime; /* Time of last change */
};
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 15
Example of Accessing File Metadata
linux> ./statcheck statcheck.c
type: regular, read: yes
int main (int argc, char **argv) linux> chmod 000 statcheck.c
{ linux> ./statcheck statcheck.c
struct stat stat; type: regular, read: no
char *type, *readok; linux> ./statcheck ..
type: directory, read: yes
Stat(argv[1], &stat);
if (S_ISREG(stat.st_mode)) /* Determine file type */
type = "regular";
else if (S_ISDIR(stat.st_mode))
type = "directory";
else
type = "other";
if ((stat.st_mode & S_IRUSR)) /* Check read access */
readok = "yes";
else
readok = "no";

printf("type: %s, read: %s\n", type, readok);


exit(0);
} statcheck.c
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 16
How the Unix Kernel Represents Open Files
 Two descriptors referencing two distinct open files
 Descriptor 1 (stdout) points to terminal
 Descriptor 4 points to open disk file

Descriptor table Open file table v-node table


[one table per process] [shared by all processes] [shared by all processes]
File A (terminal)
stdin fd 0 File access
stdout fd 1 File size Info in
File pos
stderr fd 2 stat
fd 3 refcnt=1 File type
struct
fd 4

...
...

File B (disk)
File access
File size
File pos
refcnt=1
File type

...
...

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 17


File Sharing
 Two distinct descriptors sharing the same disk file
through two distinct open file table entries
 E.g., Calling open twice with the same filename argument

Descriptor table Open file table v-node table


[one table per process] [shared by all processes] [shared by all processes]
File A (disk)
stdin fd 0 File access
stdout fd 1 File size
File pos
stderr fd 2
fd 3 refcnt=1 File type
fd 4

...
...

File B (disk)

File pos
refcnt=1
...

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 18


How Processes Share Files: fork
 A child process inherits its parent’s open files
 Note: situation unchanged by exec functions (use fcntl to change)
 Before fork call:
Descriptor table Open file table v-node table
[one table per process] [shared by all processes] [shared by all processes]
File A (terminal)
stdin fd 0 File access
stdout fd 1 File size
File pos
stderr fd 2
fd 3 refcnt=1 File type
fd 4

...
...

File B (disk)
File access
File size
File pos
refcnt=1
File type

...
...

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 19


How Processes Share Files: fork
 A child process inherits its parent’s open files
 After fork:
 Child’s table same as parent’s, and +1 to each refcnt
Descriptor table Open file table v-node table
[one table per process] [shared by all processes] [shared by all processes]
Parent File A (terminal)
fd 0 File access
fd 1 File size
File pos
fd 2
fd 3 refcnt=2 File type
fd 4

...
...

Child File B (disk)


File access
fd 0
fd 1 File size
File pos
fd 2
refcnt=2
File type
fd 3

...
...

fd 4
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 20
I/O Redirection
 Question: How does a shell implement I/O redirection?
linux> ls > foo.txt

 Answer: By calling the dup2(oldfd, newfd) function


 Copies (per-process) descriptor table entry oldfd to entry newfd

Descriptor table Descriptor table


before dup2(4,1) after dup2(4,1)
fd 0 fd 0
fd 1 a fd 1 b
fd 2 fd 2
fd 3 fd 3
fd 4 b fd 4 b

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 21


I/O Redirection Example
 Step #1: open file to which stdout should be redirected
 Happens in child executing shell code, before exec

Descriptor table Open file table v-node table


[one table per process] [shared by all processes] [shared by all processes]
File A
stdin fd 0 File access
stdout fd 1 File size
File pos
stderr fd 2
fd 3 refcnt=1 File type
fd 4

...
...

File B
File access
File size
File pos
refcnt=1
File type

...
...

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 22


I/O Redirection Example (cont.)
 Step #2: call dup2(4,1)
 cause fd=1 (stdout) to refer to disk file pointed at by fd=4

Descriptor table Open file table v-node table


[one table per process] [shared by all processes] [shared by all processes]
File A
stdin fd 0 File access
stdout fd 1 File size
File pos
stderr fd 2
fd 3 refcnt=0 File type
fd 4

...
...

File B
File access
File size
File pos
refcnt=2
File type

...
...

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 23


Outline

 Unix I/O

 Metadata, sharing, and redirection

 Standard I/O

 Closing remarks

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 24


Standard I/O Functions
 The C standard library (libc.so) contains a
collection of higher-level standard I/O functions
 Documented in Appendix B of “The C Programming Language”
by B. Kernighan and D. Ritchie
 Examples of standard I/O functions:
 Opening and closing files (fopen and fclose)
 Reading and writing bytes (fread and fwrite)
fopen fdopen
fread fwrite
 Reading and writing text lines (fgets and fputs)
fscanf fprintf  Formatted reading and writing (fscanf and fprintf)
sscanf
sprintf fgets
fputs fflush
fseek Standard I/O
fclose functions
open read
Unix I/O functions
write lseek
(accessed via system calls)
stat close
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 25
Standard I/O Streams
 Standard I/O models open files as streams
 Abstraction for a file descriptor and a buffer in memory

 C programs begin life with three open streams (defined in stdio.h)


 stdin (standard input)
 stdout (standard output)
 stderr (standard error)

#include <stdio.h>
extern FILE *stdin; /* standard input (descriptor 0) */
extern FILE *stdout; /* standard output (descriptor 1) */
extern FILE *stderr; /* standard error (descriptor 2) */

int main() {
fprintf(stdout, "Hello, world\n");
}
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 26
Buffered I/O: Motivation
 Applications often read/write one character at a time
 getc, putc, ungetc
 gets, fgets
 Read line of text one character at a time, stopping at newline
 Implementing as Unix I/O calls expensive
 read and write require Unix kernel calls
 > 10,000 clock cycles
 Solution: Buffered read
 Use Unix read to grab block of bytes
 User input functions take one byte at a time from buffer
 Refill buffer when empty

Buffer already read unread

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 27


Buffering in Standard I/O
 Standard I/O functions use buffered I/O
printf("h");
printf("e");
printf("l");
printf("l");
printf("o");
buf printf("\n");

h e l l o \n . .

fflush(stdout);

write(1, buf, 6);


 Buffer flushed to output fd on “\n”, call to fflush or
exit, or return from main.
Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 28
Outline

 Unix I/O

 Metadata, sharing, and redirection

 Standard I/O

 Closing remarks

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 29


Unix I/O vs. Standard I/O

 Standard I/O is implemented using low-level Unix I/O

fopen fdopen
fread fwrite
fscanf fprintf
sscanf C application program
sprintf fgets
fputs fflush
fseek Standard I/O
fclose functions
open read
Unix I/O functions
write lseek
(accessed via system calls)
stat close

 Which one should you use in your program? It depends.

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 30


Pros and Cons of Unix I/O
 Pros
 Unix I/O is the most general and lowest overhead form of I/O
 Other I/O packages are implemented using Unix I/O functions
 Unix I/O provides functions for accessing file metadata
 Unix I/O functions are async-signal-safe and can be used
safely in signal handlers

 Cons
 Efficient reading of text lines requires some form of
buffering, also tricky and error prone
 These issues are addressed by the standard I/O package

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 31


Pros and Cons of Standard I/O
 Pros:
 Buffering increases efficiency by decreasing the number of
read and write system calls

 Cons:
 Provides no function for accessing file metadata
 Standard I/O functions are not async-signal-safe, and not
appropriate for signal handlers
 Standard I/O is not appropriate for input and output on
network sockets

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 32


Choosing I/O Functions
 General rule: use the highest-level I/O functions you can
 Many C programmers are able to do all of their work using the
standard I/O functions
 But, be sure to understand the functions you use!

 When to use standard I/O?


 When working with disk or terminal files
 When to use raw Unix I/O?
 Inside signal handlers, because Unix I/O is async-signal-safe
 In rare cases when you need absolute highest performance

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 33


Aside: Working with Binary Files

 Functions you should never use on binary files


 Text-oriented I/O such as fgets, scanf
 Interpret EOL characters

 String functions
 strlen, strcpy, strcat
 Interprets byte value 0 (end of string) as special

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 34


For Further Information

 The Unix bible:


 W. Richard Stevens & Stephen A. Rago, Advanced
Programming in the Unix Environment, 2nd Edition, Addison
Wesley, 2005

 The Linux bible:


 Michael Kerrisk, The Linux Programming Interface, No Starch
Press, 2010

Bryant and O’Hallaron, Computer Systems: A Programmer’s Perspective, Third Edition 35

You might also like