0% found this document useful (0 votes)
12 views

16-io-notes

Uploaded by

Vishakha Agarwal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

16-io-notes

Uploaded by

Vishakha Agarwal
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

I/O

Hakim Weatherspoon
CS 3410
Computer Science
Cornell University
The slides are the product of many rounds of teaching CS 3410
by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.
Announcements
Project5 Cache Race Games night Monday, May 7th, 5pm
• Come, eat, drink, have fun and be merry!
• Location: B11 Kimball Hall

Prelim2: Thursday, May 3rd in evening


• Time and Location: 7:30pm sharp in Statler Auditorium
• Old prelims are online in CMS

Project6: Malloc
• Design Doc due May 9th, bring design doc to mtg May 7-9
• Project due Tuesday, May 15th at 4:30pm
• Will not be able to use slip days

Lab Sections are Optional this week


• Ask Prelim2 or Project6 questions
Big Picture: Input/Output (I/O)
How does a processor interact with its environment?
Big Picture: Input/Output (I/O)
How does a processor interact with its environment?

Computer System =
Memory + Datapath + Control + Input + Output

Keyboard Network

Disk
Display
I/O Devices Enables Interacting with Environment
Device Behavior Partner Data Rate (b/sec)
Keyboard Input Human 100
Mouse Input Human 3.8k
Sound Input Input Machine 3M
Voice Output Output Human 264k
Sound Output Output Human 8M
Laser Printer Output Human 3.2M
Graphics Display Output Human 800M – 8G
Network/LAN Input/Output Machine 100M – 10G
Network/Wireless LAN Input/Output Machine 11 – 54M
Optical Disk Storage Machine 5 – 120M
Flash memory Storage Machine 32 – 200M
Magnetic Disk Storage Machine 800M – 3G
Round 1: All devices on one interconnect
Replace all devices as the interconnect changes
e.g. keyboard speed == main memory speed ?!

Unified Memory and I/O Interconnect

Memory
Display Disk Keyboard Network
Round 2: I/O Controllers
Decouple I/O devices from Interconnect
Enable smarter I/O interfaces

Core0 Core1
Cache Cache

Unified Memory and I/O Interconnect

Memory I/O I/O I/O I/O


Controller Controller Controller Controller Controller

Memory
Display Disk Keyboard Network
Round 3: I/O Controllers + Bridge
Separate high-performance processor, memory, display
interconnect from lower-performance interconnect

Core0 Core1
Cache Cache

High Performance ge Lower Performance


ri d
Interconnect B Legacy Interconnect

Memory I/O I/O I/O I/O


Controller Controller Controller Controller Controller

Memory
Display Disk Keyboard Network
Bus Parameters
Width = number of wires
Transfer size = data words per bus transaction
Synchronous (with a bus clock)
or asynchronous (no bus clock / “self clocking”)
Bus Types
Processor – Memory (“Front Side Bus”. Also QPI)
• Short, fast, & wide
• Mostly fixed topology, designed as a “chipset”
– CPU + Caches + Interconnect + Memory Controller
I/O and Peripheral busses (PCI, SCSI, USB, LPC, …)
• Longer, slower, & narrower
• Flexible topology, multiple/varied connections
• Interoperability standards for devices
• Connect to processor-memory bus through a bridge
Example Interconnects
Name Use Devics per Channel Data Rate
channel Width (B/sec)
Firewire 800 External 63 4 100M
USB 2.0 External 127 2 60M
USB 3.0 External 127 2 625M
Parallel ATA Internal 1 16 133M
Serial ATA (SATA) Internal 1 4 300M
PCI 66MHz Internal 1 32-64 533M
PCI Express v2.x Internal 1 2-64 16G/dir
Hypertransport v2.x Internal 1 2-64 25G/dir
QuickPath (QPI) Internal 1 40 12G/dir
Interconnecting Components
Interconnects are (were?) busses
• parallel set of wires for data and control
e.g. Intel
• shared channel
– multiple senders/receivers Xeon
– everyone can see all bus transactions
• bus protocol: rules for using the bus wires
e.g. Intel
Alternative (and increasingly common): Nehalem
• dedicated point-to-point channels
Round 4: I/O Controllers+Bridge+ NUMA
Remove bridge as bottleneck with Point-to-point interconnects
E.g. Non-Uniform Memory Access (NUMA)
Takeaways
Diverse I/O devices require hierarchical
interconnect which is more recently transitioning
to point-to-point topologies.
Next Goal
How does the processor interact with I/O devices?
I/O Device Driver Software Interface
Set of methods to write/read data to/from device and control device
Example: Linux Character Devices

// Open a toy " echo " character device


int fd = open("/dev/echo", O_RDWR);
// Write to the device
char write_buf[] = "Hello World!";
write(fd, write_buf, sizeof(write_buf));
// Read from the device
char read_buf [32];
read(fd, read_buf, sizeof(read_buf));
// Close the device
close(fd);
// Verify the result
I/O Device API
Typical I/O Device API
• a set of read-only or read/write registers
Command registers
• writing causes device to do something
Status registers
• reading indicates what device is doing, error codes, …
Data registers
• Write: transfer data to a device
• Read: transfer data from a device

Every device uses this API


I/O Device API
Simple (old) example: AT Keyboard Device

8-bit Status: PE TO AUXB LOCK AL2 SYSF IBS OBS


8-bit Command:
0xAA = “self test” Input Input
Buffer Buffer
0xAE = “enable kbd” Status Status
0xED = “set LEDs”

8-bit Data:
scancode (when reading)
LED state (when writing) or …
Communication Interface
Q: How does program OS code talk to device?
A: special instructions to talk over special busses
Interact with cmd, status, and
Programmed I/O
data device registers directly
• inb $a, 0x64 kbd status register
• outb $a, 0x60 kbd data register
• Specifies: device, data, direction
• Protection: only allowed in kernel mode
Kernel boundary crossinging is expensive

*x86: $a implicit; also inw, outw, inh, outh, …


Communication Interface
Q: How does program OS code talk to device?
A: Map registers into virtual address space
Memory-mapped I/O Faster. Less boundary crossing
• Accesses to certain addresses redirected to I/O devices
• Data goes over the memory bus
• Protection: via bits in pagetable entries
• OS+MMU+devices configure mappings
Memory-Mapped I/O
0xFFFF FFFF
I/O
0x00FF FFFF Controller
Display
Physical
I/O
Address
Virtual Controller
Space Disk
Address
Space e e d - u p on
ag r
for
I/O
lo c atio n s
n ic ation Controller Keyboard
com m u

I/O
Controller Network
0x0000 0000 0x0000 0000
Less-favored alternative = Programmed I/O:
• Syscall instructions that communicate with I/O
• Communicate via special device registers
Device Drivers
Programmed I/O Memory Mapped I/O
char read_kbd() struct kbd {
{ char status, pad[3];
do { char data, pad[3];
sleep(); };
kbd *k = mmap(...);
status = inb(0x64);
} while(!(status & syscall
char read_kbd()
1));
{
do {
return inb(0x60);
syscall sleep();
}
Clicker Question: Which is better? status = k->status;
(A) Programmed I/O } while(!(status & 1));
(B) Memory Mapped I/O return k->data;
}
(C) Both have syscalls, both are bad
Device Drivers
Programmed I/O Memory Mapped I/O
char read_kbd() struct kbd {
{ char status, pad[3];
do { char data, pad[3];
sleep(); };
kbd *k = mmap(...);
status = inb(0x64);
} while(!(status & syscall
char read_kbd()
1));
{
do {
return inb(0x60);
syscall sleep();
}
NO status = k->status;
Both polling examples, syscall } while(!(status & 1));
But mmap I/O more efficient return k->data;
}
I/O Data Transfer
How to talk to device?
• Programmed I/O or Memory-Mapped I/O
How to get events?
• Polling or Interrupts
How to transfer lots of data?
disk->cmd = READ_4K_SECTOR; Very,
disk->data = 12; Very,
Expensive
while (!(disk->status & 1) { }
for (i = 0..4k)
buf[i] = disk->data;
Data Transfer
1. Programmed I/O: Device  CPU  RAM
for (i = 1 .. n)
CPU RAM
• CPU issues read request
• Device puts data on bus
& CPU reads into registers DISK

• CPU writes data to memory

2. Direct Memory Access (DMA): Device  RAM


• CPU sets up DMA request CPU RAM
• for (i = 1 ... n)
Device puts data on bus DISK
& RAM accepts it
• Device
Which interrupts
one is theCPU
winner?
afterWhich
done one is the loser?
DMA Example
DMA example: reading from audio (mic) input
• DMA engine on audio device… or I/O controller … or

int dma_size = 4*PAGE_SIZE;
int *buf = alloc_dma(dma_size);
...
dev->mic_dma_baseaddr = (int)buf;
dev->mic_dma_count = dma_len;
dev->cmd = DEV_MIC_INPUT |
DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;
DMA Issues (1): Addressing
Issue #1: DMA meets Virtual Memory
RAM: physical addresses CPU MMU RAM

Programs: virtual addresses


DISK

Solution: DMA uses physical addresses


• OS uses physical address when setting up DMA
• OS allocates contiguous physical pages for DMA
• Or: OS splits xfer into page-sized chunks
(many devices support DMA “chains” for this reason)
DMA Example
DMA example: reading from audio (mic) input
• DMA engine on audio device… or I/O controller … or

int dma_size = 4*PAGE_SIZE;
void *buf = alloc_dma(dma_size);
...
dev->mic_dma_baseaddr =
virt_to_phys(buf);
dev->mic_dma_count = dma_len;
dev->cmd = DEV_MIC_INPUT |
DEV_INTERRUPT_ENABLE | DEV_DMA_ENABLE;
DMA Issues (1): Addressing
Issue #1: DMA meets Virtual Memory
RAM: physical addresses CPU MMU RAM
Programs: virtual addresses uTLB
DISK

Solution 2: DMA uses virtual addresses


• OS sets up mappings on a mini-TLB
DMA Issues (2): Virtual Mem
Issue #2: DMA meets Paged Virtual Memory
DMA destination page CPU RAM
may get swapped out
DISK

Solution: Pin the page before initiating DMA


Alternate solution: Bounce Buffer
• DMA to a pinned kernel page, then memcpy
elsewhere
DMA Issues (4): Caches
Issue #4: DMA meets Caching
CPU L2 RAM
DMA-related data could
be cached in L1/L2
• DMA to Mem: cache is now stale DISK
• DMA from Mem: dev gets stale data

Solution: (software enforced coherence)


• OS flushes some/all cache before DMA begins
• Or: don't touch pages during DMA
• Or: mark pages as uncacheable in page table entries
– (needed for Memory Mapped I/O too!)
DMA Issues (4): Caches
Issue #4: DMA meets Caching
CPU L2 RAM
DMA-related data could
be cached in L1/L2
• DMA to Mem: cache is now stale DISK
• DMA from Mem: dev gets stale data

Solution 2: (hardware coherence aka snooping)


• cache listens on bus, and conspires with RAM
• DMA to Mem: invalidate/update data seen on bus
• DMA from mem: cache services request if possible,
otherwise RAM services
Programmed I/O vs Memory Mapped I/O
Programmed I/O
• Requires special instructions
• Can require dedicated hardware interface to devices
• Protection enforced via kernel mode access to instructions
• Virtualization can be difficult
Memory-Mapped I/O
• Re-uses standard load/store instructions
• Re-uses standard memory hardware interface
• Protection enforced with normal memory protection
scheme
• Virtualization enabled with normal memory virtualization
scheme
Polling vs. Interrupts
How does program learn device is ready/done?
1. Polling: Periodically check I/O status register
• Common in small, cheap, or real-time embedded systems
+ Predictable timing, inexpensive
– Wastes CPU cycles
2. Interrupts: Device sends interrupt to CPU
• Cause register identifies the interrupting device
• Interrupt handler examines device, decides what to do
+ Only interrupt when device ready/done
– Forced to save CPU context (PC, SP, registers, etc.)
– Unpredictable, event arrival depends on other devices’ activity

Clicker Question: Which is better?


(A) Polling (B) Interrupts (C) Both equally good/bad
I/O Takeaways
Diverse I/O devices require hierarchical interconnect which
is more recently transitioning to point-to-point topologies.

Memory-mapped I/O is an elegant technique to read/write


device registers with standard load/stores.

Interrupt-based I/O avoids the wasted work in


polling-based I/O and is usually more efficient.

Modern systems combine memory-mapped I/O,


interrupt-based I/O, and direct-memory access
to create sophisticated I/O device subsystems.

You might also like