16-io-notes
16-io-notes
Hakim Weatherspoon
CS 3410
Computer Science
Cornell University
The slides are the product of many rounds of teaching CS 3410
by Professors Weatherspoon, Bala, Bracy, McKee, and Sirer.
Announcements
Project5 Cache Race Games night Monday, May 7th, 5pm
• Come, eat, drink, have fun and be merry!
• Location: B11 Kimball Hall
Project6: Malloc
• Design Doc due May 9th, bring design doc to mtg May 7-9
• Project due Tuesday, May 15th at 4:30pm
• Will not be able to use slip days
Computer System =
Memory + Datapath + Control + Input + Output
Keyboard Network
Disk
Display
I/O Devices Enables Interacting with Environment
Device Behavior Partner Data Rate (b/sec)
Keyboard Input Human 100
Mouse Input Human 3.8k
Sound Input Input Machine 3M
Voice Output Output Human 264k
Sound Output Output Human 8M
Laser Printer Output Human 3.2M
Graphics Display Output Human 800M – 8G
Network/LAN Input/Output Machine 100M – 10G
Network/Wireless LAN Input/Output Machine 11 – 54M
Optical Disk Storage Machine 5 – 120M
Flash memory Storage Machine 32 – 200M
Magnetic Disk Storage Machine 800M – 3G
Round 1: All devices on one interconnect
Replace all devices as the interconnect changes
e.g. keyboard speed == main memory speed ?!
Memory
Display Disk Keyboard Network
Round 2: I/O Controllers
Decouple I/O devices from Interconnect
Enable smarter I/O interfaces
Core0 Core1
Cache Cache
Memory
Display Disk Keyboard Network
Round 3: I/O Controllers + Bridge
Separate high-performance processor, memory, display
interconnect from lower-performance interconnect
Core0 Core1
Cache Cache
Memory
Display Disk Keyboard Network
Bus Parameters
Width = number of wires
Transfer size = data words per bus transaction
Synchronous (with a bus clock)
or asynchronous (no bus clock / “self clocking”)
Bus Types
Processor – Memory (“Front Side Bus”. Also QPI)
• Short, fast, & wide
• Mostly fixed topology, designed as a “chipset”
– CPU + Caches + Interconnect + Memory Controller
I/O and Peripheral busses (PCI, SCSI, USB, LPC, …)
• Longer, slower, & narrower
• Flexible topology, multiple/varied connections
• Interoperability standards for devices
• Connect to processor-memory bus through a bridge
Example Interconnects
Name Use Devics per Channel Data Rate
channel Width (B/sec)
Firewire 800 External 63 4 100M
USB 2.0 External 127 2 60M
USB 3.0 External 127 2 625M
Parallel ATA Internal 1 16 133M
Serial ATA (SATA) Internal 1 4 300M
PCI 66MHz Internal 1 32-64 533M
PCI Express v2.x Internal 1 2-64 16G/dir
Hypertransport v2.x Internal 1 2-64 25G/dir
QuickPath (QPI) Internal 1 40 12G/dir
Interconnecting Components
Interconnects are (were?) busses
• parallel set of wires for data and control
e.g. Intel
• shared channel
– multiple senders/receivers Xeon
– everyone can see all bus transactions
• bus protocol: rules for using the bus wires
e.g. Intel
Alternative (and increasingly common): Nehalem
• dedicated point-to-point channels
Round 4: I/O Controllers+Bridge+ NUMA
Remove bridge as bottleneck with Point-to-point interconnects
E.g. Non-Uniform Memory Access (NUMA)
Takeaways
Diverse I/O devices require hierarchical
interconnect which is more recently transitioning
to point-to-point topologies.
Next Goal
How does the processor interact with I/O devices?
I/O Device Driver Software Interface
Set of methods to write/read data to/from device and control device
Example: Linux Character Devices
I/O
Controller Network
0x0000 0000 0x0000 0000
Less-favored alternative = Programmed I/O:
• Syscall instructions that communicate with I/O
• Communicate via special device registers
Device Drivers
Programmed I/O Memory Mapped I/O
char read_kbd() struct kbd {
{ char status, pad[3];
do { char data, pad[3];
sleep(); };
kbd *k = mmap(...);
status = inb(0x64);
} while(!(status & syscall
char read_kbd()
1));
{
do {
return inb(0x60);
syscall sleep();
}
Clicker Question: Which is better? status = k->status;
(A) Programmed I/O } while(!(status & 1));
(B) Memory Mapped I/O return k->data;
}
(C) Both have syscalls, both are bad
Device Drivers
Programmed I/O Memory Mapped I/O
char read_kbd() struct kbd {
{ char status, pad[3];
do { char data, pad[3];
sleep(); };
kbd *k = mmap(...);
status = inb(0x64);
} while(!(status & syscall
char read_kbd()
1));
{
do {
return inb(0x60);
syscall sleep();
}
NO status = k->status;
Both polling examples, syscall } while(!(status & 1));
But mmap I/O more efficient return k->data;
}
I/O Data Transfer
How to talk to device?
• Programmed I/O or Memory-Mapped I/O
How to get events?
• Polling or Interrupts
How to transfer lots of data?
disk->cmd = READ_4K_SECTOR; Very,
disk->data = 12; Very,
Expensive
while (!(disk->status & 1) { }
for (i = 0..4k)
buf[i] = disk->data;
Data Transfer
1. Programmed I/O: Device CPU RAM
for (i = 1 .. n)
CPU RAM
• CPU issues read request
• Device puts data on bus
& CPU reads into registers DISK