Chapter4 Buses Memory STM32
Chapter4 Buses Memory STM32
Applications
Chapter 4 (Sections 4.1-4.4)
1
Platform components
CPUs.
Interconnect buses.
Memory.
Input/output devices.
Implementations:
System-on-Chip (SoC) vs. Multi-Chip
Microcontroller vs. microprocessor
Commercial off-the-shelf (COTS) vs. custom
FPGA & Platform FPGA
2
CPU Buses
Mechanism for communication with memories and I/O
devices
Bus components:
signal wires with designated functions
protocol for data transfers
electrical parameters (voltage, current, capacitance, etc.)
physical design (connectors, cables, etc.)
3
Bus Types
Synchronous vs. Asynchronous
Sync: all op’s synchronized to a clock
Async: devices signal each other to indicate start/stop of
operations
May combine sync/async (80x86 “Ready” signal)
Data transfer types:
Processor to/from memory
Processor to/from I/O device
I/O device to/from memory (DMA)
Data bus types
Parallel (data bits transferred in parallel)
Serial (data bits transferred serially)
4
Typical bus data rates
USB Disk
Controller Controller
USB
IDE/SCSI
USB
Device
6
ARM Advanced Microcontroller Bus
Architecture (AMBA)
On-chip interconnect specification for SoC
Promote re-use by defining a common backbone for SoC
modules using standard bus architectures
AHB – Advanced High-performance Bus (system backbone)
High-performance, high clock freq. modules
Processors to on-chip memory, off-chip memory interfaces
APB – Advanced Peripheral Bus
Lower performance requirements
Low-power peripherals
Reduced interface complexity
Others:
ASB – Advanced System Bus (high performance alternate to AHB)
AXI – Advanced eXtensible Interface
ACE – AXI Coherency Extension
ATB – Advanced Trace Bus
7
Example AMBA System
“CoreLink”
(orange blocks)
Interconnect +
memory controller
IP for Cortex/Mali
10
External
Memory
STM32L476G Quad SPI
Microcontroller Memory
AHB2
AHB1
APB1
APB2
11
Microprocessor buses
Clock provides
synchronization.
R/W’ true when reading,
false when writing.
May replace CLK and R/W
with RD and WR strobes
Address is a-bit bundle of
address lines.
Data is n-bit bundle of
data lines.
Data ready signals when
n-bit data is ready.
12
Bus protocols
Bus protocol determines how devices communicate.
Devices on the bus go through sequences of states.
Protocols are specified by state machines,
One state machine per actor in the protocol.
May contain synchronous and/or asynchronous logic
behavior.
Bus protocol often defined by timing diagrams
13
Timing diagrams
14
Typical bus read and write timing
15
Arm AHB: Basic Read Transfer
Simple read transfer with no wait states:
The address phase: The master drives the address and control
signals onto the bus after the rising edge of HCLK.
The data phase: The slave samples the address and control
information and make data available at HRDATA before driving
the appropriate HREADY response.
HCLK
HWRITE
HREADY
Arm AHB: Basic Write Transfer
Simple write transfer with no wait states:
The address phase: The master drives the address and control
signals onto the bus after the rising edge of HCLK and sets
HWRITE to one.
The data phase: The slave samples the address and control
information and captures the data from HWDATA before
driving the appropriate HREADY response.
HCLK
CONTROL Control 0
HWRITE
HREADY
Bus wait state
Extend
read/write
cycle if
memory
slower than
CPU
18
Arm AHB: Read Transfer with Wait State
Address phase (first clock cycle)
Give address and control signals; set HWRITE to one.
Data phase (multiple clock cycles)
The slave holds HREADY to zero if it is not ready to provide its
data; the master delays its next transaction.
When the slave is ready, the data will be given at HRDATA; at
the same time, HREADY is set to one. The master will then
continue its next transaction.
HCLK
CONTROL Control 0
HWRITE
HREADY
Bus burst read
CPU sends
start address,
followed by
burst of data
from
consecutive
addresses
20
State diagrams for bus read
CPU DEVICE
Get Done Ack & Release
data Send ack
data
Yes Yes
Wait
Wait
start
21
Arm AHB Interface
• Capture address and control signals in registers in one
HCLK cycle.
• Transfer the corresponding data in the next HCLK cycle.
AHB Interface
Select signal HSELx rHSELx
HWRITE rHWRITE
CONTROL
signals
HTRANS [1:0] Register rHTRANS [1:0]
Address
Size.
n Row # 2n addressable
r
Memory array words
Row Address width =
Column # Decoder
c n=r+c
Aspect ratio.
Data width d.
Column
Decoder
Data bus
d
connection
25
Memory address decoding
Select a sub-space of memory addresses
A simple example
Microprocessor with 5 address bits (A4 … A0 ) 25 = 32 bytes addressable
Assume 4 byte (4 x 8) memory chip Decodes two address bits (A1 A0 )
µP can address up to 8 chips (decode address bits (A4A3 A2) for chip enable
A2 Decode 2 Byte 2
A3 Upper Enable
A4 Address 3 Byte 3
Bits
Typical generic SRAM
CE#
OE#
WE# SRAM
Address
Data
Decoded A31-24
28
ISSI IS61LV51216 SRAM read cycle
Timing Parameters:
Max data valid times
following activation of
Address, CE, OE
29
STM32 Flexible Static Memory Controller (FSMC)
STM32L4x6 Tech. Ref. Manual, Chap. 16
30
FSMC block diagram
“N” = “negative” (active low)
31
FSMC “Mode 1” memory read
Other modes:
* Provide ADV
(address latch/
advance)
* Activate
OE and WE
only in DATAST
* Multiplex A/D
bits 15-0
* Allow WAIT to
extend DATAST
Programmable parameters
33
Flash memory devices
Flash memory is programmed at system voltages.
Erasure time is long.
Must be erased in blocks.
Available in NAND or NOR structures
NOR: memory cells in parallel – allows random access
NAND: memory cells in series – sequential access/60% smaller
Program memory
35
SST39VF1601- 1M x 16 NOR Flash
(on uCdragon board)
Similar to
SRAM
connection
36
SST39VF1601 characteristics
Organized as 1M x 16
2K word sectors, 32K word blocks
Performance:
Read access time = 70ns or 90ns
Word program time = 7us
Sector/block erase time = 18ms
Chip erase time = 40ms
Check status of write/erase operation via read
DQ7 = complement of written value until write complete
DQ7=0 during erase, DQ7=1 when erase done
37
SST39VF1601 read cycle timing
38
SST39VF1601 command sequences
Assert Address, Data, WE# and CE# to write a command
39
SST39VF1601 word program
1st bus 2nd bus 3rd bus 4th bus TBP = 10μs max
write write write write
cycle cycle cycle cycle
40
Micron 2Gbit NAND flash organization
System transfers data to/from the “Register”
Internal: page copied to Register
Register:
Holds 1 page
Page:
2048 + 64 bytes
Block:
64 pages
Chip:
2048 blocks
41
NAND flash functional block diagram
Bytes/words
sent/received
sequentially
43
Micron Flash Command Set
44
Micron NAND Flash Page Read Operation
25 µs
Page to
register
Five
address
cycles
Capacity-dependent
45
Micron NAND Flash: Program & Erase Op’s
Program (data written to register)
300-700 µs
Data sequence
Erase selected block
3 ms
46
Generic DRAM device
SDRAM: CLK
CE#
R/W#
RAS# DRAM
CAS#
Address
Data
R/W’
RAS’
CAS’
time
48
Asynchronous DRAM page mode access
CE’
R/W’
RAS’
CAS’
time
49
SDRAM burst read (burst length 4)
50
Dynamic RAM refresh
Value decays in approx. 1 ms.
Refresh value by reading it.
Can’t access memory during refresh.
RAS-only refresh
CAS-before-RAS refresh.
Hidden refresh.
Example: 4 Mbyte DRAM
Refreshed every 4 msec (one row at a time)
Organized as 2048 rows x 2048 columns 2048 refreshes
Assume 1 refresh 80 nsec
2048 × 80 ×10 −9
−3
≅ 0.041 4.1% of time spent refreshing
4 ×10
51
Other DRAM forms
Extended data out (EDO): improved page mode access.
Synchronous DRAM: clocked access for pipelining.
All operations clocked
Row address
Column address - increments on clock for each data transfer
Data transfer – burst transfers (one per clock) after initial latency
Double Data Rate (DDR) – transfer on both edges of clock
Effectively doubles the bandwidth
DDR-2: doubles the clock rate of DDR
DDR-3, DDR-4 support increasingly higher bandwidths
Rambus: highly pipelined DRAM.
52
DDR2 bank activate
Burst read command = CS# low, CAS# low, RAS# high, WE# high (and CKE high)
Read Latency RL = AL + CL
CL (programmable) = CAS latency (CAS to data ready)
AL (programmable) = “Additive” Latency
54
Systems with multiple bus masters
Bus master controls operations on the bus.
55
Direct Memory Access (DMA)
DMA data transfers done without executing CPU
instructions.
CPU sets up transfer.
DMA engine fetches, writes.
DMA controller is a separate unit.
Data Ready
56
DMA operation
CPU sets DMA registers for start address, length.
DMA status register controls the unit.
Bus request to CPU – Bus grant back from CPU
DMA controller requests bus mastership from CPU
Once DMA is bus master, it transfers automatically.
May run continuously until complete.
May use every nth bus cycle.
57
Bus transfer sequence diagram
58
System-level performance analysis
Performance depends on
all the elements of the
system:
CPU.
Cache. memory CPU
Bus.
cache
Main memory.
I/O device.
59
Bandwidth as performance
Bandwidth applies to several components:
Memory.
Bus.
CPU fetches.
Different parts of the system run at different clock rates.
Components may have different widths (bus, memory).
60
Bandwidth and data transfers
Video frame: 320 x 240 x 3 = 230,400 bytes.
Need to transfer in 1/30 sec = 0.033 sec
Transfer 1 byte/µsec, 0.23 sec per frame.
Too slow.
To increase bandwidth:
Increase bus width.
Increase bus clock rate.
Minimize overhead (do burst transfers)
61
Bus bandwidth
T: # bus cycles.
P: bus clock period.
Total time for transfer: O1 D O2
t = TP.
W
D: data payload length.
O = O1 + O2 = overhead.
(before & after data)
N = total # data payloads.
W = bus width (bits/xfer) Tbasic(N) = (D+O)N/W
Transfer ND bits
62
Bus burst transfer bandwidth
T: # bus cycles.
P: time/bus cycle.
Total time for transfer: 1 2 B O
t = TP.
… W
D: data payload length.
B: burst size
(#transfers of size D)
O1 + O2 = overhead O.
N = total # data payloads Tburst(N) = (BD+O)*N/(BW)
63
Bus performance bottlenecks
Transfer 320 x 240 video
frame @ 30 frames/sec =
memory CPU
612,000 bytes/sec.
Is performance bottleneck
bus or memory?
64
Memory aspect ratios
16 M
64 M
8M
1 4 8
65
Parallelism
Speed things up by
running several units at
once.
DMA provides parallelism
if CPU doesn’t need the
bus:
DMA + bus.
CPU.
66
Electrical bus design
Bus signals are usually tri-stated.
Address and data lines may be multiplexed.
Every device on the bus must be able to drive the
maximum bus load:
Bus wires.
Other bus devices.
Resistive and capacitive loads.
Bus specification may limit loads
Bus may include clock signal.
Timing is relative to clock.
67
Tristate operation
Enable Enable
E1 E2
E2=0 E2=1
E1=0 float D2
Must prevent
E1=1 D1 conflict E1=E2=1
68