0% found this document useful (0 votes)
10 views

Chapter4 Buses Memory STM32

Chapter 4 discusses memory systems for embedded applications, focusing on platform components, CPU buses, and bus types. It details the ARM Advanced Microcontroller Bus Architecture (AMBA) and various memory types including ROM and RAM, highlighting their characteristics and organization. The chapter also covers bus protocols, timing diagrams, and the functionality of different memory devices such as SRAM, DRAM, and flash memory.

Uploaded by

lakshmidc2016
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter4 Buses Memory STM32

Chapter 4 discusses memory systems for embedded applications, focusing on platform components, CPU buses, and bus types. It details the ARM Advanced Microcontroller Bus Architecture (AMBA) and various memory types including ROM and RAM, highlighting their characteristics and organization. The chapter also covers bus protocols, timing diagrams, and the functionality of different memory devices such as SRAM, DRAM, and flash memory.

Uploaded by

lakshmidc2016
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 68

Memory Systems for Embedded

Applications
Chapter 4 (Sections 4.1-4.4)

1
Platform components
 CPUs.
 Interconnect buses.
 Memory.
 Input/output devices.

 Implementations:
 System-on-Chip (SoC) vs. Multi-Chip
 Microcontroller vs. microprocessor
 Commercial off-the-shelf (COTS) vs. custom
 FPGA & Platform FPGA

2
CPU Buses
 Mechanism for communication with memories and I/O
devices
 Bus components:
 signal wires with designated functions
 protocol for data transfers
 electrical parameters (voltage, current, capacitance, etc.)
 physical design (connectors, cables, etc.)

3
Bus Types
 Synchronous vs. Asynchronous
 Sync: all op’s synchronized to a clock
 Async: devices signal each other to indicate start/stop of
operations
 May combine sync/async (80x86 “Ready” signal)
 Data transfer types:
 Processor to/from memory
 Processor to/from I/O device
 I/O device to/from memory (DMA)
 Data bus types
 Parallel (data bits transferred in parallel)
 Serial (data bits transferred serially)
4
Typical bus data rates

Source: Peter Cheung “Computer Architecture & Systems Course Notes”


5
Hierarchical Bus Architecture
Local
CPU Cache
Main Memory
Main Memory
controller
System

LAN Video Mouse/


bridge
Controller Controller Keyboard
Expansion

USB Disk
Controller Controller
USB
IDE/SCSI
USB
Device

6
ARM Advanced Microcontroller Bus
Architecture (AMBA)
 On-chip interconnect specification for SoC
 Promote re-use by defining a common backbone for SoC
modules using standard bus architectures
 AHB – Advanced High-performance Bus (system backbone)
 High-performance, high clock freq. modules
 Processors to on-chip memory, off-chip memory interfaces
 APB – Advanced Peripheral Bus
 Lower performance requirements
 Low-power peripherals
 Reduced interface complexity
 Others:
 ASB – Advanced System Bus (high performance alternate to AHB)
 AXI – Advanced eXtensible Interface
 ACE – AXI Coherency Extension
 ATB – Advanced Trace Bus
7
Example AMBA System

8 Joe Bungo (ARM): CPU Design Concept to SoC


Communication Architecture Standards
Why do we need communication standards?
 Modular design approach
 Allows design reuse
 Facilitates IP integration into an SoC design

Picture source: http://www.ecs.soton.ac.uk/ (SoC Advance design Technique)


ARM CoreLink peripherals for AMBA

“CoreLink”
(orange blocks)

Interconnect +
memory controller
IP for Cortex/Mali

10
External
Memory
STM32L476G Quad SPI
Microcontroller Memory

AHB2
AHB1

APB1
APB2

11
Microprocessor buses
 Clock provides
synchronization.
 R/W’ true when reading,
false when writing.
 May replace CLK and R/W
with RD and WR strobes
 Address is a-bit bundle of
address lines.
 Data is n-bit bundle of
data lines.
 Data ready signals when
n-bit data is ready.

12
Bus protocols
 Bus protocol determines how devices communicate.
 Devices on the bus go through sequences of states.
 Protocols are specified by state machines,
 One state machine per actor in the protocol.
 May contain synchronous and/or asynchronous logic
behavior.
 Bus protocol often defined by timing diagrams

13
Timing diagrams

14
Typical bus read and write timing

15
Arm AHB: Basic Read Transfer
Simple read transfer with no wait states:
 The address phase: The master drives the address and control
signals onto the bus after the rising edge of HCLK.
 The data phase: The slave samples the address and control
information and make data available at HRDATA before driving
the appropriate HREADY response.

HCLK

CONTROL Control 0 Control 1 Control 2 Control 3

HADDR [31:0] Address 0 Address 1 Address 2 Address 3

HRDATA [31:0] Read Data 0 Read Data 1 Read Data 2

HWRITE

HREADY
Arm AHB: Basic Write Transfer
Simple write transfer with no wait states:
 The address phase: The master drives the address and control
signals onto the bus after the rising edge of HCLK and sets
HWRITE to one.
 The data phase: The slave samples the address and control
information and captures the data from HWDATA before
driving the appropriate HREADY response.
HCLK

CONTROL Control 0

HADDR [31:0] Address 0

HWDATA [31:0] Write Data 0

HWRITE

HREADY
Bus wait state

Extend
read/write
cycle if
memory
slower than
CPU

18
Arm AHB: Read Transfer with Wait State
Address phase (first clock cycle)
 Give address and control signals; set HWRITE to one.
Data phase (multiple clock cycles)
 The slave holds HREADY to zero if it is not ready to provide its
data; the master delays its next transaction.
 When the slave is ready, the data will be given at HRDATA; at
the same time, HREADY is set to one. The master will then
continue its next transaction.
HCLK

CONTROL Control 0

HADDR [31:0] Address 0

HRDATA [31:0] Data 0

HWRITE

HREADY
Bus burst read

CPU sends
start address,
followed by
burst of data
from
consecutive
addresses

20
State diagrams for bus read
CPU DEVICE
Get Done Ack & Release
data Send ack
data
Yes Yes

Adrs Ready? Adrs +


Ack?
CE
No
No

Wait
Wait

start

21
Arm AHB Interface
• Capture address and control signals in registers in one
HCLK cycle.
• Transfer the corresponding data in the next HCLK cycle.

AHB Interface
Select signal HSELx rHSELx

HWRITE rHWRITE
CONTROL
signals
HTRANS [1:0] Register rHTRANS [1:0]

HSIZE [1:0] rHSIZE [1:0]

Addres HADDR [31:0] rHADDR [31:0] Peripheral


s
Global signals
HRESETn Module
HCLK

HWDATA [31:0] HWDATA [31:0]


Data HRDATA [31:0]
HRDATA [31:0]

Transfer HREADYOUT HREADYOUT


response
Read-only memory types
 Mask-programmed ROM
 Programmed at factory (high NRE cost)
 PROM (Programmable ROM)
 Programmable once by users (low NRE cost)
 Electric pulses selectively applied to “fuses” or “antifuses”
 EPROM (Erasable PROM)
 Repeatedly programmable/reprogrammable
 Electric pulses for programming (seconds)
 Ultraviolet light for erasing (15-20 minutes)
 EEPROM (Electrically Erasable PROM)
 Electrically programmable and erasable at the single-byte level (msec)
 Flash EPROM
 Electrically programmable (µsec) and
 Electrically erasable (block-by-block: msec to sec)
 Structures: NOR (random access); NAND (sequential access)
 Most common program memory in embedded applications
 Widely used in digital cameras, multimedia players, smart phones, etc.
Read-write memory types
 Static RAM (SRAM)
 Each cell is a flip-flop, storing 1-bit information which is
retained as long as power is on
 Faster than DRAM
 Requires a larger area per cell than DRAM
 Dynamic RAM (DRAM)
 Each cell is a capacitor, which needs to be refreshed
periodically to retain the 1-bit information
 A refresh consists of reading followed by writing back
 Refresh overhead
ROM/RAM device organization
Memory “organization” = 2n x d
(from system designer’s perspective)

Address
 Size.
n Row #  2n addressable
r
Memory array words
Row  Address width =
Column # Decoder
c n=r+c
 Aspect ratio.
 Data width d.
Column
Decoder
Data bus
d
connection
25
Memory address decoding
 Select a sub-space of memory addresses
 A simple example
 Microprocessor with 5 address bits (A4 … A0 )  25 = 32 bytes addressable
 Assume 4 byte (4 x 8) memory chip  Decodes two address bits (A1 A0 )
 µP can address up to 8 chips (decode address bits (A4A3 A2) for chip enable

On-Chip Decoder Memory Array


A0 A Byte 0
0
A1 B
Off-Chip Decoder 1 Byte 1

A2 Decode 2 Byte 2
A3 Upper Enable
A4 Address 3 Byte 3
Bits
Typical generic SRAM

CE#

OE#
WE# SRAM
Address
Data

CE# = chip enable: initiate memory access when active


OE# = output enable: drive Data lines when active
WE# = write enable: update SRAM contents with Data
(May have one R/W# signal instead of OE# and WE#)
Multi-byte data bus devices have a byte-enable signal for each byte.
27
IS61LV51216-12T: 512K x 16 SRAM
(on uCdragon board)

Byte Lane Select


- Upper byte D15-8
- Lower byte D7-0

Decoded A31-24

28
ISSI IS61LV51216 SRAM read cycle

Timing Parameters:
Max data valid times
following activation of
Address, CE, OE

29
STM32 Flexible Static Memory Controller (FSMC)
STM32L4x6 Tech. Ref. Manual, Chap. 16

 Control external memory on AHB bus in four 256M banks


 Upper address bits decoded by the FSMC
1 to 4 static memories:
* SRAM
* Pseudo-Static RAM
* NOR flash
Bank 1 addresses:
A[31:28] = 0110
A[27:26 ]= 64MB chip select
A[25:0] = 64MB chip offset

NAND flash devices

30
FSMC block diagram
“N” = “negative” (active low)

NE[4:1] = NOR/PSRAM enable


NE[1]: A[27:26]=00
NE[2]: A[27:26]=01
NE[3]: A[27:26]=10
NE[4]: A[27:26]=11
NL = address latch/advance
NBL = byte lane
CLK for sync. Burst

A[25:0] = Address bus


D[15:0] = Data bus**
NOE = output enable
NEW = write enable
NWAIT = wait request

** Data bus = 8 or 16 bits

31
FSMC “Mode 1” memory read

Other modes:

* Provide ADV
(address latch/
advance)

* Activate
OE and WE
only in DATAST

* Multiplex A/D
bits 15-0

* Allow WAIT to
extend DATAST

ADDSET/DATAST programmed in chip-select timing register (HCLK = AHB clock)


32
FSMC “Mode 1” memory write

Programmable parameters
33
Flash memory devices
 Flash memory is programmed at system voltages.
 Erasure time is long.
 Must be erased in blocks.
 Available in NAND or NOR structures
 NOR: memory cells in parallel – allows random access
 NAND: memory cells in series – sequential access/60% smaller

Serial access Serial access

Program memory

SLC = Single-Level Cell, MLC = Multi-Level Cell


34
NAND and NOR flash
comparision

NAND flash similar to


a hard disk drive
(sequential access to bits
of a sector)

NOR flash similar to a


Random-access memory
(ROM/RAM)

35
SST39VF1601- 1M x 16 NOR Flash
(on uCdragon board)

Similar to
SRAM
connection

36
SST39VF1601 characteristics

 Organized as 1M x 16
 2K word sectors, 32K word blocks
 Performance:
 Read access time = 70ns or 90ns
 Word program time = 7us
 Sector/block erase time = 18ms
 Chip erase time = 40ms
 Check status of write/erase operation via read
 DQ7 = complement of written value until write complete
 DQ7=0 during erase, DQ7=1 when erase done

37
SST39VF1601 read cycle timing

38
SST39VF1601 command sequences
Assert Address, Data, WE# and CE# to write a command

39
SST39VF1601 word program

1st bus 2nd bus 3rd bus 4th bus TBP = 10μs max
write write write write
cycle cycle cycle cycle

40
Micron 2Gbit NAND flash organization
System transfers data to/from the “Register”
Internal: page copied to Register

Register:
Holds 1 page

Page:
2048 + 64 bytes

Block:
64 pages

Chip:
2048 blocks

41
NAND flash functional block diagram

Bytes/words
sent/received
sequentially

Micron: 2/4/8 Gbit, x8/x16 multiplexed NAND flash


42
Micron Flash Mode Selection
CLE = command latch enable; ALE = address latch enable

43
Micron Flash Command Set

44
Micron NAND Flash Page Read Operation

25 µs

Page to
register

Five
address
cycles
Capacity-dependent
45
Micron NAND Flash: Program & Erase Op’s
Program (data written to register)
300-700 µs

Data sequence
Erase selected block

3 ms

46
Generic DRAM device
SDRAM: CLK
CE#
R/W#
RAS# DRAM
CAS#
Address
Data

RAS# = Row Address Strobe: row# on Address inputs


CAS# = Column Address Strobe: column# on Address inputs
47
Asynchronous DRAM timing
CE’

R/W’

RAS’

CAS’

Adrs row col


adrs adrs
Data data

time

48
Asynchronous DRAM page mode access

CE’

R/W’

RAS’

CAS’

Adrs row col col col


adrs adrs adrs adrs
Data data data data

time
49
SDRAM burst read (burst length 4)

Trcd = RAS-to-CAS delay


CL = CAS latency (CAS to data ready)
Tac = access time

50
Dynamic RAM refresh
 Value decays in approx. 1 ms.
 Refresh value by reading it.
 Can’t access memory during refresh.
 RAS-only refresh
 CAS-before-RAS refresh.
 Hidden refresh.
Example: 4 Mbyte DRAM
Refreshed every 4 msec (one row at a time)
Organized as 2048 rows x 2048 columns  2048 refreshes
Assume 1 refresh  80 nsec
2048 × 80 ×10 −9
−3
≅ 0.041  4.1% of time spent refreshing
4 ×10

51
Other DRAM forms
 Extended data out (EDO): improved page mode access.
 Synchronous DRAM: clocked access for pipelining.
 All operations clocked
 Row address
 Column address - increments on clock for each data transfer
 Data transfer – burst transfers (one per clock) after initial latency
 Double Data Rate (DDR) – transfer on both edges of clock
 Effectively doubles the bandwidth
 DDR-2: doubles the clock rate of DDR
 DDR-3, DDR-4 support increasingly higher bandwidths
 Rambus: highly pipelined DRAM.

52
DDR2 bank activate

Memory partitioned into 8 separate arrays called “banks”


Bank Activate command = CS# low, RAS# low, CAS# high, WE# high (and CKE high)
- Bank address BA2-BA0 selects bank
- Row address A15-A0 selects a row in the bank
Follow with read/write command in next clock cycle
Concurrent Bank Activate commands permitted (up to 8)
53
DDR2 burst read (burst length 4)

Burst read command = CS# low, CAS# low, RAS# high, WE# high (and CKE high)
Read Latency RL = AL + CL
CL (programmable) = CAS latency (CAS to data ready)
AL (programmable) = “Additive” Latency

54
Systems with multiple bus masters
 Bus master controls operations on the bus.

 CPU is default bus master.


 Other devices may request bus mastership.
 Request mastership via separate handshaking lines.
 Main CPU can’t use bus when it is not master.
 Situations for multiple bus masters:
 DMA data transfers
 Multiple CPUs/Cores with shared memory
 Separate graphics/network processor

55
Direct Memory Access (DMA)
 DMA data transfers done without executing CPU
instructions.
 CPU sets up transfer.
 DMA engine fetches, writes.
 DMA controller is a separate unit.

Data Ready

56
DMA operation
 CPU sets DMA registers for start address, length.
 DMA status register controls the unit.
 Bus request to CPU – Bus grant back from CPU
 DMA controller requests bus mastership from CPU
 Once DMA is bus master, it transfers automatically.
 May run continuously until complete.
 May use every nth bus cycle.

57
Bus transfer sequence diagram

58
System-level performance analysis
 Performance depends on
all the elements of the
system:
 CPU.
 Cache. memory CPU
 Bus.
cache
 Main memory.
 I/O device.

59
Bandwidth as performance
 Bandwidth applies to several components:
 Memory.
 Bus.
 CPU fetches.
 Different parts of the system run at different clock rates.
 Components may have different widths (bus, memory).

60
Bandwidth and data transfers
 Video frame: 320 x 240 x 3 = 230,400 bytes.
 Need to transfer in 1/30 sec = 0.033 sec
 Transfer 1 byte/µsec, 0.23 sec per frame.
 Too slow.
 To increase bandwidth:
 Increase bus width.
 Increase bus clock rate.
 Minimize overhead (do burst transfers)

61
Bus bandwidth
 T: # bus cycles.
 P: bus clock period.
 Total time for transfer: O1 D O2

 t = TP.
W
 D: data payload length.
 O = O1 + O2 = overhead.
(before & after data)
 N = total # data payloads.
 W = bus width (bits/xfer) Tbasic(N) = (D+O)N/W
Transfer ND bits

62
Bus burst transfer bandwidth
 T: # bus cycles.
 P: time/bus cycle.
 Total time for transfer: 1 2 B O

 t = TP.
… W
 D: data payload length.
 B: burst size
(#transfers of size D)
 O1 + O2 = overhead O.
 N = total # data payloads Tburst(N) = (BD+O)*N/(BW)

63
Bus performance bottlenecks
 Transfer 320 x 240 video
frame @ 30 frames/sec =
memory CPU
612,000 bytes/sec.
 Is performance bottleneck
bus or memory?

 Bus: assume 1 MHz bus, D=1, O=3:


 Tbasic = (1+3)612,000/2 = 1,224,000 cycles = 1.224 sec.
 Memory: try burst mode B=4, width w=0.5.
 Tmem = (4*1+4)612,000/(4*0.5) = 2,448,000 cycles = 0.2448 sec.

64
Memory aspect ratios

Memory chip formats

16 M
64 M

8M

1 4 8

65
Parallelism
 Speed things up by
running several units at
once.
 DMA provides parallelism
if CPU doesn’t need the
bus:
 DMA + bus.
 CPU.

66
Electrical bus design
 Bus signals are usually tri-stated.
 Address and data lines may be multiplexed.
 Every device on the bus must be able to drive the
maximum bus load:
 Bus wires.
 Other bus devices.
 Resistive and capacitive loads.
 Bus specification may limit loads
 Bus may include clock signal.
 Timing is relative to clock.

67
Tristate operation

Device bus line Device


Data D1 Data D2

Enable Enable
E1 E2

E2=0 E2=1
E1=0 float D2
Must prevent
E1=1 D1 conflict E1=E2=1

68

You might also like