0% found this document useful (0 votes)
296 views

Advanced Buses

The document discusses the AMBA Multi-layer AHB interconnect which enables parallel access between masters and slaves. It is fully compatible with AHB wrappers and is a topology rather than protocol evolution. The multi-layer AHB uses a flexible matrix to connect multiple AHB layers, with arbitration stages to handle requests between layers. It can implement hierarchical systems by making slaves local to layers or connecting multiple slaves and masters per layer.

Uploaded by

tkazuta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
296 views

Advanced Buses

The document discusses the AMBA Multi-layer AHB interconnect which enables parallel access between masters and slaves. It is fully compatible with AHB wrappers and is a topology rather than protocol evolution. The multi-layer AHB uses a flexible matrix to connect multiple AHB layers, with arbitration stages to handle requests between layers. It can implement hierarchical systems by making slaves local to layers or connecting multiple slaves and masters per layer.

Uploaded by

tkazuta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

AMBA Multi-layer AHB

„ Enables parallel access paths between multiple masters and


slaves
„ Fully compatible with AHB wrappers
„ It is a topology (not protocol) evolution
„ Pure combinational matrix (scales poorly)

AHB Interconnect
Slave1
Master1 Matrix

Slave1
AHB
Master2
Slave1
Multi-Layer AHB implementation
„ The matrix is completely flexible and can be adapted
„ MUXes are point arbitration stages
„ AHB layer can be AHB-lite: single master, no
req/grant, no split/retry
Multi-layer AHB
„ A layer loosing arbitration is waited by means of
HREADY
„ When a layer is waited, input stage samples pipelined
address and control signals
Hierarchical systems

• Slaves accessed only by masters on a given layer can


be made local to the layer
Multiple slaves
Multiple slaves appear a single
slave to the matrix
• combine low bandwidth
slaves
• group slaves accessed only
by one master (e.g. DMA
controller)

Alternatively, a slave can be an


AHB-to-APB bridge, thus
allowing connection to multiple
low-bandwidth slaves
Multiple masters per layer

Combine masters that have


low bandwidth requirements
Putting it alltogether…
Interconnect matrix and Slave4
are used for across-layer
communication
Dual port slaves

Common for off-chip SDRAM controllers


• Master1: bandwidth limited high priority traffic with low latency
requirements
• Master2: default traffic
Traffic mismatches
7000000

6000000 Lower speedup!


ƒ Independent tasks (matrix
5000000
multiply) more than 2x
Exec. Time
ƒ With & without semaphore 4000000 Shared

synchronization
Bridging
3000000 MultiLayer
ƒ 8 processors (small cache)
2000000

1000000

0
Semaphore No semaphore

Traffic mismatches degrade topology evolution benefits


Crossbars

„ Application-level speedup at the cost of


increased complexity in crossbar logic
„ Scales poorly
„ area and delay scale with N2
„ Impractical beyond 10x10!
STBus
„ On-chip interconnect solution by STM

„ Multiple outstanding transactions with out-of-order completion


„ Type 1-3: increasing complexity (and performance)
„ Supports Packets (request and response)
„ Support for protection, caches, locking
„ Deployed in a number of large-scale SoCs in STM
Transaction mapping
Transaction Transaction level Split transaction into
request and response
Req Packet Resp Packet Packet level packet pair

Cell level Break each packet down


into a number of tokens
depending on bus width
Signal level

Physical encoding
(e.g., req/gnt handshaking to
E.g., 32 bits STBus. transfer a cell)
LD8 transaction
1 request packet, 1 response packet
1 request cell, 2 response cells
Type 1-2-3

Equivalent to
AHB
functionality
Topology – Shared Bus

Low performance, low cost


Topology – Full Crossbar

High performance, high wiring complexity and cost


Read on STbus
Analysis: Protocol differences

AMBA

STBUS
Protocol matching
STBus node
Upsize converter
STBus at work Downsize converter
Freq. converter

Type 2 128 Bit


VLIW
IPTG T2
IP 1 LX
IPTG 166Mhz
IPTG
Type3
Off-chip
IPTG

IPTG Mem. Ctrl


IP 2
IPTG
IPTG
IPTG LMI
IPTG
IPTG IP 3
IPTG 64 Bit
IPTG T3
IPTG 64 Bit IPTG 250MHz
IP 3
IPTG T3
IPTG 166MHz IP 5
IPTG
Critical overview

„ Protocol is not fully transaction-centric


„ Cannot connect initiator to target directly
„ Packets are atomic on the interconnect
„ Cannot initiate nor receive multiple packets at the same time
„ Large data transfers may starve other initiators
„ Complex bridge engineering
„ Bridges are protocol specific
AMBA 3.0 (AMBA AXI)
• High bandwidth – low latency designs
• High frequency operation
• Flexibility in the implementation
• Backward compatible with AHB and APB

• Burst-based transactions with only first address issued


• Address information can be issues ahead of actual data transfer
• Multiple outstanding addresses
• Out-of-order transaction completion
• easy addition of register stages for timing closure
Topology – Partial Crossbar
Design paradigm change
Master

Master
Slave

Slave
Communication
architecture

AXI AXI Target


Initiator

• Point-to-point interface specification


• Independent of the details of the communication architecture
• Communication architecture can freely evolve
• Transaction-based specification of the interface
• Open Core Protocol (OCP) is another example of this paradigm
Internal data lanes
AXI AXI

Master
Master

Slave
Slave
crossbar
shared
bus
Master

Slave

Master

Slave
Most systems use one of three interconnect approaches:
-shared address and data buses
-Shared address buses and multiple data buses
-Multilayer, with multiple address and data buses
Channel-based Architecture
„ Five groups of signals
„ Read Address “AR” signal name prefix
„ Read Data “R” signal name prefix
„ Write Address “AW” signal name prefix
„ Write Data “W” signal name prefix
„ Write Response “B” signal name prefix
R. ADDRESS W. ADDRESS WRITE DATA

READ DATA RESPONSE

Channels are independent and asynchronous wrt each other


Read transaction

Single address for burst transfers


Write transaction
Channels - One way flow
AWVALID WVALID RVALID BVALID
AWDDR WLAST RLAST BRESP
AWLEN WDATA RDATA BID
AWSIZE WSTRB RRESP BREADY
AWBURST WID RID
AWLOCK WREADY RREADY
AWCACHE
AWPROT „ Channel: a set of unidirectional information
AWID signals
AWREADY
„ Valid/Ready handshake mechanism
„ READY is the only return signal
„ Valid: source IF has valid data/control signals
„ Ready: destination IF is ready to accept data
„ Last: indicates last word of a burst transaction
Burst support
• Variable-length bursts, from 1 to 16 data transfers per burst
• Bursts with a transfer size of 8-1024 bits
• Wrapping, incrementing and non-incrementing bursts
• Atomic operations, using locked accesses
AMBA 2.0 AHB Burst
ADDRESS A11 A12 A13 A14 A21 A22 A23 D31

DATA D11 D12 D13 D14 D21 D22 D23 D31

„ AHB Burst
„ Address and Data are locked together
„ Two pipeline stages
„ HREADY controls pipeline operation
AXI - One Address for Burst

ADDRESS A11 A21 D31

DATA D11 D12 D13 D14 D21 D22 D23 D31

„ AXI Burst
„ One Address for entire burst
AXI - Outstanding
Transactions
ADDRESS A11 A21 D31

DATA D11 D12 D13 D14 D21 D22 D23 D31

„ AXI Burst
„ One Address for entire burst
„ Allows multiple outstanding addresses
Problem:
Slow slave

ADDRESS A11 A21 A31

DATA D11 D12

„ If one slave is very slow, all data is held


up.
Out-of-Order Completion
ADDRESS A11 A21 D31

DATA D21 D22 D23 D31 D11 D12 D13 D14

„ Out of order completion allowed


„ Fast slaves may return data ahead of slow slaves

„ Each transaction has an ID attached (given by the master IF)


„ Channels have ID signals - AID, RID, etc.

„ Transactions with the same ID must be ordered

„ The interconnect in a multi-master system must append

another tag to ID to make each master’s ID unique


AXI - Data Interleaving

ADDRESS A11 A21 D31

DATA D21 D22 D11 D23 D12 D31 D13 D14

„ Returned data can even be interleaved


„ Gives maximum use of data bus
„ Note - Data within a burst is always in order
Burst read
Valid high until ready high

The valid-ready handshake regulates data transfer


Overlapping burst read
Address of second burst anticipated
Burst write
Register slices for max
frequency
WID
„ Channels are WDATA
WSTRB
asynchronous WLAST
WVALID
„ Register slices can WREADY
be applied across
any channel
„ Allows maximum
frequency of operation
by changing delay into latency
„ Allows system topology to be matched to
performance requirements
Comparison
Memorie settate con 2 wait states

wImpossibile nascondere
latenza dell’arbitraggio e
AHB
della risposta degli slave

STBUS low buf


Inizia una nuova richiesta
mentre si processa ancora la
risposta
STBUS high buf
Vengono iniziate più richieste
mentre si processano le
risposte
Il complesso arbitraggio
AXI attua un interleaving
delle transazioni
Scalability
„ Highly parallel benchmark (no slave bottlenecks)
110% 180%
170%
100%
160%
90% 150%
140%
Re la tive e xe cution time

Re la tive e xe cution time


80% 130%
120%
70%
110%
60% 2 Co re s 100% 2 Co re s
4 Co re s 90% 4 Co re s
50% 6 Co re s 80% 6 Co re s
8 Co re s 70% 8 Co re s
40%
60%
30% 50%
40%
20% 30%
20%
10%
10%
0% 0%
AHB AXI S TBus S TBus (B) AHB AXI S TBus S TBus (B)

ƒ 1 kB cache (low bus ƒ 256 B cache (high


traffic) bus traffic)
Scalability
10 0 %
100%
90 %
90%

Inte rc onne c t us a ge e ffic ie nc y


80 %
80%
70 %
70%
Inte rc onne c t bus y

60 %
60%
50 % 2 Core s
50% 2 Core s
4 Core s
4 Core s
40 % 6 Core s
40% 6 Core s
8 Core s
8 Core s
30% 30 %

20% 20 %

10% 10 %

0% 0%
AHB AXI S TBus S TBus (B) AHB AXI STBu s STBu s (B)

„ Increasing contention: AXI, STBus show 80%+


efficiency, AHB < 50%
„ Saturation of shared bus architectures
Networks-on-Chip (NoCs)
Same paradigm of Wide Area Networks and
of large scale multi-processors
IP core
NI
master IP core
NI
master
Packet
switch
TAIL PAYLOAD HEADER switch
NoC switch
IP core
… FLIT NI
FLIT FLIT FLIT switch master

IP core IP core
NI NI
slave IP core slave
NI
slave
Clean separation
at session layer Modularity at HW level Physical design aware
Core issues end-to-end Only 2 building blocks: Path segmentation
transactions network interface Regular routing
Network deals with
lower level issues switch
Shared buses vs NoCs
NoCs Pros….

- Each integrated IP core adds bus load capacitance


+ Only point-to-point one-way links are used

- Bus timing problems in deep sub-micron designs


+ Better suited for GALS paradigm

- Arbiter delay grows with no of masters. Instance-specific arbiter


+ Distributed routing decisions. Reinstantiable switches

- Bus bandwidth is shared among all masters


+ Bus bandwidth scales with network dimension
Shared buses vs NoCs
NoCs Cons….

+ After bus is granted, bus access latency is null


- Unpredictable latency due to network congestion problems

+ Very low silicon cost


- High area cost

+ Simple bus-IP core interface


- Network-IP core interface can be very complex (e.g. packetization,..)

+ Design guidelines are well known


- New design paradigm

You might also like