Macallan SW Arch Overview
Macallan SW Arch Overview
Architecture Overview
November 2013
© 2013 Cisco and/or its affiliates. All rights reserved. Cisco Confidential 1
Macallan Highlights
• Macallan is the next generation modular
enterprise Ethernet switch (next generation
Cat4500)
– Centralized forwarding with Doppler ASICs
(Doppler D, G & E)
– A series of new chasses, supervisors and line cards
– Converged Operating System (Polaris)
– Converged access (wired and wireless)
– Instant access and VSS
Contents
• Architecture overview
– System architecture
– Control plane
– Data plane
• Major platform modules
System Architecture
Macallan System Block Diagram
Chassis
• New chassis: 4 slot, 7 slot and 10 slot
• High availability design in data plane and control
plane
Power Supply
• High speed data link Fan
Tray
– 320G per slot on 4 &7 slot
– 240G per slot on 10 slot
• Standard control link
– High speed PCIe link Supervisor LineCard
– I2C
Passport Supervisor
Control block
(CPU & FPGA) Forwarding
engines
Power
block
uplinks
Supervisor
• Since Macallan is a centralized forwarding system, the
supervisor contains all the control plane features and data
plane features
– In-Service Software Upgrade (ISSU): the linecards continue to
pass traffic when the system is being upgraded
• The system can expand (both the capacity and features)
when we build new supervisors
– The linecards will not need to change
• We planned to build two supervisors
– Passport: low end, Doppler D based
– Imperial: high end, Doppler E based
Line Card Architecture Example
(Bell 48x1GE UPOE)
New 1588
support
5 PCIe
controllers
Ethernet SW Standby
M3
RP
Ardbeg
PCIe
A15 A15
M3 M3
/A7 /A7
Ethernet
DopplerG DopplerG
Linecard Linecard
CPU Architecture with Passport (Cont.)
Board Chip CPU (Cores) OS Ethernet Application
eMMC M3 M3
ROM
ROM
A15/ A15/
M3 M3 M3 M3 M3 M3
A7 A7
DopplerD/G 0
… DopplerD/G
M3
• All M3 and A15/A7 start running • All M3 and A15/A7 start running
Embedded CPUs Shutdown Handling
Doppler D
(100-10-1)x2G
24
Sup<--->Line Card
Backplane 32 for 4 & 7 slot
24 for 10 slot
6 to active 6 to standby
Doppler G
(100-20)G
8 NIF
Line Card Bandwidth
• The line card bandwidth varies in different
chassis and grows with the supervisor
• For example, the bandwidth of 24x10 GE line
card
4 slot 7 slot 10 slot
Passport 3:2 oversubscription 3:1 oversubscription Not supported
(D&G) (line rate on DG, 3:2 (2:1 on DG, 3:2 on D)
on D)
Imperial Line rate 2:1 oversubscription 3:1 oversubscription
(E&G) (3:2 on EG, 4:3 on E) (2:1 on EG, 3:2 on E)
Passport: Centralized BW with DopplerD/G
LC Type # of G # Doppler D Serdes Used Bandwidth/slot (Oversub)
4 slot 7 slot 10 slot 4 slot 7 slot 10 slot
48x1G 1 6x10G 6x10G 6x10G Line rate Line rate Line rate
LC 1
G1 G2
1 2 3 4 5 6 7 8 9 10 11 12
40G with Single Connection
• 40G front panel
must have 40G D D
SLI, can not
have 2x10G or 40G 2x10G
3x10G bundle
• Works the G G
10G mode
Macallan: Centralized Active Standby
DopplerD 3-way Switch Point to Cross Link for Path#2
Path#1 Path#2
40G 40G
Congestion Management (1)
• Internal Flow Control: Within ASIC, use stall or
credit scheme at congestion points to limit the
buffering and achieve fairness
• External Flow Control:
– Port Based Flow Control (PFC)
– Class Cased Flow Control (CBFC)
SIF
12 BCN11Generation
10 AQM
IQS 13
8 9 ESM
OCI
Dopper D
Ingress NIF OCI Egress NIF
7 14
Ingress NIF
16 2 Egress NIF 3
1
Pause frame from network Pause frame to network
Congestion Management (2)
• Port Based Flow Control:
– LC NIF receives pause frame 1
– Ingress NIF passes the PFC to egress NIF 2
– egress NIF arbitor stops requesting data from egress port FIFO
(EPF) for the duration defined in pause message; 3
• Class Based Flow Control:
– LC NIF receives pause frame 1
Doppler D
– Supervisor D receives OCI 7data on ingress SLI and passes
to ESM (egress scheduler manager)
– ESM stops scheduling to the corresponding SLI 8
port/queue
9
Congestion Management (4)
• Backward Congestion Notification (BCN)
– Doppler D AQM compares 10 arrival rate per SLI port/queue, when
exceed limit 11
– AQM sends flow control message to local stack interface (SIF)
12
– SIF generate BCN to all ingress IQS
– IQS in Dopper D receives BCN and combines with ingress flow
13
control status, and send per port flow control info to egress SLI
14
– Egress SLI generate OCI message and sends to LC
15
– LC Doppler G receives OCI and generate flow control to IQS
– IQS flow controls back to ingress NIF
16
– Ingress NIF passes the flow control info to egress NIF and send
pause frame to network port
Major Platform Modules
Polaris for Macallan (Centralized)
Supervisor (Active) CPU Supervisor (Standby) CPU
I/O Complex RP Complex FP Complex (FED) I/O Complex RP Complex FP Complex (FED)
Ap CMA Ap CMA
CMAN- Ap FMA CMAN- Ap FMA
IOSd ps IOSd ps
CC ps N-FP N-FP CC ps N-FP N-FP
Backplane / System HW
LC Module HW LC Module HW
LC Module HW …… LCModule HW
RP Complex
I/O Complex Macallan 4K : Modular w/ central
FP Complex
forwarding architecture
Architecture Baseline
• Macallan software is based on Polaris
– The kernel and tool chain are based on MontaVista CGE
7
• Linux kernel for ARMv8 CPU (version 3.10, 64 bit, big endian)
• 32/64 bit big endian applications
– We will run all applications in the 32 bit mode
– We will migrate to 64 bit mode when Polaris migrates to 64 bit
» Macallan team will migrate the platform specific IOMD and FED
• The platform modules are based on Rudy
– FED 2.0 (including Fed-lite library)
– VSS and Fex infra
Infrastructure Plane
IOSD
chasfs
CMAN RP
PD
CMAN FP CMAN CC
IOMD
PD
Control Plane
IOSd WCM SANET
Fast
Int
License Path Interface
Mgmt
(IDB)
Threa Drivers FMAN RP
Mgmt FMAN RP
d Punt Path SHIM Punt Path
SHIM
FMAN RP
Platform SHIM
SHIM
Forwarding Plane
FMAN RP
FMAN FP
Doppler (D/E/G) PI
PD Hook
provided
PD
Management Plane
Management Plane
CRIMSON
CMAN-RP
CMAN-CC
FMAN-FP
FMAN-RP
chasfs
FED
IOSD
OIR Shim PM Shim IOMD
SW PM
L2/L3 config, status and stats Interface config, status and stats Fast link notification
OIR
IOMD Architecture (Cont.)
• Feature definitions
– Interface (Layer 1) features, including port config,
port status (including interrupt), and port stats
FED 2.0 Architecture
FED in Macallan
RP/FP Complex CC Complex Sup CC Complex
1. Doppler
D library:
SLI, SIF
CMAN-FP
VSS FED CMAN-CC CMAN-CC
Manager Lite-D
6. VSL link
mgmt 1. Doppler
3. Fast Link
D library:
notification 2. Doppler
NIF, SLI
chasfs G library:
IOMD IOMD NIF, SLI
Doppler D Doppler G
FED in Macallan (Cont.)
Interface Description Document
IOS RF/CF
IOSd IOSd
HW HA infra
VSS Centralized
Active chassis ICA Standby chassis ICA
FMAN FMAN
IOSd VSS Mgr IOSd
-RP -RP VSS Mgr
RM
FMAN LC-complex LC-complex
Chasfs FMAN
-FP Chasfs
CMAN- -FP
CMAN-
CC CC
RMC RMC
CMAN- CMAN-
FED IOMD FED IOMD
RP RP
VSS Mgr IOMD IOMD
VSS Mgr
client client
HW Doppler Doppler
HW
HW Doppler HW Doppler
TDL message
IOSD
BFD IPC message (over EOBC SW)
FMAN
Register access (over PCIe)
Punt-n-Inject
Driver BFD packets
FED BFD
config
config
stats
intr
Doppler Interrupt
BFD
Doppler
DPU
Core Other Dopplers
Punt-n-
packet packet
Inject
Driver
Macallan BFD Arch with DPU (Cont.)
• Only the DPU in one Doppler D is used
– The other two Doppler D’s will forward the packet to this Doppler D’s DPU
• No FED knowledge is needed on DPU
• HA
– The BFD in IOSd will sync to standby
– BFD HWO needs to sync?
• Indirect: DPU-IOSd-IOSd-DPU
• Direct: DPU-DPU
• Since BFD is not supported on Cat3k, BFD is a new feature to the FED
• BFD HWO can send a message to IOSd to report the neighbor status
change. It can also trigger an interrupt to accelerate the report. If that
is the case, an ISR is needed to collect the info and notify the BFD in
IOSd quickly
Firmware Consideration
• One alternative is to run firmware on A15/A7
– Each core runs one application in a tight loop without an OS
• Just like M3
– May work if all the applications are similar to the BFD
• Disadvantage
– Complexity: The cores share memory, interrupt, MAC and all the IO
devices. To manage them correctly among all the cores results in a
mini OS
• It actually requires more resources to develop this solution once we have
more than one application
– Scalability: super fast for one core, but does not scale on multiple core
• What if an application can be multithreaded?
FNF
• DPU assistance to FNF
– Collect NF data
– Export them to the collector
– Manage the expiration of the flow
• DPU/RP interaction
– RP programs the NF TCAM
– The DPU in each Doppler D will be active
• Need more investigation
Macallan AVC Arch with DPU
FED
TDL message
config
Flow Table Flow table sync to
other DPU’s
NBAR Data
Plane
The other two
Doppler
Punt-n-
Core
Dopplers have the
Inject
Driver same architecture
packet
DPU
Doppler
AVC with DPU
• This architecture is based on the Polaris NG3K
AVC architecture for wired ports
– DPU is used to assist in the data plane processing
– Since DPU will do the packet inspection, we will
not tightly integrate AVC with FNF (like the NG3K)
DPU Requirements Summary
Requirements BFD AVC
OS Linux Linux
(Polaris?)
Multiple DPU No Yes
Distributed FED (FED components No No
on DPU including Doppler
programming)
DPU Case Study References
• EDCS-845615: IOS BFD Offload Software
Design Specification
• EDCS-902869: Polaris NG3K AVC for Wired
Ports High Level Software Architecture
System Flow on Critical Events
Linecard Insert/Remove
RP Complex FP Complex CC Complex
IOSD PMAN
FMAN-FP
FMAN-RP chasfs
chasfs IOMD
FED
CMAN-RP CMAN-CC
Kernel
HW
OIR notification Spawn/kill
DPIDB (interface)
chasfs update/notification Polling
Linecard Online
RP Complex FP Complex CC Complex
IOMD
IOSD chasfs
chasfs
CMAN-RP CMAN-CC
Kernel
FMAN-RP FMAN-FP
FED
IOSD IOMD
Kernel
HW
A7 M3 A7 M3
DPP
DopplerE DopplerE
M3
RP
System FPGA
PCIe
A15 A15
M3 M3
/A7 /A7
Ethernet
DopplerG DopplerG
Linecard Linecard
CPU Architecture with Imperial (Option 2)
Passport
Standby
Ethernet SW
A7 M3 A7 M3
DopplerE DopplerE
M3 RP/
DPP
System FPGA
PCIe
A15 A15
M3 M3
/A7 /A7
Ethernet
DopplerG DopplerG
Linecard Linecard
CPU Architecture with Imperial
Board Chip CPU OS BIPC Application
Cost • more expensive (need memory and usb flash for the • lowest cost
second CPU)
• If we put DPP and its memory and usb flash on a DB
(make it a FRU), the customer then have a choice to
pay for more data processing capability
Software Complexity • more complicated because there are two OSes • simplest
running
• however, it is still simpler than the Passport model
where there are 3 DPP instances
Ease of use • more complicated because the customer will • easiest
inevitably be exposed that there is a second CPU in
the system. For example, debugging high CPU
utilization in the DPP, collecting core files from DPP
• more images to do ISSU and SR
Imperial: Centralized BW with DopplerE/G
LC Type # of G # Doppler E Serdes Used Bandwidth/slot
4 slot 7 slot 10 slot 4 slot 7 slot 10 slot
48x1G 1 6x10G 6x10G 6x10G Line rate Line rate Line rate
12x10G 2 12x10G 12x10G 12x10G Line Rate Line rate Line rate
pman.sh cmand
Elcaro
chasfs TDL
CPLD Driver issu_boottime.sh
Epoch
(control, process, oir dirs)
Fan
Tray
Supervisor LineCard
Phase 1 Phase 2
Sup Control Func Blocks
Branch
Macallan_dev
FED2.0_dev
Macallan
DTHO
Polaris_dev
Nov 13 Feb 14
May 14
FCS