0% found this document useful (0 votes)
162 views

Anatomy of A Linux Bridge

This document discusses the anatomy and implementation of Linux bridges. It describes the key data structures used, including the net_bridge structure that contains bridge-wide configuration and the forwarding database. It outlines the process for creating a bridge instance and adding ports via ioctl calls. Finally, it provides an overview of how frames are processed as they flow through the bridge, with lookups and forwards based on the MAC address table.

Uploaded by

Rashmi Nambiar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
162 views

Anatomy of A Linux Bridge

This document discusses the anatomy and implementation of Linux bridges. It describes the key data structures used, including the net_bridge structure that contains bridge-wide configuration and the forwarding database. It outlines the process for creating a bridge instance and adding ports via ioctl calls. Finally, it provides an overview of how frames are processed as they flow through the bridge, with lookups and forwards based on the MAC address table.

Uploaded by

Rashmi Nambiar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Anatomy of a Linux bridge

Nuutti Varis
Aalto University School of Electrical Engineering,
Department of Communications and Networking
P.O.Box 13000, 00076 Aalto, Finland
Email: {firstname.lastname}@aalto.fi

ABSTRACT tures, such as port MAC address limits, and SNMP mon-
Ethernet is the prevalent Local Area Networking (LAN) itoring of the bridge state. OpenSolaris also implements a
technology, offering a cost efficient way to connect end-hosts bridging subsystem [12] that supports STP, RSTP, or a next
to each other. Local area networks are built by network- generation bridging protocol called TRILL [14].
ing devices called switches, that forward Ethernet frames
between end-hosts in the network. The GNU/Linux oper- There has been relatively little evolution in bridging since
ating system can be used to create a software based switch, the inception of the STP. Switches have evolved in con-
called a bridge. This paper explores the architecture, design, junction with other local area network technologies such
and implementation of the Linux bridging component, and as Virtual LANs [16], while the STP has been incremen-
attempts to chart some of the processing characteristics of tally extended to support these new technologies. Currently,
the frame forwarding operation, inside the bridge and in the there are two practical next-generation solutions for switch-
operating system as a whole. ing: RBridges (TRILL), and the Shortest Path Bridging
(SPB) [1]. Both TRILL and SPB diverge from STP based
bridging in several important ways. Some of the key differ-
1. INTRODUCTION ences are improved loop safety, more efficient unicast for-
Network devices, called switches (or synonymously, bridges) warding, and improved multicast forwarding. Additionally,
are responsible for connecting several network links to each the well known scalability issues [2] of the local area net-
other, creating a local area network. Conceptually, the ma- works, and the advent of data center networking has also cre-
jor components of a network switch are a set of network ated a number of academic research papers, such as SPAIN [10],
ports, a control plane, a forwarding plane, and a MAC learn- Port Land [11], VL2 [6], DCell [7], and BCube [8].
ing database. The set of ports are used to forward traffic
between other switches and end-hosts in the network. The This paper explores the architecture, design and the im-
control plane of a switch is typically used to run the Span- plementation of the Linux bridging module. In addition,
ning Tree Protocol (STP) [15], that calculates a minimum the paper also analyzes the processing characteristics of the
spanning tree for the local area network, preventing physi- Linux bridging module by profiling the kernel during for-
cal loops from crashing the network. The forwarding plane warding, and observing various counters that track the per-
is responsible for processing input frames from the network formance of the processors and the memory in the multi-core
ports, and making a forwarding decision on which network CPU. The design and implementation of STP in the Linux
ports the input frame is forwarded to. bridge module is considered out of scope for this paper.

Finally, the MAC learning database is used to keep track of The rest of the paper is structured as follows. Section 2
the host locations in the LAN. It typically contains an entry presents an overview of the central data structures of the
for each host MAC address that traverses the switch, and Linux bridge, creation of a Linux bridge instance, and the
the input port where the frame was received. The forward- processing flow of an incoming frame. Next, Section 3 de-
ing decision is based on this information. For each unicast scribes the forwarding database functionality of the bridge
destination MAC address, the switch looks up the output implementation. Section 4 describes the experimentation
port in the MAC database. If an entry is found, the frame setup, and analyzes some of the performance related aspects
is forwarded through the port further into the network. If of the bridging module and the operating system. Finally,
an entry is not found, the frame is instead flooded from all Section 5 finishes the paper with some general remarks of
other network ports in the switch, except the port where local area networks and the Linux bridging implementation.
the frame was received. This latter provision is required to
guarantee the ”plug-and-play” nature of Ethernet.
2. OVERVIEW
In addition to Linux, several other operating systems also The architectural overview of the Linux bridging module
implement local area network bridging in the network stack. is divided into three parts. First, the key data structures
FreeBSD has a similar bridging implementation to Linux for the bridging module are described in detail. Next, the
kernel, however the FreeBSD implementation also imple- configuration interface of the Linux bridging module is dis-
ments the Rapid Spanning Tree Protocol (RSTP). The FreeBSD cussed by looking at the bridge creation and port addition
bridge implementation also supports more advanced fea- mechanisms. Finally, the input/output processing flow of
net/bridge/br_ioctl.c net/bridge/br_if.c net/core/dev.c
net_bridge net_bridge_port
net_bridge_fdb_ br_ioctl_deviceless_stub br_add_bridge alloc_netdev
entry lock br SIOCBRADDBR
0..1023
hlist port_list dev net/bridge/br_device.c
SIOCBRADDIF
dst 0..255
dev list br_ioctl_dev Bridge Device br_dev_setup

... ... …
updated hash_lock net/core/dev.c

used hash br_add_if netdev_rx_handler_register


(bridge)
addr ...
netdevice

Figure 2: Linux bridge configuration; adding a


bridge and a bridge port
Figure 1: Primary Linux bridge data structures

The Linux bridging module has two separate configuration


the Linux bridging module is discussed in detail. interfaces exposed to the user-space of the operating system.
The first, ioctl interface offers an interface that can be used
2.1 Data structures to create and destroy bridges in the operating system, and
The Linux bridge module has three key data structures that to add and remove existing network interfaces to/from the
provide the central functionality for the bridge operation. bridge. The second, sysfs based interface allows the manage-
Figure 1 presents an overview of the most important fields ment of bridge and bridge port specific parameters. Figure
and their associations in the three key data structures. The 2 presents a high level overview of the kernel ioctl process,
main data structure for each bridge in the operating system that creates and initializes the bridge object, and adds net-
is the net_bridge. It holds all of the bridge-wide configura- work interfaces to it. The functions on dark grey areas are in
tion information, a doubly-linked list of bridge ports (net_ the generic kernel, while the lighter areas are in the bridge.
bridge_port objects) in the field port_list, a pointer to
the bridge netdevice in the field dev, and the forwarding The creation of a new bridge begins with the ioctl command
database in the field hash. The technical details and the SIOCBRADDBR that takes the bridge interface name as a
functionality of the hash array table are described in 3.1. parameter. The ioctl command is handled by the br_ioctl_
Finally, the field lock is used by the bridge to synchronize deviceless_stub function, as there is no bridge device to
configuration changes, such as port additions, removals, or attach the ioctl handler internally. The addition of a new
changing the various bridge-specific parameters. bridge calls the function br_add_bridge, that creates the
required bridge objects in the kernel, and eventually calls
Each bridge port has a separate data structure net_bridge_ the alloc_netdev function to create a new netdevice for
port, that contains the bridge port specific parameters. The the bridge. The allocated netdevice is then initialized by
field br has a back reference to the bridge that the port the br_dev_setup call, including assigning the bridge device
belongs to. Next, the dev field holds the actual network specific ioctl handler br_dev_ioctl to the newly allocated
interface that the bridge port uses to receive and transmit netdevice. All subsequent bridge specific ioctl calls are done
frames. Finally, position of the data structure object in on the newly created bridge device object in the kernel.
the net_bridge->port_list linked list is stored in the field
list. There are also various configuration parameter fields Ports are added to bridges by the ioctl command SIOCBRAD-
for the port, as well as the port-specific state and timers for DIF. The ioctl command takes the bridge device and the in-
the STP and IGMP [5] snooping features. IGMP snooping dex of the interface to add to the bridge as parameters. The
will be detailed in Section 3.2. ioctl calls the bridge device ioctl handler (br_dev_ioctl),
that in turn calls the br_add_if function. The function is re-
Finally, the third key data structure for the Linux bridge sponsible for creating and setting up a new bridge port by al-
module is the net_bridge_fdb_entry object that represents locating a new net_bridge_port object. The object initial-
a single forwarding table entry. A forwarding table entry ization process automatically sets the interface to receive all
consists of a MAC address of the host (in the field addr), traffic, adds the network interface address for the bridge port
and the port where the MAC address was last seen (in the to the forwarding database as a local entry, and attaches the
field dst). The data structure also contains a field (hlist) interface as a slave to the bridge device. Finally, the func-
that points back to the position of the object in a hash table tion also calls the netdev_rx_handler_register function
array element in net_bridge->hash. In addition, there are that sets the rx_handler of the network interface to br_
two fields, updated and used, that are used for timekeeping. handle_frame, that enables the interface to start processing
The former specifies the last time when the host was seen incoming frames as a part of the bridge.
by this bridge, and the latter specifies the last time when
the object was used in a forwarding decision. The updated
field is used to delete entries from the forwarding database,
2.3 Frame processing
The Linux bridge processing flow begins from lower layers.
when the maximum inactivity timeout value for the bridge
As mentioned above, each network interface that acts as a
is reached, i.e., current time−updated > bridge hold time.
bridge interface, will have a rx_handler set to br_handle_
frame, that acts as the entry point to the bridge frame
2.2 Configuration subsystem processing code. Concretely, the rx_handler is called by
net/core/dev.c net/bridge/br_fdb.c dress group by calling the br_multicast_forward function.
netif_receive_skb __br_fdb_get If no entry is found or selective multicasting is disabled, the
Forwarding
Database
frame will be handled as a broadcast Ethernet frame and
net/bridge/br_input.c
br_fdb_update forwarded by the br_flood_forward function.
br_pass_frame_up
net/bridge/br_forward.c In cases where the destination MAC address of the incom-
br_flood_forward deliver_clone ing frame is multi- or broadcast, the bridge device is set to
br_handle_frame_finish receive all traffic, or the address is matches one of the lo-
br_multicast_forward cal interfaces, a clone of the frame is also delivered upwards
in the local network stack by calling the br_pass_frame_
br_handle_frame br_forward __br_forward
up function. The function updates the bridge device statis-
tics, and passes the incoming frame up the network stack by
net/core/dev.c net/core/dev.c
calling the device independent netif_receive_skb function,
__netif_receive_skb dev_queue_xmit ending the bridge specific processing for the frame.
Incoming Ethernet Frame Outgoing Ethernet Frame
The forwarding logic of the Linux bridge module is im-
plemented in three functions: br_forward, br_multicast_
Figure 3: Architectural overview of the Linux bridge forward, and br_flood_forward, to forward unicast, mul-
module I/O ticast, and broadcast or unknown unicast destination Eth-
ernet frames, respectively. The simplest of the three, the
br_forward function checks whether the destination bridge
interface is in forwarding state, and then either forwards
the device-independent network interface code, in __netif_
the incoming frame as is, clones the frame and forwards the
receive_skb. Figure 3 presents the processing flow of an in-
cloned copy instead by calling the deliver_clone function,
coming frame, as it passes through the Linux bridge module
or doing nothing if the bridge interface is blocked. The br_
to a destination network interface queue.
multicast_forward function performs selective forwarding
of the incoming Ethernet frame out of all of the bridge inter-
The br_handle_frame function does the initial processing
faces that have registered multicast members for the destina-
on the incoming frame. This includes doing initial validity
tion multicast address in the Ethernet frame, or on interfaces
checks on the frame, and separating control frames from nor-
that have multicast routers behind them. The br_flood_
mal traffic, because typically these frames are not forwarded
forward function iterates over all of the interfaces in the
in local area networks. The bridge considers any frame that
bridge, and delivers a clone of the frame through all of them
has a destination address prefix of 01:80:C2:00:00 to be a
except the originating interface. Finally, all three types of
control frame, that may need specialized processing. The
forwarding functions end up calling the __br_forward func-
last byte of the destination MAC address defines the behav-
tion that actually transfers the frame to the lower layers by
ior of the link local processing. Currently, Ethernet pause
calling the dev_queue_xmit function of the interface.
frames are automatically dropped, STP frames are either
passed to the upper layers if it is enabled on the bridge,
or forwarded, when it is disabled. Finally, if a forwarding 3. TECHNICAL DETAILS
decision is made, and the bridge is in either forwarding or The Linux bridge module has two specific components that
learning mode, the frame is passed to br_handle_frame_ are explored in detail in this section. First, the functionality
finish, where the actual forwarding processing begins. of the forwarding database is described in detail. Secondly,
an overview of the IGMP snooping and selective multicas-
The br_handle_frame_finish function first updates the for- ting subsystem of the Linux bridge is given, concentrating
warding database of the bridge with the source MAC ad- on the functional parts of the design.
dress, and the source interface of the frame by calling br_
fdb_update function. The update either inserts a new entry 3.1 Forwarding database
into the forwarding database, or updates an existing entry. The forwarding database is responsible for storing the loca-
tion information of hosts in the LAN. Figure 4 shows the
Next, the processing behavior is decided based on the desti- indexing mechanism for the forwarding table, and the struc-
nation MAC address in the Ethernet frame. Unicast frames ture of the forwarding database array. Internally, the for-
will have the forwarding database indexed with the destina- warding database is an array of 256 elements, where each
tion address by using the __br_fdb_get function to find out element is a singly linked list holding the forwarding table
the destination net_bridge_port where the frame will be entries for the hash value. The hash value for all destination
forwarded to. If a net_bridge_fdb_entry object is found, MAC addresses is calculated by the br_hash_mac function.
the frame will be directly forwarded through the destination
interface by the br_forward function. If no entry is found for The hashing process begins by extracting the last four bytes
the unicast destination Ethernet address, or the destination of the MAC address, creating a 32 bit identifier. The last
address is broadcast, the processing will call the br_flood_ four bytes are chosen because of the address organization in
forward function. Finally, if the frame is a multi-destination MAC addresses. Each 48 bit address consists of two parts.
frame, the multicast forwarding database is indexed with the The first 24 bits specify an Organizationally Unique Identi-
complete frame. If selective multicasting is used and a multi- fier (OUI) that is assigned to the organization that issued
cast forwarding entry is found from the database, the frame the address. The last 24 bits specify an identifier that is
is forwarded to the set of bridge ports for that multicast ad- unique within the OUI. The fragment of the MAC value
Destination Mac Address Net_bridge forwarding table

de ad be ef 00 01 0 fdb_entry net_bridge_mdb net_bridge_mdb


_htable _entry
1 fdb_entry fdb_entry
256..N
jhash_1word fdb_salt mhash hlist net_bridge_port
... ... ... ...
... hlist _group
br_hash_mac 255 fdb_entry fdb_entry de:ad:be:ef:00:01
old ... port
… ports next
1..N
Figure 4: Linux bridge forwarding table indexing
secret ... ...
ver addr
...
used by the bridge contains a single byte of the OUI and all
three bytes of the OUI specific identifier. This guarantees
a sufficiently unique identifier, while still allowing efficient
hashing algorithms to be used. Figure 5: Linux bridge multicast forwarding
database structure
The MAC address fragment, along with a randomly gen-
erated fdb_salt value is passed to a generic single word
hashing function in the Linux kernel, called jhash_1word. database structure, and the relationships between the main
The resulting 32 bit hash value is then bounded to the max- data structures. The multicast forwarding database is con-
imum index in the hash array (i.e., 255) to avoid overflowing. tained in the net_bridge_mdb_htable data structure. The
The forwarding table entry for the destination MAC address field mhash points to a hash array of linked list objects,
is found by iterating over the linked list of the hash array similar to the normal forwarding database. The signifi-
element, pointed by the truncated hash value. cant difference between the normal forwarding database and
the multicast forwarding database is that the hash table
Unused entries in the forwarding table are cleaned up pe- is dynamically resized, based on the number of multicast
riodically by the br_cleanup function, that is invoked by groups registered by the operating system, either from local
the garbage collection timer. The cleanup operation iterates or remote sources. To support the efficient resizing of the
over all the forwarding database entries and releases expired database, a special field old is included in the data struc-
entries back to the forwarding table entry cache. During ture. This field holds the previous version of the multicast
iteration, the function also keeps track of the next invoca- forwarding database. The previous version is temporarily
tion time of the cleanup operation. This is done by keeping stored because the rehashing operation of the multicast for-
track of the next expiration event after the cleanup invoca- warding database is done in parallel with read access to
tion, based on the expiration times of the forwarding table the previous database. This way, the rehashing operation
entries that are still active during the cleanup operation. does not require exclusive access to the multicast forwarding
database, and the performance of the multicast forwarding
operation does not significantly degrade. After the rehash
3.2 IGMP Snooping operation is complete, the old database is removed. Finally,
The IGMP snooping features of the Linux kernel bridge the data structure also contains the field secret, that holds
module allows the bridge to keep track of registered multi- a randomly generated number used by the multicast group
cast entities in the local area network. The multicast group address hashing to generate a hash value for the group.
information is used to selectively forward incoming multicast
Ethernet frames on bridge ports, instead of treating multi- Each multicast group is contained in a net_bridge_mdb_
cast traffic the same way as broadcast traffic. While IGMP entry data structure. The data structure begins with a two
is a network layer protocol, the IPv4 multicast addresses element array hlist. These two elements correspond to the
directly map to Ethernet addresses on the link layer. Con- position of the multicast group entry in the two different
cretely, the mapping allows local area networks to forward versions of the multicast forwarding database. The current
IPv4 multicast traffic only on links that contain hosts that version of the multicast forwarding table is defined by the
use it. This can have a significant effect in the traffic char- net_bridge_mdb_htable->ver field, that will be either 0 or
acteristics of the local area network, if multicast streaming 1. The ports field contains a pointer to a net_bridge_
services, such as IPTV are used by several hosts. port_group data structure that contains information about
a bridge port that is a part of this multicast group. Finally,
IGMP snooping functionality consists of two parts in the the addr field contains the address of the multicast group.
Linux kernel: First, multicast group information is managed
by receiving IGMP messages from end hosts and multicast The third primary data structure for the multicast forward-
capable routers on bridge ports. Next, based on the multi- ing system is the net_bridge_port_group. The data struc-
cast group information, the forwarding decision of the bridge ture holds a pointer to the bridge port, and a pointer to
module selectively forwards received multicast frames on the the next net_bridge_port_group object for a given net_
ports that have reported a member on the multicast group bridge_mdb_entry object. The data structure also contains
address in the Ethernet frame destination address field. This the multicast group address and various timers related to
paper discusses the latter part of the operation by going over the bookkeeping of the multicast group information.
the details of the multicast forwarding database, and the
multicast forwarding database lookup. The multicast forwarding database lookup is similar to the
forwarding table lookup. Figure 6 presents an overview of
Figure 5 presents an overview of the multicast forwarding the operation. The hashing function takes two separate
Multicast IPv4 Group Address net_bridge multicast forwarding table
perf
224 0 0 1 0 mdb_entry
Linux Kernel
1 mdb_entry mdb_entry
jhash_1word secret Spirent Testcenter Profiler framework
... ... ... ...

br_ip_hash N mdb_entry mdb_entry 224.0.0.1


RFC 2889 Port0 Port0
Traffic Bridge
Generator Port1 Port1
Figure 6: Linux bridge multicast forwarding
database indexing

Figure 7: Experiment environment


values and passes them to a generic hashing function in
the Linux kernel (e.g., jhash_1word), similar to the MAC
address hashing operation. For IPv4, the full multicast The kernel was instrumented to collect two different per-
group address and the contents of the field net_bridge_ formance events during the testing: used clock cycles, and
mdb_htable->secret field are passed to the hashing func- cache references and misses. The cycles can be used as an
tion, resulting in a hash value. IPv6 uses a different hashing estimator on the distribution of CPU processing inside the
function that takes the full 128-bit address as an array of kernel. Cache references and cache misses can be used to es-
4 32-bit integers. The hash value is then bounded to the timate the workload of the memory subsystem in two ways.
maximum index of the multicast forwarding database hash Each cache reference and miss can be likened to an oper-
array. As with the normal forwarding table, the correct net_ ation in the CPU, that requires the program to access the
bridge_mdb_entry is found by iterating over all the elements main memory of the system. Cache reference occurs, when
in the linked list, pointed by the bounded hash value. the accessed information is found in a cache, avoiding an
expensive main memory access. Conversely, a cache miss
happens when the information is not available in any of the
4. EXPERIMENTATION caches of the CPU, and the operation requires an expensive
Packet processing on generic hardware is generally seen as
access to the main memory of the computer.
memory intensive work [3, 4]. The experimentation in this
paper explore the processing distribution between the differ-
ent components of the system during the forwarding process. 4.2 Results
Table 1 presents the distribution of work between the dif-
ferent subsystems of the Linux kernel during the forwarding
4.1 Evaluation Setup test with 64 byte frames. The results are given as a per-
Figure 7 presents the experiment environment. It consists cent of the total number of event counters collected in the
of a Spirent Testcenter traffic generator, and a Linux server tests. The work is divided into four different subsystems:
using kernel version 3.5.3, with a bridge acting as the Device the bridge module, the network interface card driver and the
Under Test (DUT). The Spirent Testcenter generates a full network device API, the locking mechanism of the kernel for
duplex stream of Ethernet frames that are forwarded by the shared data structures, and the memory management.
DUT using two 1Gbps network interface ports. The Linux
kernel on the server collects performance statistics during
the tests using the built-in profiling framework in the kernel. Table 1: Performance event data distribution for
RFC 2889 forwarding test
Hosts
The performance framework is controlled from the user space
2 2048 2 2048 2 2048
by the perf tool [13]. The tool offers commands to man- Subsystem Cycles% Cache Ref% Cache Miss%
age the performance event data collection, and to study Interface 45.7% 40.5% 55.0% 42.2% 77.5% 77.9%
the results. To collect performance event data, the user Bridge 21.0% 29.2% 11.1% 31.5% 4.2% 3.8%
defines a list of either pre-defined performance events that Memory/IO 19.6% 17.2% 28.8% 22.0% 5.1% 5.4%
are mapped to CPU-specific performance events by the tool, Locks 13.7% 13.2% 5.2% 4.3% 13.2% 12.9%
or raw performance events that can typically be found from
the reference guide of the CPU or architecture model. Almost 46% of the CPU cycles are spent in the device driver
and the network device independent layer of the network
To generate usable performance event data, the Spirent Test- stack. Next, the Linux bridging module and the memory
center was used to run the RFC 2889 [9] forwarding test with management of the kernel are spending roughly 20% of the
64 byte frames to determine the maximum forwarding rate of cycles each. Finally, the locking mechanism of the kernel is
the DUT. The forwarding test performs several forwarding taking up the last 15% of cycles. As the number of hosts
runs, and determines the maximum forwarding rate of the in the test increases from two to 2048, we can see that the
DUT by using a binary search like algorithm to narrow the bridge uses a larger portion of the overall cycles. The in-
forwarding rate to within a percent of the maximum. Then, crease in used cycles is related to the organization of the
five separate tests using the maximum forwarding rate with hash array in the forwarding database.
performance event data collection were run with one and
1024 Ethernet hosts on each port. The reason the perfor- The network interface and the device driver are also respon-
mance event data collection was done this way was to elim- sible for 55% of the cache references, and 78% of the cache
inate the effects of frame discarding from the results, due to misses, when there are two hosts in the LAN. We can also see
receiving too many frames from the traffic generator. a similar trend with the Linux bridging module here, as with
the cycle use. When the number of hosts increases from two The Linux kernel contains a bridge module that can be
to 2048, the Linux bridging module uses significantly larger used to create local area networks by combining network
portion of memory operations (and thus, caching operations) interface ports of a computer under a single bridge. While
to update and query the forwarding database. Linux bridges are not able to compete with specialized ven-
dor hardware in performance, Linux bridging can be used in
Table 2 presents the distribution of work in the Linux bridge environments where performance is not the priority.
module between the four busiest functions during the for-
warding test. The results are given as a percent of the total The experimentation conducted for this paper explored the
number of event counters collected in the tests. Note that performance characteristics of the Linux kernel during the
the table only holds four of the 13 different functions that bridge operation. The results show that most of the process-
participate in the DUT forwarding operation. ing time is consumed by the device driver and the network
interface, instead of the bridge. We can also see that mod-
ern most of the packet forwarding to occur inside the caches
Table 2: Performance event distribution for of the CPU. The evaluation also shows a significant increase
RFC2889 forwarding test in the bridge module in processing requirements in the bridge module, when the
Hosts number of hosts in the LAN is significant increased.
2 2048 2 2048 2 2048
Function Cycles% Cache Ref% Cache Miss%
nf iterate 19.6% 13.2% 2.3% 3.4% 12.1% 8.8% 6. REFERENCES
br fdb update 18.2% 26.1% 42.0% 39.0% 0.1% 0.3% [1] D. Allan et al. Shortest path bridging: Efficient
br handle frame 13.5% 8.6% 2.7% 1.1% 3.7% 6.9% control of larger ethernet networks. IEEE
br fdb get 10.0% 23.6% 41.3% 42.9% 0.1% 0.6% Communications Magazine, 48:128–135, Oct. 2010.
[2] G. Chiruvolu, A. Ge, D. Elie-Dit-Cosaque, M. Ali, and
The most interesting piece of information can be seen in J. Rouyer. Issues and approaches on extending
these results. During testing, most cycles in the Linux bridg- Ethernet beyond LANs. Communications Magazine,
ing module are not used by a bridge-specific function. The IEEE, 42(3):80 – 86, March 2004.
nf_iterate function is used by the netfilter module to it- [3] N. Egi et al. Towards high performance virtual routers
erate over the rules that have been specified in the system. on commodity hardware. CoNEXT ’08. ACM.
All of the work performed by the nf_iterate function dur- [4] N. Egi et al. Forwarding path architectures for
ing the frame forwarding tests is essentially wasted, as the multicore software routers. PRESTO ’10. ACM, 2010.
system had no netfilter rules defined nor does the bridging
[5] W. Fenner. Internet Group Management Protocol,
module require netfilter for any operational behavior.
Version 2. RFC 2236, Internet Engineering Task
Force, November 1997.
We can also see from the table that most of the mem-
ory related operations are performed by the two forward- [6] A. Greenberg et al. VL2: a scalable and flexible data
ing database functions br_fdb_update and __br_fdb_get. center network. In SIGCOMM. ACM, 2009.
When the number of hosts during testing is increased to [7] C. Guo et al. Dcell: a scalable and fault-tolerant
2048, the two functions also consume most of the cycles network structure for data centers. In SIGCOMM,
during testing. The reason for the increased processor cycle pages 75–86. ACM, 2008.
usage with increased number of hosts is explained by the [8] C. Guo et al. BCube: a high performance,
architecture of the forwarding database. As mentioned in server-centric network architecture for modular data
3.1, the forwarding database consists of an array of 256 el- centers. In SIGCOMM. ACM, 2009.
ements, where each element is a linked list. The hashing [9] R. Mandeville and J. Perser. Benchmarking
function assigns the forwarding database entry for the MAC Methodology for LAN Switches. RFC 2889, Internet
address to one of the linked lists. Thus, the more hosts the Engineering Task Force, August 2000.
system sees, the longer the average length of the chain for a [10] J. Mudigonda, P. Yalagandula, M. Al-Fares, and
single linked list will become. The entries in the linked lists J. Mogul. SPAIN: COTS data-center ethernet for
are in arbitrary order, which requires a linear seek through multipathing over arbitrary topologies. In NSDI.
the full list. This significantly increases the number of clock USENIX, 2010.
cycles required to find the MAC address from the linked list. [11] R. Niranjan Mysore et al. PortLand: a scalable
fault-tolerant layer 2 data center network fabric. In
As can be seen from the table, the number of cache ref- SIGCOMM, pages 39–50. ACM, 2009.
erences stays roughly the same while the number of hosts [12] Opensolaris rbridge (IETF TRILL) support. http://
is increased. In addition, the forwarding database in both hub.opensolaris.org/bin/view/Project+rbridges/.
cases fits into the system cache, as the number of misses dur- [13] perf: Linux profiling with performance counters.
ing the forwarding database functions is insignificant. The https://perf.wiki.kernel.org.
majority of cache misses occur in the various netfilter related [14] R. J. Perlman. Rbridges: Transparent routing. In
functions of the overall frame processing. INFOCOM, pages 1211–1218, 2004.
[15] Media Access Control (MAC) Bridges. Standard
5. CONCLUSION 802.1D, IEEE, 2004.
Ethernet based LANs are the building block of IP based net- [16] Virtual Bridged Local Area Networks. Standard
works, and the network application ecosystem. Local area 802.1Q-2005, IEEE Computer Society, 2005.
networks are built by bridges that connect multiple Ethernet
links into a single larger Ethernet cloud.

You might also like