0% found this document useful (0 votes)

26 views

Fast Scalable FPGA-Based Network-on-Chip Simulation Models: Roblem Escription

This document describes two FPGA-based Network-on-Chip (NoC) simulation engines that were developed for the 2011 MEMOCODE Design Contest. A direct-mapped approach directly implements the simulated NoC on the FPGA, achieving three orders of magnitude speedup over software for smaller networks. A virtualized time-multiplexed approach time-multiplexes router resources to simulate larger NoCs that would not fit using direct mapping, achieving one to two orders of magnitude speedup depending on the network size and complexity. The document provides details on the problem description, design principles, overall simulator design, and architectures of the two NoC simulation engines.

Uploaded by

Sam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Fast Scalable FPGA-Based Network-on-Chip Simulation Models: Roblem Escription

Uploaded by

Sam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

Michael K. Papamichael
Computer Science Department
Carnegie Mellon University
Pittsburgh, PA, USA
Email: [email protected]

AbstractThis paper presents a set of two FPGA-based

Network-on-Chip (NoC) simulation engines that composed the
winning design of the 2011 MEMOCODE Design Contest in
the absolute performance class. Both simulation engines were
developed in Bluespec System Verilog (BSV) and were implemented on a Xilinx ML605 FPGA development board. For
smaller networks and simpler router configurations a directmapped approach was employed, where the network to be
simulated was directly implemented on the FPGA. For larger
networks, where a direct-mapped approach is not feasible due
to FPGA resource limitations, a virtualized time-multiplexed
approach was used. Compared to the provided software reference implementation, our direct-mapped approach achieves
three orders of magnitude speedup, while our virtualized timemultiplexed approach achieves one to two orders of magnitude
speedup, depending on the network and router configuration.
Keywords-Network; Network-on-Chip; Simulation; Timemultiplexing; Virtualization; FPGA;

I. I NTRODUCTION
The objective of the 2011 MEMOCODE Hardware/Software Codesign Contest was to build the fastest
simulator for a class of simple Networks-on-Chip (NoCs)
that precisely replicates the cycle-by-cycle behavior of a
given software reference simulator. Our FPGA-based submission won the Absolute Performance category providing
up to three orders of magnitude speedup over the software
reference design on a Xilinx ML605 FPGA development
board.
The contest reference design supported a large number
of design parameters, which led to a very large design
space consisting of different router configurations, network
topologies and traffic patterns. To effectively cover this vast
design space and at the same time stay within the resource limitations of our FPGA development platform, we
implemented two network simulation designs: i) a highperformance direct-mapped design that laid out the entire
simulated target network on the FPGA and ii) a virtualized
time-multiplexed design used to efficiently simulate larger
network configurations that would not fit using the directmapped approach.
This paper describes our contest submission and is organized as follows. Section II describes the problem in more
detail and Section III outlines the design principles we adhered to when developing our contest submission. Section IV

provides a high-level overview of our NoC simulator, while

Section V discusses the architecture and implementation
of the two developed NoC simulation engines. Finally, we
present implementation results in Section VI and conclude
with a discussion on related and future work in Section VII.
II. P ROBLEM D ESCRIPTION
This years MEMOCODE Design Contest called for implementing a fast parameterized network simulator that models a wide range of simple NoCs. Each NoC instance consists
of a collection of routers that can be arbitrarily connected
through bidirectional links that carry data and credits for
flow control. Data links have a single-cycle latency, while
flow-control links may either have single-cycle or multicycle latency. Routers need to support multiple virtual channels (VCs) and employ a fixed-priority allocator for packet
scheduling that is invoked every cycle. Packets can be singleor multi-flit and are generated by traffic sources that are attached to each router. Multi-flit packets are allowed to lock
resources as they traverse the network, which requires extra
book-keeping at each router and complicates the allocator
implementation.
The simulator takes two inputs: i) a NoC configuration
that specifies the parameters of the simulated target network
and ii) a traffic pattern. The main parameters specified in
the NoC configuration file are:
Network topology (list of links between routers)
Number of routers (up to 256)
Number of input and output ports per router (up to 16)
Number of virtual channels per router (up to 8)
Credit delay cycles (up to 16)
Each traffic pattern input specifies the following information for each router:
Routing information (output port for each destination)
Number of packets to send (up to 1024)
Individual packet information (number of flits and VC)
The goal of the contest was to build the fastest NoC simulator that precisely replicates the behavior of a provided
reference software implementation. During validation contestants were given 24 hours to produce NoC simulator instances for a set of five NoC configuration inputs. A set of
traffic patterns was afterwards used to measure the performance and verify the correctness of each design against the

Commands

MicroBlaze

Host PC
Figure 1.

Results

NoC Simulator
(direct-mapped
or virtualized)

High-level block diagram of platform consisting of Host PC connected to a Xilinx ML605 Development Board.

reference simulator. For the full contest description please

refer to the official contest document [1].
III. D ESIGN P RINCIPLES
Given the strict contest deadline and the short implementation window we adopted a set of design principles to spend
the available time as efficiently as possible.
Correctness First. Instead of directly implementing a
highly optimized design we split our design time into two
distinct phases, a correctness and an optimization phase.
During the first phase we only focused on producing a correct design and ignored performance or FPGA resource utilization issues. Correctness was ensured through the use of
python scripts that compared the output of an instrumented
version of the reference simulator against simulation output for individual modules, as well as the entire system.
Only once the entire system was fully validated, did we
start optimizing each component. During each step of the
optimization phase we diligently reran our validation tests.
Parameterization and Modularity. Instead of incrementally adding support for the various network and router parameters, which would potentially require revisiting each
module multiple times, we directly implemented parameterized versions of all the modules in the simulator utilizing Bluespecs [2] powerful parameterization mechanisms.
In addition to the parameterization, we tried to carefully develop standard interfaces for each module early in the design
phase to promote modularity and allow for easier localized
optimizations within each module.
Harnessing the power of Bluespec. Overall the use of
Bluespec System Verilog greatly accelerated both the design
and verification time. Bluespecs static elaboration mechanisms allowed us to quickly design parameterized modules
and define clean module interfaces. Even when substantial
interface changes were required later in the design cycle,
we were able to rely on Bluespecs powerful type checking
system to quickly identify all of the affected code regions
that required modification and eliminate potential bugs.
IV. D ESIGN OVERVIEW
Figure 1 shows a high-level block diagram of the platform
we used, which consists of a host PC that has JTAG and serial (RS232) connections to a Xilinx ML605 development

board [3]. The FPGA on the ML605 hosts the NoC simulation engine and a MicroBlaze processor. Both the directmapped and virtualized implementations of the NoC simulator expose a common FIFO-based interface for accepting
initialization commands from and streaming out simulation
results to the MicroBlaze. Since the MicroBlaze and NoC
simulator might run at different clock frequencies, the FIFOs
between them are asynchronous to allow crossing between
the two different clock domains.
Running Simulations. To setup the FPGA for a given
NoC configuration, the Bluespec compiler is invoked to generate the Verilog code for the set of parameters specified in
the NoC configuration. The produced Verilog code is then
fed to the Xilinx XST synthesis tool and the resulting netlist
is then connected to the MicroBlaze processor as a peripheral on the PLB bus [4]. Once the FPGA is configured,
scripts are used to convert each traffic pattern to MicroBlaze
code that will initialize the traffic tables for each router in the
network along with other simulation parameters. Since the
traffic tables are allowed to contain hundreds of thousands
of entries, the initialization data can grow very large and is
thus stored in off-chip DRAM.
Once the MicroBlaze has initialized all of the traffic and
routing tables through the Commands FIFO, a final command is sent that triggers the traffic sources and starts the
simulation. The MicroBlaze then starts polling the Results
FIFO until the NoC simulator detects that the simulation has
terminated either because the traffic is done or because the
maximum number of cycles has elapsed at which point
the number of simulated cycles along with other statistics
are enqueued in the Results FIFO and then printed by the
MicroBlaze through the serial port.
V. A RCHITECTURE AND I MPLEMENTATION
In order to efficiently cover the vast design space of
different possible NoCs and stay within the resource limits of the ML605 FPGA platform, our contest submission consists of two separate NoC simulation engines: i) A
high-performance direct-mapped simulation engine that supports up to moderately sized networks (100 routers) with
medium-complexity routers (e.g. 5 ports w/ 4VCs) and ii) a
highly scalable virtualized simulation engine that can handle
the entire design space including the largest network/router

A. Direct-Mapped Implementation.
In a direct-mapped implementation the network is build as
a collection of router instances that are connected according
to each NoC configuration. Figure 2 shows the architectural
block diagram of a single such router. Each router module
receives flits through a set of input ports and sends flits
through a set of output ports. The first input port of each
router is connected to a traffic source that injects packets
according to a traffic pattern table that is populated during
initialization. Similarly, the first output port is connected to a
traffic sink that drains packets once they have reached their
destination. The remaining input and output ports of each
router are either used to create links with other routers in
the network or may remain unconnected. For each flit link
connecting two neighboring routers there is a corresponding
credit link going in the opposite direction for flow-control.
1 Throughout the rest of this paper the term host will be used to refer to
the system on which the network simulator is executed and the term target
will be used to refer to the network that is being simulated.

Router

In Ports
Traffic
Table

Flit Buffers
In0

In1 (flits)
In1 (credits)

In4 (flits)
In4 (credits)

Out Ports
Switch

VC 0
VC 1

Arbitration

Source

Routing

Arbitration

16 BRAMs

configurations by time-multiplexing all target routers in the

32 BRAMs
simulated
NoC using a single virtualized
router.
Flit Buffers
In
theBRAMs
direct-mapped approach, theCredits
actual simulated tar 15
get network is implemented on theRoute
FPGA.
In other words,
Tables
39 BRAMs
a fully functional prototype of Other
the target
network,
Scheduler
State including
40% Logic
all individual
routers, links, traffic sources, etc, is laid out in
@ 2-40
itsRuns
entirety
onMHz
the FPGA. The
benefit of such
an approach
Virtualized
Router
is that all
routers in the network are simulated in parallel,
Traffic
Sources
thus yielding very high speedup compared to the reference
Logic
software
simulator. Moreover inRouter
our specific
implementation
Traffic Table
there is a one-to-one mapping between host and target1 cycles, i.e. each cycle on the FPGA corresponds to a cycle
Virtual Links
256simulated
BRAMs
in the
network. In hindsight, we discovered that
an additional benefit of havingCredit
a simulator
Links implementation
1088 Kbits LUT RAM
that closely mimics the actual network is that the simulator
Flit
LUTconverted
RAM
can 96
be Kbits
easily
to a general-purpose
FlitLinks
Links network with
minimal
changes
to
the
source
code.
2 BRAMs
Flit/Credit Conn. Table
Even though a direct-mapped approach can provide very
high performance, it is not able to fit all possible NoC
configurations on the ML605. To fill this gap and support the remaining NoC instances that would not fit on the
direct-mapped engine, we also developed a virtualized timemultiplexed NoC simulation engine, that uses the FPGA resources in a very efficient manner. In this approach, instead
of simulating all routers at the same time, each router is
simulated in successive FPGA cycles. Even on a moderatelysized FPGA board, like the the ML605, such a design can
easily scale to support even the largest allowed network
configurations, consisting of hundreds of high-radix routers
with many VCs. However, compared to a direct-mapped approach, a time-multiplexed implementation has to deal with
additional complications, which are discussed later in this
section.

Out0

Sink

Out1 (flits)

Out1 (credits)

Out4 (flits)

Out4 (credits)

Arbitration & Flow Control State

Figure 2.

Architectural block diagram for direct-mapped router.

During each clock cycle a router receives and stores new

flits from its input ports and forwards previously received
flits through its output ports. Each incoming flit is first processed by routing logic to determine the proper output port
it needs to be forwarded to and is then stored in single-entry
flit buffers according to the virtual channel that it belongs
to. To determine which flits will be scheduled to depart from
the router, arbitration logic considers flit buffer occupancy
and credit availability to decide which flits will traverse the
switch and be forwarded through the output ports. In addition to scheduling flits, the arbitration logic is also responsible for respecting VC and port priorities, as well as
preventing flits from different multi-flit packets from being
interleaved on the same virtual channel.
Termination Conditions. Additional logic monitors the
system to detect when the simulation has finished, which
can either happen because the maximum number of allowed
cycles has been reached or because all of the packets and
credits in the network have been delivered. The first case
simply requires logic that compares the current simulation
cycle against the maximum number of allowed cycles that
was set during initialization. To deal with the second case the
simulator each cycle constantly monitors all flit and credit
links including all intermediate links in the presence of
additional credit delay for activity. If all links are found
to be idle, the simulation is terminated and the results are
printed.
B. Virtualized Implementation.
To minimize FPGA resource usage, our virtualized implementation of the NoC simulator performs time-multiplexing
using a single virtualized router that simulates a different
target router during each FPGA cycle. Figure 3 shows a
block diagram of the implemented virtualized simulator, at
the heart of which is the Virtualized Router module. In terms
of functionality this module is very similar to the directmapped router, discussed earlier, with the exception that it

Router State

Virtual
Sources
Traffic Table

Flit Buffers
Credits
Route Tables
Scheduler State

Virtualized Router
Router Logic

Virtual Links
Credit Links

Delay

Flit
FlitLinks
Links
Flit/Credit Conn. Table

Figure 3.
engine.

Architectural block diagram for time-multiplexed simulation

only consists of combinational logic; it does not store any

state, but is only used to transform state. Each cycle, it receives the current state of a router in the system along with
the incoming flits and credits destined to this router. Based
on this information it generates the new router state and
sends any outgoing flits and credits. Once all routers in the
simulated system have been processed by the Virtualized
Router, the simulation proceeds to the next target cycle.
The remaining modules of the virtualized simulator are
used to maintain the state of the various routers in the network and to facilitate flit and credit transfers. In particular,
the Router State module stores the state for all of the routers
in the simulated system, such as flit buffers, credits, route
tables and other scheduling information. Since in the virtualized implementation only a single router is active at any
given cycle, the Router State only requires using a singleport memory; in contrast, the direct-mapped implementation
requires using multiple memories, because all routers need to
access their state every cycle. In addition to requiring fewer
memory ports, due to the discretization of on-chip FPGA
memory, the virtualized implementation also makes more
efficient use of on-chip memory, because it consolidates and
stores all of the routers state in a single monolithic memory.
The Virtual Sources and Virtual Links modules are responsible for injecting packets into the network and moving
flits or credits between routers. Since only a single traffic
source is active at any given cycle, the Virtual Sources module employs the same time-multiplexing techniques as the
Virtualized Router module, thus making more efficient use
of FPGA resources. The Virtual Links module manages a set
of buffers that store flits and credits until it is time to deliver

them to one of the simulated routers. Connection tables, that

are specific to each NoC configuration and are initialized at
the beginning of each simulation, hold information about the
NoC topology and determine the proper set of buffers to be
used by the currently simulated router.
Time-multiplexing and Network Simulation. As has
been shown in previous work [5], [6], time-multiplexing in
the context of network simulation requires special care to retain proper ordering of events and careful state management
to ensure that all routers in the network have a consistent
view of the system. For instance, within a single target cycle,
a router might send traffic to routers that were simulated in
previous host cycles, but also send traffic to routers that will
be simulated in subsequent host cycles. To avoid ordering
violations and properly simulate the concurrent exchange of
flits between all routers, the simulator needs to isolate events
that belong to different target clock cycles.
To deal with such issues our virtualized implementation
employs double-buffering for all flit and credit links in the
network. During each target clock cycle, incoming flits and
credits are read from one set buffers and outgoing flits and
credits are stored in a different set of buffers. In the next
target clock cycle these two buffers are swapped to allow the
network to make progress, but also ensure that events from
different target clock cycles are isolated from each other. If
the simulated NoC specifies extra credit delay, an additional
set of delay buffers is introduced before the existing double
buffers that handle credits.
Termination Conditions. As was the case with the directmapped implementation, additional logic is required to detect when the simulation is finished. However, the timemultiplexed design requires slighly different logic, because
only a subset of the flit and credit links are exposed each
cycle. Thus instead of monitoring all links each cycle, the
virtualized design keeps track of the number of idle target
cycles. If the number of idle target cycles exceeds the maximum link delay in the network, then the simulation has
finished. For networks that specify additional credit delay,
the number of idle cycles, after which the simulation is considered to be finished, needs to be adjusted to ensure that no
traffic is still active at any of the intermediate credit links.
Critical Path. The router implementation in both the
direct-mapped and the virtualized approach are single-cycle
designs, both to mimic the behavior of the reference simulator and to also minimize FPGA resource usage. This leads
to a long critical path, which is dominated by the arbitration logic that takes care of assigning output ports to different inputs and VCs, while at the same time respecting the
scheduling rules that apply to multi-flit packets.
The fixed-priority arbiter employed in the reference design considers all VCs, inputs and outputs in succession,
which inevitably creates a very long combinational chain
when implemented in hardware, that grows proportionally
to the number of inputs, outputs and VCs. Since all the

networks specified in the contest need to support single-flit

packets, the arbitration logic cannot be pipelined, because, in
the worst case, a new arbitration decision needs to be made
every cycle. The effect of this long combinational path is
reflected in the synthesis results presented in the next section.

LUTs / Clock Frequency (in MHz)

Router Config.
4 in/out ports

2 VCs

4 VCs

8 VCs

3050 / 66

4117 / 56

6346 / 34

8 in/out ports

7912 / 35

11833 / 28

28859 / 17

12 in/out ports

13653 / 30

28461 / 16

48081 / 10

16 in/out ports

30399 / 17

52288 / 12

101500 / 7

Table II

VI. R ESULTS

S YNTHESIS RESULTS FOR ENTIRE NETWORK IN VIRTUALIZED DESIGN .

To first get a sense of how the two presented NoC simulator implementations scale in terms of FPGA resource usage and clock frequency, we present FPGA synthesis results
for both the direct-mapped and virtualized simulator implementations. We then show more detailed results for the five
specific networks that were used in the contest validation.
TODO: Finally we present with a brief case study that looks
at one of these five networks in more depth.
Direct-mapped Implementation Results. As mentioned
earlier, the direct-mapped implementation of our NoC simulator is a collection of interconnected router modules. Table I
shows FPGA resource usage and clock frequency synthesis
results for different router configurations targetting a Xilinx
Virtex-6 LX760T FPGA. All reported results are for a single router within a 256-node network, the largest network
allowed in the contest. As expected, increasing the number
of router ports and VCs leads to higher LUT counts and
negatively impacts clock frequency.
LUTs / Clock Frequency (in MHz)
Router Config.

2 VCs

4 VCs

8 VCs

4 in/out ports

785 / 152

1393 / 101

2848 / 59

8 in/out ports

3243 / 81

6134 / 54

12754 / 33

12 in/out ports

7717 / 62

11596 / 36

19198 / 20

16 in/out ports

11655 / 45

28294 / 30

33689 / 14

Table I
S YNTHESIS RESULTS FOR SINGLE ROUTER IN DIRECT- MAPPED DESIGN .

Virtualized Implementation Results. Table II shows

synthesis results for different router configurations using our
virtualized implementation of the simulator to model a 256node network. Since the virtualized implementation relies
only on a single instance of a time-multiplexed router to
model the entire network, the presented results are for the
entire network and not for an individual router, as was the
case for the direct-mapped implementation results. To get
a better feeling of the scalability differences between the
two designs, note that the cost of four routers in the directmapped implementation is comparable to the cost of the
entire 256-node network in the virtualized design. In fact,
the largest allowed contest router and network configuration
only occupies 13% of a Xilinx LX760T FPGA.
Results For Contest Networks. To validate and compare
the performance of different contest submissions, a set of
five network and router configurations were provided by the

contest organizer, which are listed in Table III. Note that

all network configurations lie on the edge of the design
space as they all max-out at least one of the configuration
parameters. Moreover three of the five given networks are
very large, consisting of more than 250 routers.
Network Name

Routers

Ports/Router

VCs

butterfly

112

Credit Delay
1

highradix

mesh

253

torus

252

hypercube

256

Table III
C ONFIGURATION OF CONTEST NETWORKS .

Table IV shows actual implementation results for the five

contest network configurations running on the ML605 board
that is built around the Xilinx Virtex-6 LX240T FPGA. To
give a sense of how our simulator would perform on a
larger FPGA, we include synthesis results for a larger Xilinx
Virtex-6 LX760T FPGA. For each network and FPGA we
also indicate the chosen simulation engine direct-mapped
(DM) or virtualized (V) and report the average speedup
compared to the reference software simulator running on an
Intel Xeon X3460 processor at 2.8GHz.
Due to the large size of the contest networks and the limited resources on the LX240T FPGA, we were only able to
map one of the five networks (butterfly) to our fast directmapped engine, in which case we achieved three orders
of magnitude speedup over the software reference design.
For the remaining networks, which were implemented using
the virtualized approach, speedup values range from 5x to
30x, leading to an overall average speedup of 470x. However, when using the larger LX760T FPGA, three out of the
five contest networks can be implemented using the high
performance direct-mapped approach, leading to significant
speedup improvements; overall average speedup in that case
is 1570x.
Deterministic Performance. A nice property of the two
developed NoC simulation engines is that performance is
fully deterministic for any given network and router configuration. In a system with N routers running at F req
frequency the performance of the simulator, measured in

simulated target cycles per second, is equal to F req for a

direct-mapped implementation and F req/N in the case of a
virtualized implementation.
Xilinx LX240T

Xilinx LX760T

Network

DM/V

%LUTs

Speedup

DM/V

%LUTs

Speedup

butterfly

86%

1511

27%

2330

highradix

63%

93%

421

mesh

96%

4281

torus

7862

7892

hypercube

FPGA, an interesting extension to this work would be building a flexible configurable NoC generator. Such a tool could
prove useful to FPGA designers that need an FPGA-friendly
NoC that is custom-built to meet the specific needs of their
application. In fact, a heavily modified version of the directmapped NoC code base is currently used as the interconnect
within the CoRAM project [8].
VIII. ACKNOWLEDGMENTS

Table IV
I MPLEMENTATION RESULTS FOR CONTEST NETWORKS .

We thank Prof. James C. Hoe, Eric Chung, Gabe Weisz

and the rest of the members of the Computer Architecture
Lab at Carnegie Mellon (CALCM) for the helpful discussions and comments. We thank Xilinx for their FPGA and
tool donations. We thank Bluespec for their tool donations
and support.

VII. D ISCUSSION

R EFERENCES

Multiple Virtualized Routers. Even though our virtualized simulation engine can scale to very large network and
router configurations, this scalability comes at the cost of
lower performance compared to the direct-mapped approach.
To bridge this gap, one idea for future work is to use multiple
virtualized routers that run concurrently. To maintain proper
event ordering in such a setting, the system needs to ensure
that only independent (i.e. not neighboring) sets of routers
are simulated at the same time. This issue has been studied
in previous work [5] and a straightforward way to resolve
it would be through a separate preprocessing step that identifies independent sets of routers in the network and then
generates a fixed valid simulation schedule.
An Alternative Approach to FPGA-based NoC simualtion. Another interesting approach to FPGA-friendly NoC
simulation is FIST [7], a simulation technique previously
explored by our group that abstractly models each router
as a set of load-delay curves, which are obtained through
training using a software-based cycle-accurate NoC simulator. In addition to high simulation speed and scalability, an
important benefit of such an approach is reduced implementation complexity. In contrast to the two NoC simulation
approaches presented in this paper, FIST does not require
implementing the actual router in hardware; instead it relies
on the presence of a software-based model that will be used
for training purposes.
Automatic Network Generation. Given that the directmapped design is already fully parameterized and essentially
builds a working prototype of the target network on the

[1] D. Chiou, MEMOCODE 2011 Hardware/Software CoDesign

Contest, https://ramp.ece.utexas.edu/redmine/attachments/25/
MEMOCODE2011 DesignContest.pdf

2 During validation, one of the supplied traffic patterns deadlocked the

torus network. In such cases, even though the network is stuck, the software reference simulator needlessly continues simulation until it reaches the
maximum number of simulation cycles. Our implementation, however, can
detect a deadlock in the network, in which case it immediately terminates
the simulation and prints the results. Since deadlock occured early in the
particular simulation our implementation was able to achieve a very high
speedup for that traffic pattern. This also explains why the torus network
achieves a speedup that is comparable to the direct-mapped engine, even
though it is using the virtualized engine.

[2] Bluespec Inc, http://www.bluespec.com

[3] Xilinx, ML605 Hardware User Guide, http://www.xilinx.
com/support/documentation/boards and kits/ug534.pdf
[4] Xilinx, LogiCORE IP Processor Local Bus (PLB) v4.6,
http://www.xilinx.com/support/documentation/ip documentation/
plb v46.pdf
[5] M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer,
HAsim: FPGA-Based High-Detail Multicore Simulation Using Time-Division Multiplexing, HPCA, 2011
[6] P. Wolkotte, P. Holzenspies, and G. Smit, Fast, Accurate and
Detailed NoC Simulations, NOCS, 2007
[7] M. K. Papamichael, J. C. Hoe, and O. Mutlu, FIST: A
Fast, Lightweight, FPGA-Friendly Packet Latency Estimator
for NoC Modeling in Full-System Simulations, NOCS, 2011
[8] E. Chung, J. C. Hoe, and K. Mai, CoRAM: An In-Fabric
Memory Abstraction for FPGA-based Computing, FPGA,
2011

Full Chip Verification Flow
No ratings yet
Full Chip Verification Flow
7 pages
Google Cloud Platform - Networking
From Everand
Google Cloud Platform - Networking
alasdair gilchrist
No ratings yet
Software-Defined Networks: A Systems Approach
From Everand
Software-Defined Networks: A Systems Approach
Larry Peterson
5/5 (1)
59 5
No ratings yet
59 5
11 pages
EN - Security Center 5.9 Network Diagram - Video
No ratings yet
EN - Security Center 5.9 Network Diagram - Video
1 page
Soal Yang Sering Muncul Di MTCNA MikroTik
89% (9)
Soal Yang Sering Muncul Di MTCNA MikroTik
4 pages
Hallenges and Romising Esults in O Rototyping Sing S: C P R N CP U Fpga
No ratings yet
Hallenges and Romising Esults in O Rototyping Sing S: C P R N CP U Fpga
10 pages
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
From Everand
CISCO PACKET TRACER LABS: Best practice of configuring or troubleshooting Network
Mulayam Singh
No ratings yet
A Practical Guide Wireshark Forensics
From Everand
A Practical Guide Wireshark Forensics
alasdair gilchrist
5/5 (4)
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
DART Fast and Flexible NoC Simulation Using FPGAs
No ratings yet
DART Fast and Flexible NoC Simulation Using FPGAs
4 pages
CCNA Exam Excellence: Study Guide & Practice Tests
From Everand
CCNA Exam Excellence: Study Guide & Practice Tests
SUJAN
No ratings yet
Swain 2016
No ratings yet
Swain 2016
5 pages
Mastering the Art of Network Programming: Unraveling the Secrets of Expert-Level Programming
From Everand
Mastering the Art of Network Programming: Unraveling the Secrets of Expert-Level Programming
Steve Jones
No ratings yet
An FPGA Based Multi-Functional Signal Generator Using SOPC Design Methodology
No ratings yet
An FPGA Based Multi-Functional Signal Generator Using SOPC Design Methodology
5 pages
Designing A WISHBONE Protocol Network Adapter For An Asynchronous Network-on-Chip
No ratings yet
Designing A WISHBONE Protocol Network Adapter For An Asynchronous Network-on-Chip
7 pages
Architecting Gigabit Switches Using Wireless Archetypes: Archie Lodge
No ratings yet
Architecting Gigabit Switches Using Wireless Archetypes: Archie Lodge
7 pages
CCNA Exam Focus: Study Guide with Practice Tests
From Everand
CCNA Exam Focus: Study Guide with Practice Tests
SUJAN
No ratings yet
07 Firesim Intro
No ratings yet
07 Firesim Intro
42 pages
Design and Application of a Co-Simulation Framework for Chisel-EECS-2021-133
No ratings yet
Design and Application of a Co-Simulation Framework for Chisel-EECS-2021-133
49 pages
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
From Everand
Next-Generation switching OS configuration and management: Troubleshooting NX-OS in Enterprise Environments
Mamta Devi
No ratings yet
Frame Work For Designexp
No ratings yet
Frame Work For Designexp
15 pages
Network On Chip On FPGAs
No ratings yet
Network On Chip On FPGAs
232 pages
Eetop - CN - Designing Reliable and Efficient Networks On Chips by Dr. Srinivasan Murali (Aut
100% (1)
Eetop - CN - Designing Reliable and Efficient Networks On Chips by Dr. Srinivasan Murali (Aut
200 pages
ISA Certified Automation Professional (CAP) Associate: Certification Exam Prep: 500 Practice Exam Questions and Explanations
From Everand
ISA Certified Automation Professional (CAP) Associate: Certification Exam Prep: 500 Practice Exam Questions and Explanations
Steve Brown
No ratings yet
University of Cincinnati: 07/11/2008 Arun Janarthanan Doctor of Philosophy Computer Engineering
No ratings yet
University of Cincinnati: 07/11/2008 Arun Janarthanan Doctor of Philosophy Computer Engineering
143 pages
Next Generation FPGAs
No ratings yet
Next Generation FPGAs
29 pages
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
From Everand
Mastering Python Network Automation: Automating Container Orchestration, Configuration, and Networking with Terraform, Calico, HAProxy, and Istio
Tim Peters
No ratings yet
Pic® Micro Principles on Your Mobile
From Everand
Pic® Micro Principles on Your Mobile
Clive W. Humphris
No ratings yet
Mastering Segment Routing: A Comprehensive Guide to Network Traffic Optimization
From Everand
Mastering Segment Routing: A Comprehensive Guide to Network Traffic Optimization
Robert Johnson
No ratings yet
Scimakelatex 18042 XXX
No ratings yet
Scimakelatex 18042 XXX
5 pages
Fpga-Based Laboratory Assignments For Noc-Based Manycore Systems
No ratings yet
Fpga-Based Laboratory Assignments For Noc-Based Manycore Systems
10 pages
DVCon Europe 2015 TA2 2 Paper
No ratings yet
DVCon Europe 2015 TA2 2 Paper
8 pages
Ravi Sharma Res
No ratings yet
Ravi Sharma Res
2 pages
paper05
No ratings yet
paper05
3 pages
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
From Everand
A SECURE DATA AGGREGATION TECHNIQUE IN WIRELESS SENSOR NETWORK
Dr Chaitra HV
No ratings yet
Decoupling Fiber Optic Cables
No ratings yet
Decoupling Fiber Optic Cables
4 pages
Pic® Micro Principles Teachers Pack V11
From Everand
Pic® Micro Principles Teachers Pack V11
Clive W. Humphris
No ratings yet
Analyzing Multi-Processors and DHTS: Bon Jovial and Giban Dire
No ratings yet
Analyzing Multi-Processors and DHTS: Bon Jovial and Giban Dire
7 pages
Learn the Pic® Micro on Your Smartphone
From Everand
Learn the Pic® Micro on Your Smartphone
Clive W. Humphris
No ratings yet
Laskin MASc Thesis PDF
No ratings yet
Laskin MASc Thesis PDF
102 pages
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
CESE4040 - Processor Design Project Guide
No ratings yet
CESE4040 - Processor Design Project Guide
32 pages
ATMSIM: A Simulator For ATM Networks
No ratings yet
ATMSIM: A Simulator For ATM Networks
53 pages
Fpga vs. Multi-Core Cpus vs. Gpus: Hands-On Experience With A Sorting Application
No ratings yet
Fpga vs. Multi-Core Cpus vs. Gpus: Hands-On Experience With A Sorting Application
12 pages
Pic® Micro Principles V11
From Everand
Pic® Micro Principles V11
Clive W. Humphris
No ratings yet
Improvement of Compilers
No ratings yet
Improvement of Compilers
4 pages
Embedded Systems Programming with C: Writing Code for Microcontrollers
From Everand
Embedded Systems Programming with C: Writing Code for Microcontrollers
Larry Jones
No ratings yet
Accelerated Computing With HIP: Second Edition
From Everand
Accelerated Computing With HIP: Second Edition
Yifan Sun
No ratings yet
Scimakelatex 18225 XXX
No ratings yet
Scimakelatex 18225 XXX
4 pages
Atlas - An Environment For Noc Generation and Evaluation: Aline Mello Ney Calazans and Fernando Moraes
No ratings yet
Atlas - An Environment For Noc Generation and Evaluation: Aline Mello Ney Calazans and Fernando Moraes
2 pages
Implementation of Wireless Communications Systems On FPGA-Based Platforms
No ratings yet
Implementation of Wireless Communications Systems On FPGA-Based Platforms
11 pages
Multistage Interconnection Network For Mpsoc: Performances Study and Prototyping On Fpga
No ratings yet
Multistage Interconnection Network For Mpsoc: Performances Study and Prototyping On Fpga
6 pages
My Lecture7 Emulation
No ratings yet
My Lecture7 Emulation
31 pages
IET Computers Digital Tech - 2017 - Khan - Comparative Analysis of Network‐on‐Chip Simulation Tools
No ratings yet
IET Computers Digital Tech - 2017 - Khan - Comparative Analysis of Network‐on‐Chip Simulation Tools
9 pages
Embedded System Presentation
No ratings yet
Embedded System Presentation
53 pages
Metropolitan Road Traffic Simulation On Fpgas
No ratings yet
Metropolitan Road Traffic Simulation On Fpgas
10 pages
Design and Development of An FPGA-based Distributed Computing Pro
No ratings yet
Design and Development of An FPGA-based Distributed Computing Pro
140 pages
Design and Implementation of Virtual Channel Router Architecture On Fpga For Nocs
No ratings yet
Design and Implementation of Virtual Channel Router Architecture On Fpga For Nocs
8 pages
Noc Router Area Opt Flow Co1
No ratings yet
Noc Router Area Opt Flow Co1
8 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
1502-Advanced VHDL Verification Datasheet
No ratings yet
1502-Advanced VHDL Verification Datasheet
1 page
Metric Driven Verification of Mixed-Signal Designs: Neyaz Khan Yaron Kashai Hao Fang
No ratings yet
Metric Driven Verification of Mixed-Signal Designs: Neyaz Khan Yaron Kashai Hao Fang
6 pages
Course Module ASIC Verification
No ratings yet
Course Module ASIC Verification
6 pages
Cdnlive Jungeblut Paper
No ratings yet
Cdnlive Jungeblut Paper
5 pages
The Current Topic: Python Announcements: Lecture Room
No ratings yet
The Current Topic: Python Announcements: Lecture Room
7 pages
Bringing Formal Property Verification Methodology To An ASIC Design
No ratings yet
Bringing Formal Property Verification Methodology To An ASIC Design
8 pages
2016 DVConProgram WEB
No ratings yet
2016 DVConProgram WEB
27 pages
Is It Time To Declare A Verification War?: Brian Bailey
No ratings yet
Is It Time To Declare A Verification War?: Brian Bailey
41 pages
Multicore Enabled Verification of AMBA AHB Protocol Using UVM
No ratings yet
Multicore Enabled Verification of AMBA AHB Protocol Using UVM
7 pages
Component Test and Verification
No ratings yet
Component Test and Verification
53 pages
14 Verilog Testbenches
No ratings yet
14 Verilog Testbenches
22 pages
An Almost Complete List of All Okay Google Commands
No ratings yet
An Almost Complete List of All Okay Google Commands
20 pages
Research 1
0% (1)
Research 1
12 pages
Windows XP Installation Guide PDF
No ratings yet
Windows XP Installation Guide PDF
6 pages
Network Devices (Hub, Repeater, Bridge, Switch, Router, Gateways and Brouter)
No ratings yet
Network Devices (Hub, Repeater, Bridge, Switch, Router, Gateways and Brouter)
4 pages
Expdb Dork
No ratings yet
Expdb Dork
3 pages
01 - Chapter-1 Videoconferencing Overview
100% (1)
01 - Chapter-1 Videoconferencing Overview
118 pages
Srs of Lms
No ratings yet
Srs of Lms
8 pages
Kaleidoscope Academic Conference
No ratings yet
Kaleidoscope Academic Conference
140 pages
Oracle RAC Upgrade From 11.2.03 To 11.2.0.4 Runbook
No ratings yet
Oracle RAC Upgrade From 11.2.03 To 11.2.0.4 Runbook
12 pages
CDVI Catalogue Q4 2014 Web Eng
No ratings yet
CDVI Catalogue Q4 2014 Web Eng
24 pages
Lte Sim Lock
No ratings yet
Lte Sim Lock
23 pages
E35LM1 R2.0 - multiQIG
No ratings yet
E35LM1 R2.0 - multiQIG
95 pages
Multimedia Technology
100% (1)
Multimedia Technology
24 pages
Mining Pool Attack
No ratings yet
Mining Pool Attack
16 pages
The Alternative - All One Needs To Know About AZbox HD - Detailed Guide
No ratings yet
The Alternative - All One Needs To Know About AZbox HD - Detailed Guide
34 pages
Module-3-Cloud Platform Architecture
No ratings yet
Module-3-Cloud Platform Architecture
33 pages
User Manual of DS 2DE4220IW de
No ratings yet
User Manual of DS 2DE4220IW de
108 pages
Final Year Project Proposal Defense For "Web Based Examination Office"
50% (2)
Final Year Project Proposal Defense For "Web Based Examination Office"
15 pages
Sean Dickerson Resume
No ratings yet
Sean Dickerson Resume
3 pages
An Embedded System Is A Computer System Designed To Perform One or A Few Dedicated Functions
No ratings yet
An Embedded System Is A Computer System Designed To Perform One or A Few Dedicated Functions
13 pages
Mk500 Product Reference Guide en Us
No ratings yet
Mk500 Product Reference Guide en Us
134 pages
Hardware Observation
No ratings yet
Hardware Observation
96 pages
AS400 Commands
No ratings yet
AS400 Commands
7 pages
Tempsens Temp Transmitter Hart PDF
No ratings yet
Tempsens Temp Transmitter Hart PDF
2 pages
Introduction To MPLS
50% (2)
Introduction To MPLS
35 pages
PAC World Magazine - Digital Substation - The Next Generation Smart Substation For The Power Grid
100% (1)
PAC World Magazine - Digital Substation - The Next Generation Smart Substation For The Power Grid
4 pages
Ubiquitous Sensor Networks
100% (2)
Ubiquitous Sensor Networks
15 pages
Powersc PDF
No ratings yet
Powersc PDF
50 pages

Fast Scalable FPGA-Based Network-on-Chip Simulation Models: Roblem Escription

Uploaded by

Fast Scalable FPGA-Based Network-on-Chip Simulation Models: Roblem Escription

Uploaded by

Fast Scalable FPGA-Based Network-on-Chip Simulation Models

AbstractThis paper presents a set of two FPGA-based

provides a high-level overview of our NoC simulator, while

reference simulator. For the full contest description please

configurations by time-multiplexing all target routers in the

Arbitration & Flow Control State

Architectural block diagram for direct-mapped router.

During each clock cycle a router receives and stores new

Architectural block diagram for time-multiplexed simulation

only consists of combinational logic; it does not store any

them to one of the simulated routers. Connection tables, that

networks specified in the contest need to support single-flit

LUTs / Clock Frequency (in MHz)

S YNTHESIS RESULTS FOR ENTIRE NETWORK IN VIRTUALIZED DESIGN .

Virtualized Implementation Results. Table II shows

contest organizer, which are listed in Table III. Note that

Table IV shows actual implementation results for the five

simulated target cycles per second, is equal to F req for a

We thank Prof. James C. Hoe, Eric Chung, Gabe Weisz

[1] D. Chiou, MEMOCODE 2011 Hardware/Software CoDesign

2 During validation, one of the supplied traffic patterns deadlocked the

[2] Bluespec Inc, http://www.bluespec.com

You might also like