0% found this document useful (0 votes)

15 views

Project

This document outlines an assignment to develop a real-time data compression system using deduplication and compression techniques. Students are asked to develop a system that can compress data arriving at 400Mb/s onto an Ultra96 FPGA board using content-defined chunking, SHA hashing for deduplication, and LZW compression. Milestones are provided over 5 weeks culminating in a final report and demonstration of the system. Guidelines are given for the final report structure and code submission requirements. Background information and starting points are provided for the various compression algorithms.

Uploaded by

iamanvi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Project

Uploaded by

iamanvi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

ESE5320 Fall 2023

University of Pennsylvania
Department of Electrical and System Engineering
System-on-a-Chip Architecture

ESE5320, Fall 2023Deduplication and Compression Project Wednesday, October 25

Due: Monday, December 11, 10:15am (demo), 5:00pm (writeup)

1 Goal
Develop a compressor that can receive data in real time at modern ethernet speeds and
compress it into memory using deduplication and compression. Specifically, we’ll look at
Content-Defined Chunking to break the input into chunks, SHA-256 (or SHA3-384) hashes
to screen for duplicate chunks, and LZW compression to compress non-duplicate chunks.
For full points, your goal for implementation is to achieve real-time guaranteed support of
400 Mb/s1 but you may need to consider intermediate goals (e.g. 100Mb/s, 200Mb/s) along
the way. Slower designs will receive partial credit for the performance portion of the project
grade.

found
ethernet Find ethernet
input Chunk SHA Chunk? Send output
not
found LZW

This is a 5-week project assignment; the intent is to allow you to plan and execute a signifi-
cant, open-ended design exploration and mapping. You will not achieve the implementation
goal or the course learning goals by trying to do this in one week. We give you milestones to
help provide some structure, but the milestones are minimal and doing the minimum to hit
the milestone each week will be insufficient to get you where you need to be at the end. We
are giving you flexibility in planning and ordering rather than lock-step specifying exactly
what you need to do each week.

• Project work is done in teams of 3. You select partners during first week.
• Collaboration between teams is limited as specified on the course web page.
• Milestones writeups and Final report are a team writeup.
• The spirit of this exercise is to optimize the SoC mapping of the algorithm. As such,
explorations of alternate solutions that change the algorithms and generally optimize
the solution for hardware and software are out of scope. Explorations that tweak or
tune the algorithms slightly to better exploit the SoC hardware are potentially in scope.

1
800 Mb/s stretch goal for bonus points

1
ESE5320 Fall 2023

2 Final Report
Final report is a team writeup. There will be one turnin per team.

• Describe your single ARM processor mapped design. [1 page]

– Key parameters in the solution
– Performance achieved (also include associated -s parameter to client to demon-
strate)
– Compression achieved
– Characterization and breakdown of time spent in the major components
• Describe your final Ultra96 mapped design. [5 pages]
– Performance achieved and energy required
– Compression achieved
– Key design aspects: task decomposition, parallelism, mapping to Zynq resources,
include diagrams to support
– Be clear where each component of the final design is performed (e.g., ARM, NEON
vector, FPGA logic).
– Model to explain performance.
– Current bottleneck preventing higher performance
• Describe how you validated your implementations, including the real-time guarantee
for the input rate. The real-time validation likely includes both arguments about the
way the code is written and mapped to the Zynq and your testing methodology. [2
pages]
• Describe the key lessons you learned from this design experience. [1 page]
• Describe design space explored and show graphs and models to support design selection.
[any number of pages as needed]
• Describe who did what. [1 page]
• Include academic integrity statement for all team members:

I, your-name-here, certify that I have complied with the

University of Pennsylvania’s Code of Academic Integrity
in completing this final exercise.

You can review the Code of Academic Integrity here: https://catalog.upenn.edu/

pennbook/code-of-academic-integrity/

2
ESE5320 Fall 2023

3 Final Project Code and Bitstream Submission

We intend to run your compression routines. To make sure the process is consistent across
teams, please comply with the following standards.

1. Provide an xclbin, OpenCL host code executable, and decoder executable for your
encoder.

• Turn in a tar file to the designated final implementation assignment on canvas.

• One turnin for team.
• Should be a single tar file, containing seven files:
– encoder.xclbin, BOOT.bin, boot.scr, image.ub for FPGA kernel
– encoder for OpenCL host code executable
– decoder executable configured to work with your encoded file. (Most likely,
this is just a compilation of the Decoder.cpp we supplied; however, if you
chose a different maximum block size, you may need to change CODE LENGTH;
so give us back one with that change made.)
– client.sh shell script to invoke client with suitable -s parameter to demon-
strate your guaranteed data transfer performance.

2. Your compression program (OpenCL host code) should take one argument:

• the file name where the program should store the compressed data.

Your program should assume that encoder.xclbin is in the same directory as the host
executable.

3. Your compression program should start up ready to receive inputs.

We may provide further clarification or revision on this final implementation turnin as we

get closer.

3.1 Demo Day

We will use the final lecture slot on Monday, Dec. 11th for you to demonstrate your encoder
to course staff. Each team will singup for a slot during the lecture period. Signup will be
made available on Ed Discuss between now and Dec. 11th.

3
ESE5320 Fall 2023

4 Milestones
We will provide precise requirements for milestones each week. These may include a few
exercises to help prepare you for questions that may be on the final in addition to the project
specific components. Milestones and feedback feed into the final report. In most cases, the
milestones can serve as a first draft of a component of your report, and the feedback we give
you will help provide guidance on how to refine it for the report.

1. Analysis, parallelism, placeholder encoder, and teaming [11/3]

2. Functional version and design space [11/10]
3. First operator on FPGA [11/17]
4. Intermediate throughput design (e.g., try for portion working at 100–200 Mb/s) [12/1]
5. Final Report [12/11]

5 Components
The components we will use are standard enough that the wikipedia pages are useful, and
there are several other nice tutorial blog posts out there. Here’s a roundup of starting points.

• Content-Defined Chunking (Rabin Fingerprint)

– https://moinakg.wordpress.com/tag/rabin-fingerprint/
– https://restic.net/blog/2015-09-12/restic-foundation1-cdc
– https://en.wikipedia.org/wiki/Rabin_fingerprint
• SHA 256 (SHA3-384) Hashing
– https://tools.ietf.org/html/rfc6234
– https://en.wikipedia.org/wiki/SHA-2
– https://en.wikipedia.org/wiki/SHA-3
– An example possibly useful for testing: https://csrc.nist.gov/CSRC/media/
Projects/Cryptographic-Standards-and-Guidelines/documents/examples/
SHA256.pdf
– More examples (including some for SHA3-384) can be found https://csrc.nist.
gov/projects/cryptographic-standards-and-guidelines/example-values
– You may use the hardwired SHA unit on the ZCU3EG.
∗ We provide you a platform that enables the SHA unit in the course directory
on eniac: /home1/e/ese5320/u96v2 sbc crypto base
· Sample code for using: https://xilinx-wiki.atlassian.net/wiki/
spaces/A/pages/18841654/Linux+SHA+Driver+for+Zynq+Ultrascale+
MPSoC

4
ESE5320 Fall 2023

· N.B. To use this unit, data provided must be padded to a multiple of

4B to get consistent results; providing this padding and extracting full
performance can be tricky.
– You may use a software implementation that exploits NEON intrinsics.
∗ https://github.com/james-ben/mpsoc-crypto
∗ https://github.com/noloader/SHA-Intrinsics
• Lempel-Ziv-Welch Compression
– http://www.geeksforgeeks.org/lzw-lempel-ziv-welch-compression-technique/
∗ This is a good overview of the algorithm. However, you should use this for
the pseudocode to outline an approach. The C++ code they provide uses
datatypes that will not work for your application where you must work on
binary data. You also need to think about how you implement the map.
– https://en.wikipedia.org/wiki/Lempel-Ziv-Welch

6 Some Suggested Parameters

There will be some discretion in picking implementation parameters. From the start we will
suggest considering:

• 4KB average chunk size with a 8KB maximum

• Use on-board DRAM for the chunk dictionary and hash fingerprints
• Full, maximum chunk size as the LZW compression window

You may want to experiment with some of the parameters when tuning your implementation.

7 Examples of Use
1. Yan Zhang, Nirwan Ansari, Mingquan Wu, and Heather Yu. “On Wide Area Network
Optimization.” In IEEE Communications Surveys & Tutorials, vol. 4, issue 4, pp.
1090–113, Oct. 2013. http://ieeexplore.ieee.org/document/6042388/ Sections
III A and B survey the role of compresison and decompression in optimizing WAN
data traffic.
2. Ashok Anand, Chitra Muthukrishnan, Aditya Akella, and Ramachandran Ramjee.
“Redundancy in Network Traffic: Findings and Implications”. In Proceedings of ACM
SIGMETRICS/Performance, 2009. https://www.microsoft.com/en-us/research/
publication/redundancy-in-network-traffic-findings-and-implications/ Char-
acterizes redundancy in network traffic.
3. Athicha Muthitacharoen, Benjie Chen, and David Mazieres. 2001. A low-bandwidth
network file system. In Proceedings of the eighteenth ACM symposium on Operating
systems principles (SOSP ’01). pp. 174-187. https://dl.acm.org/citation.cfm?
id=502052 Use of deduplication for optimizing a file system operating across a low
bandwith link.

5
ESE5320 Fall 2023

8 Other Resources
• Xilinx HLS Tiny Tutorials https://github.com/Xilinx/HLS-Tiny-Tutorials. These
show examples of how to write code for specific things in Vitis HLS.
• Vitis Tutorials https://xilinx.github.io/Vitis-Tutorials/master/docs/README.
html. These are written for data center platforms (AWS F1/Alveo cards), but the de-
composition and core routines should still be useful.
• For examples of other applications that have been converted to run with PL accelerators
in HLS, you can look at:
– the Rosetta Stone Benchmarks https://github.com/cornell-zhang/rosetta
from Cornell. These were written for SDSoC, but the decomposition and core
routines should still be useful.
– Parallel Programming for FPGAs by Kastner et al. http://kastner.ucsd.edu/
wp-content/uploads/2018/03/admin/pp4fpgas.pdf

.
9 Encoded Data Storage
For your timing runs, store the compressed data in DRAM. The largest test case we provide
is under 200MB. Copy your encoded data from DRAM to the SD-Card outside of your test
timing. You will need to plan how you divide the DRAM among buffers, chunk hash storage,
and compressed output. During early testing, before adding external input, you may also
want to start with the uncompressed data in DRAM.

10 Ethernet Packet Block Size

You will transfer data in ethernet packets. Your compression should be invariant to the size
of the ethernet packet. That is, where ethernet packets end and begin should not impact
where you choose for chunks to begin or end. Chunks will typically span across packet
boundaries. Ending chunks at packet boundaries will harm deduplication since the packet
sizes are unrelated to the content that you are trying to identify and deduplicate.

11 Chunk Validation
Using an SHA-256 signature, the probability of having a collision where two chunks share
the same signature is extremely low. For the project, we will consider equality of SHA-256
signatures adequate to determine that a chunk is a duplicate. This means you do not need
to read back the chunk and validate that it is, in fact, identical. If you had terabytes of
data, or if the consequences of error were high, you would want to perform the check. This
only applies to the full 256b signature. If you use smaller hashes for indexing, you will still
need to validate that there is a match on the 256b signature.

6
ESE5320 Fall 2023

12 Compressed Format
• Compressed stream is a sequential concatenation of chunks.

• Each chunk has a 32b header that identifies it as Duplicate Chunk or LZW Chunk.

– A Duplicate Chunk is a 32b value

∗ bit 0 is a 1 to signify a Duplicate Chunk
∗ bits 31–1 is the Chunk Index of previously encoded block to be duplicated.
Only LZW Chunks are indexed. The first LZW chunk has index 0, the next
1, etc.
– An LZW Chunk is
∗ a 32b bit header
· bit 0 is 0 to signify an LZW Chunk
· bits 31–1 is the compressed chunk length in bytes
∗ LZW-compressed contents of the chunk. LZW implementations vary. Our
implementation satisfies the following properties:
· Entries 0–255 of the dictionary are initialized to the 256 literals, e.g. a
byte with value 27 would be encoded as 27.
· The next dictionary entries are used for prefixes: sequences of 2 or more
bytes. The dictionary is expanded on every code word sent (received).
The next definition is the dictionary entry for the decoded string from the
previous code word extended by the next byte following the decoded code
word string in the plain text. This follows the standard LZW encoding
as reviewed in:
http://www.geeksforgeeks.org/lzw-lempel-ziv-welch-compression-technique/.
· No special keywords such as end-of-file are contained in the dictionary.
· The dictionary size is only limited by the chunk size limitation.
· For simplicity, we set all code lengths to be dlog2 M axChunkSizee in
this project. The provided decoder works with the MaxChunkSize = 8192
bytes. You can change the M axChunkSize by changing the defined pa-
rameter CODE LENGTH.
· Code words are output MSB-first. Assuming nothing has been out-
put yet, a 13b code with binary value x12 x11 x10 x9 x8 x7 x6 x5 x4 x3 x2 x1 x0
results in two consecutive bytes with values x12 x11 x10 x9 x8 x7 x6 x5 and
x4 x3 x2 x1 x0 000. The next code word with value y12 y11 y10 y9 y8 y7 y6 y5 y4 y3 y2 y1 y0
changes the second byte to, x4 x3 x2 x1 x0 y12 y11 y10 , the third to y9 y8 y7 y6 y5 y4 y3 y2
and the fourth to y1 y0 000000.
· N.B., You will need to pay attention to endianess in your hardware im-
plementation to make sure it is consistent with the decoder.
∗ Padding so that the entire LZW chunk ends on an 8b boundary; that is,
chunks of either type always begin on 8b boundaries.

7
ESE5320 Fall 2023

Because we are using a uniform length of dlog2 M axChunkSizee > 8 for encoding codewords,
it is possible that an encoded chunk could be longer than the unencoded chunk. To deal
with this, real implementations will often compare the length of the LZW encoded chunk to
the raw chunk length and send the unencoded data if it is smaller. For simplicity, we are not
asking you to perform that optimization (and our encoded chunk format does not support
it).

13 Compression Goal
Changes in parameters (such as average chunk size) will change deduplication and compres-
sion results. Furthermore, you may make tradeoffs in implementation that impact compres-
sion ratios. You may choose to sacrifice some level of compression for throughput. You
should try to maximize deduplication and tradeoff only a modest (e.g. 10%) level of chunk
compression. Show your tradeoffs explored in your detail design-space exploration section
with graphs to support as apporpriate.

8
ESE5320 Fall 2023

14 Supplied Resources
• Laptop code and/or scripts to send data to your Ultra96 at a fixed (tunable) frequency
(Section 15).

• Dummy design to receive packets from sender (Section 15).

• Reference implementation of decoder (see project/Decoder in course code repository).

– Your encoder needs to work with the decoder. Getting the encoder to produce
encodings according to the compressed format (Sec. 12) may take some experi-
mentation. You may want to add a verbose debugging option to the decoder to
have it print out what values it is getting for the various fields to expedite the
debugging of your encoder.

• We provide several datasets that you can use for testing. We encourage you to create
your own simple datasets for unit testing. Note that the tar-files are not meant to be
unpacked. Following are the datasets that we provide.

– The Little Prince unencoded. As an encoding example, we also provide The Little
Prince compressed.
– Simple example. This archive contains three files, two of which are identical.
– Benjamin Franklin’s autobiography. This is a simple text file that you can modify
for your own purposes. The current file probably has few duplicate areas. (390
KB)
– GTK+ source code. This file contains several subsequent versions of the GTK+
source, which provides ample opportunity for deduplication. (177 MB)
– Linux source code. This file contains several subsequent versions of the source.
(191 MB)
– Several Linux kernels. As opposed to the other data sets, this set contains preva-
lently binary data. (66 MB)

Note: you should take these as examples, not a definitive list of test cases. In particular,
you should create many other focused test examples to facilitate your debugging and
validation.

9
ESE5320 Fall 2023

15 Ethernet Setup and Supplied Code

15.1 Measure Ethernet Speed

1. Make sure that you have the following pre-requisites figured out (you should already
have this setup from homework 4):

(a) Communication over ethernet. If you are using Windows, follow this document.
If you are on Mac, use the following instructions:
i. Download and install the AX88179 driver in the Mac (which is for the ethernet
usb):
ii. And then in Mac, you can do screen /dev/tty.usbserial-1234_oj11 115200
to open the serial console for the Ultra96. Assign the ip address to ultra96
like you would do normally (see Homework 6 instructions). You can exit the
serial console by doing CTRL-A CTRL-\ and pressing y.
iii. Once the ethernet driver is installed on your Mac, you can assign it an ip ad-
dress using sudo ifconfig en4 10.10.7.2 netmask 255.0.0.0 where en4
is the interface name you will find from ifconfig.
If you are using a Linux machine in Detkin/Ketterer as your host machine, use the
command: ifconfig_eth1. Note that if that command didn’t work, you might
have to use ifconfig_eth2 or ifconfig_eth3. These commands are equivalent
to the command:
sudo ifconfig ethX 10.10.7.2 netmask 255.0.0.0

This is the only way to assign an IP to the USB-ethernet device in the Linux
machines in Detkin and Ketterer.
(b) Install iperf3 on your computer: https://iperf.fr/iperf-download.php
(c) Find out what kind of USB ports you have in your computer: USB-2.0 or USB-3.0.

2. Open the Ultra96 terminal and issue the command: /usr/bin/iperf3 -s

3. Open a terminal in your computer and issue the command: /usr/bin/iperf3 -c 10.10.7.1
(assuming 10.10.7.1 is the IP address you assigned to the Ultra96).

4. You should see outputs similar to the following if you connected to a USB-3.0 port in
your computer:

Connecting to host 10.10.7.1, port 5201

[ 5] local 10.10.7.2 port 38484 connected to 10.10.7.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 110 MBytes 920 Mbits/sec 0 266 KBytes
[ 5] 1.00-2.00 sec 109 MBytes 917 Mbits/sec 0 266 KBytes
[ 5] 2.00-3.00 sec 109 MBytes 916 Mbits/sec 0 276 KBytes

10
ESE5320 Fall 2023

[ 5] 3.00-4.00 sec 109 MBytes 915 Mbits/sec 0 276 KBytes

[ 5] 4.00-5.00 sec 106 MBytes 893 Mbits/sec 0 287 KBytes
[ 5] 5.00-6.00 sec 104 MBytes 874 Mbits/sec 0 287 KBytes
[ 5] 6.00-7.00 sec 105 MBytes 878 Mbits/sec 0 287 KBytes
[ 5] 7.00-8.00 sec 105 MBytes 877 Mbits/sec 0 287 KBytes
[ 5] 8.00-9.00 sec 105 MBytes 880 Mbits/sec 0 287 KBytes
[ 5] 9.00-10.00 sec 105 MBytes 878 Mbits/sec 0 287 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1.04 GBytes 895 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 1.04 GBytes 893 Mbits/sec receiver
iperf Done.

This tells you that the upper bound on the throughput achieved by a placeholder
receiver is around 895 Mb/s limited by the Ugreen ethernet-to-USB interfaces.

15.2 Obtaining Starter Code and Integrating Ethernet Input

We will now describe how data will be sent through ethernet using the client and server, the
packet layout, and other relevant information for getting started to receiving real-time data.

1. Clone the ese532_code repository using the following command:

git clone https://github.com/icgrp/ese532_code.git

If you already have it cloned, pull in the latest changes using:

cd ese532_code/
git pull origin master

The code you will use for this section is in the project directory. The directory
structure looks like this:

project/
Client/
client.cpp
Decoder/
Decoder.cpp
Server/
encoder.cpp
encoder.h

11
ESE5320 Fall 2023

event_timer.cpp
event_timer.h
server.cpp
server.h
sourceMe.sh
LittlePrince.txt
Makefile

2. If you are compiling on BigLab/Detkin/Ketterer, run sourceMe.sh. If you are building

locally in your own machine, make sure to source Vitis settings, e.g.:

source /opt/Xilinx/Vitis/2020.2/settings64.sh

And make sure the PLATFORM_REPO_PATHS is setup to the platform you downloaded.

3. You can either use make or the Vitis GUI to compile your code. Use make all to
compile all the targets client, encoder, and decoder.

4. Use make clean to clean all the generated files.

5. Our basic model will be communication between two systems—your computer and the
Ultra96—over ethernet. Your computer will send packets at a fixed rate. The Ultra96
will receive the data and compress it. Figure 5.5 from homework 5 shows you the setup
and cabling. Since the first system is sending data at a fixed rate, it is necessary for
the receiver to compress the data at that rate or data will be lost. We provide the
code for the sender (Client/client.cpp). Your project is connected to the receiver
(Server/encoder.cpp). And then you can use the decoder (Decoder/Decoder.cpp)
to verify that you can recover the original, unencoded file from the compressed file.

6. Let’s run the given code with the system we have setup. After compiling the code, copy
over encoder binary, and the LittlePrince.txt as follows (adjust the commands if
you are not using Linux):

scp encoder LittlePrince.txt [email protected]:~/

And then open the Ultra96 terminal and run the encoder with ./encoder. The pro-
gram waits for a packet to arrive. Open a terminal in your computer and issue the
following command:

./client -i 10.10.7.1 -f LittlePrince.txt

You should see the following output in your Ultra96 terminal:

12
ESE5320 Fall 2023

root@ultra96v2-2021-1:~# ./encoder
setting up sever...
server setup complete!
write file with 14247
--------------- Key execution times ---------------
Reading packets and processing : 0.228 ms

You should see the following output in your host terminal:

ip is set to 10.10.7.1
filename is LittlePrince.txt
bytes_read 14247

You can verify the output by doing the following in the Ultra96:

diff output_cpu.bin LittlePrince.txt

Note that our example is just writing the packets to a file. Your project will process
these packets with the encoder pipeline and write a compressed output.

7. We’ll now describe what’s happening in the client and the server:

(a) Packet Layout: We will be sending data via UDP datagrams. Linux supports
the UDP protocol and receiving packets from the client can be done easily using
Linux IP. The code provided will direct you on how to setup your compression
pipeline to listen as well as handle incoming packets. The maximum size of a
packet will be 16K Bytes. The header of the packet will be 2 bytes consisting of a
done bit denoting that all of the data has been transmitted as well as the length
of the data contained inside the packet.

13
ESE5320 Fall 2023

Figure 1: Packet Layout

(b) You will be specifically interested in the files

Server/server.cpp and Server/server.h. These are the files provided and can
be directly copied and pasted into your project to set up your server pipeline
to receive data. The rest of the repository can serve as an example of how one
could implement the design (Server/encoder.cpp). At a high level you need to
call the function setup server() once. Following this you can makes calls to
the function getpacket() to receive your next datagram. Please note that your
buffer passed into this function needs to be able to handle the maximum payload
size plus the two byte header. Please also note that the recvfrom() Linux call
is blocking. This means that your process will be blocked until a datagram has
arrived. Depending on your design this may be ok. If you do not want to block
you may look into the select() system call as an alternative. This will allow you
to check if data has arrived in your socket and take actions accordingly.

You are of course free to write your own application. For more information on
how to receive the data you can refer to the man page. https://linux.die.
net/man/2/recvfrom.

Alternatively, you can create your own client and server using the DPDK library.
Examples of how to write client and server code using DPDK can be found in the
following links:
• https://doc.dpdk.org/guides/index.html
• https://zenhox.github.io/2018/01/25/dpdk-pktSR/

14
ESE5320 Fall 2023

15.3 Design Considerations

1. Configuring Sender: You will likely need to slow down the client’s data transfer to
begin with. If your system cannot keep up with the pace of the sender, packets will be
dropped.
To configure the sender, when you start the client from the Linux shell, it takes argu-
ments as shown:

./client -s 5 -f file -i ip_address_of_ultra96 -b blocksize

Usage example is:

./client -s 5 -f LittlePrince.txt -i 10.10.7.1 -b 2048

-s option specifies the sleep time or delay between packets (in microseconds)
-i option specifies ip address to send to
-f option specifies what file to send
-b option specifies the block size

For the project report and for project milestones where you characterize your through-
put, you should adjust the -s argument until your design fails. Report your maximum
throughput as the throughput associated with the smallest value of -s on which your
design successfully receives and correctly compresses the input. Measure the actual
throughput by measuring the time it takes for the client to send the file. You can use
/usr/bin/time to measure the time.

2. Debugging Sending and Receiving of Packets: If you encounter problems with

sending and receiving packets between the client and the server, you can emulate the
socket programming. You can see an example of that here. Specifically in server.cpp
and server.h, you can see that instead of using sockets, you can read a file and send
it in pieces to the encoder pipeline to emulate socket programming.

Q Tips: Fast, Scalable, and Maintainable Kdb+
From Everand
Q Tips: Fast, Scalable, and Maintainable Kdb+
Nick Psaris
No ratings yet
Practical Go: Building Scalable Network and Non-Network Applications
From Everand
Practical Go: Building Scalable Network and Non-Network Applications
Amit Saha
No ratings yet
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
SRS - How to build a Pen Test and Hacking Platform
From Everand
SRS - How to build a Pen Test and Hacking Platform
alasdair gilchrist
2/5 (1)
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
The Little Book of Sitecore® Tips: Volume 1
From Everand
The Little Book of Sitecore® Tips: Volume 1
Neil P Shack
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Confluent Certified Developer for Apache Kafka® Exam kit
From Everand
Confluent Certified Developer for Apache Kafka® Exam kit
PRIYANKA
No ratings yet
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
From Everand
Cisco Packet Tracer Implementation: Building and Configuring Networks: 1, #1
S. R. Jena
No ratings yet
Node.js, JavaScript, API: Interview Questions and Answers
From Everand
Node.js, JavaScript, API: Interview Questions and Answers
John Edward Cooper Berg
5/5 (1)
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
From Everand
Groovy for Domain-Specific Languages, Second Edition: Extend and enhance your Java applications with domain-specific scripting in Groovy
Fergal Dearle
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Game and Graphics Programming for iOS and Android with OpenGL ES 2.0
From Everand
Game and Graphics Programming for iOS and Android with OpenGL ES 2.0
Romain Marucchi-Foino
No ratings yet
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
From Everand
Protocol Buffers Handbook: Getting deeper into Protobuf internals and its usage
Clément Jean
No ratings yet
Practical Play Framework: Focus on what is really important
From Everand
Practical Play Framework: Focus on what is really important
Alberto Souza
No ratings yet
ELE00146M - Coursework Assignment(1) (1)
No ratings yet
ELE00146M - Coursework Assignment(1) (1)
3 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
Simple Golang Programming for Beginners
From Everand
Simple Golang Programming for Beginners
Terry T. Diaz
No ratings yet
C++ Basics for New Programmers: A Practical Guide with Examples
From Everand
C++ Basics for New Programmers: A Practical Guide with Examples
William E. Clark
No ratings yet
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
From Everand
Microsoft AZ-400: Designing and Implementing Microsoft DevOps Solutions - Certification Exam Prep
Steve Brown
No ratings yet
System Programming Essentials with Go: System calls, networking, efficiency, and security practices with practical projects in Golang
From Everand
System Programming Essentials with Go: System calls, networking, efficiency, and security practices with practical projects in Golang
Alex Rios
No ratings yet
OpenCL Programming by Example
From Everand
OpenCL Programming by Example
Ravishekhar Banger
No ratings yet
SCCharts - Language and Interactive Incremental Compilation
From Everand
SCCharts - Language and Interactive Incremental Compilation
Christian Motika
No ratings yet
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Living with Linux in the Industrial World
From Everand
Living with Linux in the Industrial World
Elaiya Iswera Lallan
No ratings yet
Spark: Big Data Cluster Computing in Production
From Everand
Spark: Big Data Cluster Computing in Production
Ilya Ganelin
No ratings yet
Kafka Developer Certified: The Essential Guide
From Everand
Kafka Developer Certified: The Essential Guide
SUJAN
No ratings yet
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
From Everand
TensorFlow Developer Certificate Exam Practice Tests 2024 Made Easy
Mr Troy
No ratings yet
Extending Docker
From Everand
Extending Docker
Russ McKendrick
5/5 (1)
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
From Everand
Java: Tips and Tricks to Programming Code with Java: Java Computer Programming, #2
Charlie Masterson
No ratings yet
Java: Tips and Tricks to Programming Code with Java
From Everand
Java: Tips and Tricks to Programming Code with Java
Charlie Masterson
No ratings yet
The FPGA Programming Handbook: An essential guide to FPGA design for transforming ideas into hardware using SystemVerilog and VHDL
From Everand
The FPGA Programming Handbook: An essential guide to FPGA design for transforming ideas into hardware using SystemVerilog and VHDL
Frank Bruno
No ratings yet
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
From Everand
C# Package Mastery: 100 Essentials in 1 Hour - 2024 Edition
Tenko
No ratings yet
Hyper-V 2016 Best Practices
From Everand
Hyper-V 2016 Best Practices
Benedict Berger
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
CSC8415 Assignment 2
No ratings yet
CSC8415 Assignment 2
19 pages
Core Objective-C in 24 Hours
From Everand
Core Objective-C in 24 Hours
Keith Lee
5/5 (1)
OpenStack Essentials - Second Edition
From Everand
OpenStack Essentials - Second Edition
Dan Radez
No ratings yet
Learning Concurrent Programming in Scala
From Everand
Learning Concurrent Programming in Scala
Aleksandar Prokopec
No ratings yet
Six Minute Guide to IPv6
From Everand
Six Minute Guide to IPv6
Daryl Moon
5/5 (1)
What's New in .NET 8? A Complete Guide to the Latest Features
From Everand
What's New in .NET 8? A Complete Guide to the Latest Features
Nitika
No ratings yet
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Elements of Android Room
From Everand
Elements of Android Room
Mark Murphy
No ratings yet
Cloud Computing with the Windows Azure Platform
From Everand
Cloud Computing with the Windows Azure Platform
Roger Jennings
3.5/5 (3)
Azure Bicep QuickStart Pro: From JSON and ARM Templates to Advanced Deployment Techniques, CI/CD Integration, and Environment Management
From Everand
Azure Bicep QuickStart Pro: From JSON and ARM Templates to Advanced Deployment Techniques, CI/CD Integration, and Environment Management
Selina Threxan
No ratings yet
Azure Bicep QuickStart Pro
From Everand
Azure Bicep QuickStart Pro
Selina Threxan
No ratings yet
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
From Everand
DevOps. How to build pipelines with Jenkins, Docker container, AWS ECS, JDK 11, git and maven 3?
John Edward Cooper Berg
No ratings yet
Using Yocto Project with BeagleBone Black
From Everand
Using Yocto Project with BeagleBone Black
H M Irfan Sadiq
No ratings yet
Learn C++
From Everand
Learn C++
Aishik Dutta
No ratings yet
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
From Everand
Kubernetes: Build and Deploy Modern Applications in a Scalable Infrastructure. The Complete Guide to the Most Modern Scalable Software Infrastructure.: Docker & Kubernetes, #2
Jordan Lioy
No ratings yet
Native Docker Clustering with Swarm
From Everand
Native Docker Clustering with Swarm
Fabrizio Soppelsa
No ratings yet
HackerTools Crack With Disassembling
From Everand
HackerTools Crack With Disassembling
Omega Brdarevic
2.5/5 (3)
.NET Mastery: The .NET Interview Questions and Answers
From Everand
.NET Mastery: The .NET Interview Questions and Answers
Chetan Singh
No ratings yet
DevOps for Networking
From Everand
DevOps for Networking
Steven Armstrong
4/5 (2)
Build Your First Home Server
From Everand
Build Your First Home Server
R.R. Arnob
No ratings yet
Install and Configure LVM Redhat Enterprise Linux 6.7
No ratings yet
Install and Configure LVM Redhat Enterprise Linux 6.7
15 pages
Semantic Report
No ratings yet
Semantic Report
24 pages
Software Architecture and Design
No ratings yet
Software Architecture and Design
8 pages
Parity Check
No ratings yet
Parity Check
41 pages
BDX54B
No ratings yet
BDX54B
7 pages
A3 Super PC Suite Instructions: Download List
No ratings yet
A3 Super PC Suite Instructions: Download List
7 pages
PIC Mid C 1
100% (1)
PIC Mid C 1
21 pages
FTXM50-71R Wir1 3D130931 EN
No ratings yet
FTXM50-71R Wir1 3D130931 EN
1 page
Ericsson RANAP Fail
No ratings yet
Ericsson RANAP Fail
6 pages
Inductive Coupling Aware Explicit Cross-Talk and Delay Formula For On-Chip VLSI RLCG Interconnects Using Difference Model Approach
No ratings yet
Inductive Coupling Aware Explicit Cross-Talk and Delay Formula For On-Chip VLSI RLCG Interconnects Using Difference Model Approach
6 pages
IoT Endsem
No ratings yet
IoT Endsem
151 pages
Igbt Infineon
No ratings yet
Igbt Infineon
11 pages
Java ASSIGNMENT 2 Answer Key - Set 2 (Exam Format)
No ratings yet
Java ASSIGNMENT 2 Answer Key - Set 2 (Exam Format)
14 pages
Connect With The Loader Universal Cable According To The Table (Large (1), Large (2), Small (3) )
No ratings yet
Connect With The Loader Universal Cable According To The Table (Large (1), Large (2), Small (3) )
12 pages
Intel Proset For Windows Device Manager Wmi Provider User S Guide
No ratings yet
Intel Proset For Windows Device Manager Wmi Provider User S Guide
47 pages
Flyer - DAS Distributed Acoustic Sensing
No ratings yet
Flyer - DAS Distributed Acoustic Sensing
3 pages
RK3576 Developer Guide Android14 SDK En
No ratings yet
RK3576 Developer Guide Android14 SDK En
57 pages
Circuits and Signals I Exam Question Paper
No ratings yet
Circuits and Signals I Exam Question Paper
8 pages
Dimensions Instal at Ion
No ratings yet
Dimensions Instal at Ion
142 pages
Material Estudo
No ratings yet
Material Estudo
16 pages
SP - CF - N3 v1.0.4
No ratings yet
SP - CF - N3 v1.0.4
7 pages
27.1.5 Lab Convert Data Into A Universal Format
No ratings yet
27.1.5 Lab Convert Data Into A Universal Format
9 pages
Correction Du Rattrapage D'anglais 2022-2023
No ratings yet
Correction Du Rattrapage D'anglais 2022-2023
1 page
66 Easy
No ratings yet
66 Easy
10 pages
User Manual For Digital Logic Trainer Kit
No ratings yet
User Manual For Digital Logic Trainer Kit
6 pages
The Value of Digital Transport in Distributed Antenna Systems
No ratings yet
The Value of Digital Transport in Distributed Antenna Systems
4 pages
Mainboard User's Manual
No ratings yet
Mainboard User's Manual
39 pages
Newton & Einstein: San Remigio, Cebu Teacher-Made Learner's Home Task
No ratings yet
Newton & Einstein: San Remigio, Cebu Teacher-Made Learner's Home Task
8 pages
Important-University Questions Unit-Wise
100% (1)
Important-University Questions Unit-Wise
7 pages

Project

Uploaded by

Project

Uploaded by

ESE5320 Fall 2023

ESE5320, Fall 2023Deduplication and Compression Project Wednesday, October 25

Due: Monday, December 11, 10:15am (demo), 5:00pm (writeup)

• Describe your single ARM processor mapped design. [1 page]

I, your-name-here, certify that I have complied with the

You can review the Code of Academic Integrity here: https://catalog.upenn.edu/

3 Final Project Code and Bitstream Submission

• Turn in a tar file to the designated final implementation assignment on canvas.

3. Your compression program should start up ready to receive inputs.

We may provide further clarification or revision on this final implementation turnin as we

3.1 Demo Day

1. Analysis, parallelism, placeholder encoder, and teaming [11/3]

• Content-Defined Chunking (Rabin Fingerprint)

· N.B. To use this unit, data provided must be padded to a multiple of

6 Some Suggested Parameters

• 4KB average chunk size with a 8KB maximum

10 Ethernet Packet Block Size

– A Duplicate Chunk is a 32b value

• Dummy design to receive packets from sender (Section 15).

• Reference implementation of decoder (see project/Decoder in course code repository).

15 Ethernet Setup and Supplied Code

15.1 Measure Ethernet Speed

2. Open the Ultra96 terminal and issue the command: /usr/bin/iperf3 -s

Connecting to host 10.10.7.1, port 5201

[ 5] 3.00-4.00 sec 109 MBytes 915 Mbits/sec 0 276 KBytes

15.2 Obtaining Starter Code and Integrating Ethernet Input

1. Clone the ese532_code repository using the following command:

git clone https://github.com/icgrp/ese532_code.git

If you already have it cloned, pull in the latest changes using:

2. If you are compiling on BigLab/Detkin/Ketterer, run sourceMe.sh. If you are building

4. Use make clean to clean all the generated files.

scp encoder LittlePrince.txt [email protected]:~/

./client -i 10.10.7.1 -f LittlePrince.txt

You should see the following output in your Ultra96 terminal:

You should see the following output in your host terminal:

diff output_cpu.bin LittlePrince.txt

Figure 1: Packet Layout

(b) You will be specifically interested in the files

15.3 Design Considerations

./client -s 5 -f file -i ip_address_of_ultra96 -b blocksize

Usage example is:

./client -s 5 -f LittlePrince.txt -i 10.10.7.1 -b 2048

2. Debugging Sending and Receiving of Packets: If you encounter problems with

You might also like