0% found this document useful (0 votes)
5 views6 pages

Lauri's blog _ AXI Direct Memory Access

The document provides an overview of AXI Direct Memory Access (DMA) and its implementation on Xilinx boards, distinguishing between AXI DMA and AXI VDMA. It details the internal workings of AXI DMA, including channel types and a minimal hardware and software setup for using DMA to transfer data between DDR memory and FPGA. Additionally, it includes example C code for managing DMA operations and highlights the importance of managing clock domains and memory access in the setup process.

Uploaded by

Adam Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views6 pages

Lauri's blog _ AXI Direct Memory Access

The document provides an overview of AXI Direct Memory Access (DMA) and its implementation on Xilinx boards, distinguishing between AXI DMA and AXI VDMA. It details the internal workings of AXI DMA, including channel types and a minimal hardware and software setup for using DMA to transfer data between DDR memory and FPGA. Additionally, it includes example C code for managing DMA operations and highlights the importance of managing clock domains and memory access in the setup process.

Uploaded by

Adam Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

AXI Direct Memory Access 12.

Dec '14

Introduction

Getting started with direct memory access on Xilinx boards may be initially overwhelming. First of all Xilinx distinguishes AXI DMA
and AXI VDMA in programmable fabric. AXI DMA refers to traditional FPGA direct memory access which roughly corresponds to
transferring arbitrary streams of bytes from FPGA to a slice of DDR memory and vice versa. VDMA refers to video DMA which adds
mechanisms to handle frame synchronization using ring buffer in DDR, on-the-fly video resolution changes, cropping and zooming.
Video DMA is covered in next article. In addition to AXI DMA and AXI VDMA there is a DMA engine built into the ARM core which is
also out of the scope of this article. Both AXI DMA and AXI VDMA have optional scatter-gather support which means that instead of
writing memory addresses or framebuffer addresses to control registers the DMA controller grabs them from linked list in DDR memory.
Scatter-gather features are out of scope of this article.

Internals

AXI DMA distinguishes two channels: MM2S (memory-mapped to stream) transports data from DDR memory to FPGA and S2MM
(stream to memory-mapped) transports arbitrary data stream to DDR memory.

AXI DMA internals

Minimal working hardware

The simplest way to instantiate AXI DMA on Zynq-7000 based boards is to take board vendor's base design, strip unnecessary
components, add AXI Direct Memory Access IP-core and connect the output stream port to it's input stream port. This essentially
implements memcpy functionality which can be triggered from ARM core but offloaded to programmable fabric.
Userspace
test application

/dev/mem

C code running on ARM cores


DDR memory
VHDL code running on programmable fabric

AXI4 Direct
Memory Access

AXI Direct Memory Access stream output is looped back to stream input

To be more precise, following is the corresponding high level block design. High speed clock line is highlighted in yellow as it runs on
higher frequency of 150MHz while the general purpose port runs at 100MHz. Clock domain errors can usually be tracked back to
conflicting clock lines. This is further explained in the end of this article.

High level block design corresponding to abstract design presented earlier

In the AXI Direct Memory Access IP-core customization dialog read channel and write channel correspond respectively to MM2S and
S2MM portions of the DMA block. Memory map data width of 32 bits means that 4 bytes will be transferred during one bus cycle. This
means the tdata port of the stream interface will be 32 bits wide.
Both read/write channels are enabled and scatter-gather engine is disabled

AXI Direct Memory Access component's control register, status register and transfer address registers are accessible via the AXI Lite
slave port which is memory mapped to address range of 0x40400000 - 0x4040FFFF. The whole memory range of 0x00000000-
0x1FFFFFFF is accessible via both stream to memory-mapped and memory-mapped to stream channel. AXI DMA 1 documentation has
the offsets of the registers accessible via AXI Lite port. In this case MM2S control register of 32-bits is accessible at 0x40400000, MM2S
status register of 32-bits at 0x40400004 and so forth.

Note that customizing the AXI Direct Memory Access IP-core parameters causes memory ranges to be reset under Address
Editor!

Minimal working software

When it comes to writing C code I see alarming tendency of defaulting to vendor provided components: stand-alone binary compilers,
Linux distributions, board support packages, wrappers while avoiding learning what actually happens in the hardware/software.

As described in my earlier article physical memory can be accessed in Linux via /dev/mem block device. This makes it possible to access
AXI Lite registers simply by reading/writing to a memory mapped range from /dev/mem. To use DMA component minimally four steps
have to be taken:

Start the DMA channel (MM2S, S2MM or both) by writing 1 to control register

Write start/destination addresses to corresponding registers

To initiate the transfer(s) write transfer length(s) to corresponding register(s).

Monitor status register for IOC_Irq flag.


In this case we're copying 32 bytes from physical address of 0x0E000000 to physical addres of 0x0F000000. Note that kernel may
allocate memory for other processes in that range and that is the primary reason to write a kernel module which would
request_mem_region so no other processes would overlap with the memory range. Besides reserving memory ranges the kernel module
provides a sanitized way of accessing the hardware from userspace applications via /dev/blah block devices.

/**
* Proof of concept offloaded memcopy using AXI Direct Memory Access v7.1
*/

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <termios.h>
#include <sys/mman.h>

#define MM2S_CONTROL_REGISTER 0x00


#define MM2S_STATUS_REGISTER 0x04
#define MM2S_START_ADDRESS 0x18
#define MM2S_LENGTH 0x28

#define S2MM_CONTROL_REGISTER 0x30


#define S2MM_STATUS_REGISTER 0x34
#define S2MM_DESTINATION_ADDRESS 0x48
#define S2MM_LENGTH 0x58
unsigned int dma_set(unsigned int* dma_virtual_address, int offset, unsigned int value) {
dma_virtual_address[offset>>2] = value;
}

unsigned int dma_get(unsigned int* dma_virtual_address, int offset) {


return dma_virtual_address[offset>>2];
}

int dma_mm2s_sync(unsigned int* dma_virtual_address) {


unsigned int mm2s_status = dma_get(dma_virtual_address, MM2S_STATUS_REGISTER);
while(!(mm2s_status & 1<<12) || !(mm2s_status & 1<<1) ){
dma_s2mm_status(dma_virtual_address);
dma_mm2s_status(dma_virtual_address);

mm2s_status = dma_get(dma_virtual_address, MM2S_STATUS_REGISTER);


}
}
int dma_s2mm_sync(unsigned int* dma_virtual_address) {
unsigned int s2mm_status = dma_get(dma_virtual_address, S2MM_STATUS_REGISTER);
while(!(s2mm_status & 1<<12) || !(s2mm_status & 1<<1)){
dma_s2mm_status(dma_virtual_address);
dma_mm2s_status(dma_virtual_address);

s2mm_status = dma_get(dma_virtual_address, S2MM_STATUS_REGISTER);


}
}

void dma_s2mm_status(unsigned int* dma_virtual_address) {


unsigned int status = dma_get(dma_virtual_address, S2MM_STATUS_REGISTER);
printf("Stream to memory-mapped status (0x%08x@0x%02x):", status, S2MM_STATUS_REGISTER);
if (status & 0x00000001) printf(" halted"); else printf(" running");
if (status & 0x00000002) printf(" idle");
if (status & 0x00000008) printf(" SGIncld");
if (status & 0x00000010) printf(" DMAIntErr");
if (status & 0x00000020) printf(" DMASlvErr");
if (status & 0x00000040) printf(" DMADecErr");
if (status & 0x00000100) printf(" SGIntErr");
if (status & 0x00000200) printf(" SGSlvErr");
if (status & 0x00000400) printf(" SGDecErr");
if (status & 0x00001000) printf(" IOC_Irq");
if (status & 0x00002000) printf(" Dly_Irq");
if (status & 0x00004000) printf(" Err_Irq");
printf("\n");
}

void dma_mm2s_status(unsigned int* dma_virtual_address) {


unsigned int status = dma_get(dma_virtual_address, MM2S_STATUS_REGISTER);
printf("Memory-mapped to stream status (0x%08x@0x%02x):", status, MM2S_STATUS_REGISTER);
if (status & 0x00000001) printf(" halted"); else printf(" running");
if (status & 0x00000002) printf(" idle");
if (status & 0x00000008) printf(" SGIncld");
if (status & 0x00000010) printf(" DMAIntErr");
if (status & 0x00000020) printf(" DMASlvErr");
if (status & 0x00000040) printf(" DMADecErr");
if (status & 0x00000100) printf(" SGIntErr");
if (status & 0x00000200) printf(" SGSlvErr");
if (status & 0x00000400) printf(" SGDecErr");
if (status & 0x00001000) printf(" IOC_Irq");
if (status & 0x00002000) printf(" Dly_Irq");
if (status & 0x00004000) printf(" Err_Irq");
printf("\n");
}
void memdump(void* virtual_address, int byte_count) {
char *p = virtual_address;
int offset;
for (offset = 0; offset < byte_count; offset++) {
printf("%02x", p[offset]);
if (offset % 4 == 3) { printf(" "); }
}
printf("\n");
}
int main() {
int dh = open("/dev/mem", O_RDWR | O_SYNC); // Open /dev/mem which represents the whole physical memory
unsigned int* virtual_address = mmap(NULL, 65535, PROT_READ | PROT_WRITE, MAP_SHARED, dh, 0x40400000); // Memory map AXI Lite reg
unsigned int* virtual_source_address = mmap(NULL, 65535, PROT_READ | PROT_WRITE, MAP_SHARED, dh, 0x0e000000); // Memory map sour
unsigned int* virtual_destination_address = mmap(NULL, 65535, PROT_READ | PROT_WRITE, MAP_SHARED, dh, 0x0f000000); // Memory map

virtual_source_address[0]= 0x11223344; // Write random stuff to source block


memset(virtual_destination_address, 0, 32); // Clear destination block

printf("Source memory block: "); memdump(virtual_source_address, 32);


printf("Destination memory block: "); memdump(virtual_destination_address, 32);

printf("Resetting DMA\n");
dma_set(virtual_address, S2MM_CONTROL_REGISTER, 4);
dma_set(virtual_address, MM2S_CONTROL_REGISTER, 4);
dma_s2mm_status(virtual_address);
dma_mm2s_status(virtual_address);

printf("Halting DMA\n");
dma_set(virtual_address, S2MM_CONTROL_REGISTER, 0);
dma_set(virtual_address, MM2S_CONTROL_REGISTER, 0);
dma_s2mm_status(virtual_address);
dma_mm2s_status(virtual_address);

printf("Writing destination address\n");


dma_set(virtual_address, S2MM_DESTINATION_ADDRESS, 0x0f000000); // Write destination address
dma_s2mm_status(virtual_address);
printf("Writing source address...\n");
dma_set(virtual_address, MM2S_START_ADDRESS, 0x0e000000); // Write source address
dma_mm2s_status(virtual_address);

printf("Starting S2MM channel with all interrupts masked...\n");


dma_set(virtual_address, S2MM_CONTROL_REGISTER, 0xf001);
dma_s2mm_status(virtual_address);

printf("Starting MM2S channel with all interrupts masked...\n");


dma_set(virtual_address, MM2S_CONTROL_REGISTER, 0xf001);
dma_mm2s_status(virtual_address);
printf("Writing S2MM transfer length...\n");
dma_set(virtual_address, S2MM_LENGTH, 32);
dma_s2mm_status(virtual_address);

printf("Writing MM2S transfer length...\n");


dma_set(virtual_address, MM2S_LENGTH, 32);
dma_mm2s_status(virtual_address);
printf("Waiting for MM2S synchronization...\n");
dma_mm2s_sync(virtual_address);
printf("Waiting for S2MM sychronization...\n");
dma_s2mm_sync(virtual_address); // If this locks up make sure all memory ranges are assigned under Address Editor!
dma_s2mm_status(virtual_address);
dma_mm2s_status(virtual_address);
printf("Destination memory block: "); memdump(virtual_destination_address, 32);
}

Successful run should look something like this:

Source memory block: 44332211 7dcddfdf 5a7fefa4 36aa3c9b ca2eea6a 5bf64f81 ebf7ffbb b7f710d2
Destination memory block: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Resetting DMA
Stream to memory-mapped status (0x00000001@0x34): halted
Memory-mapped to stream status (0x00000001@0x04): halted
Halting DMA
Stream to memory-mapped status (0x00000001@0x34): halted
Memory-mapped to stream status (0x00000001@0x04): halted
Writing destination address
Stream to memory-mapped status (0x00000001@0x34): halted
Writing source address...
Memory-mapped to stream status (0x00000001@0x04): halted
Starting S2MM channel with all interrupts masked...
Stream to memory-mapped status (0x00000000@0x34): running
Starting MM2S channel with all interrupts masked...
Memory-mapped to stream status (0x00000000@0x04): running
Writing S2MM transfer length...
Stream to memory-mapped status (0x00000000@0x34): running
Writing MM2S transfer length...
Memory-mapped to stream status (0x00000000@0x04): running
Waiting for MM2S synchronization...
Waiting for S2MM sychronization...
Stream to memory-mapped status (0x00001002@0x34): running idle IOC_Irq
Memory-mapped to stream status (0x00001002@0x04): running idle IOC_Irq
Destination memory block: 44332211 7dcddfdf 5a7fefa4 36aa3c9b ca2eea6a 5bf64f81 ebf7ffbb b7f710d2

Note that IOC_Irq signifies that transfer completion interrupt was triggered.

Clocks
Processing system may generate up to 4 clocks

High-speed slave ports (S_AXI_HP0 .. S_AXI_HP1) and associated ports (M00_AXI, S00_AXI, S01_AXI, M_AXI_MM2S,
M_AXI_S2MM) run at 150MHz dictated by FCLK_CLK1. Master in this case means that the bus transfers are initiated by the master
which in this case is the AXI Direct Memory Access component. AXI Interconnect in this case is acting merely as a switch in an ethernet
network multiplexing multiple AXI ports (S00_AXI, S01_AXI) to single M00_AXI.

General-purpose port (M_AXI_GP0) including all AXI Lite slaves (run at 100MHz. In this case Zynq7 Processing System is the transfer
initiator. AXI Protocol Converter similarily to AXI Interconnect allows access to multiple AXI Lite slaves (S_AXI_LITE in this case) via
single AXI Lite master port (M_AXI_GP0) on the Zynq7 Processing System.

LogiCORE IP AXI DMA v7.1, Product Guide PG021

ALSO ON LAURI'S BLOG

Using GHDL to simulate Floating-point


VHDL Gated SR latch VPN benchmarking multiplication

8 years ago • 1 comment 8 years ago • 1 comment 8 years ago • 1 comment 8 years ago • 1 comment

Open-source tools for VHDL Gated SR latch, also known For anyone who were Introduction In computers
seem to be lacking, I am as clocked SR latch or wondering how much real numbers are
mainly using GHDL to … synchronous SR latch or … throughput you might … represented in floating …

You might also like