John Hyde - SuperSpeed Device Design by Example (2014)
John Hyde - SuperSpeed Device Design by Example (2014)
Design By Example
John Hyde
USB Design By Example
EZ-USB, FX3 and GPIF are trademarks of Cypress Semiconductor.
All other trademarks or registered trademarks referenced herein are
the property of their respective owners.
© 2010 The SuperSpeed USB Trident Logo used on the front cover
is a registered trademark of the USB Implementers Forum (USB-IF).
Disclaimers
The information in this document is subject to change without notice
and should not be construed as a commitment by USB Design By
Example or Cypress Semiconductor. While reasonable precautions
have been taken, the author assumes no responsibility for any errors
that may appear in this document. No part of this document may be
copied or reproduced in any form or by any means without the prior
written consent of the author.
USB DESIGN BY EXAMPLE MAKES NO WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, WITH REGARD TO THIS
MATERIAL, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. USB Design By Example reserves the
right to make changes without further notice to the materials
described herein. USB Design By Example does not assume any
liability arising out of the application or use of any product or circuit
described herein.
ISBN: 1500588059
ISBN-13: 978-1500588052
ACKNOWLEDGMENTS
I have been wanting to write a SuperSpeed USB book for
some time so I must first thank Cypress Semiconductor for giving me
this opportunity. Cypress provided excellent support as I worked
through the many examples in the book and I particularly wanted to
thank Dhanraj Rajput, Sai Krishna, Kailas Iyer, Karthic
Sivaramakrishnan, Jegannathan Ramanujam, Venkat
Pattabhiraman, Madhura Tapse, Ed Rebbelo, Mathu Mani,
Manaskant Desai, Akshay Singhal, Mudabir Kabir, Nikhil Naik, Anup
Shivakumar, Eddie Zelaya and Gayathri Vasudevan for their
excellent answers and explanations to my never-ending questions.
Their contributions were made possible by the high profile that this
project was given by Cypress management - Badrinarayanan
Kothandaraman, Mark Fu and Veerappan Rajaram.
To get the most value from this book you should have a
Cypress SuperSpeed Explorer Kit and a CPLD Accessory Board.
Install the FX3 Toolset as described in the SuperSpeed Explorer Kit
User Guide and work through the examples in this book. I designed
them to be reusable building blocks and I encourage you to "copy-
and-paste" to create your prototypes. The SuperSpeed Explorer Kit
Guide is an essential companion for this book – this is a free
doenload from www.cypress.com/fx3 and I suggest that you
download this to your kindle now, then, if you are on the road or
reading this book in a ‘plane, then you will be able to refer to it.
Every USB 3.0 cable has two buses and only one of them is
ever operating at a time, the other is suspended and consuming very
little power. A USB 3.0 device is required to also operate as a USB
2.0 device if it is attached to a USB 2.0 only hub so, in this case, only
the USB 2.0 section would be active. Figure 1.5 is drawn to show
the two separate bus systems. We would prefer the USB 3.0 section
active and the USB 2.0 bus suspended. Let's look at the features of
both buses and determine why we should be using the USB 3.0
wires rather than the USB 2.0 wires. The USB 2.0 portion of USB
3.0 is USB 2.0, pure and simple. It operates the same as it has done
since 2001. We will review some of this operation and discover why
it is not well suited to an energy conserving solution for 2014 and
beyond.
The polling packets, sorry, the interrupt packets and all other
packets are broadcast from the host and are repeated on all
downstream sections as shown in Figure 1.6. The USB 1.0 hub was
a basic repeater and it was assumed that it too was attached to a
power source. USB 2.0 followed the same model but bus traffic
between hubs was always high-speed and the USB 2.0 hub did a
store-and-forward of packets to full and low speed devices.
However packets were always broadcast on all high-speed
downstream ports. Every device on a USB has a unique address.
Each packet contains a device address and all devices on the bus
check this and absorb, or respond to, the packet if it is addressed to
them.
Note too that the USB 1.1/2.0 1 msec Start Of Frame (SOF)
indicator is gone. It is no longer needed. Isochronous transfers that
used this SOF information for synchronization now use timestamp
headers within the data packets.
Figure 1.7 shows the multiple levels that have been defined
and implemented by USB 3.0. I have included the diagram so that
you can appreciate the amount of thought and engineering effort that
many, many experts have put into this to define a robust high-
performance 5Gbps bus. I should mention at this stage that a USB
3.0 implementation also defines a new eXtensible Host Controller
Specification, this is designed to be event driven to match the power
conscious USB 3.0 specification. There are more technical details in
both 300+ page specifications than most people can absorb and
fortunately you don’t need to read either of them, unless you are a
silicon vendor or OS supplier, to be successful designing a
SuperSpeed device. For a review of USB hardware and protocols,
including the many high-level USB 2.0 protocols that also apply to
USB 3.0, I would recommend Jan Axelson’s USB Complete . If
you are a silicon vendor I would recommend that you start with
MindShare’s USB 3.0 Architecture book – it is over 650 pages but
is much easier to read.
I'm not going to explain most of Figure 1.7 since there is little
that will help you use SuperSpeed USB. What I will discuss
however is the piece that you do need to know when implementing a
device which is the right-hand side of the Figure called power
management.
USB 3.0 power management
USB 3.0 power management affects all aspects of USB
system. The host controller architecture hardware and software
drivers are now interrupt or event driven to eliminate polling and
other “busy” work. This is thoroughly documented in the 500-page
xHCI Specification which is a download from Intel's developer site.
The USB communications protocol has been overhauled to eliminate
polling but in an architecturally compatible way so that USB
application software does not have to be rewritten. And we can do
our part by designing power aware SuperSpeed peripherals and this
is described in detail in the next Chapter and throughout this book
using examples. Power management was a fundamental design
criteria of SuperSpeed USB so let's look at some of the details of
how power is conserved while maintaining responsiveness and
performance.
Resistor Value Device Type
Less that 10Ω, less that 1KΩ, 35KΩ to 39KΩ, 65KΩ to 72KΩ,
102KΩ +/- 2%, 119KΩ to 132KΩ, > 220KΩ and 440KΩ +/- 2%.
The next stop on the tour is power modes and the various
“power planes” of the FX3 are shown in Figure 2.11.
The last stop on our tour is the System RAM and the
Distributed DMA Controller. The data paths to and from RAM have
been designed with maximum throughput as the goal. Multiple
Advanced High-performance Buses (AHB, as defined by the ARM
System Architecture) are used to interconnect the system elements.
I have drawn a scale diagram in Figure 2.12 where the width of the
connection is used to show the data throughput available from the
various blocks of the FX3.
Figure 2.15 All high speed and low speed pins are accessible
Figure 2.16 shows a block diagram of this board. There is an
integrated debugger that includes a serial connection and a JTAG
port for debugging. There is also a user LED, a user button and an
I2C EEPROM so that we can do experiments with the basic board.
I’m sure that many readers will applaud discovering that they
will be writing their application on top of a robust RTOS – you may
skip the next section! For those of you who shuddered when you
read the word RTOS, let me describe why this is a good thing . . . . . .
Multi-threading RTOS 101
You will have to learn some new words and concepts to be
successful with a multi-threading RTOS. This will take some effort so
let me first explain the benefit of becoming familiar with these new
terms.
You may have heard the terms task or multi-tasking, what then
is multi-threading? The term task is used in operating system
literature in a variety of ways; it sometimes means a separately
loadable program, it sometimes may refer to an internal program
segment. To avoid this confusion there are two terms that have,
more or less, replaced the use of task: process and thread . A
process is a completely independent program that has its own
address space (the Windows operating system uses this model),
while a thread is a semi-independent program segment that
executes within a shared address space. Most embedded
applications cannot afford the overhead (both memory and
performance) associated with a full-blown process-oriented
operating system. For these reasons, ThreadX implements a thread
model, which is both extremely efficient and practical for most real-
time embedded applications.
When you divide your program into multiple threads you will
decide that some are more important than others and you can assign
these a higher priority. Figure 3.3 shows a multi-threading RTOS
task state diagram (copied from the ThreadX User Guide). As
threads are Created they are placed on the Suspended list or on
the Ready list where the RTOS determines the highest priority
thread and makes this the Executing thread; execution of this
thread continues until it is blocked for some reason (waiting for a
resource, such as an event or a timer) when it is placed on the
Suspended list; the RTOS then places the highest priority thread on
the Ready list as the Executing Thread; and so the process
continues. There is a system-defined thread, the IdleThread , which
has the lowest priority and is always ready to run; this typically
switches the CPU to a low power state, enables interrupts then halts.
while (1); // Get here on a failure, can't recover, just hang here
// Later we shall do something more elegant here
return 0; // Won't get here but compiler wants this!
}
Note in the Main() routine in Figure 3.5 that I told RTOS that I
will use the UART and an IO pin, GPIO_45, for the button. This is
the only example that I don’t use the UART and it was simpler to
claim this from the outset. The user LED on the Explorer board is
connected to the UART signal CTS which the UART driver owns at
start up; I am only planning to use a 2 wire connection to a serial port
so I can give up the CTS control signal for LED use. I then configure
the LED as an output and configure the button as an input.
We then run our main loop which waits for Delay msec then
toggles the LED. The changing Delay value will change the blink
rate of the LED.
API Overview
We interact with the FX3 using the FX3 API described by the
EZ-USB® FX3 +FX3S SDK Firmware API Guide. This document is
about 600 pages long with no obvious structure or prioritization of
information. I don’t expect you to read it, instead, we will work
through a series of examples that highlight specific functions and
attributes of the API. Note that we DO NOT directly access registers
within the FX3 ARM CPU or IO blocks – peeking and poking around
these can lead to disaster. We also don’t need to program in
assembler, those days are gone.
In general, the RTOS has control of the CPU and calls our
user code when appropriate. We should do whatever work is
needed then pass control back to the RTOS as soon as possible.
Remember that the RTOS is running many other threads
“underneath” our code. For example, if our application needs to wait
for, say, 100msec then we SHOULD NOT implement a decrementing
loop counter! Instead we should call the RTOS function
CyU3PThreadSleep(100) which will give control back to RTOS so
that it can get on with other work. The RTOS will return in about
100msec and give us control back. I say ‘about’ since the RTOS
may decide that something more important than the user thread
should be run at this time. We will see later that user threads have
the lowest priority so that the RTOS can guarantee servicing of the
IO block threads.
ThreadX uses this code and some allocated RAM for a stack
when starting a thread. All of the local variables are allocated on the
stack so the code is inherently re-entrant; this will enable you to use
the same code to be started by multiple threads if needed by the
application. Note that thread now has a specific meaning, it
consists of a collection of code bytes that is the program, a collection
of variables that are data bytes on the stack and a data structure,
also on the stack, called the thread context . In this section we will
discuss how ThreadX deals with a resource that must be shared by
several threads and how these threads can communicate with each
other. A shared resource, such as an I2C communications port,
must be protected from being accessed from multiple threads at the
same time. The mechanism that all RTOS’s use is called a mutex
(a concatenation of MUTually EXclusive, meaning one owner) and
this is illustrated as a key in Figure 3.8.
Figure 3.8 A Mutex is used to protect a shared resource
Using the USB Control Center locate, load and run the
Chapter3Example4.img file and observe the operation via the
messages in the console window. To stop the currently running
example to enable loading new firmware, press the reset button on
the board (it’s the button next to the USB 3.0 connector)
Figure 3.17 shows the four queues defined for this example.
Memory buffers are created at step 1 and the memory address of
each buffer is the message that I pass around. Note that I am not
passing the whole data buffer. I assume that ownership of the buffer
address message is sufficient for a thread to own the buffer. This is
an important concept since it means that I can avoid copying large
amounts of data. One thread will fill the buffer then pass ownership
of the buffer via a send message to another thread that would use
the data. There is nothing, of course, preventing you from writing in
the buffer even if you do not own it – this is an easy way to create
‘hard-to-find’ bugs. Remember the convention, if you own the buffer
then you can use it, once you send it to a queue then you should not
read or write the buffer further.
The USB block and PHY on FX3 cannot be put to sleep while
the link is in the U0, U1 or U2 states. This means that there is no real
opportunity to save power in the system while the link goes to U1 or
U2. The only requirement from the firmware is to ensure that the U0
<-> U1 and U0 <->U2 transitions are handled properly.
As per the USB spec, both the host and the device are
enabled to initiate transitions from U0 to U1/U2 or back. Most USB
3.0 hosts (Intel host for example) will move the USB link to the U1
state when it is expecting the next data transfer to be device
initiated.
Note that while the host can do a direct U0 -> U2 transition,
this is not commonly used. U2 entry typically happens from the U1
state. Once the link has been in U1 for longer than a host defined
inactivity timeout, the link will transition to U2 on both device and
host sides. This happens without any actual signaling on the USB
bus.
In the L1/L2 states, the ARM core in the FX3 is placed into the
clock-gated Wait For Interrupt state. The clocks to the other
peripheral blocks on FX3 are also stopped. None of the blocks are
powered off at this stage. If any of the blocks (UART, SPI, I2C etc)
are to be powered off, you have to do this explicitly using the
corresponding de-init API calls.
Chapter Summary
This chapter included a brief look at some of the key features
of the ThreadX RTOS. The interested reader should review Ed
Lamie's book Real-time Embedded Multithreading using ThreadX
, now in its second edition. You will notice that my mutex example is
a variation of the book’s “Speedy and Slow” mutex example.
One goal of this Chapter was to show you that all the RTOS
capability provided with ThreadX is available for your use when
writing your application. You can use as little or as much as you
like. Small applications would use a few threads while large
applications should use many threads. Cypress also supports
advanced users who wish to add their own device drivers.
For those of you who are still not convinced that programming
with an RTOS is a great idea you can, in fact, program the FX3
without using the RTOS. I don’t recommend it but a non-RTOS
example called FX3-Lite (Boot) Firmware Library is described in the
Reference Section.
I assume that you have installed the SDK. If not, do this now
– instructions are in the SuperSpeed Explorer Kit Users Guide which
is included at the end of the Reference Section.
Cypress chose the open source Eclipse environment for the
graphical user interface (GUI) to the firmware development process.
This windowed application manages projects and includes an editor
with syntax highlighting and sophisticated search and reference
capabilities that enable you, for example, to lookup where functions
are declared and referenced. Behind this human interface is a set of
GCC tools (compiler, assembler, linker, locator, etc) that create FX3
object code for execution.
The GCC tools are enormously flexible and can create object
code for a large range of microprocessors and microcontrollers. The
downside of this flexibility is that the tools must be configured to
generate the correct object code! You can either configure the tools
yourself (not recommended but this is explained in FX3
Programmers Guide) or you can start from a pre-configured, working
project and edit it. I suggest that you create a workspace directory
now and import all of the book example code into it. This will give
you a working copy and should things go terribly wrong you will be
able to re-import the examples to restart any project.
Adding Console_In
Input characters from the UART arrive at an unpredictable
(but slow!) rate so we will set up a DMA channel to catch them and
deliver them to the CPU. This may seem like over-kill but we have
way more DMA capability than we will ever use and since it is free
(well, included with the FX3) we may as well use it.
The FX3 has a lot of capability that we haven't used yet (well,
we are only in Chapter 4!) so I decided that allocating one IO pin per
Thread, User Mutex, User Semaphore or User EventGroup was not
being over-indulgent so this is what we shall move foreward with for
this example.
Chapter Summary
This Chapter introduced the Firmware Development portion of
the FX3 Toolset; this uses an Eclipse GUI for program entry and
project management and “back-end” GCC tools to create program
image files that can be executed on an FX3.
1) Note that I use the FX3’s I2S pins as GPIO lines to drive the
JTAG interface of the CPLD, and this allows the FX3 to reprogram
the CPLD. This is described in detail in the Reference Section in
the Programming the CPLD Chapter.
2) I2C and SPI are both fed into the CPLD and the SPI EEPROM
Chip select is derived from the CPLD.
! IMPORTANT NOTE !
When using the CPLD board, jumper J5 on the SuperSpeed
Explorer board should be removed. This disconnects the
onboard SRAM from GPIF lines which is also connected to
the FX3. J2 should be inserted to set VIO to 3.3V.
It may be interesting to note that the board in Figure 5.1 was
not used in the original draft of this book. I used discrete I2C
expander components from Philips as shown in Figure 5.3. The
CPLD board was designed for Chapter 8 but, by adding switches
and LEDs and then reprogramming the CPLD, I was able to produce
a solution for this Chapter that was both cheaper and easier to use.
Figure 5.6 Setting up for a read (on the left) and write (on the
right)
Figure 5.7 shows the equivalent I2C circuit that is inside the
CPLD. It is an I2C slave with its I2C address set in the Verilog code
(I chose 56), an 8-bit input port where switches are attached and an
8-bit output port where LEDs are attached. This is a simple I2C
slave with no sub-addressing and is provided by Xilinx; their Verilog
code is described in the Reference Section in the Writing Your Own
CPLD Code Chapter.
Status = InitializeDebugConsole(9);
CheckStatus("Debug Console Initialized" , Status);
// Remove Reset from the CPLD
CyU3PThreadSleep(10);
CyU3PGpioSetValue (CPLD_RESET, 0);
if (Status == CY_U3P_SUCCESS )
{
Status = I2C_Init();
CheckStatus("I2C_Init" , Status);
SPI Example
My next example uses SPI and here we HAVE to use I2C for
the console since the FX3 uses the same pins for SPI and the
UART. Well this isn't quite accurate; if we were using just 16-bits of
GPIF then both the UART and SPI ports are available but the UART
is moved to a different pin position which is not supported on the
SuperSpeed Explorer board. But remember that this book is about
building a high-performance, low-power, stand-alone, SuperSpeed
device and this would use a 32-bit GPIF connection where you
must decide between UART and SPI. So for the remainder of this
Chapter the example we shall use I2C for the debug console. You
will not notice the difference but when you design your own board
you will need to know this pin allocation limitation of the FX3 and
design around it as I have done.
The next Chapter will look at USB and you will discover that
the DMA hardware also makes this easy for you and the CPU too.
Chapter 6 SuperSpeed USB communications
The USB block presents the same socket interface to the FX3
API so transferring data across the SuperSpeed USB interface is
fundamentally the same as transferring data across the UART or SPI
interfaces so there is not a lot more to learn here. There are 32
sockets that match up with the 32 endpoints so we can have several
conversations going on at the same time. To get maximum USB
transfer speed we need to generate or consume data at 400 MBps
and the UART, SPI or even CPU cannot maintain this rate. GPIF II
is designed to sustain this rate and we shall do this in the next
Chapter but I wanted to start easy and grow into the higher speed.
default :
break ;
}
}
CyBool_t LPMRequest_Callback (CyU3PUsbLinkPowerMode link_mode)
{
return CyTrue;
}
// Spin up USB, let the USB driver handle enumeration
CyU3PReturnStatus_t InitializeUSB (void )
{
CyU3PReturnStatus_t Status;
Status = CyU3PUsbStart ();
CheckStatus("Start USB Driver" , Status);
// Setup callbacks to handle the setup requests, USB Events and LPM Requests (for
USB 3.0)
CyU3PUsbRegisterSetupCallback(USBSetup_Callback, CyTrue);
CyU3PUsbRegisterEventCallback (USBEvent_Callback);
CyU3PUsbRegisterLPMRequestCallback (LPMRequest_Callback);
// Driver needs all of the descriptors so it can supply them to the host when requested
Status = SetUSBdescriptors();
CheckStatus("Set USB Descriptors" , Status);
/* Connect the USB Pins with super speed operation enabled. */
Status = CyU3PConnectState (CyTrue, CyTrue);
CheckStatus("Connect USB" , Status);
return Status;
}
I start the USB driver using CyU3PUsbStar t () and this will
set up all the USB hardware, several DMA buffers and a thread to
handle almost all of the work. We need to register three callback
routines with the driver where we can customize the operation of the
driver for this particular application. The
CyU3PUsbRegisterSetupCallback() tells the driver how to handle the
set up requests that will arrive on EP0; in general you set the second
parameter to CyTrue which tells the driver to handle all set up
requests, such as the many received during enumeration, itself. It
will then only call the USBSetup_Callback routine for Class and
Vendor requests that it cannot handle.
void StopApplication(void )
// USB connection has been lost, time to stop the application running
{
CyU3PEpConfig_t epConfig;
CyU3PReturnStatus_t Status;
glIsApplicationActive = CyFalse;
// Close down and disable the endpoint then close the DMA channel
CyU3PUsbFlushEp (CY_FX_EP_CONSUMER);
CyU3PMemSet ((uint8_t *)&epConfig, 0, sizeof (epConfig));
Status = CyU3PSetEpConfig (CY_FX_EP_CONSUMER, &epConfig);
CheckStatus("Disable Producer Endpoint" , Status);
Status = CyU3PDmaChannelDestroy (&glCPUtoUSB_Handle);
CheckStatus("Close USBtoCPU DMA Channel" , Status);
}
The USB driver does all of the heavy lifting for us and the
DMA driver moves data to where we need it. These two building
blocks make it straight forward to communicate with a host PC using
SuperSpeed USB allowing you to focus on the requirements of your
application and not on the low level details.
CDC Example
My next example builds a tool that we can use in later
examples. You may also find it useful while developing your
applications. This example also uses an OS-supplied class driver
and, from an implementation point of view, it is similar to the
keyboard example. The Communications Driver Class (CDC) is
used by modems and terminal emulation programs such as Clear
Terminal and TeraTerm to move serial data between the host and a
serial device. By itself it is not too interesting but when combined
with other USB interfaces, as we shall do with the next example, it
will be a valuable addition to our tool chest. Figure 6.7 shows the
operation of this example.
glIsApplicationActive = CyFalse;
// Close down and disable the endpoints then close the DMA channels
CyU3PUsbFlushEp (CY_FX_EP_CONSUMER);
CyU3PUsbFlushEp (CY_FX_EP_PRODUCER);
CyU3PUsbFlushEp (CY_FX_EP_INTERRUPT);
CyU3PMemSet ((uint8_t *)&epConfig, 0, sizeof (epConfig));
Status = CyU3PSetEpConfig (CY_FX_EP_CONSUMER, &epConfig);
CheckStatus("Disable Consumer Endpoint" , Status);
Status = CyU3PSetEpConfig (CY_FX_EP_PRODUCER, &epConfig);
CheckStatus("Disable Producer Endpoint" , Status);
Status = CyU3PSetEpConfig (CY_FX_EP_INTERRUPT, &epConfig);
CheckStatus("Disable Interrupt Endpoint" , Status);
#if (DirectConnect)
Status = CyU3PDmaChannelDestroy(&glUSBtoUART_Handle);
CheckStatus("Close USBtoUART DMA Channel" , Status);
Status = CyU3PDmaChannelDestroy(&glUARTtoUSB_Handle);
CheckStatus("Close UARTtoUSB DMA Channel" , Status);
#else
Status = CyU3PDmaChannelDestroy (&glUSBtoCPU_Handle);
CheckStatus("Close USBtoCPU DMA Channel" , Status);
Status = CyU3PDmaChannelDestroy (&glCPUtoUSB_Handle);
CheckStatus("Close CPUtoUSB DMA Channel" , Status);
CyU3PMemFree (UserBuffer.buffer );
#endif
}
BulkLoop Firmware
The firmware will decide how big a chunk is. To receive data
from an OUT endpoint we must set up a DMA channel to receive
data: we choose the USB OUT endpoint as a Producer Socket then
select a BufferSize and BufferCount. Chunk is measured in bytes
and is (BufferSize * BufferCount). Once the DMA controller has
accepted chunk bytes from the Producer Socket then further
attempts by the host to send data will be NAKed. What does the
DMA controller do with this data?
Streamer firmware
There are three examples of Streamer firmware, they are the
same code but use bulk, isochronous or interrupt endpoints. A
Streamer is an infinite source and sink of data. It pre-fills the DMA
buffers with known data and keeps them staged at an IN endpoint
Consumer Socket. It fills buffers with the data that arrives at an OUT
endpoint Producer Socket then discards the data so that it can
recycle the buffer as fast as possible.
The DMA channels are set up as MANUAL but since the CPU
is doing no real work on the data, and is just recycling buffers, then it
can easily keep up with USB 3.0 data rates. It is expected that the
PC is using asynchronous, overlapped transfers to get the maximum
data throughput rates. This firmware is useful in initial testing of PC
application software before the device hardware is ready. Once the
GPIF interface is up and running that it can stream data faster than
the PC can keep up with and we will study this in the next chapter.
Other examples
I put USB-to-GPIF in this category and we will cover this in
later chapters. There are also examples that I don't cover in this
SuperSpeed Device book. The FX3 also runs at full speed (480
Mbs) and at this speed also supports USB host mode including
OTG. Cypress has worked examples covering these modes and
applications notes that describe the implementation details (see
AN77960 Introduction to EZ-USB® FX3™ High-Speed USB Host
Controller ). Note that the FX3 does not include a root hub which
means that it can only talk to a single device (this does not include a
hub). For point-to-point communications, such as being an OTG
host to an Android phone for example, then this is fine. You can find
several FX3 to Android projects in Unboxing Android USB by
Rajaram Regupathy.
Chapter Summary
In this Chapter we looked at connecting the FX3 to a
SuperSpeed USB bus and we wrote several firmware programs that
used Class Drivers on the PC host; this enabled us to focus on the
FX3’s device role and the firmware we needed to write for successful
communications. The USB and DMA device drivers do most of the
‘grunt’ work and we used a high level API to access these drivers –
this enabled us to focus on the application function and not on low
level USB issues. Cypress supplies a collection of example
programs which will help you with your projects.
Figure 7.2 shows the Visual Studio projects available with the
SDK that we will be exploring in the next two chapters. Cypress-
supplied software and drivers are shown in red and my examples are
shown in blue.
Figure 7.3 shows the code used to look for an FX3 device
(actually any device that the CyUsb3.sys driver recognizes by its
GUID). The important line is number 16; here we will create a new
instance of CCyUSBDevice. The CyAPI library searches all
attached USB devices for those that match the CYUSBDRV_GUID
and populates USBDevice objects with information it extracts from
each device. The device object is extensive and its structure can be
viewed in CyAPI.h; it contains properties and methods that we can
access. The method USBDevice->DeviceCount() retrieves a count
of the matching devices.
Figure 7.4 shows the few lines added to check that the
discovered device is running bulkloop firmware and then uses the
USBDevice->BulkOutEndPt->XferData method to download some
test data to the board. We have discovered, opened, written to and
closed the FX3 device in less than 10 lines of code. I said that this
wouldn’t be difficult!
#include "stdafx.h"
printf("\nSendFile V0.3\n" );
printf("\nUse CR to EXIT\n" );
// The DOS box typically exits so fast that the developer doesn't see any messages
// Hold the box open until the user enters a character, any character
while (!_kbhit()) { }
return 0;
}
CollectData
Our next example is a GUI-based CollectData program,
shown in Figure 7.6, and this will use overlapped USB transfers to
get maximum throughput from the SuperSpeed Explorer board.
Let me first explain what the program does and then we will
study how it does it. CollectData uses the same technique as
SendFile to identify FX3 based devices, however this time we are
looking for a Streamer interface rather than a bulkloop interface.
The program discovers any device that matches the CyUsb3.sys
GUID but it is designed to operate with a streamer interface.
We can choose to receive the data and discard it and this will
give us maximum throughput numbers; I included this as a debug
aid. I intend to save the data from the FX3 device into a file that we
can later examine. Writing the data to a disk file will not be able to
keep up with SuperSpeed data transfer rates and some data will be
dropped; we will study which data is dropped and why in a later
chapter. There is a time limit data for file transfers since this
program can quickly fill up your hard drive if left running for a few
minutes – I suggest setting this to 30 seconds. When the Start
button is clicked the program gets ready to receive data then signals
the FX3 to send data. Data is then received and saved, as best it
can, to disk. The program calculates and displays the rate of data
saved (not data received) and this value will be a performance metric
for your hard disk system.
Assuming that the stop button has not been pressed nor the
time limit expired, the thread immediately resubmits the buffer with
another BeginDataXfer(). This keeps the overlapping transactions
queue for an all layers of the USB driver stack busy with work. We
continue around and around this loop until the stop condition is true.
Once stop is requested we send another vendor command to the
SuperSpeed Explorer board so that it can stop generating data and
then we wait for the DiskWrite thread to finish the backlog of buffers
then give control back to the user.
Cypress PC Utilities
Within the original Cypress SDK installation three PC utilities
BulkLoop, Streamer and USB Control Center were installed. The
source code for these utilities is also included for your review, in fact,
both C++ and C# implementations of BulkLoop and Streamer are
available. USB Control Center makes heavy use of forms so is only
supplied in C#. BulkLoop uses synchronous transfers, Streamer
uses asynchronous, overlapped transfers and the USB Control
Center can talk to all devices supported by CyUsb3.sys.
BulkLoop Utility
A BulkLoop device is identified by a VID_PID combination of
0x04B4_ 0x00F0 and its base structure and human interface are
shown in Figure 7.9.
Streamer utility
A Streamer device is identified by a VID_PID combination of
0x04B4_ 0x00F1 and its base structure and human interface are
shown in Figure 7.10.
The USB 3.0 LoopBack plug comes with test software similar
to that described in this chapter but with added features such as
logging as shown in Figure 7.14. Multiple units can be attached to a
PC so that all the USB ports can be tested at the same time.
Passmark have stress test software that supports development and
burn in testing.
Producers
Whereas AbstractProducer defines the interface for
producers, it doesn’t define any actual capability. That is left to the
three concrete producers, shown in Figure 8.9. Throughout the code
and this document, a Producer or source always puts data into a
buffer (writes) and a Consumer or sink always removes data
(reads) from a buffer.
Figure 8.9 Concrete Producers
MemoryProducer is the simplest of the three and the
fastest. Upon construction, it simply fills a buffer of the appropriate
size (determined by the BytesPerWrite variable) and constantly stuffs
that memory block into its circular buffer, as fast as possible. “As
fast as possible” depends on many things, including the specifics of
your system, the number of bytes written per write, and the size of
the buffer. The latter is a consequence of the overhead associated
with putting data into the synchronized circular buffer – locking the
buffer, writing one byte, updating the state, and unlocking the buffer
will generate much worse performance than writing a much larger
chunk. Write sizes in the megabytes-per-operation have significantly
better performance.
Consumers
Consumers, as shown in Figure 8.10, are completely
symmetric with producers in function. Consumers pull data out of
circular buffers and do something with the data. MemoryConsumers
simply drop the data on the floor. They typically have the capacity to
stay well ahead of most producers. FileConsumers take incoming
data and write it to disk. Due to system buffering on files,
FileConsumers can often sustain very high throughputs for short
durations of a few seconds, but ultimately slow down as buffers fill
faster than disks can write – unless you have a very fast SSD
device, a FileConsumer typically cannot keep up with a USB 3.0
producer.
OverlappedIO
The OverlappedIO class, shown in Figure 8.11, is central to
throughput maximization with the Cypress library. When top speed
is of no concern, I recommend that you use the blocking XferData()
function to read or write to an endpoint as I did in SendFile since this
is MUCH easier to use and is still fast. For many applications, the
throughput and latency involved in using synchronous I/O is
sufficient. However, if your processing needs are such that
XferData() is not good enough, and it certainly isn’t for system
benchmarking, then asynchronous I/O is necessary.
OverlappedIO functions:
BeginTransfer() and BeginTransfer(begin, end) – These
functions initiate a USB transaction. An OverlappedIO object can be
attached to either a bulk IN endpoint or a bulk OUT endpoint – the
library keeps track of which is active through the internal
CCyUSBEndPoint data member with which the OverlappedIO class
was constructed. Calling the version with no arguments initiates an
asynchronous read from USB, while calling the iterator version first
copies the sequential chunk of data from (begin, end) (using the
same open set notation of the C++ standard library, where begin
points to the first byte of data, end points to one past the last byte)
and starts a write transfer. Internal variables are set such that the
OverlappedIO instance knows that a transfer has started, hanging on
to references to the Windows overlapped I/O structure and the
Cypress completion token necessary to abort or finish the transfer.
USB Engine
The USBEngine class is the “traffic-cop” for the application. It
manages the creation of producers and consumers, links them up to
their respective buffers, starts the underlying threads on which each
runs, monitors the throughput, and periodically reports to any
interested party on the progress of the test. The USBEngine has the
most complex class diagram because of these multiple
responsibilities, as shown in Figure 8.12.
Acknowledgement:
The last major FX3 block to learn is the GPIF II block, which I
shall abbreviate to just ‘GPIF’. I have left this until last since it takes
a while to get your head around its basic functionality let alone it is
amazing capabilities. By now you should be comfortable with the
FX3’s DMA engine and the API used to control it. You will learn a
little more about the DMA engine in this Chapter as I expose some
more features that it has but we haven't used until now.
Figure 9.1 shows the CPLD board that was designed for this
Chapter and was re- purposed in Chapter 5 to demonstrate low
speed IO capability. The board contains a Xilinx XC2C128 CPLD
connected to all of the FX3 high-speed DQ and DQ control lines. I
have included a CPLD programmer project in the Reference Section
that enables the FX3 to reprogram the CPLD using its JTAG
connection. No other hardware is required.
module Counter1(
input PCLK,
input RESET,
output reg WR_n,
output [31:0] DQ,
output [7:0] LED
);
reg [31:0] Counter;
assign LED = ~Counter[31:24];
assign DQ = Counter;
Setting up GPIF II
GPIF II is a soft-loaded state machine that powers up in the off
state. To get GPIF to do useful work we must program it and this is
done using an external tool called GPIF II Designer. Creating a state
machine for the GPIF is at the opposite end of the scale as writing
equations with Verilog. With GPIF designer you create state
machine pictures using a graphical editor where states are drawn in
boxes and transitions are drawn as lines between these boxes.
Actions are assigned to each state. GPIF has a 32-bit address
counter, a 32-bit data counter and 16-bit control counter and
matching comparators that you can use. A state action could include
incrementing one of these counters, setting an IO pin, reading or
writing from the GPIF pins, reading or writing from a DMA socket or
interrupting the CPU. A state transition could be a comparison from
one of these counters, the value of an IO pin, the value of a DMA
flag (we will add this in the next iteration of the example) or a signal
from the CPU. You can define up to 256 states which should be way
more than anyone will use. You can also implement several
independent state machines within the structure providing that there
are no more than 256 total states. One implementation restriction
that we will hit later in this Chapter is that only one or two conditions
can be evaluated for a state transition. You can design around this
using extra or mirror states and the tool will help you construct
these. There is a lot more that I could say but rather than repeat a
lot of text here I refer you to Getting Started with GPIF II Designer .
Design Stage 1
Set the limit to 10 seconds and click START as shown in
Figure 9.6.
Most PCs can keep up with the initial data but then buffers in
the PC get full and start to be over-written (CollectData was
designed this way, it collects data as fast as it can from USB at the
expense of over-writing buffers before they have been written to
disk). You need to run for 20-30 seconds to see the typical
throughput but this generates enormous files that Excel can’t open.
So locate and run a utility called CheckData – this looks through the
CollectData.bin file and reports discontinuities in the data. Click,
drag and drop the data file onto CheckData.exe.
The large gaps are caused by the USB transfer not being able
to keep up with the CPLD’s 400 MBps data rate. The only solution is
to reduce the data rate so that it is less than the average data rate as
reported by the CollectData application. We could put some data
compression at our data source; this is viable in an FPGA design but
there isn't the capacity in our small CPLD so I will take the simpler
approach of reducing PCLK.
At the debug console enter the keyword PCLK and the FX3
will display its current value. You can now enter PCLK+ or PCLK- to
increase and decrease the clock driving the CPLD state machines
which will have the effect of changing the data rate of the
incrementing counter. Figure 9.7 shows the code behind this PCLK
command. You should run this example several times until the data
rate is low enough that your host computer can keep up. Use
different save filenames to collect the data if you would like to
compare results.
Design stage 2
The small gaps of non-monotonic counter data in
CollectedData.txt are due to the latency when switching DMA buffers
at the GPIF block. We drop between 50 and 150 counts as the
buffers are switched. The FX3 solves this with more hardware which
Cypress unfortunately calls a thread. To distinguish this feature from
the RTOS threads that we were described in Chapter 3, I shall refer
to these new threads as hardware threads throughout this Chapter.
dmaMultiConfig.cb = DualGpifToUsbDmaCallback;
Status = CyU3PDmaMultiChannelCreate(&glDualGPIF2USB_Handle,
CY_U3P_DMA_TYPE_AUTO_MANY_TO_ONE, &dmaMultiConfig);
CheckStatus("DmaMultiChannelCreate", Status);
// Start the DMA Channel with transfer size to Infinite and with PING (Offset = 0)
Status = CyU3PDmaMultiChannelSetXfer(&glDualGPIF2USB_Handle, 0, 0);
CheckStatus("DmaMultiChannelStart", Status);
Design stage 3
You may want more proof that no data is being lost so in this
stage we give the CPLD access to the DMA flags. This simplifies
the GPIF state machine as shown in the top portion of Figure 9.11
but shifts the complexity to the CPLD as shown in the bottom portion
of Figure 9.11. Design of a GPIF interface to external hardware is an
iterative process since these two units cooperate in solving the
problem.
module CPLDinControl(
input ClockIn, // From FX3
input RESET, // 0 = resets counter, 1 = Supply data
input DMA0_Ready, // 0 = can accept data, 1 = busy and samples will be missed
input DMA1_Ready, // 0 = can accept data, 1 = busy and samples will be missed
output [31:0] GPIF, // CPLD drives a counter onto GPIF
output WR_N, // 0 = no data sent, 1 = sample data being sent
output SelectDMA, // CPLD chooses FX3 DMA Buffer (actually, Thread)
output [7:0] LED // Some user feedback
);
end
endmodule
Chapter Summary
The story at both customer was the same – it was close to the
end of the project and the FX3 firmware writer was running short of
RAM so he changed the endpoint burst size from 16 to 8 and this
reduced DMA_BUFFER_SIZE from 16KB to 8KB. But he didn’t tell
the FPGA designer. The system now started to have data errors,
they looked everywhere. Nobody suspected the FPGA since “that
has been working correctly for months”. Having a software
dependency in the hardware is never a good thing.
If you have good documentation and a good team process
then using a counter in the CPLD/FPGA is a solid, simple approach.
There is a better way.
The FX3 starts the process by releasing reset from the CPLD
(how is explained in a few paragraphs). The GPIF slave, which is
using socket 0, has already started and it is in the WAIT4WR state
waiting to be told that there is a valid data on the DQ data lines that it
should capture. So let's look at how the CPLD master does this.
Figure 11.4 GPIF and CPLD State Machines for a Slave FIFO
Read
We measured the time it takes for a DMA buffer to become
available in Chapter 9 – this was about 70 100 MHz clocks. We also
learnt in Chapter 9 that using two DMA sockets and hardware
threads could reduce this delay to 0 and this includes the 3 clock
delay on DMA0_Ready. I decided to keep this first example simple
so I am only using one socket. Note that a 70 clock delay with 4096
burst is less than 0.05% degradation.
The state machines continue around their main loops until the
user button is pressed again which toggles RUN. The CPLD state
machine moves to the STOP state where it also asserts LastData.
The GPIF state machine sees LastData asserted so then moves to
its SIGNAL state where an interrupt into the FX3 CPU is generated
which results in tidying up the final transfer.
Figure 11.5 Verilog code for the CPLD counter and state
machine
Code in top.v
module top(
inout [12:0] CTRL, [31:0] DQ,
input PCLK,
inout [7:0] User, I2C_SCL, I2C_SDA,
input [7:0] Button, GPIO45_n, SPI_SCK, SPI_SSN, RX_MOSI,
output [7:0] LED, TX_MISO, FlashCS_n, INT, TP_2
);
Endmodule
Code in FifoMasterCounter.v
module FifoMasterCounter(
input PCLK, RESET, DMA0_Ready, DMA0_Watermark, PushButton,
output WR, LastData, [31:0] DQ, [7:0] LED, [7:0] User
);
// Generate a RUN signal from the PushButton; PushButton presses toggle RUN
// Note that this creates a different clock domain but this is OK
reg RUN;
always @ (negedge PushButton or posedge RESET) begin
if (RESET) RUN<=0; else RUN <= ~RUN;
end
// Define a State Machine for CPLD as FIFO Master, use one hot encoding
reg [4:0] CurrentState, NextState;
parameter IDLE = 5'b00001;
parameter WAIT4DMA = 5'b00010;
parameter WRITE = 5'b00100;
parameter PAUSE = 5'b01000;
parameter STOP = 5'b10000;
// Display internal variables on User port for debug
assign User = { CurrentState, RUN };
// Output signals are dependent upon the state machine
assign WR = (CurrentState == WRITE);
assign LastData = (CurrentState == STOP);
endmodule
// Start GPIF clocks, they need to be running before we attach a DMA channel to GPIF
pibClock.clkDiv = 4;
pibClock.clkSrc = CY_U3P_SYS_CLK ; // 400/4 = 100MHz
pibClock.isHalfDiv = 0;
pibClock.isDllEnable = CyFalse; Status = CyU3PPibInit (CyTrue, &pibClock);
CheckStatus("Start GPIF Clock" , Status);
// Create a MANUAL channel since I need to look for the last packet
CyU3PMemSet ((uint8_t *)&dmaConfig, 0, sizeof (dmaConfig));
dmaConfig.size = (EpSize[usbSpeed] * ENDPOINT_BURST_LENGTH);
dmaConfig.count = 4;
dmaConfig.prodSckId = GPIF_PRODUCER_SOCKET;
dmaConfig.consSckId = CONSUMER_ENDPOINT_SOCKET;
dmaConfig.dmaMode = CY_U3P_DMA_MODE_BYTE ;
dmaConfig.notification = CY_U3P_DMA_CB_CONS_SUSP |
CY_U3P_DMA_CB_CONS_EVENT |
CY_U3P_DMA_CB_PROD_EVENT ;
dmaConfig.cb = GpifToUsbDmaCallback;
Status = CyU3PDmaChannelCreate (&glGPIF2USB_Handle,
CY_U3P_DMA_TYPE_MANUAL , &dmaConfig);
CheckStatus("DmaChannelCreate" , Status);
Code in top.v
module top(
inout [12:0] CTRL, [31:0] DQ,
input PCLK,
inout [7:0] User, I2C_SCL, I2C_SDA,
input [7:0] Button, GPIO45_n, SPI_SCK, SPI_SSN, RX_MOSI,
output [7:0] LED, TX_MISO, FlashCS_n, INT, TP_2
);
// Need to assign inputs else they get optimized away
assign TP_2 = RX_MOSI & SPI_SCK & SPI_SSN & I2C_SCL & I2C_SDA;;
// Assign fixed outputs not used in this example
assign FlashCS_n = 1'b1;
assign INT = 1'b0;
assign TX_MISO = 1'bZ;
// Include the Counter
FifoMasterCounter Counter (
.PCLK(PCLK),
.RESET(CTRL[10]),
.DMA1_Ready(CTRL[6]),
.DMA1_Watermark(CTRL[6]),
.PushButton(GPIO45_n),
.RD(CTRL[1]),
.LastData(CTRL[3]),
.DQ(DQ),
.LED(LED),
.User(User)
);
endmodule
Code in FifoMasterCounter.v
module FifoMasterCounter(
input PCLK, RESET, DMA1_Ready, DMA1_Watermark, PushButton, LastData, [31:0]
DQ,
output RD, [7:0] LED, [7:0] User
);
// Define our counter which will provide data
reg [31:0] Counter;
// Display data movement during transfers
assign LED = ~DQ[31:24];
// Generate a RUN signal from the PushButton; PushButton presses toggle RUN
reg RUN;
always @ (negedge PushButton or posedge RESET) begin
if (RESET) RUN<=0; else RUN <= ~RUN;
end
// Define a State Machine for CPLD as FIFO Master, use one hot encoding
reg [4:0] CurrentState, NextState;
parameter IDLE = 5'b00001;
parameter WAIT4DMA = 5'b00010;
parameter READ = 5'b00100;
parameter PAUSE = 5'b01000;
parameter STOP = 5'b10000;
// Display internal variables on User port for debug
assign User = { CurrentState, RUN };
// Output signals are dependent upon the state machine
assign RD = (CurrentState == READ);
always @ (posedge PCLK or posedge RESET) begin
if (RESET) begin
CurrentState <= IDLE;
Counter <= 0;
end
else begin
CurrentState <= NextState;
end
end
// Calculate next state using combinational logic
always @ (*) begin
// Default is to stay in the current state
NextState = CurrentState;
case (CurrentState)
IDLE: if (RUN) NextState = WAIT4DMA; //else NextState = IDLE;
WAIT4DMA: if (DMA1_Ready) NextState = WRITE; else
if (~RUN) NextState = IDLE; // else NextState = WAIT4DMA;
READ: if (~RUN) NextState = STOP; else
if (DMA1_Watermark) NextState = PAUSE; // else NextState = WRITE;
PAUSE: if (~DMA1_Ready) NextState = WAIT4DMA; // else NextState = PAUSE;
STOP: if (!RUN) NextState = IDLE;
default: NextState = IDLE; // Should never get here
endcase
end
ndmodule
The last step in this Slave FIFO example set is to make the
data transfer bi-directional which means combining the two
examples; I had to extend the names of a few of the signals. Figure
11.11 shows the combined interface signals and Figure 11.12 shows
the combined state machines. The matching Verilog code is not
shown since it is a concatenation of the two previous examples but
note that the state machine had to be extended to 7 bits to allow for
the additional states. The code is available in the CPLD Code
examples folder for review. There are two sockets needed for the bi-
directional transfer and Figure 11.11 shows that I am using sockets 0
and 1 and the matching hardware threads 0 and 1. The combined
DMA initialization did not include anything special so I decided to
save space and not present a Figure. It too is available in the
examples directory as GPIF_Example6 project for review.
Figure 11.11 Interface signals for Slave FIFO Read and Write
Figure 11.12 State machines for Slave FIFO Read and Write
The CPLD already includes the bi-directional code so there is
no need to reprogram it at this time. Load and run
GPIF_Example6.img using the USB Control Center. I tried to
demonstrate bi-directional data transfer by interleaving CPLD reads
and writes on alternate 32-bit data samples but the performance was
so poor due to the many additional states needed to turn the bus
around that I was embarrassed to include it as an example. The
typical use of this bi-directional interface is to sometimes move a lot
of data in one direction and then move a lot of data in the other
direction. This is how a hard disk drive or multifunction device, such
as a printer/scanner operates and this performance is exceptional;
we shall see this in a moment.
You can use CollectData to read from the CPLD or the USB
Control Center to write to the CPLD as in the previous two
examples. The FX3 project can run both Slave FIFO Read and
Write cycles, but you need to select which the CPLD is going to do
and I did this with Switch 6.
Figure 11.15 shows the GPIF state machine and CPLD state
machine. I decided for this example, to let the GPIF state machine
count cycles to determine when the DMA buffer is full so that you
could see an alternative solution. Since DMA_BUFFER_SIZE is a
global FX3 project constant then we should not get tripped up as
with the Slave FIFO case.
Figure 11.15 GPIF and CPLD state machines for Master FIFO
Read
The FX3 code is the same as the Slave FIFO Read example
with the only difference being a different GPIF state machine was
included. So, the FX3 firmware nor the PC know that the GPIF
interface is now operating as a master.
Code in FifoMasterCounter.v
module top(
inout [12:0] CTRL, [31:0] DQ,
input PCLK,
inout [7:0] User, I2C_SCL, I2C_SDA,
input [7:0] Button, GPIO45_n, SPI_SCK, SPI_SSN, RX_MOSI,
output [7:0] LED, TX_MISO, FlashCS_n, INT, TP_2
);
// Need to assign inputs else they get optimized away
assign TP_2 = RX_MOSI & SPI_SCK & SPI_SSN & I2C_SCL & I2C_SDA;;
// Assign fixed outputs not used in this example
assign FlashCS_n = 1'b1;
assign INT = 1'b0;
assign TX_MISO = 1'bZ;
// Include the Counter
FifoSlaveCounter Counter (
.PCLK(PCLK),
.RESET(CTRL[10]),
.PushButton(GPIO45_n),
.RD(CTRL[1]),
.LastData(CTRL[3]),
.DQ(DQ),
.LED(LED),
.FIFO_Empty(CTRL[6]),
.User(User)
);
Endmodule
Code in FifoSlaveCounter.v
module FifoSlaveCounter(
input PCLK, RESET, PushButton, RD,
output LastData, FIFO_Empty, [31:0] DQ, [7:0] LED, [7:0] User
);
// Define our counter which will provide data
reg [31:0] Counter;
assign DQ = Counter;
// Display data movement during transfers
assign LED = ~DQ[31:24];
// Generate a RUN signal from the PushButton; PushButton presses toggle RUN
reg RUN;
always @ (negedge PushButton or posedge RESET) begin
if (RESET) RUN<=0; else RUN <= ~RUN;
end
// Define a State Machine for CPLD as FIFO Master, use one hot encoding
reg [4:0] CurrentState, NextState;
parameter IDLE = 5'b00001;
parameter WAIT4RD = 5'b00010;
parameter READ = 5'b01000;
parameter STOP = 5'b10000;
// Display internal variables on User port for debug
assign User = { CurrentState, RUN };
// Output signals are dependent upon the state machine
assign FIFO_Empty = (CurrentState == IDLE);
assign LastRDData = (CurrentState == STOP);
endmodule
A BIG topic that I did not cover was the use of an FX3 as an
attached processor. You can connect a “main” CPU directly onto the
GPIF interface and use the FX3 as an intelligent sub-system. This is
a LARGE topic and the schedule did not allow for this to be included
in this First Edition; it will be in the Second Edition.
// Need to assign all the inputs else they get optimized away
wire UnusedUser1 = User[0] & User[1] & User[2] & User[3];
wire UnusedUser2 = User[4] & User[5] & User[6] & User[7];
wire UnusedCtrl1 = CTRL[0] & CTRL[1] & CTRL[2] & CTRL[3] & CTRL[4] & CTRL[5];
wire UnusedCtrl2 = CTRL[6] & CTRL[7] & CTRL[8] & CTRL[9] & CTRL[11] & CTRL[12];
wire UnusedOther = RX_MOSI & SPI_SCK & SPI_SSN & GPIO45_n;
assign TP_2 = UnusedCtrl1 & UnusedCtrl2 & UnusedOther & UnusedUser1 &
UnusedUser2;
// Assign fixed outputs not used in this example
assign FlashCS_n = 1'b1;
assign INT = 1'b0;
assign TX_MISO = 1'bZ;
// Using CTRL[10] to RESET the CPLD
assign RESET = CTRL[10];
// Both modules output to the LEDs, use Button[7] to select which module has control
wire [7:0] I2C_LEDs;
wire [7:0] Counter_LEDs;
assign LED = Button[7] ? I2C_LEDs : Counter_LEDs;
i2c_module i2c_slave (
.scl(I2C_SCL),
.i2c_rst(RESET),
.sda_in(sda_in),
.gpio_input_pins(Button),
.ack_out(ack_out),
.out_en(out_en),
.sda_out(sda_out),
.gpio_output_pins(I2C_LEDs));
Endmodule
1. Full USB device (peripheral) mode support: USB 2.0 and 3.0
2. GPIO support (simple GPIOs only).
3. I2C, SPI and UART support.
4. GPIF-II and PMMC support for connection to external devices.
5. DMA support: Low level DMA access without any direct DMA
channel support.
A separate naming convention (CyFx3Boot) is used for APIs
in this library to distinguish them from the full RTOS based
firmware solution. As the emphasis is on low memory footprint,
the APIs provided at low level calls that will require the user to do
most of the application implementation. Two application examples
are provided in the Cypress examples installation; they are
prefixed with Fx3BootApp .
As this library does not make use of any RTOS or threads, it
expects that the user will call the relevant APIs from the main
processing loop. The drivers for each module register interrupt
handlers for the corresponding interrupts, and provide callbacks to
notify the application about events of interest. These callbacks are
invoked from the ISR itself, and the user application will need to
defer their processing to the main loop as and when required.
USB API
Only the USB device (peripheral) mode of operation is
supported in this firmware library. The USB APIs provide full-
featured USB 2.0 and 3.0 device support.
This library also supports a seamless transition to a full FX3
library based application, without a USB re-connect. This feature
(no re-enumeration) facilitates use cases where the system
requires firmware loading through the USB host without the
overhead of multiple USB connections and driver binding.
GPIO APIs
The library supports selection of any FX3 IO as a simple
GPIO, configuring the IO pin and IO state get/set functionality.
Complex GPIOs and GPIO interrupts are not supported.
UART APIs
The UART APIs in the FX3 lite library support UART
transmit and receive operations in register and DMA modes. A
DebugPrint equivalent function is provided for logging as well.
UART interrupts are not enabled, and the user will need to poll for
events of interest.
I2C APIs
The I2C APIs in the FX3 lite library support I2C functionality
similar to that provided in the full FX3 firmware library, with the
exception of I2C interrupt support.
SPI APIs
The SPI APIs in the FX3 lite library support SPI functionality
similar to that provided in the full FX3 firmware library, with the
exception of SPI interrupt support.
GPIF-II APIs
The GPIF-II APIs in the FX3 lite library support GPIF-II
configuration and access functions similar to that provided in the
full FX3 firmware library. These APIs use the same structure
definitions that are used in the full library, so that the GPIF-II
designer generated configurations can be used directly.
PMMC APIs
The library also supports the PMMC (Pseudo MMC or MMC
Slave) mode of operation of the FX3’s P-Port block. The selection
between GPIF-II and PMMC mode has to be made prior to
initializing the PIB block.
DMA APIs
The FX3 Lite library provides a set of low level DMA
configuration and access functions that can be used to implement
complex DMA use cases. The APIs provided deal with the
configuration and access of DMA building blocks like descriptors
and sockets. No channel level APIs are provided in order to keep
the memory footprint low.
// Now wait for filled buffers to be send to the Queue and forward them to the I2C Block
while (1)
{
Q_Status = CyU3PQueueReceive(&I2C_DebugQueue, &FilledBuffer, PollDelay);