FPGA Implementation of A Simple 3D Graphics Pipeline
FPGA Implementation of A Simple 3D Graphics Pipeline
Department of Cybernetics and Biomedical Engineering, Faculty of Electrical Engineering and Computer
Science, VSB–Technical University of Ostrava, 17. listopadu 15, 708 33 Ostrava, Czech Republic
[email protected], [email protected]
DOI: 10.15598/aeee.v13i1.1125
Abstract. Conventional methods for computing 3D require much larger frequencies to achieve comparable
projects are nowadays usually implemented on stan- speeds. FPGA also support high parallelization and
dard or graphics processors. The performance of these may be used to achieve high computational through-
devices is limited especially by the used architecture, puts.
which to some extent works in a sequential manner.
This project originated as a semester project with an
In this article we describe a project which utilizes pa-
initial goal of drawing 3D projections of simple wire-
rallel computation for simple projection of a wireframe
frame models in real-time on a single chip, where the
3D model. The algorithm is optimized for a FPGA-
intent was to achieve very high values of fps. Since
based implementation. The design of the numerical
the described problem commonly lies beyond the boun-
logic is described in VHDL with the use of several basic
daries of usual microcontrollers/CPUs, the solution has
IP cores used especially for computing trigonometric
led to the creation of a hardware graphics pipeline for
functions. The implemented algorithms allow smooth
drawing on a screen via the VGA interface [1].
rotation of the model in two axes (azimuth and eleva-
tion) and a change of the viewing angle. Tests carried
out on a FPGA Xilinx Spartan-6 development board
have resulted in real-time rendering at over 5000 fps. 2. Graphics Pipeline
In the conclusion of the article, we discuss additional
possibilities for increasing the computational output in Current GPUs comprise many cores containing unified
graphics applications via the use of HPC (High Perfor- shaders, which allow the realization of operations pre-
mance Computing). viously carried out by vertex units, pixel units, TMUs
(texture mapping units) and ROPs (render output
units). Drawing of 3D models on the screen is basically
the results of several consecutive blocks (simplified) [5]:
Keywords
• Primitive processing – reading primitives, vertices
3D projection, FPGA, parallel processing, real and their connection.
time, VGA, VHDL.
• Vertex shader – the vertex shader transforms co-
ordinates of vertices by their multiplication with
the matrices of the scene. This is where the trans-
1. Introduction formation from 3D → 2D occurs.
The drawing of graphics scenes in 3D obtained from • Primitive assembly – vertices are joined into pri-
their representations requires the processing of large mitives.
volumes of data. Special chips are available for this • Rasterization – primitives are rasterized into pi-
purpose – GPUs which rely on mass parallelization. xels.
Under usual circumstances, CPUs are not suitable for
these tasks (even though there do exist instruction sets • Pixel shader – this is applied to each pixel of the
supporting multiple computations), since by their de- rasterized scene and computes its color. This step
sign they process instructions serially and hence would also applies textures.
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 39
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
A = P · T · R. (1)
Individual blocks of the graphics pipeline consist of Rotation is typically entered in the form of an azi-
separate VHDL modules or optimized IP cores which muth and elevation, where the following relations
are a part of the Xilinx ISE development kit. hold:
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 40
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
3.1. Cubido3D
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 41
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
R = A + (B − C) · D. (10)
1) Calculation of the Projection Matrix Fig. 7: Computation of the projection matrix by the GrxGe-
(GrxGenerateProjectionMatrix ) nerateProjectionMatrix module.
GrxGenerateProjectionMatrix is a unit which com- The resulting projection matrix is serialized into a
putes the matrix of a perspective display from received 4×4×10Q8 bus, and thus has a width of 288 bits. The
values of the azimuth, elevation and viewing angle bus is connected to computational units via registers
(amount of perspective deformation). The computa- and bus multiplexes controlled by the state automaton.
tion of formulas Eq. (1) to Eq. (9) is adapted for pro- The largest amount of running time is used by the se-
cessing via FPGA. The center of the projection (the rial divisors, and hence they are initiated shortly after
target point) is fixed to the initial point of the coor- the computation of the matrix begins and they work
dinate system. Despite best efforts to make the com- in parallel with the other computations.
putation as parallel as possible, it is strongly sequen-
tial and its processing is carried out by a state ma-
chine with 28 states. The computation of trigonometric 2) Vertex Unit (GrxVertexProjection)
functions is carried out by the CORDIC unit (Cordic-
Core_SINCOS entity), which computes in parallel the The vertex unit computes 2D vertices of the image by
sine and cosine functions for the entered angle. The projecting their 3D template through multiplication
computation of the tangent of the viewing angle is car- with the projection matrix received from GrxGene-
ried out in 2 steps: the first is the computation of the rateProjectionMatrix . The whole principle is very si-
sine and cosine of the viewing angle, followed by their milar to the computation carried out by the GrxGe-
division. Division is carried out in an adjoined serial nerateProjectionMatrix unit. The computation is con-
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 42
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
R = A + B · D. (11)
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 43
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 44
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
The hue is based on the azimuth and elevation and 3.5. The Libgenerics Library
hence changes depending on the "rotation" of the ob-
ject. The formula used for the video signal v is: This library provides the basic functional blocks and
functions:
− cos(θaz ) − cos(θel )
• GResynchronizer – Resynchronizer of one-bit
cos(θaz ) − cos(θ el )
2+
2
asynchronous signals into the internal clock do-
cos(θaz )
cos(θel ) − 2
main.
vcolor = y · +
2048
(12) • GResetSynchronizer – Resynchronizer of reset into
the internal clock domain.
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 45
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
Fig. 16: Overview of a Rivyera HPC with FPGA circuits. Fig. 19: RIVYERA supercomputer with 256 FPGAs [8].
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 46
CONTROL ENGINEERING VOLUME: 13 | NUMBER: 1 | 2015 | MARCH
safety, which is a difficult task in the case of sequen- ISSN 1350-2409. DOI: 10.1049/ip-cds:20040838.
tial tools based on microprocessors. Similar methods
have been used in areas such as mobile applications and [5] HO AHN, S. OpenGL programming tutorials, ex-
even home care systems, see [7]. Techniques for design amples and notes written with C++ [online]. 2013.
verification form an important part of the design me- Available at: http://www.songho.ca/opengl/
thods for these programmable circuits. These provide index.html.
us with near-certainty regarding the actual reliability [6] TRAN, V.-H. and X.-T. TRAN. An efficient ar-
of the programmable logical circuits. chitecture design for VGA monitor controller. In:
Tab. 1: FPGA device utilization summary.
2011 International Conference on Consumer Elec-
tronics, Communications and Networks (CEC-
Number of FSMs 12 Net). XianNing: IEEE, 2011, pp. 3917–3921.
Number of Block RAMs 32 of 32 (100 %) ISBN 978-1-61284-458-9. DOI: 10.1109/CEC-
Number of Slice LUTs 4237 of 9112 (46 %) NET.2011.5768261.
Number of bonded IOBs 28 of 232 (12 %)
Number of BUFG/BUFCTRLs 3 of 16 (18 %) [7] PENHAKER, M., M. STANKUS, J. KIJONKA
Number of DSP48A1s 10 of 32 (31 %)
Number of PLL_ADVs 1 of 2 (50 %) and P. GRYGAREK. Design and Application of
Mobile Embedded Systems for Home Care Ap-
plications. In: 2010 Second International Confer-
ence on Computer Engineering and Applications.
Acknowledgment Bali Island: IEEE, 2010, pp. 412–416. ISBN 978-
1-4244-6079-3. DOI: 10.1109/ICCEA.2010.86.
This paper has been elaborated in the framework of [8] SciEngines GmbH. 2014. Available at: http://
the project "Support research and development in the www.sciengines.com.
Moravian-Silesian Region 2013 DT 1 - International
research teams" (RRC/05/2013). Financed from the
budget of the Moravian-Silesian Region. The work About Authors
and the contributions were supported by the project
SP2014/194 "Biomedicinske inzenyrske systemy X".
Vladimir KASIK was born in Vyskov, Czech Re-
public, in 1973. He received his M.Sc. in Cybernetics,
Automation and Control from the Brno University of
References Technology, Czech Republic, in 1996 and his Ph.D. in
Technical Cybernetics from VSB–Technical University
[1] KASIK, V., A. KURECKA and P. POSPECH. of Ostrava, Czech Republic in 2000. Currently he is
3D Graphics Processing Unit with VGA Output. an assistant professor at VSB–Technical University of
IEE Proceedings - Circuits, Devices and Systems. Ostrava, Department of Cybernetics and Biomedical
2005, vol. 152, iss. 3, pp. 388–393. ISSN 1474-6670. Engineering, Ostrava, Czech Republic, where he
DOI: 10.3182/20130925-3-CZ-3023.00081. teaches and collaborates with industry in the areas of
programmable logic, electronics, embedded and con-
[2] AVR200. Multiply and Divide Routines. At-
trol systems. He is the author of several international
mel, 2009. Available at: http://www.atmel.com/
publications and in earlier years he attended a vari-
Images/doc0936.pdf.
ety of lecture stays in Universite Joseph Fourier and
[3] Nexys3. Board Reference Manual. Digilent, 2013. L’Institut National Polytechnique de Grenoble, France.
Available at: http://www.digilentinc.com/
Data/Products/NEXYS3/Nexys3_rm.pdf. Ales KURECKA was born in 1989. He received
his M.Sc. degree from VSB–Technical University of
[4] BENSAALI, F., A. AMIRA and A. BOURI- Ostrava, Czech Republic in 2013. He is currently a
DANE. Accelerating matrix product on re- Ph.D. student at the department of Cybernetics and
configurable hardware for image processing Biomedical Engineering. His research interests include
applications. IEE Proceedings - Circuits, Devices primarily localization techniques and embedded sys-
and Systems. 2005, vol. 152, iss. 3, pp. 236–246. tems.
c 2015 ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING 47