0% found this document useful (0 votes)
2 views

A Proposed Risc Instruction Set Architecture for the Mac Unit of 2014

This document proposes a RISC instruction set architecture (ISA) for a 32-bit VLIW DSP processor core, focusing on a specialized multiplier-accumulator (MAC) unit designed for efficient digital signal processing. The architecture supports flexible data computations and parallel execution of instructions, enhancing performance for applications like audio and imaging. The implementation has been verified on both simulation software and FPGA hardware, demonstrating its effectiveness in handling complex DSP operations.

Uploaded by

bvpoornima4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

A Proposed Risc Instruction Set Architecture for the Mac Unit of 2014

This document proposes a RISC instruction set architecture (ISA) for a 32-bit VLIW DSP processor core, focusing on a specialized multiplier-accumulator (MAC) unit designed for efficient digital signal processing. The architecture supports flexible data computations and parallel execution of instructions, enhancing performance for applications like audio and imaging. The implementation has been verified on both simulation software and FPGA hardware, demonstrating its effectiveness in handling complex DSP operations.

Uploaded by

bvpoornima4
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Proposed RISC Instruction Set Architecture for the

MAC Unit of 32-bit VLIW DSP Processor Core

Khoi-Nguyen LE-HUU, Anh-Vu Thanh VU Quoc-Minh DANG-DO, Vy LUU,


DINH-DUC HCMC University of Technology Trong-Tu BUI
University of Information Technology Ho Chi Minh City, Vietnam HCMC University of Science
Ho Chi Minh City, Vietnam [email protected] Ho Chi Minh City, Vietnam
{khoinguyen, anhvu}@uit.edu.vn {dqmdang, lxvy,
bttu}@fetel.hcmus.edu.vn

Abstract—Multiplier-accumulator is a specific hardware unit speeds and often include a modest amount of additional
that performs a common operation – computing the product of hardware (barrel shifter, instruction cache), to improve
two numbers and adding that product to an accumulator. performance in common DSP algorithms. These processors
Especially, in digital signal processing applications which consist also tend to have deeper pipelines.
of a large number of convolution operations, the emergence of
MAC unit contributes greatly to the high performance of the Another DSP generation was built by expanding
systems. This work is about an implementation for a specific conventional DSP architectures, for instance, adding parallel
MAC unit based on the proposed RISC instruction set execution units, i.e. a second multiplier and adder. The
architecture (ISA) of 32-bit VLIW Fixed-point DSP processor hardware extensions are typically associated with extended
core presented in our previous work. The computational unit is instruction set, allows multiple operations to be encoded in a
designed to be flexible for 32-bit/16-bit/8-bit data computations. single instruction and be executed in parallel. DSP processors
The implementation is verified to function correctly not only in in this category often have wider data buses, allowing them to
Modelsim software but also on Altera Cyclone II (2C35) FPGA get more data words per clock cycle. They can also use wider
board. instruction words to integrate parallel operations within a
single instruction. The downside of these DSP processors is the
Keywords—Digital Signal Processors, Multiply, Accumulate, difficulty in assembly language programming.
VLIW, RISC.
Multi-issue processors use very simple instructions that
I. INTRODUCTION typically encode a single operation. These processors achieve a
Digital signal processing is increasingly important for high level of parallelism by issuing and executing instructions
applications in real life such as communications [1]-[2], in parallel groups rather than one at a time. Using simple
medical imaging [3]-[4], radar & sonar [5], high fidelity music instructions simplifies instruction decoding and execution,
reproduction [6], oil prospecting [7], etc. As applications allowing multi-issue processors to execute at higher clock rates
become more complex, the processing of digital signals in an than conventional or enhanced conventional DSP processors.
efficient manner will help the system be more attractive. A The two sub-categories of implementation of this architecture
digital signal processor is a specialized microprocessor with an that execute multiple instructions in parallel are VLIW (Very
architecture optimized for operational needs of digital signal. Long Instruction Word) and superscalar. The biggest difference
DSP algorithms and functions will determine the appropriate between them is how instructions are grouped for parallel
architecture for the processor. Although DSP processors have a execution.
comprehensive change in the past few decades, there are still Recently, an implementation of 16-bit RISC-based DSP
common features in most DSP processors today. DSP processor was proposed in [10]. In this design, all of logical
processors need multiple memory banks with independent and arithmetic operations are carried out by only one ALU.
buses, specialized instruction sets, addressing modes, control The ALU is constructed to include three sub units: MAC,
and peripherals. Modern DSP architectures can be divided into LOGIC and ARITH units. Obviously, this design is not
3 or 4 categories (generations) [8]. effective in term of parallel computation. For example, when
For the conventional DSP processors, one instruction is the MAC unit is busy, the ARITH unit is free and vice versa.
issued and executed in one clock cycle. They use the complex, Therefore, in this paper, an implementation of separate
multi-operation type of instructions. These processors typically MAC unit is proposed. This unit only supports MAC and
include a single multiplier (MAC unit) and an ALU, but few relating multiplying operations. Other arithmetic operations are
additional execution units. Typical processors in this category handled by the ALU. Moreover, the novelty of this design is
include Analog Devices' ADSP-21xx family, Texas that MAC unit can support multiple data widths. At the same
Instruments' TMS320C2xx family, and Motorola's DSP560xx time, one 32-bit MAC operation or two 16-bit MAC operations
family. DSP processors like the Motorola DSP563xx and or four 8-bit MAC operations can be calculated separately.
Texas Instruments TMS320C54x operate at higher clock

978-1-4799-2903-0/14/$31.00 ©2014 IEEE 170


This is very helpful for many 8-bit and 16-bit DSP B. Instruction Set Architecture
applications. The rest of this paper is structured as follows. In [12], we have proposed an ISA for the 32-bit Fixed-Point
Section II describes the overview of 32-bit VLIW fixed-point DSP processor core. Note that, this DSP processor core is
DSP processor core. The proposed MAC unit is in Section III. aimed at audio and imaging applications, thus, the ISA also
The last two sections are for experiment as well as conclusion, follows this objective.
respectively.
1) Register Files and Conditional Execution
II. VLIW DSP PROCESSOR OVERVIEW Due to RICS architecture, all operations are performed on
A. Top-level Architecture register files and obviously, the more number of registers, the
less number of load/store instructions. In this ISA, there are up
In general, digital signal processing SoCs usually consist of to 32 32-bit general registers named A0 - A15 and B0 – B15.
DSP core, peripheral controller, external memory controller, Most of registers can be used as operands in computational
power management as well as acceleration hardware such as units and base address/offset in load/store instructions.
FFT core (Fast Fourier Transform), DCT core (Discrete Cosine Moreover, A4 – A7 and B4 – B7 are also utilized in both linear
Transform), DMA units (Direct Memory Access). They are all and circular addressing mode. Especially, super load and store
depicted in Fig. 1. instructions have A7 and B7 implicitly do their address
calculating.
Inheriting advanced conditional execution from ARM
processors and other leading commercial DPSs, this ISA also
supports this feature. There is a field named “cr” in operation
code denoting which register (A0 –A2, B0 – B2) will be
examined with a test condition (equal or not equal to zero). In
case that this field is zero, the instruction will always be
performed. This means, programmer can specify conditional
register in assembly format as follow:
[A0] Instr dst, src1, src2
Fig. 1. General architecture of DSP SoC.
In C language:
In this work, we only focus on the DSP core. It consists of if (A0 != 0) dst = src1 op src2;
Instruction Fetch unit, Instruction Decode unit, four
executional units, 32 32-bit general-purpose registers, control 2) Adressing Mode
registers, interrupt controller and instruction cache as in Fig. 2. The Addressing Mode Register (AMR) in Fig. 3 specifies
On-chip memory sub-system is located out of the core, the addressing mode for each of the eight registers (A4–A7,
containing the Program memory and two Data memories B4–B7) that can perform linear or circular addressing.
(DMX and DMY). The Execution Units on the whole are
composed of four functional units: FALU (Arithmetic Logic
Unit for Fixed-point computation), MAC (Multiplication and
Accumulation unit), BALU (Arithmetic Logic Unit for
Branching computation), and LSU (Loading/Storing Unit). The
program fetch and instruction decode units can deliver up to Fig. 3. Addressing Mode Register (AMR)
four 32-bit instructions to the functional units every CPU clock
cycle. Consequently, this VLIW approach can help us possibly The reserved field of AMR is always 0. The block size
take full advantage of instruction level parallelism (ILP). fields (BS0, BS1) contain 5-bit values used in calculating block
sizes for circular addressing. A 2-bit field for each register
selects the address modification mode: linear (the default) or
circular mode. This 2-bit field also selects which BS field to
use for circular buffer.
Linear mode simply shifts the offsetR/cst operand (in load
and store instruction) or shifts the src1/cst operand (in add and
sub instruction) to the left by 2, 1, or 0 for word, half word, or
byte access, respectively, and then performs the operations
needed. For circular mode, one important thing to consider is
that the offsetR/cst or src1/cst is modulo the circular buffer size
if these operands are specified greater than the buffer size.
Fig. 2. Top-level architecture of DSP processor core. In case of super load and super store instructions
(SLD/SST), the base registers for data memory X/Y are A7 and
B7 implicitly. These kinds of instructions can use both
addressing modes which are shown in AMR for A7 and B7.

171
3) Parallel Execution MIN MAC4(U) CMPGT(U/2/4) OR
There are four instructions that can be fetched at a time MIN2 MAC4(S) CMPLT(U/2/4) STB(U)
MINU4 MACN2 EXT(U) STH(U)
forming a fetch packet of 128 bits. The execution of the
MVK MMAC MVK STW
individual instruction in a packet is determined by bit p in each NEG MPY(U) OR STB(U) offset
instruction. Bit p (bit 0) determines whether an instruction is NOT MPY(S) NEG STH(U) offset
executed in parallel with another instruction. If the p-bit of OR NOT STW offset
instruction i is 1, then instruction i+1 is executed in parallel SIN SADD(SU/US/2) SUB2
with (in the same cycle as) instruction i. If the p-bit of SINC SADDU4 XOR
instruction i is 0, then instruction i+1 is executed in the cycle SADD SHLMB SLDB(U)
after instruction i. Therefore, the last p-bit in a fetch packet is SAT SHR2 SLDH(U)
SUB(U) SHRMB SLDW
always set to 0. In assembly language, parallel execution can
SUB2 SHRU2 STB(U)
be denoted by || symbol before an instruction to specify its SUB4 SUB2 STH(U)
parallel execution with previous one. SUBABS4 SWAP2 STW
Instr. A SWAP XOR
|| Instr. B SWAP4
|| Instr. C XOR
|| Instr. D TABLE I describes all of the proposed instructions
An execute packet consists of all instructions executing in according to each functional unit. The opcode maps as well as
parallel. Each instruction in the execute packet must be proposed design data path for MAC unit will be presented in
implemented by a different functional unit. The p-bit pattern of next subsections.
four instructions in a fetch packet can result in the execution
sequence that is fully parallel, fully serial, or partially serial. 5) Instruction to MAC unit
With the help of SLD/SST, MAC instructions in parallel
execution, some of signal processing computations can get
throughput of one instruction cycle. For example, in
Fig. 4. Opcode for MAC unit.
convolution operation, the SLD can load two operands, while
the MAC performs multiplication and accumulation for the Fig. 4 illustrates an overall opcode map for all operations
previous ones. Following example shows an advantage of on the MAC unit. All notations for opcode are depicted in
SLD, MAC and parallel execution support in convolution. TABLE II.
Convolution in C language:
TABLE II. INSTRUCTION OPERATION & EXECUTION
conv = conv + x[i] * h[i]. NOTATIONS
Convolution in Assembly language without super load, Symbol Description
MAC and parallel execution requires four sequential cr conditional registers: instruction executed based on z value
instructions as follows: z zero or non-zero
dst destination operand
LDW A0, A7[i] src2 second source operand
LDW B0, B7[i] src1 first source operand
MPY A0, A0, B0
const constant operand
ADD A1, A1, A0
rsv reserved
However, only two contemporaneous instructions are func. code functional code: contain up to 64 instructions
necessary within MAC, super load and parallel support: 00000 opcode: identify MAC instructions
SLDW A0, B0, i p parallel execution
|| MAC A0, B0 MAC unit aims at having multiply and accumulate
4) Instruction to Functional Unit Mapping instructions. Indeed, there are specific instruction groups such
as multiplication group (MPY and extensions, etc.),
TABLE I. MAPPING BETWEEN INSTRUCTIONS AND multiplication and accumulation group (MAC(2/4), MMAC,
FUNCTIONAL UNITS etc.). Additionally, this unit includes a functional block to
FALU MAC BALU LSU perform bit-oriented group (BITC4, BITR, etc.). Moreover, the
ABS AVG2 ADD ADD common addition/subtraction instructions are still supported.
ABS2 AVGU4 ADDK ADD2 The single cycle data path design for instruction groups is
ADD BITC4 ADD2 ADDB(H/W) depicted in Fig. 5.
ADDU BITR ADDKPC ADDAD
ADD2 DEAL AND AND The MAC block in Fig. 5 can be implemented in top-level
ADD4 ROTL ANDN ANDN architecture as illustrated in Fig 6. The interesting point in this
AND SHFL B disp LDB(U) architecture is the emergence of barrel shifter allowing the
ANDN XPND2 B reg LDH(U) multiplication by 2x number without using heavy multiplier.
COS XPND4 BDEC LDW
COSC MAC(U) BNOP LDB(U) offset
MAX MAC(S) BPOS LDH(U) offset
MAX2 MAC2(U) CLR LDW offset
MAXU4 MAC2(S) CMPEQ(2/4) MVK

172
TABLE IV. CONTROL SIGNAL DESCRIPTION
Name Type Bit Description
Width
iClk input 1 System clock
iReset_n input 1 Reset signal
iMac input 1 MAC/Bit-oriented unit selection signal
iOp input 5 Operation signal
iFunc input 5 Function signal
iSource1 input 32 First operand
iSource2 input 32 Second operand
oResult output 64 Result
Fig. 7 describes a top-level architecture for the proposed
MAC unit. The four multiply-accumulate instructions
including MAC/MAC2/MAC4/MACN2 are inputs for the
Fig. 5. Single cycle data path design for MAC unit.
MAC Register block, their results are then stored in 64-bit
MACReg. As the proposed DSP processor is based on 32-bit
architecture, the MACReg is needed to transfer its value to the
dst1 and dst2 in memory by utilizing MMAC instruction
according to TABLE III. Except for MPY and those multiply-
accumulate instructions, the oResult_High of the other
instructions are all masked with “0” bits. The 5-bit iOp and
iFunc are used to design the signal mapping for MAC unit as
described in following subsections.

Fig. 6. Top-level architecture of MAC block in DSP core.

As the convolution is obviously one of the most common


operations in DSP algorithms, the MAC, MAC2, MAC4,
MMAC instructions are taken into account carefully. On the
other hand, the MAC unit still supports individual packed 8/16-
bit multiplication and accumulation (MAC and extensions).
Their assembler syntaxes are illustrated in TABLE III.

TABLE III. SPECIAL INSTRUCTIONS IN MAC UNIT


Instruction Syntax Description
MAC4 MAC src2, MACReg=MACReg
src1/const + src2_b0*src1_b0(const)
+ src2_b1*src1_b1(const)
+ src2_b2*src1_b2(const)
+ src2_b3*src1_b3(const)
MAC2 MAC src2, MACReg=MACReg
src1/const + src2_msb16*src1_msb16(const)
+ src2_lsb16*src1_lsb16(const)

MAC MAC src2, MACReg=MACReg+src2*src1/const


src1/const Fig. 7. Top-level architecture of MAC unit.
MACN2 MACN2 src2, MACReg=MACReg+
src1/const src2_msb16*src1_msb16(const)-
src2_lsb16*src1_lsb16(const) A. Multiply-Accumulate Unit Implementation
MMAC MMAC dst1, dst1=MACRegH The Multiply-Accumulate unit is responsible for MPY,
dst2 dst2=MACRegL
MAC, AVG, ROTL instructions. TABLE V presents the
Note that, MACReg is a 64-bit register located inside the design of those instructions.
MAC unit to store the result of convolution operations.
III. PROPOSED MAC UNIT TABLE V. SIGNAL MAPPING FOR MAC UNIT OPERATIONS

In general, the MAC unit is controlled by following signals MPY MAC AVG ROTL
iOp[3] 1 0 0 0
as depicted in TABLE IV. Certainly, Instruction Decoder will
iOp[2] 0 1 0 0
be in charge of generating those signals. iOp[1] 0 0 1 0
iOp[0] 0 0 0 1

173
cases for those operations. Notations used in Fig. 8 are
According to this design, the 5-bit iOp signal will indicate explained in TABLE IX.
the execution of corresponding instructions. For example, the
MPY instruction will be executed only if the relevant iOp[3] =
1. If one instruction includes sub-operations such as 8-bit/16-
bit/32-bit computation, the iFunc will be fully utilized as
described in TABLE VI.

TABLE VI. SIGNAL MAPING FOR MAC UNIT SUB-OPERATIONS


iOp Instr. iFunc[4 iFunc[3 iFunc[2 iFunc[1 iFunc[0
] ] ] ] ]
iOp[2] MAC x x 0 0 0
=1 MAC2 x x 0 0 1
MAC4 x x 0 1 0
MACN2 x x 0 1 1
MMAC x x 1 0 0
iOp[1] AVG2 x x x x 0
=1 AVGU4 x x x x 1
According to TABLE VI, if iOp[1]=1 and iFunc[0]=1, the
AVGU4 instruction that calculates an average value on packed
8-bit data and then places the result in dst in packed 8-bit
format will be executed.
B. Bit-Oriented Unit (BOU) Implementation
The Bit-Oriented unit is responsible for bit-oriented
instructions (BITC4, BITR, XPND2, XPND4), de-interleave
instruction (DEAL), shifting instruction (SHFL). TABLE VII
presents the design of those instructions.

TABLE VII. SIGNAL MAPPING FOR BOU OPERATIONS


BITC4 BITR XPND DEAL SHFL
iOp[4] 1 0 0 0 0
iOp[3] 0 1 0 0 0
iOp[2] 0 0 1 0 0
Fig. 8. Test cases model for MAC unit.
iOp[1] 0 0 0 1 0
iOp[0] 0 0 0 0 1
Similar to the signal mapping for Multiply-Accumulate unit TABLE IX. NOTATIONS OF TEST CASES
operations, the 5-bit iOp signal will indicate the execution of
corresponding instructions in Bit-Oriented unit. For example, Notation Description
(signed) This test case uses signed version of the operation
the BITC4 instruction that counts the number of “1” bits in (unsigned) This test case uses unsigned version of the operation
packed 8-bit data and then writes the value to the (-) The operand used in this testcase is negative
corresponding position will be executed only if the relevant (+) The operand used in this testcase is positive
(0) The operand used in this testcase is zero
iOp[4] = 1. If one instruction includes sub-operations such as
8-bit/16-bit/32-bit computation, the iFunc will be fully utilized A particular set of input data for test cases shown in Fig. 8
as described in TABLE VIII. is given in left panel of Fig. 9. Input data includes control
signals such as iMAC, iOp, iFunc and data signals such as
TABLE VIII. SIGNAL MAPING FOR BOU SUB-OPERATIONS iSource1 and iSource2. Golden output data for those test cases
are provided in right panel of Fig. 9. It consists of result signal.
iOp Instr. iFunc[4] iFunc[3] iFunc[2] iFunc[1] iFunc[0]
iOp[2]=1 XPND2 x x x x 0
XPND4 x x x x 1
B. Simulation waveforms
According to TABLE VIII, if iOp[2]=1, iFunc[0]=1, the The MAC unit is first designed by Verilog HDL and then
XPND4 instruction that reads the four least-significant bits of simulated by Modelsim software as illustrated in Fig. 10 to
src2 and expands them into four-byte masks written to dst is verify the functional correctness.
executed.
In this simulation, the MAC unit is controlled by three
IV. EXPERIMENT signals such as iMAC, iOp and iFunc. The data that need to be
handled are put on two operands including iSource1 and
In this section, the experimental results will be presented
iSource2. Results of the calculations are given in oResult. And
and discussed.
the instruction set for MAC unit are all verified to function
A. Test case building correctly. For example, Fig. 10 depicts a simulation for the test
The proposed test cases for MAC unit are depicted in Fig. cases from number 15 to number 20 in Fig. 8. In the 16th test
8. There are two columns: the left one shows Multiply- case, we get an unsigned multiplication value between two
Accumulate operations and the right one presents specific test positive numbers.

174
[12]. In order to achieve throughput one clock cycle/
instruction, pipelining will be surely in our future work.
Besides, the three remaining functional blocks including
FALU, BALU, LSU should also be carried out in the future to
finalize the design of the 32-bit VLIW DSP processor core.
Moreover, the compiler and assembler need to be taken into
account carefully as they absolutely contribute to the
effectiveness of proposed instruction set.
ACKNOWLEDGMENT
This work was granted under Project 39/2013/HĐ-SKHCN
by the Department of Science and Technology of HCM City.
REFERENCES
[1] Gatherer A., Stetzler T., McMahan M., and Auslander E., DSP-based
Architectures for Mobile Communications: Past, Present, and Future,
IEEE Communications Magazine, Vol. 38, Issue 1, pp. 84 – 90, Jan
2000.
[2] Xuan-Thuan NGUYEN, QM-Dang DO, Hoang-Dat TRAN, Huu-Thuan
HUYNH, and Cong-Kha PHAM, A PCIe-based FFT Implementation for
High-speed Spectrum Analysis, Proc. 3rd IEICE Int. Conf. Integrated
Circuits and Devices in Vietnam, pp. 126 – 131, Danang, Vietnam, Aug
13th – 15th, 2013.
Fig. 9. Input data and Golden output data.
[3] Yagi M., Shibata T., An Image Representation Algorithm Compatible
with Neural-Associative-Processor-Based Hardware Recognition
Systems, IEEE Trans. Neural Networks, Vol. 14, No. 5, pp. 1144 –
1161, Sep. 2003.
[4] Greenberg J.E., Delgutte B., and Gray M.L., Hands-on Learning in
Biomedical Signal Processing, IEEE Engineering in Medicine and
Biology Magazine, Vol. 22, Issue 4, pp. 71 – 79, Aug 2003.
[5] Titlebaum, Edward L. ; Dept. of Electr. Eng., Rochester Univ., NY,
USA, “Frequency- and time-hop coded signals for use in radar and sonar
systems and multiple access communications systems”, in Conference
Fig. 10. Simulation of MAC unit on Modelsim tool. Record of The Twenty-Seventh Asilomar Conference on Signals,
Systems and Computers, 1993.
After the successful simulation on Modelsim tool, the [6] Olswang, B.S. ; LOUD Technol. Inc., Wodinville, WA ; Cvetkovic, Z.,
“Separation of Audio Signals Into Direct and Diffuse Soundfields for
design is synthesized by Altera Quartus II targeting on Cyclone Surround Sound”, in Procs. of IEEE International Conference on
II EP2C35 FPGA device. Compilation report is shown in Acoustics, Speech and Signal Processing, 2006.
TABLE X. The verification on FPGA for the test cases from [7] Mottl, V. ; Tula State Univ., Russia ; Dvoenko, S. ; Levyant, V. ;
number 15 to number 20 in Fig. 8 is finally completed by Muchnik, I., “Pattern recognition in spatial data: a new method of
utilizing SignalTap II Logic Analyzer to capture MAC signals seismic explorations for oil and gas in crystalline basement rocks”, in
as presented in Fig. 11. In this test, MAC unit is running at Procs. of 15th Internation Conference on Pattern Recognition, 2000.
clock of 40 MHz. Also, the waveform is identical with [8] Edwin J. Tan and Wendi B. Heinzelman. DSP architectures: past,
present and futures. SIGARCH Comput. Archit. News 31, 3 (June 2003),
simulation result. pp. 6-19.
[9] Donghoon Lee, Chanwon Ryu, Jusung Park, Kyunsoo Kwon and
TABLE X. COMPILATION REPORT Wontae Choi, Design and implementation of 16-bit fixed point digital
Resource Logic Element 1633 signal processor, IEEE International SoC Design Conference (ISOCC),
Register 203 vol.2, pp. II-61 – II-64, 2008.
Fmax 42.17 MHz [10] Xuan-Thuan Nguyen, Trong-Tu Bui, Huu-Thuan Huynh, Cong-Kha
Pham, Duc-Hung Le, An Asic Implementation Of 16-Bit Fixed-Point
Digital Signal Processor, International Conference on Advanced
Computing and Applications (ACOMP), 2013.
[11] Khoi-Nguyen Le-Huu, Thanh T. Vu, Diem N. Ho, Anh-Vu Dinh-Duc,
“Towards a VLIW Architecture for the 32-bit Digital Signal Processor
Core”, in Procs. of the 5th FTRA Int. Conf. on Computer Science and its
Applications (CSA-13), 2013.
[12] Khoi-Nguyen Le-Huu, Thanh T. Vu, Diem N. Ho, Anh-Vu Dinh-Duc,
Fig. 11. Verification on SignalTap II Logic Analyzer. “Towards a RISC Instruction Set Architecture for the 32-bit VLIW DSP
Processor Core”, to appear in Procs. of the IEEE Region 10 Technical
Symposium (TENSYMP 2014), 2014.
V. CONCLUSION
In this work, we have presented an implementation for
MAC unit according to the proposed ISA in our previous work

175

You might also like