DSP_presentation_Sumit 1
DSP_presentation_Sumit 1
System
Analog D Analog
Antialiasing Sample Reconst.
Signal Filter and Hold A/D S D/A Filter
Signal
in out
P
A perspective of the Digital Signal Processing
problem
Application areas
Medical Radar Speech Seismic Image
•••
Theoretical
problem Basic functions
modelling
Algorithms
Architechtures
Processor
Implementation instruction sets
and/or hardware
functions
Component technology
DSP APPLICATION CHARACTERISTICS
Programmable Expensive
Can be configured for Complex control
different applications I/O overheads
1. Flexible
2. Suitable for Internet and
Multimedia application
3. Software Intensive
4. Slow for high speed application
5. Too bulky
6. Power hungry
Why are conventional Processors not
suitable for DSP?
Data
Progra Read
m Address
Data Write
Counter
Address
*
Program/ Data
Coefficient Memory
Memory
CP ACC
Architecture of Digital Signal
Processors
• General-purpose processors are based on the Von
Neumann architecture (single memory bank and
processor accesses this memory bank thro’ single
set of address and data lines)
2. PIPELINING
3. HARDWARE MULTIPLIERS AND OTHER ARITHMETIC FUNCTIONS
4. ON-CHIP AND CACHE MEMORIES
5. A VARIETY OF ADDRESSING MODES
7. INSTRUCTIONS THAT PACK SEVERAL OPERATIONS
8. ZERO-OVERHEAD LOOPING
9. I/O FEATURES SUCH AS INTERRUPT, SERIAL I/O, DMA
10. OTHER CONTROL FUNCTIONS SUCH AS WAIT STATES
A second order FIR
filter
x(n) x(n-1) x(n-2)
Delay Delay
y(n)
+ +
x(n-1) h(1)
Delay
ar1 x(n-2) ar2 h(2)
MAC
y(n)
Organization of signal samples and filter coefficients
for a second order FIR filter implementation
An Nth order FIR filter implementation
A[0] X[n]
A[1] X[n-1]
A[2] X[n-2]
*
•• ••
•• P ••
•• ••
A[N-1] +
X[n-
N+1]
y[n]
ACC
Coefficient Data
Memory Memory
FIR Filter pseudo-code
Load loop count
Initialize coefficient and data addr regs
Zero Acc and P registers
LOOP: Pnew = A[i] . X[n-i]
Accnew = ACCold + Pold
Decrement coefficient and data addr
regs
X[n-i] X[n-i-1] {for next iteration}
Decr loop count
BNZ LOOP
Acc Y[n]
A Typical DSP Architecture
PM Data DM Data
PM Address Address Address DM Address
Program
Memory Generator Generator Data
(PM) Memory
(DM)
Instruc- Program Sequencer
tions & Instruction Cache Data
secondary PM Data DM Data
only
data
Registers DMA Bus
I/O
Multiplier Controller
(DMA)
ALU
Shifter
Input/Output
Salient Features
• REPEAT-MAC instruction
- Performs auto-increment of both coefficient and data
pointers
- Frees up program memory bus for fetching
coefficients
• Circular buffer
- to manage data movement at the end of every output
computation
• Handling precision
- Accumulator guard bits
- Saturation mode
- Shifters (both right and left shift)
Types of multipliers used
• Array multipliers
• Multipliers based on modified Booth’s
algorithm
Product Computation Unit of a
simple multiplier for 4-bit
unsigned numbers X and Y
Multiplying X and Y: summation
unit of the simple multiplier
Combinational array for Booth’s
algorithm – Basic cell B
Array Multiplier for 4x4-bit
numbers using basic cell B
Arithmetic
Fixed point Vs Floating point
Array indices, Loop Wider dynamic range
counters etc. frees user from scaling concerns
Less sensitive to error
accumulation
Unbiased rounding
Internal Vs External
Pincount limitation
Speed penalty Off-chip bussing
PROGRAM DATA
MEMORY MEMORY
MODIFICATION #1 MODIFICATION#2
PROGRAM MULTI-PORT
DATA PROGRAM
/DATA DATA
MEMORY MEMORY
MEMORY MEMORY
MEMORY ORGANISATION - II
MODIFICATION #3
PROGRAM/
PROGRA DATA
DATA
M CACHE MEMORY
MEMORY
MODIFICATION #4
MODIFICATION #5
VLIW architecture
• Each instruction specifies several
operations to be done in parallel
• Advantages : Simple hardware
compilers can spot ILP
easily
• Disadvantages : Little compatibilty between
generations
Explicit NOPs bloat code
Super scalar architecture
ends
F4 D4 O4 W4
here
Example of dependency
• A 3 + A; B 4 x A
Can’t perform these two in parallel
• Another case: A = B + A; B = A – B; A =
A – B (swapping without temp) ; examine
how you can handle this.
Branch Dependency in Pipelining
A Branch instruction can cause a pipeline stall if the branch
is taken, as the next instruction has to be aborted in that
case. If I1 is an unconditional branch instruction, the next
Fetch cycle (F2) can start after D1. But if I1 is a conditional
branch instruction, F2 has to wait until O1 for the decision as
to whether the branch will be taken or not.
F1 D1 O1 W1 branch instruction
Xi SINGLE-CYCLE
Z -1 Z -1 Z-1
Z-1
MULTIPLY/ACCUMULATE
MULTIPLY/ACCUMULATE
USING EXTERNAL
PROGRAM MEMORY
REPEAT INSTRUCTION
ADAPTIVE FILTERING
Yi
INSTRUCTIONS
BIT-REVERSED
0-16 BIT SCALING SHIFTER (SIGNED ADDRESSING
OR UNSIGNED) AUTOMATIC DATA-MOV
OVERFLOW MANAGEMENT IN MEMORY (Z-1)
-SATURATION MODE
-BRANCH ON OVERFLOW
-PRODUCT RIGHT SHIFT
TMS320C25 - HIGHER PERFORMANCE AT LESS CODE
SPACE
xn
Z-1 Z-1 Z-1 Z-1
x x x x
Yn
N
Yn = b K X(n-K) TMS320C25
K=0
RPTK 49
MACD
MEMORY ORGANIZATION
4K WORDS ON-CHIP
MASKED ROM
544 WORDS ON-CHIP
DATA RAM
256 WORDS ON-CHIP
RAM RECONFIGURABLE
AS DATA/PROGRAM
MEMORY
BLOCK TRANSFERS IN
MEMORY
DIRECT, INDIRECT, AND
IMMEDIATE ADDRESSING
MODES
BLOCK DIAGRAM OF A TMS320C5X DSP
General-Purpose Microprocessor
circa 1984 : Intel 8088
~100,000 transistors
Clock speed : ~ 5 MHz
Address space : 20 bits
Bus width : 8 bits
100+ instructions
2-35 cycles per instruction
Micro-coded architecture
DSP TMS 32010 1984
Clock 20 MHz
16 bits
8, 12 bits addressing space
~ 50 k transistors
~ 35 instructions
Harvard architecture
Hardware multiplier
Double length accumulator with
saturation
A few special DSP instructions
General Purpose Microprocessor 2000
GHz clock speed
32-bit address or more
32-bit bus, 128-bit instructions
Complex MMU
Super scalar CPU
MMX instructions
On chip cache
Single cycle execution
32-bit floating point ALU on board
Very expensive
10s of watts of power
DSP in 2000
Clock 100 ~ 200 MHz
16-bit floating point or 32-bit floating
point
16-24 bits address space
Large on-chip and off-chip memories
Single cycle execution of most
instructions
Harvard architecture
Lots of special DSP instructions
50 mw to 2w power
Future of DSP Microprocessor
Sufficiently unique for an independent
class of applications (HDD, cell phone)
Low power consumption, low cost
High performance within power, cost
constraints (MIPS/mw, MIPS/$)
Fixed point & floating point
Better compilers - but users must be
informed
Hybrid DSP/ GP systems
DSP
Architecture
s
Professor S. Srinivasan
Electrical Engineering Department
I.I.T.-Madras, Chennai –600 036
[email protected]