DSP_presentation_Sumit 3
DSP_presentation_Sumit 3
MODIFICATION #3
PROGRAM/
PROGRA DATA
DATA
M CACHE MEMORY
MEMORY
MODIFICATION #4
MODIFICATION #5
y(n)
+ +
Theoretical
problem Basic functions
modelling
Algorithms
Architechtures
Processor
Implementation instruction sets
and/or hardware
functions
Component technology
A Digital Signal Processing
System
Analog D Analog
Antialiasing Sample Reconst.
Signal Filter and Hold A/D S D/A Filter
Signal
in out
P
DSP APPLICATION CHARACTERISTICS
Programmable Expensive
Can be configured for Complex control
different applications I/O overheads
1. Flexible
2. Suitable for Internet and
Multimedia application
3. Software Intensive
4. Slow for high speed
application
5. Too bulky
6. Power hungry
Why are conventional Processors not
suitable for DSP?
Data
Progra Read
m Address
Data Write
Counter
Address
*
Program/ Data
Coefficient Memory
Memory
CP ACC
Architecture of Digital Signal
Processors
• General-purpose processors are based on the Von
Neumann architecture (single memory bank and
processor accesses this memory bank thro’ single
set of address and data lines)
2. PIPELINING
3. HARDWARE MULTIPLIERS AND OTHER
ARITHMETIC FUNCTIONS
4. ON-CHIP AND CACHE MEMORIES
5. A VARIETY OF ADDRESSING MODES
7. INSTRUCTIONS THAT PACK SEVERAL
OPERATIONS
8. ZERO-OVERHEAD LOOPING
9. I/O FEATURES SUCH AS INTERRUPT, SERIAL
I/O, DMA
10. OTHER CONTROL FUNCTIONS SUCH AS WAIT
STATES
x(n+1)
x(n-1) h(1)
Delay
ar1 x(n-2) ar2 h(2)
MAC
y(n)
Organization of signal samples and filter coefficients
for a second order FIR filter implementation
An Nth order FIR filter implementation
A[0] X[n]
A[1] X[n-1]
A[2] X[n-2]
*
•• ••
•• P ••
•• ••
A[N-1] +
X[n-
N+1]
y[n]
ACC
Coefficient Data
Memory Memory
FIR Filter pseudo-code
Load loop count
Initialize coefficient and data addr
regs
Zero Acc and P registers
LOOP: Pnew = A[i] . X[n-i]
Accnew = ACCold + Pold
Decrement coefficient and data
addr regs
X[n-i] X[n-i-1] {for next
iteration}
Decr loop count
BNZ LOOP
A Typical DSP Architecture
PM Data DM Data
PM Address Address Address DM Address
Program
Memory Generator Generator Data
(PM) Memory
(DM)
Instruc- Program Sequencer
tions & Instruction Cache Data
secondary PM Data DM Data
only
data
Registers DMA Bus
I/O
Multiplier Controller
(DMA)
ALU
Shifter
Input/Output
Salient Features
• REPEAT-MAC instruction
- Performs auto-increment of both coefficient
and data pointers
- Frees up program memory bus for fetching
coefficients
• Circular buffer
- to manage data movement at the end of
every output computation
• Handling precision
- Accumulator guard bits
- Saturation mode
- Shifters (both right and left shift)
Product Computation Unit of a
simple multiplier for 4-bit
unsigned numbers X and Y
Combinational array for Booth’s
algorithm – Basic cell B
Arithmetic
Fixed point Vs Floating point
Array indices, Loop Wider dynamic range
counters etc. frees user from scaling
concerns
Less sensitive to error
accumulation
Unbiased rounding
Xi ¨ SINGLE-CYCLE
Z -1 Z -1 Z -1
Z -1
MULTIPLY/ACCUMULATE
¨ MULTIPLY/ACCUMULATE
USING EXTERNAL
PROGRAM MEMORY
¨ REPEAT INSTRUCTION
¨ ADAPTIVE FILTERING
Yi INSTRUCTIONS
¨ BIT-REVERSED
ADDRESSING
¨ 0-16 BIT SCALING SHIFTER (SIGNED
¨ AUTOMATIC DATA-MOV
OR UNSIGNED)
IN MEMORY (Z-1)
¨ OVERFLOW MANAGEMENT
-SATURATION MODE
-BRANCH ON OVERFLOW
-PRODUCT RIGHT SHIFT
TMS320C25 - HIGHER PERFORMANCE AT LESS CODE SPACE
xn
Z-1 Z-1 Z-1 Z-1
x x x x
Yn
N
Yn = b K X(n-K) TMS320C25
K=0
RPTK 49
MACD
¨ INDIRECT ADDRESSING
- B AUXILIARY REGISTERS
- USED OFTEN IN PROGRAM
LOOPS WITH AUTO INC/DEC OPTIONS
Addressing Mode (contd.)
MEMORY ORGANIZATION
¨ 4K WORDS ON-CHIP
MASKED ROM
¨ 544 WORDS ON-CHIP
DATA RAM
¨ 256 WORDS ON-CHIP
RAM RECONFIGURABLE
AS DATA/PROGRAM
MEMORY
¨ BLOCK TRANSFERS IN
MEMORY
¨ DIRECT, INDIRECT, AND
IMMEDIATE
ADDRESSING MODES
BLOCK DIAGRAM OF A TMS320C5X DSP
General-Purpose Microprocessor
circa 1984 : Intel 8088
~100,000 transistors
Clock speed : ~ 5 MHz
Address space : 20 bits
Bus width : 8 bits
100+ instructions
2-35 cycles per instruction
Micro-coded architecture
DSP TMS 32010 1984
Clock 20 MHz
16 bits
8, 12 bits addressing space
~ 50 k transistors
~ 35 instructions
Harvard architecture
Hardware multiplier
Double length accumulator with
saturation
A few special DSP instructions
Relatively inexpensive
General Purpose Microprocessor 2000
GHz clock speed
32-bit address or more
32-bit bus, 128-bit instructions
Complex MMU
Super scalar CPU
MMX instructions
On chip cache
Single cycle execution
32-bit floating point ALU on board
Very expensive
10s of watts of power
DSP in 2000
Clock 100 ~ 200 MHz
16-bit floating point or 32-bit floating
point
16-24 bits address space
Large on-chip and off-chip memories
Single cycle execution of most
instructions
Harvard architecture
Lots of special DSP instructions
50 mw to 2w power
Future of DSP Microprocessor
Sufficiently unique for an
independent class of applications (HDD,
cell phone)
Low power consumption, low cost
High performance within power, cost
constraints (MIPS/mw, MIPS/$)
Fixed point & floating point
Better compilers - but users must be
informed
Hybrid DSP/ GP systems
DSP
Architecture
s
Professor S. Srinivasan
Electrical Engineering Department
I.I.T.-Madras, Chennai –600 036
[email protected]