0% found this document useful (0 votes)
346 views

Algoritham and Architectural Level Methodologies

The document discusses power estimation at various levels of design including algorithm, architectural, and implementation levels. It provides examples of vector quantization and tree search encoding algorithms. Power is estimated by considering the switching capacitance of hardware modules and activity models of data. Spatial and temporal locality in algorithms can be exploited to reduce switching capacitance and power consumption during mapping and implementation.

Uploaded by

xyz333447343
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
346 views

Algoritham and Architectural Level Methodologies

The document discusses power estimation at various levels of design including algorithm, architectural, and implementation levels. It provides examples of vector quantization and tree search encoding algorithms. Power is estimated by considering the switching capacitance of hardware modules and activity models of data. Spatial and temporal locality in algorithms can be exploited to reduce switching capacitance and power consumption during mapping and implementation.

Uploaded by

xyz333447343
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

MANUKUMAR G.C M.

Tech 1st Sem(SP &VLSI)

Introduction Design flow

Algorithm level: Analysis and Optimization


Architectural level: Estimation and Synthesis

Power has become a critical design parameter in designing low power devices.
It has been demonstrated that the decisions made at this level have major impact on power consumption. By some of the known synthesis, optimization and estimation techniques, we can analyse the circuits at different stages with greater accuracy.

A design environment must include optimization and estimation tools at all level of the design flow. Effective decisions are made at the highest level of abstraction.

The decisions made at algorithm level are not accurate hence estimates are made at architectural level which gives more accurate result.

Vector quantization
It is one of the data compression method used in voice recognition and video system.

Vector quantization, Encoding and Decoding

The image is broken into a sequence of 4x4 pixel image. Each pixel is represented by an 8 bit word which is a vector of 16 words each having a 8 bit length. The vectors are compared with a previously generated codebook which contains 256 different combination of vectors.

After compression an 8 bit word is generated which defines the address of a code vector that approximates 4x4 vector image. It corresponds to a compression ratio of 16:1,since16 8-bit words are represented as a single word.

The power consumption of a CMOS chip is given as dynamic, short circuit and leakage power. Power= CeffVf
f frequency of operation V- supply voltage Ceff- effective switching capacitance. It combines two factors i,e
C- The capacitance being charged or discharged. - corresponding switching probability.

At the algorithm level we can predict the design decisions but not make absolute claim about power consumption.

Power dissipation is divided into two components


Algorithm inherent dissipationPower dissipation due to execution units and memory.

Implementation overheadPower dissipation due to control, interconnects and register

Inherent means that it is necessary for basic functionality and cannot be ignored irrespective of implementation. It serves as a prime factor for comparison between different algorithm. The power dissipation can be considered as a weighted sum of number of operations in the algorithm.

Weights must reflect respective switching capacitance depending on the operation.


Switching parameters are strongly dependent on hardware architecture. Mapping of different operations on to hardware resource effect correlation between signals. Hence mapping is not available until the architecture is not finalized. The power consumption at the algorithm level is same as that of hardware sharing.

It depends on the specific architectural platform. The power comes into account if it is not greater than the algorithm inherent dissipation. First order prediction are obtained on overhead components given some properties of algorithm and hardware architecture.

Distortion measure is given by full search through the entire codebook(FSVQ) combined with the standard mean square error(MSE).

C-codebook code vector X-4x4 vector representation i-index of individual pixel word

First order approximation is given by measuring number of executions required to search the codebook.(e.g. Multiplication,additon)

Computing MSE between two vectors require 16 memory access, 16 multiplies and 16 addition. In FSVQ it is done for each 256 vetors in the codebook.

Algorithm inherent dissipation: Operation count can be used to estimate the switching capacitance of the targeted hardware architecture.

Using black box capacitance model of the hardware a first order estimate of capacitance can be made.

FSVQ algorithmic inherent dissipation

First order analysis produces an overview to state which are the functions needed for optimization.

Ripple adder dissipates less power than the carry select adder then it fails to meet required throughput below 5v.
CSA continues to meet the required throughput when the voltage is set to 3v.

Estimation tools must be integrated with the design space exploration and optimization tools to provide an easy to use environment for designer. This provides an quick feedback for the designer about the effect of design choices.

Area and energy prediction of FIR Filter

Functional pipelining, algebraic transformation, loop transformation can be used to increase speed at low voltages.

This technique result in larger silicon area implementation hence termed has trading area for power.

Avoid wasteful activity


Activity at the algorithm level is given by size and complexity of the algorithm. (e.g. Operation count and word length) The algorithm with least number of operation is generally preferred.

Transformation include operation and strength reduction. Operation reduction includes


Common sub expression elimination Algebraic transformation Dead code elimination

Strength reduction

Replacing energy consuming operation by a combination of simpler operation.(e.g. Replacing expansion of multiplication by constants into shift and add operation)

Algorithms which possess some certain structural properties such as locality and regularity. The chip area has been reduced as this translates into a reduced bus capacitance.

Tree search encoding

It requires less computation as compared FSVQ. It performs binary search of the vector space instead of full search. Computational complexity is proportional to log N rather than N
2

N- number of vectors in the codebook

The input vector is compared with two code book entries. The branch which is closer to the input vector is assigned 0 and the other branch is assigned with 1 which is not considered for further analysis. There are 2*log2 (256)=16 distortion comparison have to be made instead of 256 in case of FSVQ.

It involves rearranging the difference between input vector X and two code vectors Ca and Cb.

Comparison is made between the two code vectors . Hence this can be under one summation.

The number of multiplication is reduced from 32 to 16 which is same for addition and subtraction.

Estimating power at architectural level is more accurate for two reasons


More precise information is obtained regarding the signal statistics, hence it yields more accurate model for hardware modules. The implementation overhead is now defined with respect to controllers, memories and buses which can be estimated accurately.

Power analysis at this level requires two entities


Capacitance model for hardware modules. Activity models for data or control signal.

Capacitance of RTL module (adder, multiplier) can be expressed as a function of complexity parameters.
E.g. The switching capacitance of a multiplier is proportional to square of its input word length.

Capacitance model for logarithmic shifter is given by


S & M are maximum shift values.

L=log2(M+1) represents number of shift stages.

Average power dissipation of a module is a function of the applied signal. It is difficult to find the capacitance model for all possible input patterns. Power factor approximation is employed to analyse power dissipation.

It uses experimentally determined weighting factor called the power factor to find out average power consumed by the given module. A more accurate model can be designed on the basis of twos complement data words. It can be divided into two regions on the basis of their behaviour
Activity in higher order data depends on temporal correlation. Lower order bits behave similar to white noise data.

Transition activity versus bit for typical data streams

The over all module is characterized by its capacitance model in the MSB & LSB. The break points can be determined from the applied signal statistics obtained from theoretical analysis.

Power consumption at the final stage of an algorithm depends on quality of its mapping onto the architecture. The mapping process must use the relevant properties of algorithm so it preserves data correlation.

Spatial locality can be utilized during the binding of operations to hardware units.

Spatial locality in parallel IIR filter

It consists of three distinct clusters


Algorithm was mapped into a single large unpartitioned chip with resource sharing different units of clusters.

Resource sharing was allowed between operations in the same cluster. At the final stage it can be used to reduce the size and allow to access capacitance of register files.

Processing one node of the search tree requires

17 memory access. 16 multiply/accumulate instruction Final add operation for comparison to find the location of the next node in the tree.

Total of 18 clock cycles are required for entire computation. Same number of calculations is required for each node. Hence 8x18=146 clock cycles are required.

The locality of reference enables partitioning of the memory into smaller memory associated to a single level of tree.

Locality of reference identification

Pipelined structure can be used to optimize system performance.


By using distributive architecture we can reduce switching capacitance.

Distributive memory

There are 8 controllers and processors, they are clocked 1/8 of the frequency, the capacitance switched per vector by these elements is unchanged.
As there is less overhead in reading from smaller memory , switching capacitance can be reduced.

Power and area breakdown

You might also like