008 Architectural
008 Architectural
ARCHITECTURAL-LEVEL
SYNTHESIS
Giovanni De Micheli
Stanford University
Please read the
entire chapter 4
Outline
Motivation.
Compiling language models into
abstract models.
Behavioral-level optimization and
program-level transformations.
Architectural synthesis: an overview.
Synthesis
Transform behavioral into structural view.
Architectural-level synthesis:
Architectural abstraction level.
Determine macroscopic structure.
Example: major building blocks.
Logic-level synthesis:
Logic abstraction level.
Determine microscopic structure.
Example: logic gate interconnection.
Example
diffeq {
read (x, y, u, dx, a);
repeat {
xl = x + dx;
ul = u - (3 * x * u * dx) - (3 * y * dx);
yl = y + u * dx;
c = x < a;
x = xl; u = ul; y = yl;
}
until ( c ) ;
write (y);
}
Example of structures
Example
Architectural-level synthesis
Translate HDL models into sequencing
graphs.
Behavioral-level optimization:
Optimize abstract models independently from the
implementation parameters.
Hardware compilation:
Compile HDL model into sequencing graph.
Optimize sequencing graph.
Generate gate-level interconnection for a cell library.
Compilation
Front-end:
Semantic analysis:
Data-flow and control-flow analysis.
Type checking.
Resolve arithmetic and relational operators.
Behavioral-level optimization
Semantic-preserving transformations
aiming at simplifying the model.
Applied to parse-trees or during their
generation.
Taxonomy:
Data-flow based transformations.
Control-flow based transformations.
Tree-height reduction.
Constant and variable propagation.
Common sub-expression elimination.
Dead-code elimination.
Operator-strength reduction.
Code motion.
Tree-height reduction
Applied to arithmetic expressions.
Goal:
Split into two-operand expressions to exploit
hardware parallelism at best.
Techniques:
Balance the expression tree.
Exploit commutativity, associativity and
distributivity.
x = a +b c +d )
x = (a +d) +b c
x = a (b c d +e) )
x = a b c d +a e;
Examples of propagation
First Transformation type: Constant
propagation:
a = 0, b = a +1, c = 2 * b,
a = 0, b = 1, c = 2,
Sub-expression elimination
Logic expressions:
Performed by logic optimization.
Kernel-based methods.
Arithmetic expressions:
Search isomorphic patterns in the parse trees.
Example:
a = x +y,
a = x +y,
b = a +1,
b = a +1,
c = x +y,
c = a.
Operator-strength reduction:
a = x 2 ; b = 3 * x;
a = x * x; t = x << 1; b = x + t;
Code motion:
for (i = 1; i a * b) { }
t = a * b; for (i = 1; i t) { }
Model expansion.
Conditional expansion.
Loop expansion.
Block-level transformations.
Model expansion
Expand subroutine flatten hierarchy.
Useful to expand scope of other optimization
techniques.
Problematic when routine is called more than
once.
Example:
x = a +b;
y = a * b;
z = foo(x; y);
foo(p; q) {t = q - p; return(t); }
By expanding foo:
x = a +b; y = a * b; z = y - x
Conditional expansion
Transform conditional into parallel execution
with test at the end.
Useful when test depends on late signals.
May preclude hardware sharing.
Always useful for logic expressions.
Example:
y = ab; if (a) {x = b + d; } else {x = bd;}
can be expanded to: x = a(b +d) +a bd
and simplified as: y = ab; x = y +d(a +b)
Loop expansion
Applicable to loops with data-independent exit
conditions.
Useful to expand scope of other optimization
techniques.
Problematic when loop has many iterations.
Example:
x = 0;
Implementation parameters:
Area.
Performance:
Cycle-time.
Latency.
Throughput (for pipelined implementations).
Power consumption
Hardware modeling
Circuit behavior:
Sequencing graphs.
Building blocks:
Resources.
Constraints:
Timing and resource usage.
Resources
Functional resources:
Perform operations on data.
Example: arithmetic and logic blocks.
Memory resources:
Store data.
Example: memory and registers.
Interface resources:
Example: busses and ports.
Functional resources
Standard resources:
Existing macro-cells.
Well characterized (area/delay).
Example: adders, multipliers, ...
Application-specific resources:
Circuits for specific tasks.
Yet to be synthesized.
Example: instruction decoder.
Implementation constraints
Timing constraints:
Cycle-time.
Latency of a set of operations.
Time spacing between operation pairs.
Resource constraints:
Resource usage (or allocation).
Partial binding.
Sharing:
Bind a resource to more than one operation.
Operations must not execute concurrently.
First
multiplier
Third
multiplier
Fourth
multiplier
First ALU
Second
ALU
Solution
Four
Multipliers
Two ALUs
Four Cycles
Binding specification
Mapping from the vertex set to the set of
resource instances, for each given type.
Partial binding:
Partial mapping, given as design constraint.
Compatible binding:
Binding satisfying the constraints of the partial
binding.
Estimation
Resource-dominated circuits.
Area = sum of the area of the resources bound to the
operations.
Determined by binding.
Approaches to architectural
optimization
Multiple-criteria optimization problem:
area, latency, cycle-time.
Approaches to architectural
optimization
Area/latency trade-off,
for some values of the cycle-time.
Cycle-time/latency trade-off,
for some binding (area).
Area/cycle-time trade-off,
for some schedules (latency).
Pareto points
in three
dimensions
Area-latency trade-off
Rationale:
Cycle-time dictated by system constraints.
Resource-dominated circuits:
Area is determined by resource usage.
Approaches:
Schedule for minimum latency under resource
constraints
Schedule for minimum resource usage under
latency constraints
for varying constraints.
Summary
Behavioral optimization:
Create abstract models from HDL models.
Optimize models without considering
implementation parameters.