0% found this document useful (0 votes)
2 views

C to Asm, Asm to c.ppt

The document outlines the processes of converting assembly language code to C and vice versa, detailing the programming model, assembly language structure, and C/C++ features. It covers topics such as data types, intrinsic functions, inline assembly, software pipelining, and examples of assembly instructions and optimizations. Additionally, it includes practical examples and explanations of how to implement these concepts in programming.

Uploaded by

shamimkausa47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

C to Asm, Asm to c.ppt

The document outlines the processes of converting assembly language code to C and vice versa, detailing the programming model, assembly language structure, and C/C++ features. It covers topics such as data types, intrinsic functions, inline assembly, software pipelining, and examples of assembly instructions and optimizations. Additionally, it includes practical examples and explanations of how to implement these concepts in programming.

Uploaded by

shamimkausa47
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

How to convert an assembly

language code into C and C code


into Assembly Language

By: Ali Hassan Soomro


BSCS from UBIT, University of Karachi
DAE in Electronics from SBTE
Facebook: https://web.facebook.com/AliiHassanSoomro
Gmail: [email protected]
Outline:

◼ Programming model
◼ Assembly language
▪ Assembly code structure / Data Types
▪ Assembly instructions
◼ C/C++
▪ Data types
▪ Intrinsic functions
▪ Optimizations
▪ Software Pipelining
▪ Inline Assembly
▪ Calling Assembly functions
Programming model

◼ Two register files: A and B


◼ 16 registers in each register file (A0-A15),
(B0-B15)
◼ A0, A1, B0, B1 used in conditions
◼ A4-A7, B4-B7 used for circular addressing
Assembly language structure
◼ A TMS320C6x assembly instruction includes up to seven items:
▪ Label
▪ Parallel bars
▪ Conditions
▪ Instruction
▪ Functional unit
▪ Operands
▪ Comment

Format of assembly instruction:

Label: parallel bars [condition] instruction unit operands ;comment


Parallel bars

|| : indicates that current instruction executes in


parallel with previous instruction, otherwise
left blank
Condition
◼ All assembly instructions are conditional
◼ If no condition is specified, the instruction executes always
◼ If a condition is specified, the instruction executes only if the
condition is valid
◼ Registers used in conditions are A1, A2, B0, B1, and B2
◼ Examples:
[A] ;executes if A ≠ 0
[!A] ;executes if A = 0

[B0] ADD .L1 A1,A2,A3


|| [!B0] ADD .L2 B1,B2,B3
Instruction

◼ Either directive or mnemonic


◼ Directives must begin with a period (.)
◼ Mnemonics should be in column 2 or higher
◼ Examples:
◼ .sect data ;creates a code section
◼ .word value ;one word of data
Functional units (optional)

◼ L units: 32/40 bit arithmetic/compare and 32 bit logic operations


◼ S units: 32-bit arithmetic operations, 32/40-bit shifts and 32-bit bit-field operations, 32-bit
logical operations, Branches, Constant generation, Register transfers to/from control
register file (.S2 only)
◼ M units: 16 x 16 multiply operations
◼ D units: 32-bit add, subtract, linear and circular address calculation, Loads and stores with
5-bit constant offset, Loads and stores with 15-bit constant, offset (.D2 only)
Operands

◼ All instructions require a destination operand.


◼ Most instructions require one or two source
operands.
◼ The destination operand must be in the same
register file as one source operand.
◼ One source operand from each register file per
execute packet can come from the register file
opposite that of the other source operand.
◼ Example:
▪ ADD .L1 A0,A1,A3
▪ ADD .L1 A0,B1,A2
Instruction format

◼ Fetch packet

◼ The same functional unit cannot be used in the


same fetch packet
▪ ADD .S1 A0, A1, A2 ;.S1 is used for
▪ || SHR .S1 A3, 15, A4 ;...both instructions
Arithmetic instructions
◼Add/subtract/multiply:
ADD .L1 A3,A2,A1 ;A1←A2+A3
SUB .S1 A1,1,A1 ;decrement A1
MPY .M2 A7,B7,B6 ;multiply LSBs
|| MPYH .M1 A7,B7,A6 ;multiply MSBs
Move and Load/store Instructions- Addressing
Modes

◼ Loading constants:
MVK .S1 val1, A4 ;move low halfword
MVKH .S1 val1, A4 ;move high halfword
◼ Indirect Addressing Mode:
LDH .D2 *B2++, B7 ;load halfword B7←[B2], increment B2
|| LDH .D1 *A2++, A7 ; load halfword A7←[A2], increment A2

STW .D2 A1, *+A4[20] ;store [A4]+20 words ← A2,


;preincrement/don’t modify A4
Example

◼ Calculate the values of register and memory


for the following instructions:
A2= 0x00000010, MEM[0x00000010] = 0x0, MEM[0x00000014]
= 0x1, MEM[0x00000018] = 0x2, MEM[0x0000001C] = 0x3,

LDH .D1 *++A2, A7 A2= ? A7= ?


LDH .D1 *A2--[2], A7 A2= ? A7= ?
LDH .D1 *-A2, A7 A2= ? A7= ?
LDH .D1 *++A2[2], A7 A2= ? A7= ?
Branch and Loop Instructions
◼ Loop example:
MVK .S1 count, A1 ;loop counter
|| MVKH .S2 count, A1
LOOP MVK .S1 val1, A4 ;loop MVKH .S1
val1, A4 ;body

SUB .S1 A1,1,A1 ;decrement counter


[A1] B .S2 Loop ;branch if A1 ≠ 0
NOP 5 ;5 NOPs for branch
Assembler Directives

◼ .short : initiates 16-bit integer


◼ .int (.word .long) : initiates 32-bit integer
◼ .float : 32-bit single-precision floating-point
◼ .double : 64-bit double-precision floating-point
◼ .trip :
◼ .bss
◼ .far
◼ .stack
Programming Using C

◼ Data types
◼ Intrinsic functions
◼ Inline assembly
◼ Linear assembly
◼ Calling assembly functions
◼ Code optimizations
◼ Software pipelining
Data types


◼ char, signed char
▪ 8 bits ASCII
◼ unsigned char
▪ 8 bits ASCII
◼ Short
▪ 16 bits 2's complement
◼ unsigned short
▪ 16 bits binary
◼ int, signed int
▪ 32 bits 2's complement
◼ unsigned int
▪ 32 bits binary
◼ long, signed long
▪ 40 bits 2's complement

◼ unsigned long
▪ 40 bits binary
◼ Enum
Intrinsic functions
◼ Available C functions used to increase
efficiency
▪ int_mpy(): MPY instruction, multiplies 16 LSBs
▪ int_mpyh(): MPYH instruction, multiplies 16 MSBs
▪ int_mpylh(): MPYHL instruction, multiplies 16
LSBs with 16 MSBs
▪ int_mpyhl(): MPYHL instruction, multiplies 16
MSBs with 16 LSBs
Inline Assembly

◼ Assembly instructions and directives can be


incorporated within a C program using the
asm statement
asm (“assembly code”);
Calling Assembly Functions

◼ An external declaration of an assembly


function can be called from a C program
extern int func();
Example

◼ Program that calculates S=n+(n-1)+…+1 by calling


assembly function
#include <stdio.h>
main()
{
short n=6;
short result;
result = sumfunc(n);
printf(“sum = %d”, result);
}
Example (continued)
◼ Assembly function:

.def _sumfunc
_sumfunc: MV .L1 A4,A1 ;n is loop counter
SUB .S1 A1,1,A1 ;decrement n

LOOP: ADD .L1 A4,A1,A4 ;A4 is accumulator


[A1] B.S2 LOOP ;branch if A1 ≠ 0
NOP 5 ;branch delay nops
B .S2B3 ;return from calling
NOP 5 ;five NOPS for delay
.end
Example

◼ Write a program that calculates the first 6


Fibonacci numbers by calling an assembly
function
Linear Assembly
◼ enables writing assembly-like programs without worrying
about register usage, pipelining, delay slots, etc.
◼ The assembler optimizer program reads the linear assembly
code to figure out the algorithm, and then it produces an
optimized list of assembly code to perform the operations.
◼ Source file extension is .sa
◼ The linear assembly programming lets you:
▪ use symbolic names
▪ forget pipeline issues
▪ ignore putting NOPs, parallel bars, functional units, register names
▪ more efficiently use CPU resources than C.
Linear Assembly Example
_sumfunc: .cproc np ;.cproc directive starts a C callable procedure
.reg y ;.reg directive use descriptive names for values that will be stored in registers
MVK np,cnt
loop: .trip 6 ; trip count indicates how many times a loop will iterate
SUB cnt,1,cnt
ADD y,cnt,y
[cnt] B loop
.return y
.endproc ; .endproc to end a C procedure
---------------------Equivalent assembly function------------------------------
.def _sumfunc
_sumfunc: MV .L1 A4,A1 ;n is loop counter
LOOP: SUB .S1 A1,1,A1 ;decrement n
ADD .L1 A4,A1,A4 ;A4 is accumulator
[A1] B .S2 LOOP ;branch if A1 ≠ 0
NOP 5 ;branch delay nops
B .S2 B3 ;return from calling
NOP 5 ;five NOPS for delay
.end
Software Pipelining

◼ A loop optimization technique so that all functional


units are utilized within one cycle. Similar to
hardware pipelining, but done by the programmer
or the compiler, not the processor
◼ Three stages:
▪ Prolog (warm-up): instructions needed to build up the loop
kernel (cycle)
▪ Loop kernel (cycle): all instructions executed in parallel.
Entire kernel executed in one cycle.
▪ Epilog (cool-off): Instructions necessary to complete all
iterations
Software pipelining procedure

◼ Draw a dependency graph


▪ Draw nodes and paths
▪ Write number of cycles for each instruction
▪ Assign functional units
◼ Set up a scheduling table
◼ Obtain code from scheduling table
Software pipelining example
for (i=0; i<16; i++)
sum = sum +
a[i]*b[i];
Dependency Graph
◼ LDH: 5 cycles
◼ MPY: 2 cycles
◼ ADD: 1 cycle
◼ SUB: 1 cycle
◼ LOOP: 6 cycles
Scheduling Table

Unit C1, C9.. C2, C10… C3, C11.. C4, C12… C5, C13… C6, C14… C7, C15… C8, C16…

.D1 LDH

.D2 LDH

.M1 MPY

.L1 ADD

.L2 SUB

Unit
.S2 C1, C9.. C2, C10… C3,
B C11.. C4, C12… C5, C13… C6, C14… C7, C15… C8, C16…
Prolog Kernel

.D1 LDH LDH LDH LDH LDH LDH LDH LDH


.D2 LDH LDH LDH LDH LDH LDH LDH LDH
.M1 MPY MPY MPY

.L1 ADD

.L2 SUB SUB SUB SUB SUB SUB SUB

.S2 B B B B B B
Assembly Code

;cycle 1
MVK .L2 16,B1 ;loop count
|| ZERO .L1 A7 ;sum
|| LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
;cycle 2
LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement count
;cycle 3
LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement
|| [B1] B .S2 LOOP
;cycle 4
LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement
|| [B1] B .S2 LOOP
;cycle 5
LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement
|| [B1] B .S2 LOOP
Assembly code
;cycle 6
LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement
|| [B1] B .S2 LOOP
|| MPY .M1x A2,B2,A6
;cycle 7
LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement
|| [B1] B .S2 LOOP
|| MPY .M1x A2,B2,A6
;cycles 8-21(loop kernel)
LOOP: LDH .D1 *A4++,A2 ;input in A2
|| LDH .D2 *B4++,B2 ;input in B2
|| [B1] SUB .L2 B1,1,B1 ;decrement
|| [B1] B .S2 LOOP
|| MPY .M1x A2,B2,A6 ;multiplication
|| ADD .L1 A6,A7,A7
;cycle 22 (epilog)
ADD .L1 A6,A7,A7 ;final sum
Example

◼ Use software pipelining in the following


example:

for (i=0; i<16; i++)


sum = sum + a[i]*b[i];
Loop unrolling
•A technique for reducing the loop overhead
•The overhead decreases as the unrolling factor increases
at the expense of code size
•Doesn’t work with zero overhead looping hardware DSPs

for (i=0; i<64; i++) for (i=0; i<64/4; i++)


{ {
sum +=*(data++); sum +=*(data++);
} sum +=*(data++);
sum +=*(data++);
sum +=*(data++);
}
Loop Unrolling example

◼Unroll the following loop by a factor of 2, 4,


and eight
for (i=0; i<64; i++)
{
a[i] = b[i] + c[i+1];
}
Code optimization steps

◼ When code performance is not satisfactory


the following steps can be taken:
▪ Use intrinsic functions
▪ Use compiler optimization levels
▪ Use profiling then convert functions that need
optimization to linear ASM
▪ Optimize code in ASM
Profiling using clock function
#include <time.h> /* in order to call clock()*/
main() {

clock_t start, stop, overhead;
start = clock(); /* Calculate overhead of calling
clock*/
stop = clock(); /* and subtract this value from The
results*/
overhead = stop − start;
start = clock();
/* code to be profiled */

stop = clock();
printf(”cycles: %d\n”, stop − start − overhead);
}
Code optimization

◼ Use instructions in parallel


◼ Eliminate NOPs
◼ Unroll loops
◼ Use software pipelining
Using Interrupts

◼ 16 interrupt sources
▪ 2 timer interrupts
▪ 4 external interrupts
▪ 4 McBSP interrupts
▪ 4 DMA interrupts
Loop program with interrupt
interrupt void c_int11 //ISR
{
int sample_data;
sample_data = input_sample(); //input data
output_sample(sample_data); //output data
}
void main()
{
comm_intr(); //init DSK, codec, McBSP
//enable INT11 and GIE
while(1); //infinite loop
}

You might also like