0% found this document useful (0 votes)
4 views

Documentation_compiler

This document outlines a compiler designed to convert C++ matrix multiplication operations into instructions for a Processor-in-Memory (PIM) architecture. It details the process of generating LLVM Intermediate Representation, extracting Three-Address Code, and creating ISA instructions for parallel execution across multiple cores. Key components include Clang/LLVM for conversion, a custom LLVM pass for TAC extraction, and a Python script for ISA generation, all aimed at optimizing matrix operations in a PIM environment.

Uploaded by

johnsneak63
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Documentation_compiler

This document outlines a compiler designed to convert C++ matrix multiplication operations into instructions for a Processor-in-Memory (PIM) architecture. It details the process of generating LLVM Intermediate Representation, extracting Three-Address Code, and creating ISA instructions for parallel execution across multiple cores. Key components include Clang/LLVM for conversion, a custom LLVM pass for TAC extraction, and a Python script for ISA generation, all aimed at optimizing matrix operations in a PIM environment.

Uploaded by

johnsneak63
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Compiler Implementation for Matrix

Multiplication using a PIM architecture

Faculty : Senthil Prakash


Slot : B1+TB1

------------------------------------COMPILATHON--------------------------------------

Team :

Rohit kumar singh 22BRS1258


Punish Midha 22BPS1150
Taher hussain kapadia 22BPS1113

Overview :
This document describes a compiler that transforms C++ matrix operations into custom instruction set
architecture (ISA) commands for a Processor-in-Memory (PIM) system. The process involves generating
LLVM Intermediate Representation, extracting Three-Address Code, and converting this code into machine
instructions compatible with the PIM architecture.

Process Flow :
C++ Code → LLVM IR → TAC Extraction → ISA Generation → Parallel Execution

Key Components :
 Clang / LLVM: Converts C++ to LLVM IR and provides analysis tools

 Custom LLVM Pass: Extracts Three-Address Code from LLVM IR

 Python Converter: Transforms TAC into ISA-compatible instructions

 Target Architecture: Uses 24-bit instruction format designed for DRAM subarray parallel processing
Implementation Steps :
1. Starting Point

Begin with a C++ program containing predefined matrices and a matrix multiplication function.

2. Generate LLVM IR

Output : matrix_ops.ll
Used ( -01 ) : Disables optnone attribute , allowing the LLVM pass to analyze the IR.

3. Extract Three-Address Code

 Custom LLVM pass (TACGenPass.cpp) identifies load, store, and arithmetic operations
 The pass outputs operations to tac_output.txt
 Compilation command:

Compile the pass : clang++ -shared -fPIC TACGenPass.cpp -o tacgen.so $(llvm-config --cxxflags --ldflags
--system-libs --libs core)

Run the pass : opt -load-pass-plugin ./tacgen.so -passes="tacgen" matrix_ops.ll -o /dev/null 2> tac_output.txt
OUTPUT : tac_output.txt

4. Generate ISA Instructions

 Python script maps TAC operations to the 24-bit ISA format


 Distributes instructions across multiple processing elements
 Execution command:

Run the script : python3 modified_tac_to_isa.py

OUTPUT : parallal_output.isa
ISA Instruction Format
24-bit instruction with the following fields:

 OPCODE (2 bits): 00=LOAD, 01=MULT, 10=STORE


 CODE_ID (6 bits): Processing element ID (0-3)
 Rd/Wr (2 bits): Read/Write flags
 Row Address (9 bits): DRAM row address
 Reserved (5 bits): For future expansion

Example Instruction

00 000001 11 000010000 00000 = LOAD from address 0x1000 on Core 1

Parallel Execution Strategy


 Instructions are distributed across 4 cores using round-robin assignment
 Each DRAM subarray processes independent iterations (row 0 on Core 0, row 1 on Core 1, etc.)

TAC Output Analysis

1. OP: %8 = add nuw nsw i64 %5, 1

o This is an addition operation (likely for loop iteration).

o nuw (No Unsigned Wrap) and nsw (No Signed Wrap) are LLVM flags indicating no
overflow occurs.

2. GEP: %12 = getelementptr inbounds [2 x i32], ptr %2, i64 %5, i64 %11

o This is a GetElementPtr (GEP) instruction used to calculate the address of an element in a


2D array.

o %5 and %11 are loop indices for accessing elements.

3. STORE: store i32 0, ptr %12, align 4, !tbaa !8

o This stores the value 0 into memory at %12. This likely corresponds to initializing C[i][j] = 0.

4. OP: %14 = add nuw nsw i64 %11, 1


o Another addition operation for loop iteration.

5. GEP: %19 = getelementptr inbounds [2 x i32], ptr %0, i64 %5, i64 %17

o GEP instruction to calculate the address of an element in matrix A.

6. LOAD: %20 = load i32, ptr %19, align 4, !tbaa !8

o Load the value from matrix A.

7. GEP: %21 = getelementptr inbounds [2 x i32], ptr %1, i64 %17, i64 %11

o GEP instruction to calculate the address of an element in matrix B.

8. LOAD: %22 = load i32, ptr %21, align 4, !tbaa !8

o Load the value from matrix B.

9. OP: %23 = mul nsw i32 %22, %20

o Multiply two loaded values (A[i][k] * B[k][j]) to compute a partial product.

10. OP: %24 = add nsw i32 %18, %23

o Add the partial product to the accumulator (C[i][j] += A[i][k] * B[k][j]).

11. STORE: store i32 %24, ptr %12, align 4, !tbaa !8

o Store the updated value back into matrix C.

12. OP: %25 = add nuw nsw i64 %17, 1

o Increment loop variable for the innermost loop.

}
}

You might also like