CSA HW 4

The document discusses the performance comparison between unscheduled and scheduled code in computer system architecture, highlighting that the scheduled code is 1.6 times faster, resulting in a 60% speedup. It also details the execution time per element for both codes, with unscheduled code taking 16 clock cycles and scheduled code taking 10 clock cycles. Additionally, it explains the benefits of loop unrolling, which reduces execution time to approximately 6.67 cycles per element by minimizing loop overhead and allowing parallel execution of operations.

Uploaded by

vinod.kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

CSA HW 4

Uploaded by

vinod.kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

CSCI 6461 Vinod Kumar (G39671299) FALL 2024 Semester

Computer System Architecture, HW 4

Dr. Lei He
Q. 3.14 Answer_a)
1. Unscheduled Code:
Clock cycle Instruction
1 addi x4, x1, #800 // Setup loop upper bound
2 fld F2, 0(x1) // Load X(i) → F2
3 stall
4 fmul.d F4, F2, F0 // F4 = a * X(i)
5 fld F6, 0(x2) // Load Y(i) → F6
6 stall
7 fadd.d F6, F4, F6 // F6 = a * X(i) + Y(i)
8 stall
9 stall
10 stall
11 fsd F6, 0(x2) // Store Y(i) = a * X(i) + Y(i)
12 addi x1, x1, #8 // Increment X index
13 addi x2, x2, #8 // Increment Y index
14 sltu x3, x1, x4 // Test if X(i) < X upper bound
15 stall
16 bnez x3, foo // Branch if needed
17 stall
Total = 16 clock cycles per element.
2. Unscheduled Code:
Clock cycle Instruction
1 addi x4, x1, #800 // Setup loop upper bound
2 fld F2, 0(x1) // Load X(i) → F2
3 fld F6, 0(x2) // Load Y(i) → F6 (parallel load)
4 fmul.d F4, F2, F0 // F4 = a * X(i)
5 addi x1, x1, #8 // Increment X index
6 addi x2, x2, #8 // Increment Y index
7 sltu x3, x1, x4 // Test if X(i) < X upper bound
8 fadd.d F6, F4, F6 // F6 = a * X(i) + Y(i)
9 Stall // Floating-point addition delay
10 bnez x3, foo // Branch if needed
11 fsd F6, -8(x2) // Store Y(i) = a * X(i) + Y(i)
Total = 10 clock cycles per element.
Execution Time per Element:
Unscheduled Code = 16 clock cycles per element.
Scheduled Code = 10 clock cycles per element.
The speedup from the scheduled code can be calculated as:
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 (𝑈𝑈𝑈𝑈𝑈𝑈𝑈𝑈ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)
Speedup =
𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸 𝑇𝑇𝑇𝑇𝑇𝑇𝑇𝑇 (𝑆𝑆𝑆𝑆ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒)
16
Speedup = = 1.6
10
This means the scheduled code is 1.6 times faster than the unscheduled code,
Or in percentage terms:
Percentage Speedup = (1.6 − 1) × 100 = 60%
So, the scheduled code is 60% faster than the unscheduled code.
Thus, the clock speed must be 60% faster for the unscheduled code to match the performance
of the scheduled code on the original hardware.

Answer_b)
Clock Cycle Instruction
1 addi x4, x1, #800 // Set up loop upper bound
2 fld F2, 0(x1) // Load X(i) → F2
3 fld F6, 0(x2) // Load Y(i) → F6 (parallel load)
4 fmul.d F4, F2, F0 // F4 = a * X(i)
5 fld F2, 8(x1) // Load X(i+1) → F2 for next iteration
6 fld F10, 8(x2) // Load Y(i+1) → F10 (parallel load)
7 fmul.d F8, F2, F0 // F8 = a * X(i+1)
8 fld F2, 8(x1) // Load X(i+2) → F2 for next iteration
9 fld F14, 8(x2) // Load Y(i+2) → F14 (parallel load)
10 fmul.d F12, F2, F0 // F12 = a * X(i+2)
11 fadd.d F6, F4, F6 // F6 = a * X(i) + Y(i) (complete the first iteration)
12 addi x1, x1, #24 // Increment X index for the next 3 iterations
13 fadd.d F10, F8, F10 // F10 = a * X(i+1) + Y(i+1)
14 addi x2, x2, #24 // Increment Y index for the next 3 iterations
15 sltu x3, x1, x4 // Test if X(i+3) < X upper bound for the loop
16 fadd.d F14, F12, F14 // F14 = a * X(i+2) + Y(i+2)
17 fsd F6, -24(x2) // Store the result Y(i) = a * X(i) + Y(i)
18 fsd F10, -16(x2) // Store the result Y(i+1) = a * X(i+1) + Y(i+1)
19 bnez x3, foo // Branch if the loop continues
20 fsd F14, -8(x2) // Store the result Y(i+2) = a * X(i+2) + Y(i+2)
The unrolled loop completes 3 iterations in 20 clock cycles.
20 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
Execution Time Per Element = ≈6.67 cycles per element.
3 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
• The loop must be unrolled three times to eliminate stalls and minimize loop overhead.
• The execution time per element after unrolling and scheduling is approximately 6.67
cycles per element.

Therefore, by unrolling the loop three times, we signiﬁcantly reduce the loop overhead and
allow multiple independent operations to be executed in parallel, leading to a substantial
improvement in execution time.

Ece 7373 HW#4
0% (1)
Ece 7373 HW#4
2 pages
SSTP Server For Windows 10
No ratings yet
SSTP Server For Windows 10
14 pages
AdvTopicCompilerSupportedILP
No ratings yet
AdvTopicCompilerSupportedILP
17 pages
Adv Topic Compiler Supported ILPSlides
No ratings yet
Adv Topic Compiler Supported ILPSlides
18 pages
CSCI 510: Computer Architecture Written Assignment 2 Solutions
No ratings yet
CSCI 510: Computer Architecture Written Assignment 2 Solutions
6 pages
cs433 Fa19 hw4 Solution
No ratings yet
cs433 Fa19 hw4 Solution
12 pages
Cs433 Sp12 Midterm Sol
No ratings yet
Cs433 Sp12 Midterm Sol
9 pages
En m3 Ex Sol
No ratings yet
En m3 Ex Sol
35 pages
Superscalar Architecture
No ratings yet
Superscalar Architecture
156 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
5 pages
MN Loop Unrolling
No ratings yet
MN Loop Unrolling
5 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
4 pages
HW3 Solution
No ratings yet
HW3 Solution
14 pages
Chapter 03 Solution
No ratings yet
Chapter 03 Solution
19 pages
Chapter 03
No ratings yet
Chapter 03
19 pages
Chapter 03
No ratings yet
Chapter 03
19 pages
CS222 - COAL - SOLUTION - Final - Spring2023
No ratings yet
CS222 - COAL - SOLUTION - Final - Spring2023
12 pages
Computer Architecture - Mid - Solution
No ratings yet
Computer Architecture - Mid - Solution
25 pages
Midterm Solutions Mar 30
No ratings yet
Midterm Solutions Mar 30
6 pages
Exploiting Instruction-Level Parallelism With Software Approaches
No ratings yet
Exploiting Instruction-Level Parallelism With Software Approaches
108 pages
F10 E1 Solution
No ratings yet
F10 E1 Solution
5 pages
Tut10 Selected Ans
No ratings yet
Tut10 Selected Ans
7 pages
Ass2 cs637 Merged Organized
No ratings yet
Ass2 cs637 Merged Organized
18 pages
Solution 2
No ratings yet
Solution 2
3 pages
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
No ratings yet
Lecture: Static ILP: Topics: Predication, Speculation (Sections C.5, 3.2)
26 pages
5.Advanced-1
No ratings yet
5.Advanced-1
60 pages
Hw5 Solution
No ratings yet
Hw5 Solution
11 pages
CH02 Solution
No ratings yet
CH02 Solution
10 pages
CH02 Solution-1 PDF
No ratings yet
CH02 Solution-1 PDF
10 pages
111 Computer Organization - Midterm
No ratings yet
111 Computer Organization - Midterm
6 pages
Solution Manual For Digital Systems Design Using VHDL 3rd Edition by Roth John ISBN 1305635140 9781305635142 2024 Scribd Download Full Chapters
100% (17)
Solution Manual For Digital Systems Design Using VHDL 3rd Edition by Roth John ISBN 1305635140 9781305635142 2024 Scribd Download Full Chapters
46 pages
Exam19s2 Answers
No ratings yet
Exam19s2 Answers
12 pages
Unit II
No ratings yet
Unit II
84 pages
Short Exam 1
No ratings yet
Short Exam 1
2 pages
No. of Cycles IF ID EXE MEM WB
No ratings yet
No. of Cycles IF ID EXE MEM WB
5 pages
Cs433 Fa20 Hw3 Solution
No ratings yet
Cs433 Fa20 Hw3 Solution
15 pages
Announced Quiz 3: ECE511/CSE511 Computer Architecture
No ratings yet
Announced Quiz 3: ECE511/CSE511 Computer Architecture
1 page
OsChapter_6
No ratings yet
OsChapter_6
12 pages
CCEE 213_2006_2007_II_Final
No ratings yet
CCEE 213_2006_2007_II_Final
10 pages
CSE 560 - Practice Problem Set 4 Solution
No ratings yet
CSE 560 - Practice Problem Set 4 Solution
3 pages
04 30067 Electronic Engineering January 2022 Exam Paper
No ratings yet
04 30067 Electronic Engineering January 2022 Exam Paper
10 pages
24mv06_Assigment2
No ratings yet
24mv06_Assigment2
7 pages
Yash Patel 22125040 Ecn
No ratings yet
Yash Patel 22125040 Ecn
3 pages
2005 Computer Architecture Solutions
No ratings yet
2005 Computer Architecture Solutions
11 pages
Experiment With FPGA (Field Programmable Gate Array) : Ans 1: 2-Bit Multiplier
No ratings yet
Experiment With FPGA (Field Programmable Gate Array) : Ans 1: 2-Bit Multiplier
4 pages
CEG 2136 - Fall 2016 - Midterm
No ratings yet
CEG 2136 - Fall 2016 - Midterm
9 pages
Quiz Questions
No ratings yet
Quiz Questions
2 pages
Ca Mid1 2017
No ratings yet
Ca Mid1 2017
9 pages
Algorithm and Design
No ratings yet
Algorithm and Design
6 pages
Assignment#2 Solution
No ratings yet
Assignment#2 Solution
8 pages
midterm-sol
No ratings yet
midterm-sol
7 pages
COMP1411 Final Exam Question Book
No ratings yet
COMP1411 Final Exam Question Book
10 pages
Microprocessor Lab1
No ratings yet
Microprocessor Lab1
17 pages
Assignment Ar
No ratings yet
Assignment Ar
5 pages
Computer Architecture: Ph.D. Qualifiers Examination - Sample Questions
No ratings yet
Computer Architecture: Ph.D. Qualifiers Examination - Sample Questions
2 pages
ps1 Sol
No ratings yet
ps1 Sol
11 pages
Proposal For Team K 1. Team Members
No ratings yet
Proposal For Team K 1. Team Members
4 pages
575339074-VWV-FINALY-1
No ratings yet
575339074-VWV-FINALY-1
11 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
150+ C Pattern Programs
From Everand
150+ C Pattern Programs
Hernando Abella
No ratings yet
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
4 pages
Council Meeting Date: 3/3/15 Agenda Item: 7.C Council Agenda Packet For 3/3/15 Council Agenda Packet For 3/3/15
No ratings yet
Council Meeting Date: 3/3/15 Agenda Item: 7.C Council Agenda Packet For 3/3/15 Council Agenda Packet For 3/3/15
22 pages
AASHTO Standard Specifications For Highway Bridges 17th Ed - Errata Only
100% (1)
AASHTO Standard Specifications For Highway Bridges 17th Ed - Errata Only
56 pages
Arts 9 Unit 2 3rd Quarter
No ratings yet
Arts 9 Unit 2 3rd Quarter
32 pages
Google Fiber
No ratings yet
Google Fiber
18 pages
Akilas Mebrahtom Resume
No ratings yet
Akilas Mebrahtom Resume
1 page
AWS Ac Ra Web 01
No ratings yet
AWS Ac Ra Web 01
1 page
Is 1944 Part - 5
No ratings yet
Is 1944 Part - 5
26 pages
Aslan 200 CFRP Rebar
No ratings yet
Aslan 200 CFRP Rebar
8 pages
Deepseek v2 Tech Report
No ratings yet
Deepseek v2 Tech Report
50 pages
Acoustics of Gol Gumbaz
100% (1)
Acoustics of Gol Gumbaz
12 pages
Achyut Kanvinde
100% (1)
Achyut Kanvinde
18 pages
MyEclipse Struts 1.x Tutorial
No ratings yet
MyEclipse Struts 1.x Tutorial
19 pages
SG 246351
No ratings yet
SG 246351
390 pages
Ezpdf Reader 2 2 4 0 Build 220
No ratings yet
Ezpdf Reader 2 2 4 0 Build 220
2 pages
Google Sketchup Plugin For Archi Cad 15 Tutorial PDF
No ratings yet
Google Sketchup Plugin For Archi Cad 15 Tutorial PDF
3 pages
555 Timer IC
No ratings yet
555 Timer IC
69 pages
Types of Stairs Used in Building Construction
No ratings yet
Types of Stairs Used in Building Construction
4 pages
How Cloud Storage Works
No ratings yet
How Cloud Storage Works
6 pages
Chapter 7 Changing Cultural Traditions
No ratings yet
Chapter 7 Changing Cultural Traditions
2 pages
Ex. No: 1 Date: GUI Components, Font and Colours Aim
No ratings yet
Ex. No: 1 Date: GUI Components, Font and Colours Aim
10 pages
Instrumentation Pocket Guide
No ratings yet
Instrumentation Pocket Guide
59 pages
ZX
No ratings yet
ZX
6 pages
LBC Fence Main & Addendum Contract Boq & Report Progress
No ratings yet
LBC Fence Main & Addendum Contract Boq & Report Progress
4 pages
Study of Cracks in Buildings PDF
No ratings yet
Study of Cracks in Buildings PDF
15 pages
The Festival: by H. P. Lovecraft
No ratings yet
The Festival: by H. P. Lovecraft
7 pages
IMG - 0165 PSME Code 2008 157
No ratings yet
IMG - 0165 PSME Code 2008 157
1 page
CE331 Lecture1 Introduction
No ratings yet
CE331 Lecture1 Introduction
13 pages
Setting Up YAMJ
No ratings yet
Setting Up YAMJ
1 page

CSA HW 4

Uploaded by

CSA HW 4

Uploaded by

CSCI 6461 Vinod Kumar (G39671299) FALL 2024 Semester

Computer System Architecture, HW 4

You might also like