Stanley Assignment

The document discusses vector processing and its advantages over scalar processing for scientific and engineering computations. It describes features of vector processing like processing elements in parallel. It also discusses profiling algorithms for a vector processor using QEMU and optimizing algorithms for vector architectures.

Uploaded by

Timson

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

Stanley Assignment

Uploaded by

Timson

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

DEPARTMENT OF COMPUTER

SCIENCE
FEDERAL POLYTECHNIC IDAH
ASSINGMENT ON

COM 314 – COMPUTER ARCHITECHTURE

NAME: OLAWUMI EMMANUEL OMOTAYO

LEVEL: HND1
DEPARTMENT: COMPUTER SCIENCE

QUESTION:
Write about application of vector processor in image processing, scalar processor.

INTRODUCTION
The development and design of modern microprocessor solutions requires a lot of effort that
is why in order to increase the efficiency of their development it is necessary to use complex tools
that allow the efficiency evaluation on a test sample.
This allows us to rapidly compare alternative approaches and reasonably choose optimal solutions for the
development and modification of new microprocessor architectures.
The paper relevance lies in the designing of the approaches and the choices of methods and tools for
the practical application of these optimization approaches that enhance the design quality of domestic
high-performance vector processors at the design stage. Thus, the goal of the work is to create
optimization approaches designed for application in the design process based on the new and

1
standard architectures and the elaboration of technologies that significantly improve the efficiency
of hardware and software development. The tasks that will make it possible to approach the goal amount
to investigating the command execution flow, monitoring the work with memory, and empirical
evaluation of the received data. To assess the productivity and optimality of software solutions, it is
convenient to use statistical methods applicable to various metrics of the object under study, for

example, time, cyclomatic complexity, deviation error and others.

The scientific and research computations involve many computations which require
extensive and high-power computers. These computations when run in a conventional computer
may take days or weeks to complete. The science and engineering problems can be specified in
methods of vectors and matrices using vector processing.
Vector processing is a central processing unit that can perform the complete vector input
in individual instruction. It is a complete unit of hardware resources that implements a sequential
set of similar data elements in the memory using individual instruction.

FEATURES OF VECTOR PROCESSING

There are various features of Vector processing which are as follows:
1. A vector is a structured set of elements. The elements in a vector are scalar quantities. A
vector operand includes an ordered set of n elements, where n is known as the length of
the vector.
2. Each clock period processes two successive pairs of elements. During one single clock
period, the dual vector pipes and the dual sets of vector functional units allow the
processing of two pairs of elements. As the completion of each pair of operations takes
place, the results are delivered to appropriate elements of the result register. The
operation continues just before the various elements processed are similar to the count
particularized by the vector length register.
3. In parallel vector processing, more than two results are generated per clock cycle. The
parallel vector operations are automatically started under the following two
circumstances.
4. When successive vector instructions facilitate different functional units and multiple
vector registers.
5. When successive vector instructions use the resulting flow from one vector register as the
operand of another operation utilizing a different functional unit. This phase is known as
chaining.

2
6. A vector processor implements better with higher vectors because of the foundation delay
in a pipeline.
7. Vector processing decrease the overhead related to maintenance of the loop-control
variables which creates it more efficient than scalar processing.

PROFILING WITH THE USE OF QEMU

As part of this work the processor architecture is analyzed with the use of QEMU [4, 5]
emulator, that allows emulating self-contained user applications written for one architecture on a
different one. The open source code of the program allows you to implement the necessary for
research metrics directly into the emulated model.

Fig. 1 shows a diagram illustrating model profiling in the tool set based on the QEMU virtual
machine. The simulator interprets guest programs instructions, which is actually a model of the
microprocessor and its parts that form the structure of the computer system. The simulator itself
is an application program that runs on a host machine under the host operating system.
In this case, for more convenient work the data need to be streamlined. The most effective way is
to convert the data for vector processing into a one-dimensional array. A cycle scan is applicable
to cycles with a small body size. It is similar to manual vectorization. In this case, you can use
each iteration more efficiently. Therefore, the body of the cycle is repeatedly duplicated
depending on the number of executing devices. In vector architectures, this optimization method
can be replaced by SIMD instructions. But such optimization can cause dependence on data, to
get rid of it additional variables are introduced. You should also consider the number of
iterations and the iterative step – their greatest common divisor should be equal to the iteration
step. In the absence of this condition, actions on the remaining block are performed outside the
cycle.
Not any algorithm can be vectorized; therefore, it is necessary to use loop optimization
methods applied to scalar architectures. Moving the base blocks would allow placing the code of
frequently executed basic commands close to each other and shorten the time for calculating the
addresses of the transitions. The decomposition of frequently encountered command blocks that

3
have many incoming and outgoing edges most likely indicates a non-optimal memory operation.
In this case, it may be possible to avoid unnecessary downloads of data and speed up the process
of the program execution. The use of embedded functions in cycles allows to avoid using the
stack when calling simple functions, which in some cases increases the performance of
algorithms.
A good option is to reorder the condition branches, based on their logic and frequency of
execution, in order to minimize the cost of predicting. It is recommended to place the most
probable branches at the beginning of branching. In this case, part of the logical conditions can
be replaced by arithmetic expressions. Obviously, this allows you to test fewer conditions, as
well as less frequently make conditional transfers, which are one of the most resource-intensive
operations.
One of the common difficulties in vector programming is the transformation of branching into
arithmetic expressions. A code with a large number of branches is difficult to vectorize, and with
vectorization it may even degrade performance due to the addition of new operations (replacing
branches).
This can be tested on the following;
 Image filtering by convolution with a window.
 Color spaces conversion (RGB-YUV).
 Preliminary and post processing FDCT and IDCT (forward / inverse discrete cosine
transform).
 Quantization and dequantization.
 Motion estimation.
 Intra-prediction.
The proposed solutions allow estimating parameters of algorithms for a vector processor and
determine a set of commands that make a significant contribution to performance and are
suitable for implementation on the developed architecture. To estimate the time spent, a high
precision timer was used from the chrono C ++ 11 library and a test image with a rainbow
gradient, providing the maximum color gamut. The choice of the best version of the algorithm
was made taking into account the minimization of the time spent.
The running time of the algorithms measuring the memory while maintaining the level of the
permissible error makes it possible to estimate possible distortions when converting algorithms
into their integer counterpart. The use of the standard deviation estimate allows us to take into
account the image size and reduce the individual perception factor:

where "W" and "H" are image sizes in pixels, "a" is the value in the reference algorithm, "b" is
the value in its integer version.
An example of the data obtained for the color space transformation algorithm is shown in Fig. 3.

4
The obtained data indicate that the memory allocated for the time variables can be reduced from
16 bits to 7 bits without loss of conversion quality. The estimates described are objective criteria
for accuracy, since they depend solely on numerical data. Nevertheless, these criteria do not
always correspond to subjective estimates. Images are intended for human perception, so the
only thing that can be said is that poor indicators of objective criteria usually correspond to low
subjective estimates, but good indicators
Images are intended for human perception, so the only thing that can be said is that poor
indicators of objective criteria usually correspond to low subjective estimates, but good
indicators of objective criteria do not guarantee high subjective estimates.

CONCLUSION
This research work provide an area for further work is the improvement of methods for compiler
performance evaluating in order to ensure the speed and reliability of the results, depending on
the level of optimizations. A possible solution is to use the obtained statistical information on a
set of test tasks for graphic processing.
A separate problem that requires careful study is the choice of a representative class of tasks
image processing, computer graphics, and computational tasks for performance analysis. For
example, from computer graphics, we can take a rendering of a large number of objects on one
of the libraries compiled with different keys. From computational problems, we can take the
calculation of 100,000 integrals, by some complicated method, and when building a program,
again, change keys. From image processing, we can take one of the filters and compile a
program with different keys.
Performance comparison of executable files obtained in different compilers is of particular
However, it should be remembered that when profiling an important step is the selection of
criteria. For example, we can count the execution time, the amount of memory required, the
number of operations, etc. When selecting criteria, it is necessary to carefully study the task, its
requirements, then select the most appropriate profiling method.

REFERENCES

J. Holewinski, R. Ramamurthi, M. Ravishankar, N. Fauzia, L.-N. Pouchet, A. Rountev, and P.

Sadayappan. Dynamic tracebased analysis of vectorization potential of applications. In
Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design
and Implementation, PLDI ’12, p. 371–382, New York, NY, USA, 2012.
G. C. Evans, S. Abraham, B. Kuhn, and D. A. Padua. Vector seeker: A tool for finding vector
potential. In Proceedings of the 2014 Workshop on Programming Models for
SIMD/Vector Processing, WPMVP ’14, p. 41–48, New York, NY, USA, 2014.
5
R. Barik, J. Zhao, and V. Sarkar. Automatic vector instruction selection for dynamic
compilation. In Proceedings of the 19th International Conference on Parallel
Architectures and Compilation Techniques, PACT ’10, p. 573-574, New York, NY,
USA, 2010.
QEMU. Emulator user documentation. URL: http://wiki.qemu.org/download/qemu-doc.html.
Bellard Fabrice. QEMU, a fast and portable dynamic translator. Proceedings of the annual
conference on USENIX Annual Technical Conference. ATEC '05. Berkley, USA:
USENIX Association, 2005, pp. 41-46.
Shen, J. P. and M. H. LIPASTI. Modern Processor Design: Fundamentals of Su-perscalar
Processors. New York: McGraw-Hill, 2005.
S.F. Kurmangaleyev. Methods for optimizing C/C ++ applications distributed in the LLVM bit-
code, taking into account the equipment specificity. Proceedings of ISP RAS, volume 24,
p. 127-144, 2013. DOI: 10.15514/ISPRAS-2013-24-7.
R. Levin, I. Newman, G. Haber. Complementing missing and inaccurate profiling using a
minimum cost circulation algorithm. Proceedings of the 3rd international conference on
High performance embedded architectures and compilers.— HiPEAC’08.— Berlin,
Heidelberg: Springer-Verlag, 2008, pp. 291–304.
M. Hohenauer, F. Engel, R. Leupers, G. Ascheid, and H. Meyr. A SIMD Optimization
Framework for Retargetable Compilers. ACM Trans. Archit. Code Optim. 6(1), 1–27,
2009.
A.C. Bovik, Handbook of image and video processing, 2nd ed. San Diego, Elsevier Academic
Press, 2005.
GCC 4.8.2 Manual. URL: http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/.

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Computer-Controlled Systems: Theory and Design, Third Edition
From Everand
Computer-Controlled Systems: Theory and Design, Third Edition
Karl J Åström
3/5 (1)
Credential Renewal Process Step by Step Guide
No ratings yet
Credential Renewal Process Step by Step Guide
9 pages
SOPs PDF
50% (2)
SOPs PDF
32 pages
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Ca Part 3
No ratings yet
Ca Part 3
20 pages
Aca305 2000
No ratings yet
Aca305 2000
8 pages
COA Unit V B
No ratings yet
COA Unit V B
5 pages
Advanced Backend Code Optimization
From Everand
Advanced Backend Code Optimization
Sid Touati
No ratings yet
Design Principles in Architecture
From Everand
Design Principles in Architecture
Rajendra Asan
No ratings yet
Multivector&SIMD Computers Ch8
No ratings yet
Multivector&SIMD Computers Ch8
12 pages
Vector
No ratings yet
Vector
38 pages
Module 1.6
No ratings yet
Module 1.6
53 pages
PP Unit 2 Tesseract
No ratings yet
PP Unit 2 Tesseract
38 pages
module-4-chapter-2
No ratings yet
module-4-chapter-2
42 pages
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
Vector Processor
No ratings yet
Vector Processor
13 pages
Jss Academy of Technical Education, BANGALORE-560060: Topic: Automatic Loop Vectorizarion in Parallel Computing
No ratings yet
Jss Academy of Technical Education, BANGALORE-560060: Topic: Automatic Loop Vectorizarion in Parallel Computing
14 pages
Implementing Linear Algebraalgorithms For Dense Matrices
No ratings yet
Implementing Linear Algebraalgorithms For Dense Matrices
22 pages
The Software Programmer: Basis of common protocols and procedures
From Everand
The Software Programmer: Basis of common protocols and procedures
S Mathioudakis
No ratings yet
Bangabandhu Sheikh Mujibur Rahman Maritime University Bangladesh
No ratings yet
Bangabandhu Sheikh Mujibur Rahman Maritime University Bangladesh
7 pages
Simple Vector Processor Modeled With VHDL
No ratings yet
Simple Vector Processor Modeled With VHDL
6 pages
Chapter 04
No ratings yet
Chapter 04
17 pages
Microprocessor Array System
No ratings yet
Microprocessor Array System
7 pages
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
From Everand
Python Machine Learning: Machine Learning Algorithms for Beginners - Data Management and Analytics for Approaching Deep Learning and Neural Networks from Scratch
Ahmed Ph. Abbasi
No ratings yet
Computerised Systems Architecture: An embedded systems approach
From Everand
Computerised Systems Architecture: An embedded systems approach
S Mathioudakis
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
No ratings yet
Unit Iii Data-Level Parallelism in Vector, Simd, and Gpu Architectures
26 pages
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
No ratings yet
7-VECTOR PROCESSING-04-Jan-2020Material - I - 04-Jan-2020 - VECTOR - PROCESSING PDF
31 pages
For Example: C (1:50) A (1:50) + B (1:50)
No ratings yet
For Example: C (1:50) A (1:50) + B (1:50)
7 pages
Vector
No ratings yet
Vector
42 pages
Vector Processor
No ratings yet
Vector Processor
83 pages
7f99 PDF
No ratings yet
7f99 PDF
8 pages
Chapter 04
No ratings yet
Chapter 04
47 pages
COA Chapter 9
No ratings yet
COA Chapter 9
36 pages
VLIW ARCHITECTURE and Pipeline
No ratings yet
VLIW ARCHITECTURE and Pipeline
5 pages
CS6461 - Computer Architecture Fall 2016 - Vector Operations
No ratings yet
CS6461 - Computer Architecture Fall 2016 - Vector Operations
47 pages
Dataflow and Reactive Programming Systems
From Everand
Dataflow and Reactive Programming Systems
Matt Carkci
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Vector-Logic Computing For Faults-As-Address Deductive Simulation
No ratings yet
Vector-Logic Computing For Faults-As-Address Deductive Simulation
15 pages
Module 5 Coa
No ratings yet
Module 5 Coa
11 pages
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
ACA20012021 - Vector & Multiple Issue Processor - 2
No ratings yet
ACA20012021 - Vector & Multiple Issue Processor - 2
21 pages
UNIT-V-Pipeline and Array Processing and Multi Processors
No ratings yet
UNIT-V-Pipeline and Array Processing and Multi Processors
51 pages
3D Hardware design:: Software applications for GPU
From Everand
3D Hardware design:: Software applications for GPU
S Mathioudakis
No ratings yet
Auto-Vectorization With The Intel Compilers: Is Your Code Ready For Sandy Bridge and Knights Corner?
No ratings yet
Auto-Vectorization With The Intel Compilers: Is Your Code Ready For Sandy Bridge and Knights Corner?
12 pages
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
XX-BSC Compact Vector Processing
No ratings yet
XX-BSC Compact Vector Processing
49 pages
CA Classes-196-200
No ratings yet
CA Classes-196-200
5 pages
Output
No ratings yet
Output
1 page
CS7103 - MultiCore Architecture Ppts Unit-II
No ratings yet
CS7103 - MultiCore Architecture Ppts Unit-II
43 pages
Key Principles of IT Architecture
From Everand
Key Principles of IT Architecture
Nelson Ambrose
No ratings yet
onur-digitaldesign-2020-lecture19-simd-beforelecture
No ratings yet
onur-digitaldesign-2020-lecture19-simd-beforelecture
64 pages
SIMD
No ratings yet
SIMD
44 pages
Scan Primitives For Vector Computers
No ratings yet
Scan Primitives For Vector Computers
10 pages
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
From Everand
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
Granino A. Korn
No ratings yet
Chapter 8
No ratings yet
Chapter 8
59 pages
Unit 2 ppt
No ratings yet
Unit 2 ppt
43 pages
CSE 820 Graduate Computer Architecture Vectors and Multiprocessor Introduction
No ratings yet
CSE 820 Graduate Computer Architecture Vectors and Multiprocessor Introduction
39 pages
Assignment 1: Sample Solution
No ratings yet
Assignment 1: Sample Solution
8 pages
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
No ratings yet
FALLSEM2021-22 CSE4001 ETH VL2021220104078 Reference Material I 26-Aug-2021 Module2-SIMD-VectorProcessors
16 pages
Performance Optimization Made Simple: A Practical Guide to Programming
From Everand
Performance Optimization Made Simple: A Practical Guide to Programming
William E. Clark
No ratings yet
l22 Vector
No ratings yet
l22 Vector
32 pages
CHAPTER ONE - 3 H
No ratings yet
CHAPTER ONE - 3 H
28 pages
2023 AJRCSugcapproved June 15 TH
No ratings yet
2023 AJRCSugcapproved June 15 TH
9 pages
Joseph Blessing Seminar
No ratings yet
Joseph Blessing Seminar
11 pages
Eniola trd,+UIJSLICTR 8 1 Pgs 39 55
No ratings yet
Eniola trd,+UIJSLICTR 8 1 Pgs 39 55
17 pages
Journal AnOnlineResultProcessing
No ratings yet
Journal AnOnlineResultProcessing
14 pages
Emma Project
No ratings yet
Emma Project
65 pages
Smart Card Ndii
No ratings yet
Smart Card Ndii
14 pages
CHAPTER ONE Rosemary
No ratings yet
CHAPTER ONE Rosemary
7 pages
01 Slide
No ratings yet
01 Slide
7 pages
Real Estate Price Prediction Based On Linear Regre
No ratings yet
Real Estate Price Prediction Based On Linear Regre
10 pages
Kubwageneralhospi TAL: Re:Ahi Abati Mothyojochegbe/Male/25Years
No ratings yet
Kubwageneralhospi TAL: Re:Ahi Abati Mothyojochegbe/Male/25Years
1 page
Ex Based
No ratings yet
Ex Based
31 pages
Developing A Machine Learning Portal For
No ratings yet
Developing A Machine Learning Portal For
161 pages
Data Management System Design of Disaste
No ratings yet
Data Management System Design of Disaste
9 pages
My New Abstract
No ratings yet
My New Abstract
1 page
Cgpa
No ratings yet
Cgpa
1 page
Trans 2
No ratings yet
Trans 2
1 page
StudyofeyeTraking IEEE
No ratings yet
StudyofeyeTraking IEEE
7 pages
Network
No ratings yet
Network
2 pages
Seminar
No ratings yet
Seminar
20 pages
Mobile App
No ratings yet
Mobile App
13 pages
An Overview On Edge Computing Research
No ratings yet
An Overview On Edge Computing Research
18 pages
Industrial Training
No ratings yet
Industrial Training
27 pages
Ai2Canvas Tutorial
No ratings yet
Ai2Canvas Tutorial
8 pages
Zoran Zr39160pqcg
No ratings yet
Zoran Zr39160pqcg
2 pages
Chapter 12: Using Interconnect: Switch Matrix
No ratings yet
Chapter 12: Using Interconnect: Switch Matrix
1 page
Course Syllabus: Welcome Fundamentals Course. This Is The First of Four Courses Required For The Google Cloud
No ratings yet
Course Syllabus: Welcome Fundamentals Course. This Is The First of Four Courses Required For The Google Cloud
4 pages
Advanced User Defined Function
No ratings yet
Advanced User Defined Function
4 pages
APPLE PAY - NO ❌OTP NEEDED
100% (1)
APPLE PAY - NO ❌OTP NEEDED
22 pages
Red Hat CodeReady Studio-12.16-Installation Guide-en-US
No ratings yet
Red Hat CodeReady Studio-12.16-Installation Guide-en-US
25 pages
Hik Studio
No ratings yet
Hik Studio
5 pages
Text Data Compression
No ratings yet
Text Data Compression
13 pages
1 - Frequency Count
67% (3)
1 - Frequency Count
30 pages
Log Report 14-4-2021
No ratings yet
Log Report 14-4-2021
3 pages
Cox Model Assumptions E Martingala Residuals - Easy Guides in R
No ratings yet
Cox Model Assumptions E Martingala Residuals - Easy Guides in R
14 pages
S1 Slides
No ratings yet
S1 Slides
18 pages
English for Management Information Systems (Final)
No ratings yet
English for Management Information Systems (Final)
163 pages
Peculiarities of 3D Compression of Noisy Multichannel Images
No ratings yet
Peculiarities of 3D Compression of Noisy Multichannel Images
4 pages
Thermal Take DH101 Manual
No ratings yet
Thermal Take DH101 Manual
25 pages
MS IDs
No ratings yet
MS IDs
7 pages
Getting Ready To Code - VBA Crack Course
No ratings yet
Getting Ready To Code - VBA Crack Course
14 pages
C# Interview Questions and Answers For Beginners
No ratings yet
C# Interview Questions and Answers For Beginners
13 pages
Kregel Academic and Professional 2009
No ratings yet
Kregel Academic and Professional 2009
44 pages
Easy Balancer Update Manual
100% (1)
Easy Balancer Update Manual
7 pages
Web Security
No ratings yet
Web Security
7 pages
ZMM r000 Material Details
No ratings yet
ZMM r000 Material Details
3 pages
Pabrik Lua
No ratings yet
Pabrik Lua
9 pages
Collabora Online Installation Guide
No ratings yet
Collabora Online Installation Guide
21 pages
Microsoft SQ L Server Notes For Professionals
No ratings yet
Microsoft SQ L Server Notes For Professionals
286 pages