Stanley Assignment
Stanley Assignment
SCIENCE
FEDERAL POLYTECHNIC IDAH
ASSINGMENT ON
BY
QUESTION:
Write about application of vector processor in image processing, scalar processor.
INTRODUCTION
The development and design of modern microprocessor solutions requires a lot of effort that
is why in order to increase the efficiency of their development it is necessary to use complex tools
that allow the efficiency evaluation on a test sample.
This allows us to rapidly compare alternative approaches and reasonably choose optimal solutions for the
development and modification of new microprocessor architectures.
The paper relevance lies in the designing of the approaches and the choices of methods and tools for
the practical application of these optimization approaches that enhance the design quality of domestic
high-performance vector processors at the design stage. Thus, the goal of the work is to create
optimization approaches designed for application in the design process based on the new and
1
standard architectures and the elaboration of technologies that significantly improve the efficiency
of hardware and software development. The tasks that will make it possible to approach the goal amount
to investigating the command execution flow, monitoring the work with memory, and empirical
evaluation of the received data. To assess the productivity and optimality of software solutions, it is
convenient to use statistical methods applicable to various metrics of the object under study, for
The scientific and research computations involve many computations which require
extensive and high-power computers. These computations when run in a conventional computer
may take days or weeks to complete. The science and engineering problems can be specified in
methods of vectors and matrices using vector processing.
Vector processing is a central processing unit that can perform the complete vector input
in individual instruction. It is a complete unit of hardware resources that implements a sequential
set of similar data elements in the memory using individual instruction.
2
6. A vector processor implements better with higher vectors because of the foundation delay
in a pipeline.
7. Vector processing decrease the overhead related to maintenance of the loop-control
variables which creates it more efficient than scalar processing.
Fig. 1 shows a diagram illustrating model profiling in the tool set based on the QEMU virtual
machine. The simulator interprets guest programs instructions, which is actually a model of the
microprocessor and its parts that form the structure of the computer system. The simulator itself
is an application program that runs on a host machine under the host operating system.
In this case, for more convenient work the data need to be streamlined. The most effective way is
to convert the data for vector processing into a one-dimensional array. A cycle scan is applicable
to cycles with a small body size. It is similar to manual vectorization. In this case, you can use
each iteration more efficiently. Therefore, the body of the cycle is repeatedly duplicated
depending on the number of executing devices. In vector architectures, this optimization method
can be replaced by SIMD instructions. But such optimization can cause dependence on data, to
get rid of it additional variables are introduced. You should also consider the number of
iterations and the iterative step – their greatest common divisor should be equal to the iteration
step. In the absence of this condition, actions on the remaining block are performed outside the
cycle.
Not any algorithm can be vectorized; therefore, it is necessary to use loop optimization
methods applied to scalar architectures. Moving the base blocks would allow placing the code of
frequently executed basic commands close to each other and shorten the time for calculating the
addresses of the transitions. The decomposition of frequently encountered command blocks that
3
have many incoming and outgoing edges most likely indicates a non-optimal memory operation.
In this case, it may be possible to avoid unnecessary downloads of data and speed up the process
of the program execution. The use of embedded functions in cycles allows to avoid using the
stack when calling simple functions, which in some cases increases the performance of
algorithms.
A good option is to reorder the condition branches, based on their logic and frequency of
execution, in order to minimize the cost of predicting. It is recommended to place the most
probable branches at the beginning of branching. In this case, part of the logical conditions can
be replaced by arithmetic expressions. Obviously, this allows you to test fewer conditions, as
well as less frequently make conditional transfers, which are one of the most resource-intensive
operations.
One of the common difficulties in vector programming is the transformation of branching into
arithmetic expressions. A code with a large number of branches is difficult to vectorize, and with
vectorization it may even degrade performance due to the addition of new operations (replacing
branches).
This can be tested on the following;
Image filtering by convolution with a window.
Color spaces conversion (RGB-YUV).
Preliminary and post processing FDCT and IDCT (forward / inverse discrete cosine
transform).
Quantization and dequantization.
Motion estimation.
Intra-prediction.
The proposed solutions allow estimating parameters of algorithms for a vector processor and
determine a set of commands that make a significant contribution to performance and are
suitable for implementation on the developed architecture. To estimate the time spent, a high
precision timer was used from the chrono C ++ 11 library and a test image with a rainbow
gradient, providing the maximum color gamut. The choice of the best version of the algorithm
was made taking into account the minimization of the time spent.
The running time of the algorithms measuring the memory while maintaining the level of the
permissible error makes it possible to estimate possible distortions when converting algorithms
into their integer counterpart. The use of the standard deviation estimate allows us to take into
account the image size and reduce the individual perception factor:
where "W" and "H" are image sizes in pixels, "a" is the value in the reference algorithm, "b" is
the value in its integer version.
An example of the data obtained for the color space transformation algorithm is shown in Fig. 3.
4
The obtained data indicate that the memory allocated for the time variables can be reduced from
16 bits to 7 bits without loss of conversion quality. The estimates described are objective criteria
for accuracy, since they depend solely on numerical data. Nevertheless, these criteria do not
always correspond to subjective estimates. Images are intended for human perception, so the
only thing that can be said is that poor indicators of objective criteria usually correspond to low
subjective estimates, but good indicators
Images are intended for human perception, so the only thing that can be said is that poor
indicators of objective criteria usually correspond to low subjective estimates, but good
indicators of objective criteria do not guarantee high subjective estimates.
CONCLUSION
This research work provide an area for further work is the improvement of methods for compiler
performance evaluating in order to ensure the speed and reliability of the results, depending on
the level of optimizations. A possible solution is to use the obtained statistical information on a
set of test tasks for graphic processing.
A separate problem that requires careful study is the choice of a representative class of tasks
image processing, computer graphics, and computational tasks for performance analysis. For
example, from computer graphics, we can take a rendering of a large number of objects on one
of the libraries compiled with different keys. From computational problems, we can take the
calculation of 100,000 integrals, by some complicated method, and when building a program,
again, change keys. From image processing, we can take one of the filters and compile a
program with different keys.
Performance comparison of executable files obtained in different compilers is of particular
However, it should be remembered that when profiling an important step is the selection of
criteria. For example, we can count the execution time, the amount of memory required, the
number of operations, etc. When selecting criteria, it is necessary to carefully study the task, its
requirements, then select the most appropriate profiling method.
REFERENCES