SlideShare a Scribd company logo
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1417
Latin Square Computation of Order-3 using Open CL
Avnish Kansal1, Ashish Chaturwedi2
1,2Department of Computer Science & Engineering, Carlox Teacher’s University, Ahmedabad, Gujarat
---------------------------------------------------------------------------***---------------------------------------------------------------------------
Abstract: Latin sqaure is widely used in steganography,
cryptography, digital watermarks, computer games,
sudoku, graph analysis, error correcting codes; generate
magic squares, statistics and mathematical field. The
Sudoku puzzles are a special case of Latin squares. When
we have to make the latin computation using the
sequential algorithm then it waste more clock time. By
using parallel programming (OpenCL) the time taken is
reduced and throughput is increased. Traditionally Latin
square methodology is based on heuristic cell based
technique and generates random Latin square using
genetic algorithmic approach both consumes high
processing time and decreases the throughput. Here we
are presenting the algorithm by using parallel processing
environment using OpenCL for computing latin square of
order3.
Keywords: OpenCL, gnuplot, Sequential architecture,
parallel architecture, GPU.
I. Introduction:
Latin square is an 𝑛×𝑛 array in which each cell is having at
most one symbol, chosen form an n-set, such that every
symbol occurs at most one time in each row and at- most
one time in each column. The “Latin square” name was
stimulated by mathematical papers by Leonhard Euler.
Two latin squares are said to be orthogonal if both Latin
squares of the same size such that when one latin square is
superposed on the other latin, each letter of the one
coincides once with each letter of the other. The two Latin
squares are held to be conjugate if the rows of one are the
columns of the other that is if the rows and columns of a
square be interchanged then conjugate square is
generated. An Adjugacy is a generality of the concept to
conjugacy in which a permutation of the constraints of one
generates another [12]. Each Latin square is defined as a
triple (r,c,s), where r is the row, c is the column, and s is
the symbol and from this triplet we attain a set of n2 triples
called the orthogonal array representation of the
square. Latin square of n x n order, in which every row is
derivative from any other in a cyclic permutation of
degree n, or by a power of such a permutation, is
a cyclic Latin square [9].
While the implementation done using sequential
algorithms techniques but with help of high level
languages parallel processing technique we are able to
decrease the processing time for matrix processing
application [3]. As the problem is divided into discrete set
of instances which are solved concurrently [1]. With help
of this parallel computation technique we are able to
execute two or more instructions at a same time
simultaneously. While executing the sequential algorithms
on a CPU it runs slower. In the proposed system the
sequential algorithms which have task parallelism or data
parallelism those algorithms are implemented to OpenCL
[14]. By the help of OpenCL we minimizes overhead on the
CPU and makes matrix processing run faster and efficiently
to get higher throughput.
Concurrency is the way to sharing of multiple resources in
a software. For the applications that are naturally parallel
the concurrency provides an abstraction [5].
When the execution of the multiple threads running in
parallel, this means that the active thread running
simultaneously on different hardware resources and
processing elements [10]. The execution of the
simultaneous threads is provided by the platforms, for
achieving parallel computing. In latest computing
machines we have SIMD or MIMD which have capabilities
to exploit either data level or task level parallism[6].
Task level parallism, to handle different number of tasks,
within a single problem at the same time. The efficiency of
this model will depend on the independent operations of
the task [6].
Data level parallelism, to handle the discrete chunks of
the same task at the same time simultaneously. The
efficiency of this model will depend on the independent
operations of the task [6].
1.1 Existing System & Proposed System
Traditionally all the computations and execution of
instructions are handled by the CPU in the computer.
Architecturally, the CPU consists of very few cores with
lots of cache memory which are able to handle a few
software threads at a time concurrently. These few cores
are used to optimize the query by sequential serial
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1418
processing architecture. So to alleviate the load of the CPU
by handle all its advanced computations which are
necessary to project the final display on the monitor a
concept of GPU’s being introduced. The capability of the
GPUs with 100+ cores to process thousands of threads can
accelerate software by 100x over a CPU alone. The GPUs
have massively parallel architecture which consists of
thousands of smaller and more efficient cores that are
designed for handling multiple tasks simultaneously. This
is the reason for the wide and mainstream acceptance of
the GPU’s now a day. The GPU-accelerated computing has
now grown into a mainstream movement that is supported
by the latest operating systems from Apple (with OpenCL)
and Microsoft (using DirectCompute). This accelerated
computing consist of graphics processing unit (GPU)
together with a CPU to accelerate scientific, analytics,
engineering, consumer, and enterprise applications.
1.2 Defining Algorithm
The parallel processing algorithms are intended for Latin
square. The representation of algorithms is presented in
Figure 1.The first step is input matrix of order 3, we have
input as a matrix in OpenCL programming code. The
matrix have complex data in the form of array values when
we applying sequential algorithm on an matrix it takes
more of time for execution on CPU. But if we are applying
the parallel processing concepts then we positively we
reduce time taken by the matrix execution.
Second step, is decomposing the input matrix according to
task parallelism or data parallelism technique. Then this
divided matrix is being used in third step for concurrent
execution
Third step, is defining the individual sub matrices which
are further being processed by various processing
elements.
Fourth step, after the division of matrix into sub matrices
each individual sub matrix is sent to number of processing
elements simultaneously in GPU.
Fifth step, the sub matrices result is being calculated with
the help of processing elements concurrently.
Sixth step is to combine the results of all matrices in single
processing element so as to obtain the final output of input
matrix in reduced time using parallel architecture.
Figure 1 Flowchart of Proposed Work
Seventh step, is to store the final result of input matrix
from various sub matrices which are further being saved
in the dynamic random access memory (DRAM) of
graphical processing unit.
Eighth step, after the final result is saved in the DRAM of
GPU; that result is copy to the DRAM of CPU for displaying
the final output to the user.
2. Implementing Latin Square using OpenCL:
The algorithm which we are chosen to parallel the matrix
over a number of workgroups, the matrix is further
divided into number of chunks as per the coarse grained
division. The workgroup is the part of a matrix by which
we make the sub-matrix stored in the on chip local
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1419
memory of the GPU. These partial latin square are further
reduced into the single main latin square results.
2.1 Overview is as follows, when we begin to design a
OpenCL kernel to the respective hardware we have to take
care work-items and the size of the work_group. The local
memory is resides in the work_group and we share the
data inside the work_group. The local memory is not
shared in the work_group only. If we want to share the
results of the each work_group to another then we are not
able to do this. The work_group in OpenCL consists of
work_items which further share between the local
memories within that work_group. We are performing all
this to reduce the memory overheads by storing sub-
matrix in local memory.
After that we have to define more work_groups for
exchanging more local memory data into the global
memory but we must assure that the number of
work_groups for efficient use of local memory and reduce
overhead on the global memory the number should be
close to the number of compute units we have in our
hardware.
To traverse the total input matrix we use here, global Ids,
local Ids, group Ids, group size, etc.
Our OpenCL program consists of two parts: First is kernel
part, the instances of kernel are copied to the different
compute units and the kernel is executed on the each
compute unit individually. Another is host code which we
have to add to work for the different models of the OpenCL
which is executed on the host or CPU.
The host program defines the context for the kernels and
manages their execution. Each OpenCL device has a
command queue, where the host programs are queued for
kernel execution and memory transfer.
The core of the OpenCL execution model is defined by
execution of kernels. When kernel is executed an index
space is defined which is an instance of the kernel as in our
problem. Each work_item executes the same code but the
execution pathway and the data used will be different.
Work items are organized into workgroups. Each
work_group is assigned a unique workgroup_Id and each
work_item in the workgroup assigned the local_Id.
A single work_group can be identified by its global_Id or
by its workgroup_Id or local_Id. The work_item in a single
work_group executes concurrently on a processing
elements of a single compute units. Work_item in a
work_group can synchronize each other and share data
through local memory in the compute unit. All the
work_items has read and write access to any position in
the global memory. The global and constant memory can
also be accessed from the host processor before and after
kernel execution.
3 Experiments and results: For each latin square, both
the GPU kernel code and CPU serial code are designed. The
processing that takes place on the CPU; the kernel code in
the algorithm as an instance copied to the GPU. The kernel
is executed on compute unit and after that the results are
copied to the CPU.
Figure 2: Latin Square Computation (Order-3) order v/s
time graph using procedural programming code technique
having more time complexity.
Figure 3: Latin Square Computation (order-3) time v/s
order graph using OpenCL code having reduced time
complexity.
The speed up of the latin square computation by the GPU is
significantly improves the computing speed by reducing
the time complexity and throughput. Here the graph
depicts while using the procedural programming
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072
© 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1420
technique the time complexity increases as the order for
latin square increased. But when the same code is
implemented using OpenCL the time complexity decreases
tremendously.
4. Conclusions
In this paper a framework is tried to be developed an
efficient algorithmic approach to find a solution to
different latin square to enhance the existing approach. We
implemented the Latin square problem definition of order-
3 in GPGPU (GP2U), which providing heterogeneous
environment to execute and capability to reduce time
complexity which improves the performance of Latin
square computation over intensive domains latin square
We executed the latin square by using OpenCL
environment and compared with the sequential
implementations on CPU. On CPU algorithm takes very
huge amount of time. Obviously, the time taken becomes
low on GPU device. It provides novel and efficient
acceleration technique for matrix calculation and is cheap
in hardware implementation.
Future work is to gain deeper knowledge about the
parallization techniques to make best use of GPU device
and to work with other algorithms related to these project
concepts.
5. References
[1] “Heterogeneous computing with OpenCL” by Benedict
gaster,british libraries,printed USA
[2] http://www.khronos.org/OpenCL ,”Khronos Group”.
[3] Roberto Fontana, Random Latin squares and Sudoku
designs generation,2013
[4] Nan Zhang, Yun-shan Chen, Jian-li Wang. “Image
Parallel Processing Based on GPU”. International
Conference on Advanced Computer Control, March 2010.
[5] Pardalos P.M., Xue, J, “The maximum clique problem”,
Journal of Global Optimization, 4, 1994, 301—328.
[6] Demetres Christofides, Klas Markstrom (2003)
Random Latin square graphs
[7]http://developer.amd.com/pages/default.aspx
“University Kit 1.0”.
[8] C.Colbourn(1984) The Complexity of completing
partial Latin squares. Discrete Applied Mathematics8: 25-
30. Doi:10.1016/0166-218X(84)90075-1
[9] Denes, J. and A. Keedwell. 1991. Latin Squares: New
Developments in the Theory and applications. North-
Holland.
[10] AMD Accelerated Parallel Processing OpenCL
Programming Guide1.pdf
[11] Jacobson, M.T and P. Matthews (1996) Generating
uniformly distributed random Latin squares. Journals of
Combinatorial Designs 4(6), 405-406
[12] Brendan D. McKay and Ian M. Wanless(2000) On the
number of Latin squares Australian National University,
Canberra, ACT 0200, Australia
[13] J.A. Bate, G.H.J. van Rees, The Size of the Smallest
Strong Critical Set in a Latin Square University of
Manitoba, Winnipeg, Manitoba.

More Related Content

What's hot (18)

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
IJCNCJournal
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
Dr Sandeep Kumar Poonia
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
Pankaj Kumar Jain
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
iosrjce
 
Parallel computation
Parallel computationParallel computation
Parallel computation
Jayanti Prasad Ph.D.
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYSPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
csandit
 
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing SystemHybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
CSCJournals
 
Todtree
TodtreeTodtree
Todtree
Manasa Prasad
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
ijcses
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
Geoffrey Fox
 
A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...
A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...
A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...
iosrjce
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
MapReduce in Cloud Computing
MapReduce in Cloud ComputingMapReduce in Cloud Computing
MapReduce in Cloud Computing
Mohammad Mustaqeem
 
Bt0070
Bt0070Bt0070
Bt0070
Simpaly Jha
 
Vol 3 No 1 - July 2013
Vol 3 No 1 - July 2013Vol 3 No 1 - July 2013
Vol 3 No 1 - July 2013
ijcsbi
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
NETWORK-AWARE DATA PREFETCHING OPTIMIZATION OF COMPUTATIONS IN A HETEROGENEOU...
IJCNCJournal
 
program partitioning and scheduling IN Advanced Computer Architecture
program partitioning and scheduling  IN Advanced Computer Architectureprogram partitioning and scheduling  IN Advanced Computer Architecture
program partitioning and scheduling IN Advanced Computer Architecture
Pankaj Kumar Jain
 
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
Efficient Dynamic Scheduling Algorithm for Real-Time MultiCore Systems
iosrjce
 
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHYSPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
SPEED-UP IMPROVEMENT USING PARALLEL APPROACH IN IMAGE STEGANOGRAPHY
csandit
 
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing SystemHybrid Model Based Testing Tool Architecture for Exascale Computing System
Hybrid Model Based Testing Tool Architecture for Exascale Computing System
CSCJournals
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
Pietro Michiardi
 
Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...Pretzel: optimized Machine Learning framework for low-latency and high throug...
Pretzel: optimized Machine Learning framework for low-latency and high throug...
NECST Lab @ Politecnico di Milano
 
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
ijcses
 
Parallel Computing 2007: Overview
Parallel Computing 2007: OverviewParallel Computing 2007: Overview
Parallel Computing 2007: Overview
Geoffrey Fox
 
A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...
A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...
A survey of Parallel models for Sequence Alignment using Smith Waterman Algor...
iosrjce
 
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node CombinersHadoop Mapreduce Performance Enhancement Using In-Node Combiners
Hadoop Mapreduce Performance Enhancement Using In-Node Combiners
ijcsit
 
Vol 3 No 1 - July 2013
Vol 3 No 1 - July 2013Vol 3 No 1 - July 2013
Vol 3 No 1 - July 2013
ijcsbi
 

Similar to IRJET- Latin Square Computation of Order-3 using Open CL (20)

General Purpose GPU Computing
General Purpose GPU ComputingGeneral Purpose GPU Computing
General Purpose GPU Computing
GlobalLogic Ukraine
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
João Paulo Leonidas Fernandes Dias da Silva
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Performance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using openclPerformance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using opencl
eSAT Publishing House
 
Open CL For Speedup Workshop
Open CL For Speedup WorkshopOpen CL For Speedup Workshop
Open CL For Speedup Workshop
Ofer Rosenberg
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
ChangWoo Min
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
Andreas Schreiber
 
SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel
NECST Lab @ Politecnico di Milano
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
Synthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL ProgramsSynthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL Programs
Nikos Bellas
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
Pavel Tsukanov
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
AnastasiaStulova
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Pradeep Singh
 
Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam Mustafa
HAMMAD GHULAM MUSTAFA
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmap
Manolis Vavalis
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
William Cunningham
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
Akhila Prabhakaran
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
Tomasz Bednarz
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
George Papaioannou
 
Accelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8Parallel and Distributed Computing Chapter 8
Parallel and Distributed Computing Chapter 8
AbdullahMunir32
 
Performance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using openclPerformance analysis of sobel edge filter on heterogeneous system using opencl
Performance analysis of sobel edge filter on heterogeneous system using opencl
eSAT Publishing House
 
Open CL For Speedup Workshop
Open CL For Speedup WorkshopOpen CL For Speedup Workshop
Open CL For Speedup Workshop
Ofer Rosenberg
 
A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...A Survey on in-a-box parallel computing and its implications on system softwa...
A Survey on in-a-box parallel computing and its implications on system softwa...
ChangWoo Min
 
OpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel ProgrammingOpenCL - The Open Standard for Heterogeneous Parallel Programming
OpenCL - The Open Standard for Heterogeneous Parallel Programming
Andreas Schreiber
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
chiportal
 
Synthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL ProgramsSynthesis of Platform Architectures from OpenCL Programs
Synthesis of Platform Architectures from OpenCL Programs
Nikos Bellas
 
Thinking in parallel ab tuladev
Thinking in parallel ab tuladevThinking in parallel ab tuladev
Thinking in parallel ab tuladev
Pavel Tsukanov
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
AnastasiaStulova
 
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Development of Signal Processing Algorithms using OpenCL for FPGA based Archi...
Pradeep Singh
 
Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam Mustafa
HAMMAD GHULAM MUSTAFA
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmap
Manolis Vavalis
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
Akhila Prabhakaran
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
Tomasz Bednarz
 
Ad

More from IRJET Journal (20)

Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning ModelEnhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation ProjectKiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based CrowdfundingInvest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUBSPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATIONBRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ..."Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer VisionBreast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning ModelEnhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
Enhanced heart disease prediction using SKNDGR ensemble Machine Learning Model
IRJET Journal
 
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
Utilizing Biomedical Waste for Sustainable Brick Manufacturing: A Novel Appro...
IRJET Journal
 
Kiona – A Smart Society Automation Project
Kiona – A Smart Society Automation ProjectKiona – A Smart Society Automation Project
Kiona – A Smart Society Automation Project
IRJET Journal
 
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
DESIGN AND DEVELOPMENT OF BATTERY THERMAL MANAGEMENT SYSTEM USING PHASE CHANG...
IRJET Journal
 
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based CrowdfundingInvest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
Invest in Innovation: Empowering Ideas through Blockchain Based Crowdfunding
IRJET Journal
 
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUBSPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
SPACE WATCH YOUR REAL-TIME SPACE INFORMATION HUB
IRJET Journal
 
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
A Review on Influence of Fluid Viscous Damper on The Behaviour of Multi-store...
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
BRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATIONBRAIN TUMOUR DETECTION AND CLASSIFICATION
BRAIN TUMOUR DETECTION AND CLASSIFICATION
IRJET Journal
 
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
The Project Manager as an ambassador of the contract. The case of NEC4 ECC co...
IRJET Journal
 
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ..."Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
"Enhanced Heat Transfer Performance in Shell and Tube Heat Exchangers: A CFD ...
IRJET Journal
 
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
Advancements in CFD Analysis of Shell and Tube Heat Exchangers with Nanofluid...
IRJET Journal
 
Breast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer VisionBreast Cancer Detection using Computer Vision
Breast Cancer Detection using Computer Vision
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...A Novel System for Recommending Agricultural Crops Using Machine Learning App...
A Novel System for Recommending Agricultural Crops Using Machine Learning App...
IRJET Journal
 
Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.Auto-Charging E-Vehicle with its battery Management.
Auto-Charging E-Vehicle with its battery Management.
IRJET Journal
 
Analysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the HeliosphereAnalysis of high energy charge particle in the Heliosphere
Analysis of high energy charge particle in the Heliosphere
IRJET Journal
 
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
Wireless Arduino Control via Mobile: Eliminating the Need for a Dedicated Wir...
IRJET Journal
 
Ad

Recently uploaded (20)

Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...
Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...
Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...
Mohamed905031
 
Introduction to AI agent development with MCP
Introduction to AI agent development with MCPIntroduction to AI agent development with MCP
Introduction to AI agent development with MCP
Dori Waldman
 
fHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghj
fHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghjfHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghj
fHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghj
yadavshivank2006
 
Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)
elelijjournal653
 
Structural Health and Factors affecting.pptx
Structural Health and Factors affecting.pptxStructural Health and Factors affecting.pptx
Structural Health and Factors affecting.pptx
gunjalsachin
 
ANFIS Models with Subtractive Clustering and Fuzzy C-Mean Clustering Techniqu...
ANFIS Models with Subtractive Clustering and Fuzzy C-Mean Clustering Techniqu...ANFIS Models with Subtractive Clustering and Fuzzy C-Mean Clustering Techniqu...
ANFIS Models with Subtractive Clustering and Fuzzy C-Mean Clustering Techniqu...
Journal of Soft Computing in Civil Engineering
 
Artificial Power 2025 raport krajobrazowy
Artificial Power 2025 raport krajobrazowyArtificial Power 2025 raport krajobrazowy
Artificial Power 2025 raport krajobrazowy
dominikamizerska1
 
cloud Lecture_2025 cloud architecture.ppt
cloud Lecture_2025 cloud architecture.pptcloud Lecture_2025 cloud architecture.ppt
cloud Lecture_2025 cloud architecture.ppt
viratkohli82222
 
ACEP Magazine Fifth Edition on 5june2025
ACEP Magazine Fifth Edition on 5june2025ACEP Magazine Fifth Edition on 5june2025
ACEP Magazine Fifth Edition on 5june2025
Rahul
 
Principles of Building planning and its objectives.pptx
Principles of Building planning and its objectives.pptxPrinciples of Building planning and its objectives.pptx
Principles of Building planning and its objectives.pptx
PinkiDeb4
 
introduction to Digital Signature basics
introduction to Digital Signature basicsintroduction to Digital Signature basics
introduction to Digital Signature basics
DhavalPatel171802
 
Strength of materials (Thermal stress and strain relationships)
Strength of materials (Thermal stress and strain relationships)Strength of materials (Thermal stress and strain relationships)
Strength of materials (Thermal stress and strain relationships)
pelumiadigun2006
 
"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai
Julio Chai
 
New Microsoft Office Word Documentfrf.docx
New Microsoft Office Word Documentfrf.docxNew Microsoft Office Word Documentfrf.docx
New Microsoft Office Word Documentfrf.docx
misheetasah
 
Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...
Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...
Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...
BeHappy728244
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
IOt Based Research on Challenges and Future
IOt Based Research on Challenges and FutureIOt Based Research on Challenges and Future
IOt Based Research on Challenges and Future
SACHINSAHU821405
 
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...
gerogepatton
 
PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...
PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...
PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...
ijccmsjournal
 
Webinar On Steel Melting IIF of steel for rdso
Webinar  On Steel  Melting IIF of steel for rdsoWebinar  On Steel  Melting IIF of steel for rdso
Webinar On Steel Melting IIF of steel for rdso
KapilParyani3
 
Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...
Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...
Numerical Investigation of the Aerodynamic Characteristics for a Darrieus H-t...
Mohamed905031
 
Introduction to AI agent development with MCP
Introduction to AI agent development with MCPIntroduction to AI agent development with MCP
Introduction to AI agent development with MCP
Dori Waldman
 
fHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghj
fHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghjfHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghj
fHUINhKG5lM1WBBk608.pptxfhjjhhjffhiuhhghj
yadavshivank2006
 
Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)Electrical and Electronics Engineering: An International Journal (ELELIJ)
Electrical and Electronics Engineering: An International Journal (ELELIJ)
elelijjournal653
 
Structural Health and Factors affecting.pptx
Structural Health and Factors affecting.pptxStructural Health and Factors affecting.pptx
Structural Health and Factors affecting.pptx
gunjalsachin
 
Artificial Power 2025 raport krajobrazowy
Artificial Power 2025 raport krajobrazowyArtificial Power 2025 raport krajobrazowy
Artificial Power 2025 raport krajobrazowy
dominikamizerska1
 
cloud Lecture_2025 cloud architecture.ppt
cloud Lecture_2025 cloud architecture.pptcloud Lecture_2025 cloud architecture.ppt
cloud Lecture_2025 cloud architecture.ppt
viratkohli82222
 
ACEP Magazine Fifth Edition on 5june2025
ACEP Magazine Fifth Edition on 5june2025ACEP Magazine Fifth Edition on 5june2025
ACEP Magazine Fifth Edition on 5june2025
Rahul
 
Principles of Building planning and its objectives.pptx
Principles of Building planning and its objectives.pptxPrinciples of Building planning and its objectives.pptx
Principles of Building planning and its objectives.pptx
PinkiDeb4
 
introduction to Digital Signature basics
introduction to Digital Signature basicsintroduction to Digital Signature basics
introduction to Digital Signature basics
DhavalPatel171802
 
Strength of materials (Thermal stress and strain relationships)
Strength of materials (Thermal stress and strain relationships)Strength of materials (Thermal stress and strain relationships)
Strength of materials (Thermal stress and strain relationships)
pelumiadigun2006
 
"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai"The Enigmas of the Riemann Hypothesis" by Julio Chai
"The Enigmas of the Riemann Hypothesis" by Julio Chai
Julio Chai
 
New Microsoft Office Word Documentfrf.docx
New Microsoft Office Word Documentfrf.docxNew Microsoft Office Word Documentfrf.docx
New Microsoft Office Word Documentfrf.docx
misheetasah
 
Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...
Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...
Direct Current circuitsDirect Current circuitsDirect Current circuitsDirect C...
BeHappy728244
 
International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)International Journal of Distributed and Parallel systems (IJDPS)
International Journal of Distributed and Parallel systems (IJDPS)
samueljackson3773
 
IOt Based Research on Challenges and Future
IOt Based Research on Challenges and FutureIOt Based Research on Challenges and Future
IOt Based Research on Challenges and Future
SACHINSAHU821405
 
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (...
gerogepatton
 
PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...
PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...
PREDICTION OF ROOM TEMPERATURE SIDEEFFECT DUE TOFAST DEMAND RESPONSEFOR BUILD...
ijccmsjournal
 
Webinar On Steel Melting IIF of steel for rdso
Webinar  On Steel  Melting IIF of steel for rdsoWebinar  On Steel  Melting IIF of steel for rdso
Webinar On Steel Melting IIF of steel for rdso
KapilParyani3
 

IRJET- Latin Square Computation of Order-3 using Open CL

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1417 Latin Square Computation of Order-3 using Open CL Avnish Kansal1, Ashish Chaturwedi2 1,2Department of Computer Science & Engineering, Carlox Teacher’s University, Ahmedabad, Gujarat ---------------------------------------------------------------------------***--------------------------------------------------------------------------- Abstract: Latin sqaure is widely used in steganography, cryptography, digital watermarks, computer games, sudoku, graph analysis, error correcting codes; generate magic squares, statistics and mathematical field. The Sudoku puzzles are a special case of Latin squares. When we have to make the latin computation using the sequential algorithm then it waste more clock time. By using parallel programming (OpenCL) the time taken is reduced and throughput is increased. Traditionally Latin square methodology is based on heuristic cell based technique and generates random Latin square using genetic algorithmic approach both consumes high processing time and decreases the throughput. Here we are presenting the algorithm by using parallel processing environment using OpenCL for computing latin square of order3. Keywords: OpenCL, gnuplot, Sequential architecture, parallel architecture, GPU. I. Introduction: Latin square is an 𝑛×𝑛 array in which each cell is having at most one symbol, chosen form an n-set, such that every symbol occurs at most one time in each row and at- most one time in each column. The “Latin square” name was stimulated by mathematical papers by Leonhard Euler. Two latin squares are said to be orthogonal if both Latin squares of the same size such that when one latin square is superposed on the other latin, each letter of the one coincides once with each letter of the other. The two Latin squares are held to be conjugate if the rows of one are the columns of the other that is if the rows and columns of a square be interchanged then conjugate square is generated. An Adjugacy is a generality of the concept to conjugacy in which a permutation of the constraints of one generates another [12]. Each Latin square is defined as a triple (r,c,s), where r is the row, c is the column, and s is the symbol and from this triplet we attain a set of n2 triples called the orthogonal array representation of the square. Latin square of n x n order, in which every row is derivative from any other in a cyclic permutation of degree n, or by a power of such a permutation, is a cyclic Latin square [9]. While the implementation done using sequential algorithms techniques but with help of high level languages parallel processing technique we are able to decrease the processing time for matrix processing application [3]. As the problem is divided into discrete set of instances which are solved concurrently [1]. With help of this parallel computation technique we are able to execute two or more instructions at a same time simultaneously. While executing the sequential algorithms on a CPU it runs slower. In the proposed system the sequential algorithms which have task parallelism or data parallelism those algorithms are implemented to OpenCL [14]. By the help of OpenCL we minimizes overhead on the CPU and makes matrix processing run faster and efficiently to get higher throughput. Concurrency is the way to sharing of multiple resources in a software. For the applications that are naturally parallel the concurrency provides an abstraction [5]. When the execution of the multiple threads running in parallel, this means that the active thread running simultaneously on different hardware resources and processing elements [10]. The execution of the simultaneous threads is provided by the platforms, for achieving parallel computing. In latest computing machines we have SIMD or MIMD which have capabilities to exploit either data level or task level parallism[6]. Task level parallism, to handle different number of tasks, within a single problem at the same time. The efficiency of this model will depend on the independent operations of the task [6]. Data level parallelism, to handle the discrete chunks of the same task at the same time simultaneously. The efficiency of this model will depend on the independent operations of the task [6]. 1.1 Existing System & Proposed System Traditionally all the computations and execution of instructions are handled by the CPU in the computer. Architecturally, the CPU consists of very few cores with lots of cache memory which are able to handle a few software threads at a time concurrently. These few cores are used to optimize the query by sequential serial
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1418 processing architecture. So to alleviate the load of the CPU by handle all its advanced computations which are necessary to project the final display on the monitor a concept of GPU’s being introduced. The capability of the GPUs with 100+ cores to process thousands of threads can accelerate software by 100x over a CPU alone. The GPUs have massively parallel architecture which consists of thousands of smaller and more efficient cores that are designed for handling multiple tasks simultaneously. This is the reason for the wide and mainstream acceptance of the GPU’s now a day. The GPU-accelerated computing has now grown into a mainstream movement that is supported by the latest operating systems from Apple (with OpenCL) and Microsoft (using DirectCompute). This accelerated computing consist of graphics processing unit (GPU) together with a CPU to accelerate scientific, analytics, engineering, consumer, and enterprise applications. 1.2 Defining Algorithm The parallel processing algorithms are intended for Latin square. The representation of algorithms is presented in Figure 1.The first step is input matrix of order 3, we have input as a matrix in OpenCL programming code. The matrix have complex data in the form of array values when we applying sequential algorithm on an matrix it takes more of time for execution on CPU. But if we are applying the parallel processing concepts then we positively we reduce time taken by the matrix execution. Second step, is decomposing the input matrix according to task parallelism or data parallelism technique. Then this divided matrix is being used in third step for concurrent execution Third step, is defining the individual sub matrices which are further being processed by various processing elements. Fourth step, after the division of matrix into sub matrices each individual sub matrix is sent to number of processing elements simultaneously in GPU. Fifth step, the sub matrices result is being calculated with the help of processing elements concurrently. Sixth step is to combine the results of all matrices in single processing element so as to obtain the final output of input matrix in reduced time using parallel architecture. Figure 1 Flowchart of Proposed Work Seventh step, is to store the final result of input matrix from various sub matrices which are further being saved in the dynamic random access memory (DRAM) of graphical processing unit. Eighth step, after the final result is saved in the DRAM of GPU; that result is copy to the DRAM of CPU for displaying the final output to the user. 2. Implementing Latin Square using OpenCL: The algorithm which we are chosen to parallel the matrix over a number of workgroups, the matrix is further divided into number of chunks as per the coarse grained division. The workgroup is the part of a matrix by which we make the sub-matrix stored in the on chip local
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1419 memory of the GPU. These partial latin square are further reduced into the single main latin square results. 2.1 Overview is as follows, when we begin to design a OpenCL kernel to the respective hardware we have to take care work-items and the size of the work_group. The local memory is resides in the work_group and we share the data inside the work_group. The local memory is not shared in the work_group only. If we want to share the results of the each work_group to another then we are not able to do this. The work_group in OpenCL consists of work_items which further share between the local memories within that work_group. We are performing all this to reduce the memory overheads by storing sub- matrix in local memory. After that we have to define more work_groups for exchanging more local memory data into the global memory but we must assure that the number of work_groups for efficient use of local memory and reduce overhead on the global memory the number should be close to the number of compute units we have in our hardware. To traverse the total input matrix we use here, global Ids, local Ids, group Ids, group size, etc. Our OpenCL program consists of two parts: First is kernel part, the instances of kernel are copied to the different compute units and the kernel is executed on the each compute unit individually. Another is host code which we have to add to work for the different models of the OpenCL which is executed on the host or CPU. The host program defines the context for the kernels and manages their execution. Each OpenCL device has a command queue, where the host programs are queued for kernel execution and memory transfer. The core of the OpenCL execution model is defined by execution of kernels. When kernel is executed an index space is defined which is an instance of the kernel as in our problem. Each work_item executes the same code but the execution pathway and the data used will be different. Work items are organized into workgroups. Each work_group is assigned a unique workgroup_Id and each work_item in the workgroup assigned the local_Id. A single work_group can be identified by its global_Id or by its workgroup_Id or local_Id. The work_item in a single work_group executes concurrently on a processing elements of a single compute units. Work_item in a work_group can synchronize each other and share data through local memory in the compute unit. All the work_items has read and write access to any position in the global memory. The global and constant memory can also be accessed from the host processor before and after kernel execution. 3 Experiments and results: For each latin square, both the GPU kernel code and CPU serial code are designed. The processing that takes place on the CPU; the kernel code in the algorithm as an instance copied to the GPU. The kernel is executed on compute unit and after that the results are copied to the CPU. Figure 2: Latin Square Computation (Order-3) order v/s time graph using procedural programming code technique having more time complexity. Figure 3: Latin Square Computation (order-3) time v/s order graph using OpenCL code having reduced time complexity. The speed up of the latin square computation by the GPU is significantly improves the computing speed by reducing the time complexity and throughput. Here the graph depicts while using the procedural programming
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 06 Issue: 06 | June 2019 www.irjet.net p-ISSN: 2395-0072 © 2019, IRJET | Impact Factor value: 7.211 | ISO 9001:2008 Certified Journal | Page 1420 technique the time complexity increases as the order for latin square increased. But when the same code is implemented using OpenCL the time complexity decreases tremendously. 4. Conclusions In this paper a framework is tried to be developed an efficient algorithmic approach to find a solution to different latin square to enhance the existing approach. We implemented the Latin square problem definition of order- 3 in GPGPU (GP2U), which providing heterogeneous environment to execute and capability to reduce time complexity which improves the performance of Latin square computation over intensive domains latin square We executed the latin square by using OpenCL environment and compared with the sequential implementations on CPU. On CPU algorithm takes very huge amount of time. Obviously, the time taken becomes low on GPU device. It provides novel and efficient acceleration technique for matrix calculation and is cheap in hardware implementation. Future work is to gain deeper knowledge about the parallization techniques to make best use of GPU device and to work with other algorithms related to these project concepts. 5. References [1] “Heterogeneous computing with OpenCL” by Benedict gaster,british libraries,printed USA [2] http://www.khronos.org/OpenCL ,”Khronos Group”. [3] Roberto Fontana, Random Latin squares and Sudoku designs generation,2013 [4] Nan Zhang, Yun-shan Chen, Jian-li Wang. “Image Parallel Processing Based on GPU”. International Conference on Advanced Computer Control, March 2010. [5] Pardalos P.M., Xue, J, “The maximum clique problem”, Journal of Global Optimization, 4, 1994, 301—328. [6] Demetres Christofides, Klas Markstrom (2003) Random Latin square graphs [7]http://developer.amd.com/pages/default.aspx “University Kit 1.0”. [8] C.Colbourn(1984) The Complexity of completing partial Latin squares. Discrete Applied Mathematics8: 25- 30. Doi:10.1016/0166-218X(84)90075-1 [9] Denes, J. and A. Keedwell. 1991. Latin Squares: New Developments in the Theory and applications. North- Holland. [10] AMD Accelerated Parallel Processing OpenCL Programming Guide1.pdf [11] Jacobson, M.T and P. Matthews (1996) Generating uniformly distributed random Latin squares. Journals of Combinatorial Designs 4(6), 405-406 [12] Brendan D. McKay and Ian M. Wanless(2000) On the number of Latin squares Australian National University, Canberra, ACT 0200, Australia [13] J.A. Bate, G.H.J. van Rees, The Size of the Smallest Strong Critical Set in a Latin Square University of Manitoba, Winnipeg, Manitoba.