Improving System Performance in Homogeneous Multicore Systems
Improving System Performance in Homogeneous Multicore Systems
Abstract. Allocation of parallel load in multicore systems has related to simulation on mesh-based topologies that empha-
become a challenging task for high performance computing sizes on modeling and analyzing on chip interconnect [2] [3]
system. There are several parameters to evaluate the perfor- [4]. Metrics such as packet delay, load imbalance factor have
mance of a scheduling algorithm such as task imbalance and been used as a function of the communication load, speedup
execution time. This paper proposes a task scheduling approach
and utilization factor. Some networks are designed specifical-
that targets multiple cores connected through appropriate in-
ly with customized application in order to achieve better per-
terconnection network. The proposed approach utilizes the
computing resources effectively by assigning the tasks dynami- formance. The main objective behind customization is to fit
cally among different cores of the system in a realistic time. the requirements of specific applications under certain condi-
Each task has its own timeline and multiple sequence of tasks tions [5]. However, a generalized task-based programming
are mapped on different cores of the system. In particular, per- model is inevitable solution for multicore architectures.
formance is evaluated on n x n Mesh, DMesh, ZMesh and Torus In this paper, we explore the interplay between architec-
networks. The load imbalance and execution times are consid- tures and algorithm design in the context of dynamic task
ered as metrics to evaluate the performance of the proposed allocation. A dynamic scheduling algorithm is designed and
algorithm. Simulation results are obtained and compared with evaluated by mapping tasks on a number of mesh-based mul-
well-known minimum distance scheduling algorithm which ticore architectures. The proposed approach is based on
shows reduction in execution time while maintaining the load
standard minimum distance scheduling approach that has
imbalance. An improvement of 20-30% is obtained in load im-
been used extensively for conventional parallel systems in a
balance for considered multicore systems with improved execu-
tion time. The simulation study reveals that the proposed algo- variety of ways [6]. For better analysis of results different
rithm is best suited to take architectural benefits for mesh-based data sets are applied to similar architectures for the perfor-
multicore systems. mance evaluation of the proposed algorithm.
The rest of the paper is organized as follows. In section 2,
Keywords: Multicore, Scheduling Algorithm, Load Imbalance, various approaches related with scheduling of tasks on Ho-
Execution Time. mogeneous/heterogeneous multicore system are presented.
Section 3 describes the problem formation and the target
systems considered for study. The proposed algorithm is ex-
1 Introduction plained in section 4. Based on the experimental results, the
performance evaluation is carried out and presented in section
Multicore systems are found in variety of computing systems 5. Concluding remarks are presented in section 6.
from high-performance servers to special purpose embedded
systems. The industrial applications are utilizing embedded
systems with more cores in processors [1]. The performance 2 Related Work
of these systems depends upon how extensively the parallel-
ism is exploited among different cores in the system. In order
A programming model schedule tasks dynamically according
to address the problem of parallelism in a multicore system,
to the availability of computing resources. Mapping of ready
the load is partitioned into small independent tasks and are
to execute tasks to different cores of the system requires criti-
mapped onto different available cores in the systems. The
cally task aware schedular [7]. The efficient scheduling prob-
problem of efficient allocation of a group of tasks to carry out
lem has been extensively studied for asymmetric multicore
parallel execution in multicore systems has drawn attention of
systems. Some of them are based on dividing the tasks into
researchers.
groups of critical and non-critical tasks and mapping each
Designing an efficient communication network and apply-
group to one core type. In this method deciding which task is
ing efficient scheduling algorithm for utilizing computing
critical is a major issue [8]. Task prioritization is another
resources is critical for achieving high performance in multi-
processor multicore systems. There is a number of studies
46
DOI: 10.37398/JSR.2022.660207
Journal of Scientific Research, Volume 66, Issue 2, 2022
approach which assigns priority to different tasks based on active load or when the core becomes idle. At a particular
information discoverable at run-time [9]. point of time the system manages a uniform distribution of
A number of programming models have been developed tasks. The resource utilization and uniform allocation of tasks
for high-performance computing such as task parallelism are carried out dynamically in parallel among different avail-
[10], data parallelism for example OpenMp loops [11] to able cores of the system. If tasks in an application are unbal-
exploit parallelism in multicore systems architectures. These anced, the overloaded and underloaded cores are identified
models support both inter-task parallelism as well as intra- and tasks migration take place until the system obtain an even
task parallelism. In general, the sequence of tasks is mapped distribution of tasks. Therefore, in application of wide range
as a group of parallel sub tasks that are allowed to execute in graph such as Zmesh and higher-level mesh having large
parallel on multiple cores. The directed acyclic graph (DAG) number of cores or with large volume of tasks the task sched-
is one of the most famous parallel task models used in multi- ular reconfigures the tasks dynamically based on the value of
core architectures [12]. A DAG consists of directed edges ideal load and load imbalance factor (LIF).
between a set of nodes in which each node is a sequential The minimum distance scheduling (MDS) is considered
sub-task that are allowed to execute on any core using di- suitable for parallel interconnection networks in traditional
rected edges. Subtasks are allowed to execute on different parallel systems [17]. The algorithm relies on minimum dis-
cores that can significantly improve resource utilization. On a tance property in which only adjacent cores are allowed to
multicore system meeting deadlines of parallel tasks is more migrate the tasks. This is followed in order to reduce
complex due to possible interleaving of threads across the makespan and complexity of scheduling algorithm. Several
cores. Therefore, to incorporate full speed up there is a great variations of MDS have been proposed and found suitable for
challenge to maximize the utilization of parallel multicore a particular class of architectures. The performance of these
architectures which meet the deadlines of application cores. algorithms has not been studied for multicore systems. The
List scheduling has been used in variety of ways to obtain proposed algorithm is an effort to extend the concept of min-
optimal/sub-optimal solutions [13]. List scheduling is de- imum distance property with some alteration and tested for
signed on the basis of assigning priorities to the tasks of DAG considered multicore systems.
and arranging the tasks in the form of list which are config-
3.2 The Target Architectures
ured in descending order of priorities. Task having higher
priority is allowed to execute first. The algorithm performs To evaluate the performance of proposed scheduling algo-
better with small heterogeneity factor for randomly generated rithm the topology of target system is a modeled un-directed
applications. However, to reduce task execution time a dupli- graph G (Ci, Ei) where C is a finite set of cores/vertices and E
cation approach to identify heavily communicating tasks is is a finite set connected edges. A vertex Ci represents the
applied. processor core i and Ei represents a bidirectional communica-
In heterogeneous computing system the cost of executing a tion link between adjacent cores. The resource graph is a
task may vary from one core to another. The priority of tasks complete graph consisting of n fully connected cores. We
is not fixed rather change when migrated between different assume contention free communication between cores.
cores. To handle this problem, Heterogenous Earliest Finish For the purpose of simulation four similar topologies
Time Schedular (HEFT) [14] and Heterogeneity through namely Mesh, Dmesh, Zmesh and Torus networks are con-
Limited Duplicated [HLD] approach [15] are used in order to sidered [18]. The system consists of a set of homogeneous
get a single computation cost of a task. However, perfor- cores and all considered topologies are modeled as 4 x 4 net-
mance of these algorithms is limited with the significant vari- works shown in Fig. 1. Task-to-core assignment is identical
ations in the execution makespan. in all the considered topologies.
System performance can also be improved by non-
contiguous allocation of parallel jobs in multicomputer sys-
tems [16]. In this approach the author claimed better perfor-
mance in terms of execution time for different traffic pattern
particularly with uniform-decreasing job size distribution.
The algorithm, however, is not tested for Torus type architec-
ture.
3 Problem Formation and Target Systems
3.1 Task Scheduling Model (a) 4 x4 Mesh network (b) 4 x4 DMesh network
47
Institute of Science, BHU Varanasi, India
Journal of Scientific Research, Volume 66, Issue 2, 2022
48
Institute of Science, BHU Varanasi, India
Journal of Scientific Research, Volume 66, Issue 2, 2022
49
Institute of Science, BHU Varanasi, India
Journal of Scientific Research, Volume 66, Issue 2, 2022
50
Institute of Science, BHU Varanasi, India
Journal of Scientific Research, Volume 66, Issue 2, 2022
51
Institute of Science, BHU Varanasi, India