SlideShare a Scribd company logo
Bulletin of Electrical Engineering and Informatics
Vol. 10, No. 6, December 2021, pp. 3393~3402
ISSN: 2302-9285, DOI: 10.11591/eei.v10i6.3135 3393
Journal homepage: http://beei.org
An effective classification approach for big data with parallel
generalized Hebbian algorithm
Ahmed Hussein Ali1
, Royida A. Ibrahem Alhayali2
, Mostafa Abdulghafoor Mohammed3
, Tole Sutikno4
1
ICCI, Informatics Institute for Postgraduate Studies, Baghdad, Iraq
1,2
Department of Computer, College of Education, AL-Iraqia University, Iraq
1
Department of Computer Science, AL Salam University College, Iraq
2
Department of Computer Engineering, College of Engineering, University of Diyala, Diyala, Iraq
3
Imam Aadham University College, Iraq
4
Department of Electical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
Article Info ABSTRACT
Article history:
Received Jun 30, 2021
Revised Aug 19, 2021
Accepted Oct 31, 2021
Advancements in information technology is contributing to the excessive rate
of big data generation recently. Big data refers to datasets that are huge in
volume and consumes much time and space to process and transmit using the
available resources. Big data also covers data with unstructured and structured
formats. Many agencies are currently subscribing to research on big data
analytics owing to the failure of the existing data processing techniques to
handle the rate at which big data is generated. This paper presents an efficient
classification and reduction technique for big data based on parallel
generalized Hebbian algorithm (GHA) which is one of the commonly used
principal component analysis (PCA) neural network (NN) learning
algorithms. The new method proposed in this study was compared to the
existing methods to demonstrate its capabilities in reducing the dimensionality
of big data. The proposed method in this paper is implemented using Spark
Radoop platform.
Keywords:
Big data
Generalized Hebbian algorithm
Machine learning
Neural network
Principal component analysis
Spark Radoop
This is an open access article under the CC BY-SA license.
Corresponding Author:
Ahmed Hussein Ali
Department of Computer Science, AL Salam University College
119 Baghdad, Taji, Iraq
Email: msc.ahmed.h.ali@gmail.com
1. INTRODUCTION
The problem of big data borders on their size, volume, and rate of generation from multiple sources
(including machines and human) [1]-[13]. There are many forms of big data, such as web and social media
data, business transaction data, machine-to-machine data, and biometric data [14]-[39]. Big data cannot be
described just as a large database but is often unstructured and is currently on the increase in all domains.
High dimensional input data streams are highly important for most information processing tasks, such as
communication and pattern recognition, and can help in reducing noise and redundancy to allow for the
extraction of useful information from input signals. Consequently, information processing, transmission, and
storage on both software and hardware has become easier due to the ability to reduce data dimensionality.
One of the common feature extraction methods is principal component analysis (PCA) which is used to
extracts useful information through establishing the patterns in the input space. PCA is mainly aimed at
obtaining the accurate data representation that can reduce the redundant components [40]-[47].
The PCA [48], [49] transform is mainly used for tasks such as pattern recognition, data compression,
and classification. It is also called Karhunen-Loeve transform (KLT) [50]-[53]. Despite the wide application
of numerous PCA-based algorithms [14], [54], most are not suitable for real time applications due to the high
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402
3394
computational complexity of such algorithms in high dimension feature vectors. So, the computational speed
of PCA can be improved by using a number of algorithms, even though the nagging problem. Most of these
algorithms are only implemented as software to achieve moderate performance. PCA and its variants can also
be implemented on hardware, but this requires enough resources and complex circuit control systems. Hence,
it is only considered for small dimension. Implementation on PCA neural network (NN) is another alternative
for PCA implementation [55]. This is done using the GHA [56] but the problem is the slow convergency of
GHA which make it mandatory to perform several iterations, thereby prolonging the computational time for
most GHA-based algorithms. Most data reduction techniques are exceptional in saving bandwidth and time
through enabling user to process large datasets using minimal available resources. Being that much data is
involved in data mining process, data reduction processes have become mandatory as the aim is to retrieve
important information from such large datasets. Data size reduction is also a nagging problem because most
of the straightforward techniques only work on small data and not on big data. Hence, software design stage
is a crucial phase during the building of data reduction algorithms for big data processing.
The recent works on parallel big data dimensionality reduction are reviewed in this section. The
study by [57] presented a hybrid PGO-SVM-based model that combined SVM with PGO for improved
classification accuracy even when faced with small number of feature subsets. The proposed PGOSVM was
implemented with Spark Radoop with distributed data points storage using Hadoop dispersed file system
(HDFS). The classification efficiency of the proposed model on large dataset was better and exhibited faster
execution time than the benchmark method. Scala was used as the programming language to implement the
PGOSVM while Covtype and Higgs datasets were analyzed. Another study present a fast HP-PL model [58]
as a new way of improving DR and classification accuracy. The system was implemented on Apache Spark
and was capable of selecting the best features within the shortest computational time. Even though the
improvement level is dependent on the data features, the system showed good performance on the number of
evaluated nodes for the tested datasets. The iterated PCA (IPCA) [59] method has been proposed for fault
detection in a continuously stirred tank reactor (CSTR) model. The proposed IPCA relies on the GHA for
memory complexity problems. The reason for addressing the fault detection problem is to facilitate online
computation of the principal components in a recursive manner. The GHA was developed to define a
function that can merge all the major factors that affect the fault detection capability of the developed model.
Song et al proposed the TOC-based PCA algorithm [60] that can exploit the advantage of optical computing
in big data computation to solve the issues related to the PCA algorithm in electronic computers. The parallel
operation of the system ensured that the efficiency is greatly improved. Another paper by Jian et al. [59]
demonstrated that the GHA has non-approaching adaptive learning rates by investigating the convergence of
the GHA using the DDT method. It is simple to solve the computational roundoff constraints and satisfy the
tracking requirements in real applications because these adaptive learning rates can achieve non-zero
constants convergence. As a generalization of the Hebbian learning paradigm, Eraldo and colleagues [60]
proposed a new adaptation strategy for linear neural networks. In this paper, an efficient classification and
reduction technique for big data based on parallel generalized Hebbian algorithm (GHA) and implemented by
using Spark Radoop platform will be presented. The new method proposed will be compared to the existing
methods to demonstrate its capabilities in reducing the dimensionality of big data.
This paper is organized in this manner: apache spark radoop is presented in section 2. The principle
of GHA and the suggested method are presented in section 3. The materials used in this work, and the
methods employed are presented in section 4, and the results and discussion are presented in section 5 while
section 6 presents the conclusion and possible future works.
2. APACHE SPARK RADOOP
RapidMiner Radoop is an extension of the in-memory functionality of RapidMiner that allows for
the provision of sophisticated operators that are implementable for in-Hadoop execution [61]-[66]. It was
developed as an extension of the in-memory functionality of RapidMiner for the provision of sophisticated
operators that are implementable for in-Hadoop execution [67]-[73]. For data transformation in Radoop [61],
there are more than 60 operators available. It is also capable of advanced and predictive modeling on Hadoop
clusters in a distributed manner. RapidMiner [74] is a data mining application. Radoop relies on RapidMiner
Studio's visual workflow designer to make the creation, implementation, and maintenance of predictive
analytics in Hadoop as simple as possible. Because of Hadoop's code-free environment [62], [74] and built-in
intelligence, the intricacies of the system are kept to a minimum, allowing the operator to concentrate solely
on addressing business challenges rather than on technical concerns. This ensures that predictive analytics for
both TBs and PBs of data is effective and scalable because the workflow execution is handled by Radoop
rather than the user; all computations are executed in the Hadoop cluster that holds the data. Radoop was
developed as an extension to ensure that Hadoop and RapidMiner could work together seamlessly. It is a data
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali)
3395
science software that simplifies the process of preparing data for machine learning on Hadoop and Radoop
Spark (refer to Figure 1). Throughout RapidMiner Studio, all parallel operations and data processes are
implemented on the SparkRM platform within the Hadoop cluster to ensure that Apache Spark may be used
for task execution, hence broadening the applicability of the tool and enabling stronger algorithms. Hive and
Mahout are made up of data analytics routines that have been well optimized, and as a result, they were used
in this study as well. Figure 2 depicts the overall framework for the integration of Hadoop into RapidMiner.
In this study, an extension was developed that allows for close connection with Hadoop while also providing
the same Hadoop features as those used in memory-based RapidMiner operations. The initial stage in
creating the Radoop is to include the RadoopNest meta-operator, which contains the basic cluster parameters.
This meta-operator serves as a foundation for the operation of the remaining operators.
Figure 1. Spark Radoop architecture
Figure 2. The neural model for the GHA
3. THE PROPOSED GENERALIZED HEBBIAN ALGORITHM
The GHA [75] is a linear feedforward NN framework that is well-suited for unsupervised learning
applications and is often used in PCA. It is advantageous in terms of processing efficiency since it can handle
the problem of eigenvalue using iterative approaches, which eliminates the need for direct computation of the
covariance matrix. Because of the capacity to handle eigenvalue issues iteratively, there is no need to
compute and answer eigenvalue issues in a linear fashion. As a result, GHA was created as a solution to
memory complexity difficulties, particularly when dealing with large-scale data sets as shown in Figure 2. In
order to provide a memory-efficient implementation, GHA is designed to be flexible and adaptable to time-
varying distributions. Particle-counting analysis [75]-[81] is regarded as an attribute reduction method that is
beneficial when dealing with data that is derived from numerous characteristics and contains some
redundancy. Because they are most likely assessing the same concept, redundancy in this circumstance
means that there is some type of correlation between some attributes. As a result of this redundancy, it is
thought that the observed attributes can be reduced to a smaller number of PCs, each of which will be
representative of the fluctuations in the observed characteristics. The PCA method uses orthogonal
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402
3396
transformation to convert a set of data with linked qualities into a set of values referred to as principle
components (uncorrelated attributes) [82]-[87]. Considering that the number of PCs is typically less than or
equal to the number of original attributes, this transformation is defined in such a way that the variance of the
first PC, which accounts for the majority of data variability, is as high as possible, and each of the succeeding
components has the highest possible variance under the condition that it is orthogonal to the PCs [88].
This section pointed out some of the considerations for the implementation of the proposed algorithm
with Radoop. Several steps are involved in the parallelization process. A virtual machine cluster was considered
for the tuning and testing of the experimental conditions. The experiments were carried out on three different
supercomputers. Because big data is taken into consideration in this work, it is probable that the dataset will
contain a huge number of transactions. As a result, some of the large transactional data sets are kept in the
HDFS, while numerous data fractions are distributed across the cluster nodes. The execution of jobs on data
partitions is carried out in parallel by the Spark engine. We generated and processed a collection of RDDs in
order to construct the set of frequently occurring l-iternsets, which were then arranged in descending order. The
proposed PGHA is applicable in big data streaming using classification methods. It can be used to reduce all the
stored dataset as HDFS files, and handle dataset with numeric features. Figure 3 presents the overall proposed
algorithm for data reduction which can be implemented as Map and Reduce functions.
Proposed Parallel GHA
Input: S (Dense array)
Output: T (Reduced array)
1 Begin
2 Execute the spark context (Slave)
3 Listen to the master connection.
4 Receive a dense array of data
5 Check the length of the columns M
6 Data should be parallelized by Spark (Master)
7 N rows of data were collected
8 do in parallel
9 Set the initial synaptic weights wij and thresholds j to small random values, such as [0, 1], and
then repeat the procedure. Assign tiny positive values to the learning rate parameter as well as
the forgetting factor.
10 Calculate the output of the neuron at iteration T.
11 Update the weights in the network: wij(p + 1) = wij(p) + ∆wij(p), // i, j = 1, 2, ..., n
12 Send reduction array T
13 Close connection
14 End
Figure 3. Overall proposed algorithm for data reduction
4. MATERIAL AND METHOD
A number of supervised classification approaches were considered in this study, including Nave
Bayes, K-Nearest Neighbours, NN, and Random Forest, among others. To begin, Table 1 has a description of
the system, while Table 2 has a description of the six datasets that were used in the study. The computation
times for parallel GHA and parallel PCA on the identical hardware arrangement were used to present the
results. With respect to the six large datasets that were used for the analysis in this study, the performance of
Apache Spark and MLlib 2.0 was compared. The six datasets used in this study were obtained from the UCI
ML repository. The experiments in this research are comprised of a Spark cluster that runs on Apache
Zeppelin 0.7.1 and an HDFS, which is described in detail in the paper. The Spark cluster is made up of four
nodes: a master node that executes the driver application, three worker nodes, and a cluster manager. The
three nodes were configured in a manner that was similar to that shown in Figure 1. The three worker nodes
were each given a memory allocation of 48 GB and were configured with four executors (each with a
memory allocation of 4 GB) and two cores. Each worker was allotted three executors (each with a memory
size of five gigabytes) while the master node was allotted two cores. A total of 16 GB of RAM has been
assigned to the driver process. Scala 2.11.8 was used as the programming language for MLlib execution in
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali)
3397
the Spark 2.2.1 cluster, with Hadoop 2.7.3 serving as the distributed storage system. The amount of memory
available to the executors in each worker node was changed in order to get the best possible performance.
Table 1. Description of the system
Operating system Windows10
CPU Intel® CoreTM i7-6700 processor running at
3.40 GHz with eight cores
Memory 16 GB
No. of workers 3
Computational framework Apache Spark 2.2.1
Compatible framework Radoop
DSS HDFS (Hadoop 2.7.3)
Code development editor Apache Zeppelin 0.7.1
Coding language Scala 2.11.8
Table 2. Datasets description
Data
No of
record
No of
attributes
No of
classes
Covtype 581012 54 7
Covtype-
2
581012 54 2
Higgs 11,000,000 28 2
Botnet
Attacks
7,062,606 115 10
Dota2 102944 116 2
SUSY 5,000,000 18 2
5. RESULTS AND DISCUSSION
The initial step of the classical GHA is loading the whole dataset to memory. Note that the data size
must be within the limit that can fit within the memory size of the computer. Memoryup is a performance
metric that assesses a parallel clustering algorithm's ability to efficiently utilize the available memory space
on each node. It is possible to compute the memoryup by changing the memory size of each node while
keeping the dataset and the number of nodes the same. The concept of the new GHA approach is to use the
idea of data scanning by rows. The GHA approach can still implemented even when the data exceeds the
computer memory size. A significant amount of CPU time is frequently lost in large datasets as a result of the
unnecessary processing of redundant and non-representative data. The deletion of this type of data can
frequently result in a significant increase in processing performance. A further benefit of eliminating
nonrepresentative data from huge datasets is that storage and transmission of these datasets become less
difficult. The computational advantages of the proposed new system were evaluated using numerical
examples. The computation was performed on a third generation Intel core-i7 2.8GHz processor with 16GB
DDR3 memory. The programming language for all the algorithms in the proposed big data GHA was C++.
Through a thorough examination of the PGHA's running time utilizing the Radoop method and Parallel PCA,
the comparison seeks to assess the speed performance of the algorithm. For the purposes of this example, we
will suppose that the degree of support varies while the number of computer nodes remains at 3. Runtime
with different support degrees for the datasets Covtype, Covtype-2, Higgs, Botnet Attacks, Dota2, and SUSY
are depicted in Figures 4 (a)-(f).
The algorithms is shown by the x-axis, while the running time is represented by the y-axis. The two
techniques appear to be more efficient when the support degree is increased, as can be seen in the graph.
Remember that our approach appears to be faster than parallel PCA when running on all datasets, which is a
significant advantage. The performance of the proposed based was evaluated based on a single processor
because if a parallel algorithm is used, the performance may be over-shadowed by the performance of the
other algorithms. Execution times and speed-up ratios are depicted in Figure 4 in relation to the number of
objects in the datasets for different numbers of processors.
(a) (b)
Figure 4. Running time of PGHA using Radoop compared to parallel PCA under different classification
experiments. Naïve Bayes, NN, K-Nearest Neighbours and Random Forest; (a) Covetype, (b), Covetype 2
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402
3398
(c) (d)
(e) (f)
Figure 4. Running time of PGHA using Radoop compared to parallel PCA under different classification
experiments. Naïve Bayes, NN, K-Nearest Neighbours and Random Forest; (c) Higgs, (d) Botnet, (e) Dota2,
(f) SUSY datasets (continue)
The analysis of the dataset Covetype is only possible when the number of processors in our computer
cluster system is equal to or greater than eight. When dealing with large amounts of data, the advantages of a
distributed memory system are readily apparent. Based on the experimental results, it has been demonstrated
that this parallel reduction method has superior speed-up and linear scaling behavior (time complexity), and that
it may be used to overcome space complexity limits by using the aggregate memory of the reduction system.
The performance evaluation of the new approach was based on the method of inducing the base classifier. The
results of the experiments showed that the PGHA, as a data reduction tool, minimized the run time compared to
parallel PCA algorithm as shown in Figure 4. However, our partial reduction method outperformed full
reduction methods in many real-world data analysis and data reduction applications.
6. CONCLUSION
The concept of parallel computing and parallel dimensionality reduction algorithms was introduced
in this study. This article proposed the parallel algorithm concept based on the classical DR algorithm for
effective handling of the issue encountered in big data mining. The proposed framework in this work was
based on the previous studies with the aim of reducing the high volume of input data features while retaining
the relevant information. To achieve this aim, both GHA and the proposed parallel algorithm were used to
improve the DR and reduce features complexity. The evaluation results showed that GHA was better in
reducing redundant features of datasets. In the future studies, effort will be focused on combination of the
proposed parallel GHA in this work with other ML methods, as well as improving the performance of the
latest datasets using some evolutionary optimization techniques.
ACKNOWLEDGEMENTS
The authors would like to thank ICCI, Informatics Institute for Postgraduate Studies, Iraqia
University and Al Salam University College for their facilities, support and cooperation during this research;
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali)
3399
and Universitas Ahmad Dahlan to support this collaborative research. Special thanks to the anonymous
reviewers for their valuable suggestions and constructive comments.
REFERENCES
[1] T. H. Davenport, P. Barth and R. Bean, “How Big Data Is Different,” MIT Sloan Manag. Rev., vol. 54, no. 1, pp.
22–24, 2012.
[2] V. Chang, “An ethical framework for big data and smart cities,” Technological Forecasting and Social Change,
vol. 165, 2021.
[3] N. A. N. M. Idros, H. Mohamed and R. Jenal, “The use of expert review in component development for customer
satisfaction towards E-hailing,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 17, no. 1. pp. 347–356, 2019, doi: 10.11591/ijeecs.v17.i1.pp347-356.
[4] K. Anam, C. Avian and M. Nuh, “Multilayer extreme learning machine for hand movement prediction based on
electroencephalography,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp. 2404–2410,
2020, doi: 10.11591/eei.v9i6.2626.
[5] M. N. F. Jamaluddin, A. Ismail, A. A. Rashid and T. T. O. Takleh, “Performance comparison of Java based parallel
programming models,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 3. pp.
1577–1583, 2019, doi: 10.11591/ijeecs.v16.i3.pp1577-1583.
[6] M. B. Swidan, A. A. Alwan, S. Turaev and Y. Gulzar, “A model for processing skyline queries in crowd-sourced
databases,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 10, no. 2. pp. 798–
806, 2018, doi: 10.11591/ijeecs.v10.i2.pp798-806.
[7] Mustakim, N. K. Sari, Jasril, I. Kusumanto and N. G. I. Reza, “Eigenvalue of analytic hierarchy process as the
determinant for class target on classification algorithm,” Indonesian Journal of Electrical Engineering and
Computer Science (IJEECS), vol. 12, no. 3. pp. 1257–1264, 2018, doi: 10.11591/ijeecs.v12.i3.pp1257-1264.
[8] S. Berhil, H. Benlahmar and N. Labani, “A review paper on artificial intelligence at the service of human resources
management,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp.
32–40, 2019, doi: 10.11591/ijeecs.v18.i1.pp32-40.
[9] W. A. Jbara, “Ear biometric verification approach based on morphological and geometric invariants,” Indonesian
Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1479–1484, 2020, doi:
10.11591/ijeecs.v20.i3.pp1479-1484.
[10] H. A. Razak, M. A. M. Saleh and N. M. Tahir, “Review on anomalous gait behavior detection using machine
learning algorithms,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 5. pp. 2090–2096,
2020, doi: 10.11591/eei.v9i5.2255.
[11] M. M. Nasr, F. K. Kamel and Y. S. Abd ElWahab, “A survey on predicting oil spills by studying its causes using
deep learning techniques,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22,
no. 1. pp. 580–589, 2021, doi: 10.11591/ijeecs.v22.i1.pp580-589.
[12] P. Chaudhury and H. K. Tripathy, “Optimising the parameters of a RBFN network for a teaching learning
paradigm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 15, no. 1. pp. 435–
442, 2019, doi: 10.11591/ijeecs.v15.i1.pp435-442.
[13] A. S. I. Hilaiwah, H. A. A. Abed Allah, B. A. Abbas and T. Sutikno, “Live to learn: learning rules-based artificial
neural network,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 1. pp.
558–565, 2021, doi: 10.11591/ijeecs.v21.i1.pp558-565.
[14] A. Labrinidis and H. V Jagadish, “Challenges and opportunities with big data,” Proc. VLDB Endow., vol. 5, no. 12,
pp. 2032–2033, 2012.
[15] Q. Shallal, Z. Hussien and A. A. Abbood, “Method to implement K-NN machine learningto classify data privacy in
IoT environment,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 2.
pp. 985–990, 2020, doi: 10.11591/ijeecs.v20.i2.pp985-990.
[16] M. AbdullahAl-Hagery, M. AbdullahAl-Assaf and F. MohammadAl-Kharboush, “Exploration of the best
performance method of emotions classification for arabic tweets,” Indonesian Journal of Electrical Engineering
and Computer Science (IJEECS), vol. 19, no. 2. pp. 1010–1020, 2020, doi: 10.11591/ijeecs.v19.i2.pp1010-1020.
[17] E. Sutoyo and A. Almaarif, “Twitter sentiment analysis of the relocation of Indonesia’s capital city,” Bulletin of
Electrical Engineering and Informatics (BEEI), vol. 9, no. 4. pp. 1620–1630, 2020, doi: 10.11591/eei.v9i4.2352.
[18] A. R. Lubis, M. K. M. Nasution, O. S. Sitompul and E. M. Zamzami, “The effect of the TF-IDF algorithm in times
series in forecasting word on social media,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 22, no. 2. pp. 368–376, 2020, doi: 10.11591/ijeecs.v22.i2.pp368-376.
[19] E. S. Negara, R. Andryani and R. Amanda, “Network analysis of YouTube videos based on keyword search with
graph centrality approach,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22,
no. 2. pp. 172–178, 2020, doi: 10.11591/ijeecs.v22.i2.pp172-178.
[20] M. N. Alraja, M. A. Hussein and H. M. S. Ahmed, “What affects digitalization process in developing economies?
An evidence from smes sector in oman,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1.
pp. 441–448, 2020, doi: 10.11591/eei.v10i1.2033.
[21] N. H. M. Kadir and S. Aliman, “Text analysis on health product reviews using r approach,” Indonesian Journal of
Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 3. pp. 1303–1310, 2020, doi:
10.11591/ijeecs.v18.i3.pp1303-1310.
[22] S. Sangam and S. Shinde, “Sentiment classification of social media reviews using an ensemble classifier,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16, no. 1. pp. 355–363, 2019,
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402
3400
doi: 10.11591/ijeecs.v16.i1.pp355-363.
[23] S. Manikam, S. Sahibudin and V. Kasinathan, “Business intelligence addressing service quality for big data
analytics in public sector,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16,
no. 1. pp. 491–499, 2019, doi: 10.11591/ijeecs.v16.i1.pp491-499.
[24] B. Jabir, N. Falih and K. Rahmani, “HR analytics a roadmap for decision making: Case study,” Indonesian Journal
of Electrical Engineering and Computer Science (IJEECS), vol. 15, no. 2. pp. 979–990, 2019, doi:
10.11591/ijeecs.v15.i2.pp979-990.
[25] M. A. B. W. Nordin, D. Vedenyapin, M. F. Alghifari and T. S. Gunawan, “The disruptometer: An artificial
intelligence algorithm for market insights,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 8, no. 2.
pp. 727–734, 2019, doi: 10.11591/eei.v8i2.1494.
[26] O. A. Dawood, O. I. Hammadi, K. Shaker and M. Khalaf, “Multi-dimensional cubic symmetric block cipher
algorithm for encrypting big data,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp.
2569–2577, 2020, doi: 10.11591/eei.v9i6.2475.
[27] W. A. R. Wan Mohd Isa, A. I. H. Suhaimi, N. Noordin, A. F. Harun, J. Ismail and R. A. Teh, “Factors influencing
cloud computing adoption in higher education institution,” Indonesian Journal of Electrical Engineering and
Computer Science )(IJEECS), vol. 17, no. 1. pp. 412–419, 2019, doi: 10.11591/ijeecs.v17.i1.pp412-419.
[28] S. Wilson and R. Sivakumar, “Twitter data analysis using hadoop ecosystems and apache zeppelin,” Indonesian
Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16, no. 3. pp. 1490–1498, 2019, doi:
10.11591/ijeecs.v16.i3.pp1490-1498.
[29] L. Y. Fang, N. F. M. Azmi, Y. Yahya, H. Sarkan, N. N. A. Sjarif and S. Chuprat, “Mobile business intelligence
acceptance model for organisational decision making,” Bulletin of Electrical Engineering and Informatics (BEEI),
vol. 7, no. 4. pp. 650–656, 2018, doi: 10.11591/eei.v7i4.1356.
[30] P. D. Ibnugraha, L. E. Nugroho and P. I. Santosa, “An approach for risk estimation in information security using
text mining and jaccard method,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 7, no. 3. pp. 393–
399, 2018, doi: 10.11591/eei.v7i3.847.
[31] M. Z. H. Jesmeen et al., “A survey on cleaning dirty data using machine learning paradigm for big data analytics,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 10, no. 3. pp. 1234–1243,
2018, doi: 10.11591/ijeecs.v10.i3.pp1234-1243.
[32] N. Prasanna Moorthi and Mathivananr, “A study about SOA based agriculture management data framework,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 9, no. 1. pp. 39–42, 2018, doi:
10.11591/ijeecs.v9.i1.pp39-42.
[33] A. M. Saleh, H. Y. Abuaddous, O. Enaizan and F. Ghabban, “User experience assessment of a COVID-19 tracking
mobile application (AMAN) in Jordan,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 23, no. 2. pp. 1120–1127, 2021, doi: 10.11591/ijeecs.v23.i2.pp1120-1127.
[34] Hertina et al., “Data mining applied about polygamy using sentiment analysis on twitters in indonesian perception,”
Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 4. pp. 2231–2236, 2021, doi:
10.11591/EEI.V10I4.2325.
[35] T. A. Tran, J. Duangsuwan and W. Wettayaprasit, “A new approach for extracting and scoring aspect using
SentiWordNet,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 3. pp.
1731–1738, 2021, doi: 10.11591/ijeecs.v22.i3.pp1731-1738.
[36] I. S. Nasir, A. H. Mousa and I. L. Hussein Alsammak, “SMUPI-BIS: A synthesis model for users’ perceived impact
of business intelligence systems,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 21, no. 3. pp. 1856–1867, 2021, doi: 10.11591/ijeecs.v21.i3.pp1856-1867.
[37] C. R. Pattnaik, S. N. Mohanty, S. Mohanty, J. M. Chatterjee, B. Jana and V. García-Díaz, “A fuzzy multi-criteria
decision-making method for purchasing life insurance in india,” Bulletin of Electrical Engineering and Informatics
(BEEI), vol. 10, no. 1. pp. 344–356, 2021, doi: 10.11591/eei.v10i1.2275.
[38] N. S. Shaeeali, A. Mohamed and S. Mutalib, “Customer reviews analytics on food delivery services in social
media: A review,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 4. pp. 691–699, 2020, doi:
10.11591/ijai.v9.i4.pp691-699.
[39] A. S. Oh, “Smart urban farming service model with IoT based open platform,” Indonesian Journal of Electrical
Engineering and Computer Science, vol. 20, no. 1. pp. 320–328, 2020, doi: 10.11591/ijeecs.v20.i1.pp320-328.
[40] N. M. Mahfuz, M. Yusoff and Z. Ahmad, “Review of single clustering methods,” IAES International Journal of
Artificial Intelligence, vol. 8, no. 3. pp. 221–227, 2019, doi: 10.11591/ijai.v8.i3.pp221-227.
[41] F. A. N. Rashid, N. S. Suriani and A. Nazari, “Kinect-based physiotherapy and assessment: A comprehensive
review,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 11, no. 3. pp. 1176–
1187, 2018, doi: 10.11591/ijeecs.v11.i3.pp1176-1187.
[42] Z. Faisal and N. K. El Abbadi, “Detection and recognition of brain tumor based on DWT, PCA and ANN,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp. 56–63, 2019, doi:
10.11591/ijeecs.v18.i1.pp56-63.
[43] A. A. Vinaya, S. Yulianto, Q. A. M. O. Arifianti, D. Arifianto and A. S. Aisjah, “Machinery signal separation using
non-negative matrix factorization with real mixing,” Bulletin of Electrical Engineering and Informatics (BEEI),
vol. 9, no. 4. pp. 1468–1476, 2020, doi: 10.11591/eei.v9i4.1956.
[44] M. S. Abdul Razak and C. R. Nirmala, “A computing model for trend analysis in stock data stream classification,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 19, no. 3. pp. 1602–1609,
2020, doi: 10.11591/ijeecs.v19.i3.pp1602-1609.
Bulletin of Electr Eng & Inf ISSN: 2302-9285 
An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali)
3401
[45] K. Gangadharan, G. R. N. Kumari, D. Dhanasekaran and K. Malathi, “Detection and classification of various pest
attacks and infection on plants using RBPN with GA based PSO algorithm,” Indonesian Journal of Electrical
Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1278–1288, 2020, doi:
10.11591/ijeecs.v20.i3.pp1278-1288.
[46] K. Okokpujie, S. John, C. Ndujiuba, J. A. Badejo and E. Noma-Osaghae, “An improved age invariant face
recognition using data augmentation,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1. pp.
179–191, 2021, doi: 10.11591/eei.v10i1.2356.
[47] S. K. Addagarla and A. Amalanathan, “e-SimNet: A visual similar product recommender system for E-commerce,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 563–570, 2021,
doi: 10.11591/ijeecs.v22.i1.pp563-570.
[48] A. M. Martinez and A. C. Kak, "PCA versus LDA," in IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001, doi: 10.1109/34.908974.
[49] M. A. Ahmed, R. A. Hasan, A. H. Ali and M. A. Mohammed, “The classification of the modern arabic poetry using
machine learning,” TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 17, no. 5, pp.
2667–2674, 2019, doi: 10.12928/telkomnika.v17i5.12646.
[50] I. Kamal, K. Housni and Y. Hadi, “Online dictionary learning for car recognition using sparse coding and lars,”
IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 1. pp. 164–174, 2020, doi:
10.11591/ijai.v19i1.pp164-174.
[51] B. Vijayalaxmi, C. Anuradha, K. Sekaran, M. N. Meqdad and S. Kadry, “Image processing based eye detection
methods a theoretical review,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 3. pp. 1189–
1197, 2020, doi: 10.11591/eei.v9i3.1783.
[52] M. Z. Alksasbeh et al., “Smart hand gestures recognition using K-NN based algorithm for video annotation
purposes,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 1. pp. 242–
252, 2021, doi: 10.11591/ijeecs.v21.i1.pp242-252.
[53] H. M. Salman, A. K. M. Al-Qurabat and A. A. R. Finjan, “Bigradient neural network-based quantum particle
swarm optimization for blind source separation,” IAES International Journal of Artificial Intelligence (IJAI), vol.
10, no. 2. pp. 355–364, 2021, doi: 10.11591/ijai.v10.i2.pp355-364.
[54] A. Parveen, Z. H. Khan and S. N. Ahmad, “Classification and evaluation of digital forensic tools,” TELKOMNIKA
Telecommunication, Computing, Electronics and Control, vol. 18, no. 6, pp. 3096–3106, 2020, doi:
10.12928/telkomnika.v18i6.15295.
[55] S. Xu et al., “The fuzzy comprehensive evaluation (FCE) and the principal component analysis (PCA) model
simulation and its applications in water quality assessment of Nansi Lake Basin, China,” Environmental
Engineering Research, vol. 26, no. 2, pp. 222–232, 2021, doi: 10.4491/eer.2020.022.
[56] G. Gorrell and B. Webb, “Generalized hebbian algorithm for incremental latent semantic analysis,” Ninth European
Conference on Speech Communication and Technology, 2005.
[57] A. H. Ali and M. Z. Abdullah, “A parallel grid optimization of SVM hyperparameter for big data classification
using spark Radoop,” Karbala International Journal of Modern Science, vol. 6, no. 1, article 3, pp. 1-18, 2020, doi:
10.33640/2405-609X.1270.
[58] A. H. Ali and M. Z. Abdullah, “A novel approach for big data classification based on hybrid parallel dimensionality
reduction using spark cluster,” Computer Science, vol. 20, no. 4, 2019, doi: 10.7494/csci.2019.20.4.3373.
[59] R. Baklouti, M. Mansouri, M. Nounou, Z. Ben Messaoud and A. Ben Hamida, "Generalized Hebbian Algorithm for
fault detection of CSTR model," 2016 2nd International Conference on Advanced Technologies for Signal and
Image Processing (ATSIP), 2016, pp. 421-424, doi: 10.1109/ATSIP.2016.7523127.
[60] K. Song, B. Zhang, W. Li, L. Yan and X. Wang, “Research on parallel principal component analysis based on
ternary optical computer,” Optik (Stuttg)., vol. 241, 2021, doi: 10.1016/j.ijleo.2021.167176.
[61] M. K. Alsmadi, M. Tayfour, R. A. Alkhasawneh, U. Badawi, I. Almarashdeh and F. Haddad, “Robust feature
extraction methods for general fish classification,” International Journal of Electrical and Computer Engineering
(IJECE), vol. 9, no. 6, pp. 5192–5204, 2019, doi: 10.11591/ijece.v9i6.pp5192-5204.
[62] M. S. Al_Duais and F. S. Mohamad, “Improved Time Training with Accuracy of Batch Back Propagation
Algorithm Via Dynamic Learning Rate and Dynamic Momentum Factor,” IAES International Journal of Artificial
Intelligence, vol. 7, no. 4, pp. 170-178, 2018, doi: 10.11591/ijai.v7.i4.pp170-178.
[63] M. Jupri and R. Sarno, “Data mining, fuzzy AHP and TOPSIS for optimizing taxpayer supervision,” Indonesian
Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp. 75–87, 2019, doi:
10.11591/ijeecs.v18.i1.pp75-87.
[64] S. Mohamed and A. Ezzati, “A data mining process using classification techniques for employability prediction,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 14, no. 2. pp. 1025–1029,
2019, doi: 10.11591/ijeecs.v14.i2.pp1025-1029.
[65] E. B. B. Palad, M. J. F. Burden, C. R. Dela Torre and R. B. C. Uy, “Performance evaluation of decision tree
classification algorithms using fraud datasets,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9,
no. 6. pp. 2518–2525, 2020, doi: 10.11591/eei.v9i6.2630.
[66] L. M. Padirayon, M. S. Atayan, J. S. Panelo and C. R. Fagela Jr., “Mining the crime data using naïve Bayes
model,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 23, no. 2. pp. 1084–
1092, 2021, doi: 10.11591/ijeecs.v23.i2.pp1084-1092.
[67] Y. Choubik, A. Mahmoudi, M. M. Himmi and L. El Moudnib, “STA/LTA trigger algorithm implementation on a
seismological dataset using hadoop mapreduce,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9,
no. 2. pp. 269–275, 2020, doi: 10.11591/ijai.v9.i2.pp269-275.
 ISSN: 2302-9285
Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402
3402
[68] D. A. Jasm, M. M. Hamad and A. T. H. Alrawi, “Deep image mining for convolution neural network,” Indonesian
Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 1. pp. 347–352, 2020, doi:
10.11591/ijeecs.v20.i1.pp347-352.
[69] S. W. Kareem, R. Z. Yousif and S. M. J. Abdalwahid, “An approach for enhancing data confidentiality in hadoop,”
Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1547–1555,
2020, doi: 10.11591/ijeecs.v20.i3.pp1547-1555.
[70] E. E. Abel, A. L. M. Shafie and W. H. Chan, “Deployment of internet of things-based cloudlet-cloud for
surveillance operations,” IAES International Journal of Artificial Intelligence (IJAI), vol. 10, no. 1. pp. 24–34,
2021, doi: 10.11591/ijai.v10.i1.pp24-34.
[71] S. Abed, L. Waleed, G. Aldamkhi and K. Hadi, “Enhancement in data security and integrity using minhash
technique,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 3. pp.
1739–1750, 2021, doi: 10.11591/ijeecs.v21.i3.pp1739-1750.
[72] S. M. Mohammed, K. Jacksi and S. R. M. Zeebaree, “A state-of-the-art survey on semantic similarity for document
clustering using GloVe and density-based algorithms,” Indonesian Journal of Electrical Engineering and Computer
Science (IJEECS), vol. 22, no. 1. pp. 552–562, 2021, doi: 10.11591/ijeecs.v22.i1.pp552-562.
[73] A. Joshi and S. D. Munisamy, “Enhancement of cloud performance metrics using dynamic degree memory
balanced allocation algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS),
vol. 22, no. 3. pp. 1697–1707, 2021, doi: 10.11591/ijeecs.v22.i3.pp1697-1707.
[74] N. M. M. Sobran, M. M. Salmi, M. B. Bahar, M. N. Othman and S. H. Johari, “Fuzzy Takagi-Sugeno Method in
Microcontroller Based Water Tank System,” International Journal of Robotics and Automation (IJRA), vol. 7, no.
1, pp. 1–7, 2018, doi: 10.11591/ijra.v7i1.pp1-7.
[75] M. A. I. Al Jewari, A. Jidin, S. A. A. Tarusan and M. Rasheed, “Implementation of SVM for five-level cascaded H-
Bridge multilevel inverters utilizing FPGA,” International Journal of Power Electronics and Drive Systems
(IJPEDS), vol. 11, no. 3, pp. 1132-1144, 2020, doi: 10.11591/ijpeds.v11.i3.pp1132-1144.
[76] M. A. Mohammed, I. A. Mohammed, R. A. Hasan, N. Ţăpuş, A. H. Ali and O. A. Hammood, "Green Energy
Sources: Issues and Challenges," 2019 18th RoEduNet Conference: Networking in Education and Research
(RoEduNet), 2019, pp. 1-8, doi: 10.1109/ROEDUNET.2019.8909595.
[77] M. A. Mohammed, Z. H. Salih, N. Ţăpuş and R. A. K. Hasan, “Security and accountability for sharing the data
stored in the cloud,” in 2016 15th RoEduNet Conference: Networking in Education and Research, 2016, pp. 1–5.
[78] M. A. Mohammed and N. ŢĂPUŞ, “A novel approach of reducing energy consumption by utilizing enthalpy in
mobile cloud computing,” Studies in Informatics and Control, vol. 26, no. 4, pp. 425–434, 2017, doi:
https://doi.org/10.24846/v26i4y201706.
[79] N. Q. Mohammed, M. S. Ahmed, M. A. Mohammed, O. A. Hammood, H. A. N. Alshara and A. A. Kamil,
"Comparative Analysis between Solar and Wind Turbine Energy Sources in IoT Based on Economical and
Efficiency Considerations," 2019 22nd International Conference on Control Systems and Computer Science
(CSCS), 2019, pp. 448-452, doi: 10.1109/CSCS.2019.00082.
[80] R. A. I. Alhayali, M. A. Ahmed, Y. M. Mohialden and A. H. Ali, “Efficient method for breast cancer classification
based on ensemble hoffeding tree and naïve Bayes,” Indonesian Journal of Electrical Engineering and Computer
Science, vol. 18, no. 2, pp. 1074–1080, 2020, doi: 10.11591/ijeecs.v18.i2.pp1074-1080.
[81] Z. H. Salih, G. T. Hasan and M. A. Mohammed, "Investigate and analyze the levels of electromagnetic radiations
emitted from underground power cables extended in modern cities," 2017 9th International Conference on
Electronics, Computers and Artificial Intelligence (ECAI), 2017, pp. 1-4, doi: 10.1109/ECAI.2017.8166452.
[82] Z. H. Salih, G. T. Hasan, M. A. Mohammed, M. A. S. Klib, A. H. Ali and R. A. Ibrahim, "Study the Effect of
Integrating the Solar Energy Source on Stability of Electrical Distribution System," 2019 22nd International
Conference on Control Systems and Computer Science (CSCS), 2019, pp. 443-447, doi:
10.1109/CSCS.2019.00081.
[83] N. D. Zaki, N. Y. Hashim, Y. M. Mohialden, M. A. Mohammed, T. Sutikno and A. H. Ali, “A real-time big data
sentiment analysis for iraqi tweets using spark streaming,” Bulletin of Electrical Engineering and Informatics
(BEEI), vol. 9, no. 4, pp. 1411–1419, 2020, doi: 10.11591/eei.v9i4.1897.
[84] M. Pradhan, “Evolutionary computational algorithm by blending of PPCA and EP-Enhanced supervised classifier
for microarray gene expression data,” IAES International Journal of Artificial Intelligence (IJAI), vol. 7, no. 2. pp.
95–104, 2018, doi: 10.11591/ijai.v7.i2.pp95-104.
[85] E. A. Gheni and Z. M. Algelal, “Human face recognition methods based on principle component analysis (PCA),
wavelet and support vector machine (SVM): a comparative study,” Indonesian Journal of Electrical Engineering
and Computer Science (IJEECS), vol. 20, no. 2. pp. 991–999, 2020, doi: 10.11591/ijeecs.v20.i2.pp991-999.
[86] P. V Kumar and K. M. Jeevan, “Face recognition with frame size reduction and DCT compression using PCA
algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 168–
178, 2021, doi: 10.11591/ijeecs.v21.i4.pp168-178.
[87] C. Darujati, S. M. Susiki Nugroho, D. Kurniawan and M. Hariadi, “Enhancing the feature-based 3D deformable
face recognition using hybrid PCA-NN,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 22, no. 1. pp. 215–221, 2021, doi: 10.11591/ijeecs.v21.i4.pp215-221.
[88] N. M. Hussien, Y. M. Mohialden, N. T. Ahmed, M. A. Mohammed and T. Sutikno, “A smart gas leakage
monitoring system for use in hospitals,” Indonesian Journal of Electrical Engineering and Computer Science
(IJEECS), vol. 19, no. 2, pp. 1048–1054, 2020, doi: 10.11591/ijeecs.v19.i2.pp1048-1054.
Ad

More Related Content

What's hot (20)

Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
PyData
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
Vijay Srinivas Agneeswaran, Ph.D
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
journalBEEI
 
Technical_Report_on_ML_Library
Technical_Report_on_ML_LibraryTechnical_Report_on_ML_Library
Technical_Report_on_ML_Library
Saurabh Chauhan
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
Jinseob Kim
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
4213ijaia02
4213ijaia024213ijaia02
4213ijaia02
ijaia
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
Alexander Decker
 
Paper id 25201498
Paper id 25201498Paper id 25201498
Paper id 25201498
IJRAT
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Jongwook Woo
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
IMC Institute
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
MIT College Of Engineering,Pune
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Vijay Srinivas Agneeswaran, Ph.D
 
The Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDThe Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICD
Shu-Jeng Hsieh
 
Improving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoopImproving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoop
eSAT Journals
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
PyData
 
Implementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big dataImplementation of p pic algorithm in map reduce to handle big data
Implementation of p pic algorithm in map reduce to handle big data
eSAT Publishing House
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
journalBEEI
 
Technical_Report_on_ML_Library
Technical_Report_on_ML_LibraryTechnical_Report_on_ML_Library
Technical_Report_on_ML_Library
Saurabh Chauhan
 
Big Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLabBig Data Analytics with Storm, Spark and GraphLab
Big Data Analytics with Storm, Spark and GraphLab
Impetus Technologies
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 
4213ijaia02
4213ijaia024213ijaia02
4213ijaia02
ijaia
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
Impetus Technologies
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
Alexander Decker
 
Paper id 25201498
Paper id 25201498Paper id 25201498
Paper id 25201498
IJRAT
 
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure MLBig Data Analysis in Hydrogen Station using Spark and Azure ML
Big Data Analysis in Hydrogen Station using Spark and Azure ML
Jongwook Woo
 
Big Data Analytics using Mahout
Big Data Analytics using MahoutBig Data Analytics using Mahout
Big Data Analytics using Mahout
IMC Institute
 
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking AlgorithmPerformance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
Performance Improvement of Heterogeneous Hadoop Cluster using Ranking Algorithm
IRJET Journal
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
MIT College Of Engineering,Pune
 
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Distributed computing abstractions_data_science_6_june_2016_ver_0.4
Vijay Srinivas Agneeswaran, Ph.D
 
The Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICDThe Exabyte Journey and DataBrew with CICD
The Exabyte Journey and DataBrew with CICD
Shu-Jeng Hsieh
 
Improving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoopImproving performance of apriori algorithm using hadoop
Improving performance of apriori algorithm using hadoop
eSAT Journals
 

Similar to An effective classification approach for big data with parallel generalized Hebbian algorithm (20)

Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
An efficient and robust parallel scheduler for bioinformatics applications in...
An efficient and robust parallel scheduler for bioinformatics applications in...An efficient and robust parallel scheduler for bioinformatics applications in...
An efficient and robust parallel scheduler for bioinformatics applications in...
nooriasukmaningtyas
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET Journal
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
OpenACC
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
OpenACC
 
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
paperpublications3
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
tsysglobalsolutions
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
Alexander Decker
 
Multi-objective load balancing in cloud infrastructure through fuzzy based de...
Multi-objective load balancing in cloud infrastructure through fuzzy based de...Multi-objective load balancing in cloud infrastructure through fuzzy based de...
Multi-objective load balancing in cloud infrastructure through fuzzy based de...
IAESIJAI
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
cscpconf
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)
IJCSEA Journal
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
redpel dot com
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
Performance evaluation of Map-reduce jar pig hive and spark with machine lear...
IJECEIAES
 
An efficient and robust parallel scheduler for bioinformatics applications in...
An efficient and robust parallel scheduler for bioinformatics applications in...An efficient and robust parallel scheduler for bioinformatics applications in...
An efficient and robust parallel scheduler for bioinformatics applications in...
nooriasukmaningtyas
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce FrameworkBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce Framework
Mahantesh Angadi
 
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduceBIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
BIGDATA- Survey on Scheduling Methods in Hadoop MapReduce
Mahantesh Angadi
 
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop FrameworkIRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET- Big Data Processes and Analysis using Hadoop Framework
IRJET Journal
 
OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020OpenACC Monthly Highlights: October2020
OpenACC Monthly Highlights: October2020
OpenACC
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
OpenACC
 
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
Literature Survey on Buliding Confidential and Efficient Query Processing Usi...
paperpublications3
 
IEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and AbstractIEEE Parallel and distributed system 2016 Title and Abstract
IEEE Parallel and distributed system 2016 Title and Abstract
tsysglobalsolutions
 
A data aware caching 2415
A data aware caching 2415A data aware caching 2415
A data aware caching 2415
SANTOSH WAYAL
 
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...IRJET-  	  A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET- A Review on K-Means++ Clustering Algorithm and Cloud Computing wit...
IRJET Journal
 
Qo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environmentQo s aware scientific application scheduling algorithm in cloud environment
Qo s aware scientific application scheduling algorithm in cloud environment
Alexander Decker
 
Multi-objective load balancing in cloud infrastructure through fuzzy based de...
Multi-objective load balancing in cloud infrastructure through fuzzy based de...Multi-objective load balancing in cloud infrastructure through fuzzy based de...
Multi-objective load balancing in cloud infrastructure through fuzzy based de...
IAESIJAI
 
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCENETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
NETWORK TRAFFIC ANALYSIS: HADOOP PIG VS TYPICAL MAPREDUCE
cscpconf
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENTLARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT
ijwscjournal
 
RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)RSDC (Reliable Scheduling Distributed in Cloud Computing)
RSDC (Reliable Scheduling Distributed in Cloud Computing)
IJCSEA Journal
 
Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...Performance evaluation and estimation model using regression method for hadoo...
Performance evaluation and estimation model using regression method for hadoo...
redpel dot com
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFSIRJET- Performing Load Balancing between Namenodes in HDFS
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Ad

More from riyaniaes (10)

Remote sensing and GIS application for monitoring drought vulnerability in In...
Remote sensing and GIS application for monitoring drought vulnerability in In...Remote sensing and GIS application for monitoring drought vulnerability in In...
Remote sensing and GIS application for monitoring drought vulnerability in In...
riyaniaes
 
Comparative review on information and communication technology issues in educ...
Comparative review on information and communication technology issues in educ...Comparative review on information and communication technology issues in educ...
Comparative review on information and communication technology issues in educ...
riyaniaes
 
The development and implementation of an android-based saving and loan cooper...
The development and implementation of an android-based saving and loan cooper...The development and implementation of an android-based saving and loan cooper...
The development and implementation of an android-based saving and loan cooper...
riyaniaes
 
Online medical consultation: covid-19 system using software object-oriented a...
Online medical consultation: covid-19 system using software object-oriented a...Online medical consultation: covid-19 system using software object-oriented a...
Online medical consultation: covid-19 system using software object-oriented a...
riyaniaes
 
Successful factors determining the significant relationship between e-governa...
Successful factors determining the significant relationship between e-governa...Successful factors determining the significant relationship between e-governa...
Successful factors determining the significant relationship between e-governa...
riyaniaes
 
Non-linear behavior of root and stem diameter changes in monopodial orchid
Non-linear behavior of root and stem diameter changes in monopodial orchidNon-linear behavior of root and stem diameter changes in monopodial orchid
Non-linear behavior of root and stem diameter changes in monopodial orchid
riyaniaes
 
Cucumber disease recognition using machine learning and transfer learning
Cucumber disease recognition using machine learning and transfer learningCucumber disease recognition using machine learning and transfer learning
Cucumber disease recognition using machine learning and transfer learning
riyaniaes
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
riyaniaes
 
Android mobile application for wildfire reporting and monitoring
Android mobile application for wildfire reporting and monitoringAndroid mobile application for wildfire reporting and monitoring
Android mobile application for wildfire reporting and monitoring
riyaniaes
 
An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...
riyaniaes
 
Remote sensing and GIS application for monitoring drought vulnerability in In...
Remote sensing and GIS application for monitoring drought vulnerability in In...Remote sensing and GIS application for monitoring drought vulnerability in In...
Remote sensing and GIS application for monitoring drought vulnerability in In...
riyaniaes
 
Comparative review on information and communication technology issues in educ...
Comparative review on information and communication technology issues in educ...Comparative review on information and communication technology issues in educ...
Comparative review on information and communication technology issues in educ...
riyaniaes
 
The development and implementation of an android-based saving and loan cooper...
The development and implementation of an android-based saving and loan cooper...The development and implementation of an android-based saving and loan cooper...
The development and implementation of an android-based saving and loan cooper...
riyaniaes
 
Online medical consultation: covid-19 system using software object-oriented a...
Online medical consultation: covid-19 system using software object-oriented a...Online medical consultation: covid-19 system using software object-oriented a...
Online medical consultation: covid-19 system using software object-oriented a...
riyaniaes
 
Successful factors determining the significant relationship between e-governa...
Successful factors determining the significant relationship between e-governa...Successful factors determining the significant relationship between e-governa...
Successful factors determining the significant relationship between e-governa...
riyaniaes
 
Non-linear behavior of root and stem diameter changes in monopodial orchid
Non-linear behavior of root and stem diameter changes in monopodial orchidNon-linear behavior of root and stem diameter changes in monopodial orchid
Non-linear behavior of root and stem diameter changes in monopodial orchid
riyaniaes
 
Cucumber disease recognition using machine learning and transfer learning
Cucumber disease recognition using machine learning and transfer learningCucumber disease recognition using machine learning and transfer learning
Cucumber disease recognition using machine learning and transfer learning
riyaniaes
 
Using particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problemsUsing particle swarm optimization to solve test functions problems
Using particle swarm optimization to solve test functions problems
riyaniaes
 
Android mobile application for wildfire reporting and monitoring
Android mobile application for wildfire reporting and monitoringAndroid mobile application for wildfire reporting and monitoring
Android mobile application for wildfire reporting and monitoring
riyaniaes
 
An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...An empirical assessment of different kernel functions on the performance of s...
An empirical assessment of different kernel functions on the performance of s...
riyaniaes
 
Ad

Recently uploaded (20)

Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
Resistance measurement and cfd test on darpa subboff model
Resistance measurement and cfd test on darpa subboff modelResistance measurement and cfd test on darpa subboff model
Resistance measurement and cfd test on darpa subboff model
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
Data Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptxData Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptx
RushaliDeshmukh2
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
The Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLabThe Gaussian Process Modeling Module in UQLab
The Gaussian Process Modeling Module in UQLab
Journal of Soft Computing in Civil Engineering
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIHlecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
Abodahab
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Process Parameter Optimization for Minimizing Springback in Cold Drawing Proc...
Journal of Soft Computing in Civil Engineering
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
Data Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptxData Structures_Linear data structures Linked Lists.pptx
Data Structures_Linear data structures Linked Lists.pptx
RushaliDeshmukh2
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)AI-assisted Software Testing (3-hours tutorial)
AI-assisted Software Testing (3-hours tutorial)
Vəhid Gəruslu
 
How to use nRF24L01 module with Arduino
How to use nRF24L01 module with ArduinoHow to use nRF24L01 module with Arduino
How to use nRF24L01 module with Arduino
CircuitDigest
 
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIHlecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
lecture5.pptxJHKGJFHDGTFGYIUOIUIPIOIPUOHIYGUYFGIH
Abodahab
 
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITYADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ADVXAI IN MALWARE ANALYSIS FRAMEWORK: BALANCING EXPLAINABILITY WITH SECURITY
ijscai
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Introduction to FLUID MECHANICS & KINEMATICS
Introduction to FLUID MECHANICS &  KINEMATICSIntroduction to FLUID MECHANICS &  KINEMATICS
Introduction to FLUID MECHANICS & KINEMATICS
narayanaswamygdas
 
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptxExplainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
Explainable-Artificial-Intelligence-XAI-A-Deep-Dive (1).pptx
MahaveerVPandit
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
Smart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineeringSmart Storage Solutions.pptx for production engineering
Smart Storage Solutions.pptx for production engineering
rushikeshnavghare94
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Data Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptxData Structures_Introduction to algorithms.pptx
Data Structures_Introduction to algorithms.pptx
RushaliDeshmukh2
 

An effective classification approach for big data with parallel generalized Hebbian algorithm

  • 1. Bulletin of Electrical Engineering and Informatics Vol. 10, No. 6, December 2021, pp. 3393~3402 ISSN: 2302-9285, DOI: 10.11591/eei.v10i6.3135 3393 Journal homepage: http://beei.org An effective classification approach for big data with parallel generalized Hebbian algorithm Ahmed Hussein Ali1 , Royida A. Ibrahem Alhayali2 , Mostafa Abdulghafoor Mohammed3 , Tole Sutikno4 1 ICCI, Informatics Institute for Postgraduate Studies, Baghdad, Iraq 1,2 Department of Computer, College of Education, AL-Iraqia University, Iraq 1 Department of Computer Science, AL Salam University College, Iraq 2 Department of Computer Engineering, College of Engineering, University of Diyala, Diyala, Iraq 3 Imam Aadham University College, Iraq 4 Department of Electical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia Article Info ABSTRACT Article history: Received Jun 30, 2021 Revised Aug 19, 2021 Accepted Oct 31, 2021 Advancements in information technology is contributing to the excessive rate of big data generation recently. Big data refers to datasets that are huge in volume and consumes much time and space to process and transmit using the available resources. Big data also covers data with unstructured and structured formats. Many agencies are currently subscribing to research on big data analytics owing to the failure of the existing data processing techniques to handle the rate at which big data is generated. This paper presents an efficient classification and reduction technique for big data based on parallel generalized Hebbian algorithm (GHA) which is one of the commonly used principal component analysis (PCA) neural network (NN) learning algorithms. The new method proposed in this study was compared to the existing methods to demonstrate its capabilities in reducing the dimensionality of big data. The proposed method in this paper is implemented using Spark Radoop platform. Keywords: Big data Generalized Hebbian algorithm Machine learning Neural network Principal component analysis Spark Radoop This is an open access article under the CC BY-SA license. Corresponding Author: Ahmed Hussein Ali Department of Computer Science, AL Salam University College 119 Baghdad, Taji, Iraq Email: [email protected] 1. INTRODUCTION The problem of big data borders on their size, volume, and rate of generation from multiple sources (including machines and human) [1]-[13]. There are many forms of big data, such as web and social media data, business transaction data, machine-to-machine data, and biometric data [14]-[39]. Big data cannot be described just as a large database but is often unstructured and is currently on the increase in all domains. High dimensional input data streams are highly important for most information processing tasks, such as communication and pattern recognition, and can help in reducing noise and redundancy to allow for the extraction of useful information from input signals. Consequently, information processing, transmission, and storage on both software and hardware has become easier due to the ability to reduce data dimensionality. One of the common feature extraction methods is principal component analysis (PCA) which is used to extracts useful information through establishing the patterns in the input space. PCA is mainly aimed at obtaining the accurate data representation that can reduce the redundant components [40]-[47]. The PCA [48], [49] transform is mainly used for tasks such as pattern recognition, data compression, and classification. It is also called Karhunen-Loeve transform (KLT) [50]-[53]. Despite the wide application of numerous PCA-based algorithms [14], [54], most are not suitable for real time applications due to the high
  • 2.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402 3394 computational complexity of such algorithms in high dimension feature vectors. So, the computational speed of PCA can be improved by using a number of algorithms, even though the nagging problem. Most of these algorithms are only implemented as software to achieve moderate performance. PCA and its variants can also be implemented on hardware, but this requires enough resources and complex circuit control systems. Hence, it is only considered for small dimension. Implementation on PCA neural network (NN) is another alternative for PCA implementation [55]. This is done using the GHA [56] but the problem is the slow convergency of GHA which make it mandatory to perform several iterations, thereby prolonging the computational time for most GHA-based algorithms. Most data reduction techniques are exceptional in saving bandwidth and time through enabling user to process large datasets using minimal available resources. Being that much data is involved in data mining process, data reduction processes have become mandatory as the aim is to retrieve important information from such large datasets. Data size reduction is also a nagging problem because most of the straightforward techniques only work on small data and not on big data. Hence, software design stage is a crucial phase during the building of data reduction algorithms for big data processing. The recent works on parallel big data dimensionality reduction are reviewed in this section. The study by [57] presented a hybrid PGO-SVM-based model that combined SVM with PGO for improved classification accuracy even when faced with small number of feature subsets. The proposed PGOSVM was implemented with Spark Radoop with distributed data points storage using Hadoop dispersed file system (HDFS). The classification efficiency of the proposed model on large dataset was better and exhibited faster execution time than the benchmark method. Scala was used as the programming language to implement the PGOSVM while Covtype and Higgs datasets were analyzed. Another study present a fast HP-PL model [58] as a new way of improving DR and classification accuracy. The system was implemented on Apache Spark and was capable of selecting the best features within the shortest computational time. Even though the improvement level is dependent on the data features, the system showed good performance on the number of evaluated nodes for the tested datasets. The iterated PCA (IPCA) [59] method has been proposed for fault detection in a continuously stirred tank reactor (CSTR) model. The proposed IPCA relies on the GHA for memory complexity problems. The reason for addressing the fault detection problem is to facilitate online computation of the principal components in a recursive manner. The GHA was developed to define a function that can merge all the major factors that affect the fault detection capability of the developed model. Song et al proposed the TOC-based PCA algorithm [60] that can exploit the advantage of optical computing in big data computation to solve the issues related to the PCA algorithm in electronic computers. The parallel operation of the system ensured that the efficiency is greatly improved. Another paper by Jian et al. [59] demonstrated that the GHA has non-approaching adaptive learning rates by investigating the convergence of the GHA using the DDT method. It is simple to solve the computational roundoff constraints and satisfy the tracking requirements in real applications because these adaptive learning rates can achieve non-zero constants convergence. As a generalization of the Hebbian learning paradigm, Eraldo and colleagues [60] proposed a new adaptation strategy for linear neural networks. In this paper, an efficient classification and reduction technique for big data based on parallel generalized Hebbian algorithm (GHA) and implemented by using Spark Radoop platform will be presented. The new method proposed will be compared to the existing methods to demonstrate its capabilities in reducing the dimensionality of big data. This paper is organized in this manner: apache spark radoop is presented in section 2. The principle of GHA and the suggested method are presented in section 3. The materials used in this work, and the methods employed are presented in section 4, and the results and discussion are presented in section 5 while section 6 presents the conclusion and possible future works. 2. APACHE SPARK RADOOP RapidMiner Radoop is an extension of the in-memory functionality of RapidMiner that allows for the provision of sophisticated operators that are implementable for in-Hadoop execution [61]-[66]. It was developed as an extension of the in-memory functionality of RapidMiner for the provision of sophisticated operators that are implementable for in-Hadoop execution [67]-[73]. For data transformation in Radoop [61], there are more than 60 operators available. It is also capable of advanced and predictive modeling on Hadoop clusters in a distributed manner. RapidMiner [74] is a data mining application. Radoop relies on RapidMiner Studio's visual workflow designer to make the creation, implementation, and maintenance of predictive analytics in Hadoop as simple as possible. Because of Hadoop's code-free environment [62], [74] and built-in intelligence, the intricacies of the system are kept to a minimum, allowing the operator to concentrate solely on addressing business challenges rather than on technical concerns. This ensures that predictive analytics for both TBs and PBs of data is effective and scalable because the workflow execution is handled by Radoop rather than the user; all computations are executed in the Hadoop cluster that holds the data. Radoop was developed as an extension to ensure that Hadoop and RapidMiner could work together seamlessly. It is a data
  • 3. Bulletin of Electr Eng & Inf ISSN: 2302-9285  An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali) 3395 science software that simplifies the process of preparing data for machine learning on Hadoop and Radoop Spark (refer to Figure 1). Throughout RapidMiner Studio, all parallel operations and data processes are implemented on the SparkRM platform within the Hadoop cluster to ensure that Apache Spark may be used for task execution, hence broadening the applicability of the tool and enabling stronger algorithms. Hive and Mahout are made up of data analytics routines that have been well optimized, and as a result, they were used in this study as well. Figure 2 depicts the overall framework for the integration of Hadoop into RapidMiner. In this study, an extension was developed that allows for close connection with Hadoop while also providing the same Hadoop features as those used in memory-based RapidMiner operations. The initial stage in creating the Radoop is to include the RadoopNest meta-operator, which contains the basic cluster parameters. This meta-operator serves as a foundation for the operation of the remaining operators. Figure 1. Spark Radoop architecture Figure 2. The neural model for the GHA 3. THE PROPOSED GENERALIZED HEBBIAN ALGORITHM The GHA [75] is a linear feedforward NN framework that is well-suited for unsupervised learning applications and is often used in PCA. It is advantageous in terms of processing efficiency since it can handle the problem of eigenvalue using iterative approaches, which eliminates the need for direct computation of the covariance matrix. Because of the capacity to handle eigenvalue issues iteratively, there is no need to compute and answer eigenvalue issues in a linear fashion. As a result, GHA was created as a solution to memory complexity difficulties, particularly when dealing with large-scale data sets as shown in Figure 2. In order to provide a memory-efficient implementation, GHA is designed to be flexible and adaptable to time- varying distributions. Particle-counting analysis [75]-[81] is regarded as an attribute reduction method that is beneficial when dealing with data that is derived from numerous characteristics and contains some redundancy. Because they are most likely assessing the same concept, redundancy in this circumstance means that there is some type of correlation between some attributes. As a result of this redundancy, it is thought that the observed attributes can be reduced to a smaller number of PCs, each of which will be representative of the fluctuations in the observed characteristics. The PCA method uses orthogonal
  • 4.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402 3396 transformation to convert a set of data with linked qualities into a set of values referred to as principle components (uncorrelated attributes) [82]-[87]. Considering that the number of PCs is typically less than or equal to the number of original attributes, this transformation is defined in such a way that the variance of the first PC, which accounts for the majority of data variability, is as high as possible, and each of the succeeding components has the highest possible variance under the condition that it is orthogonal to the PCs [88]. This section pointed out some of the considerations for the implementation of the proposed algorithm with Radoop. Several steps are involved in the parallelization process. A virtual machine cluster was considered for the tuning and testing of the experimental conditions. The experiments were carried out on three different supercomputers. Because big data is taken into consideration in this work, it is probable that the dataset will contain a huge number of transactions. As a result, some of the large transactional data sets are kept in the HDFS, while numerous data fractions are distributed across the cluster nodes. The execution of jobs on data partitions is carried out in parallel by the Spark engine. We generated and processed a collection of RDDs in order to construct the set of frequently occurring l-iternsets, which were then arranged in descending order. The proposed PGHA is applicable in big data streaming using classification methods. It can be used to reduce all the stored dataset as HDFS files, and handle dataset with numeric features. Figure 3 presents the overall proposed algorithm for data reduction which can be implemented as Map and Reduce functions. Proposed Parallel GHA Input: S (Dense array) Output: T (Reduced array) 1 Begin 2 Execute the spark context (Slave) 3 Listen to the master connection. 4 Receive a dense array of data 5 Check the length of the columns M 6 Data should be parallelized by Spark (Master) 7 N rows of data were collected 8 do in parallel 9 Set the initial synaptic weights wij and thresholds j to small random values, such as [0, 1], and then repeat the procedure. Assign tiny positive values to the learning rate parameter as well as the forgetting factor. 10 Calculate the output of the neuron at iteration T. 11 Update the weights in the network: wij(p + 1) = wij(p) + ∆wij(p), // i, j = 1, 2, ..., n 12 Send reduction array T 13 Close connection 14 End Figure 3. Overall proposed algorithm for data reduction 4. MATERIAL AND METHOD A number of supervised classification approaches were considered in this study, including Nave Bayes, K-Nearest Neighbours, NN, and Random Forest, among others. To begin, Table 1 has a description of the system, while Table 2 has a description of the six datasets that were used in the study. The computation times for parallel GHA and parallel PCA on the identical hardware arrangement were used to present the results. With respect to the six large datasets that were used for the analysis in this study, the performance of Apache Spark and MLlib 2.0 was compared. The six datasets used in this study were obtained from the UCI ML repository. The experiments in this research are comprised of a Spark cluster that runs on Apache Zeppelin 0.7.1 and an HDFS, which is described in detail in the paper. The Spark cluster is made up of four nodes: a master node that executes the driver application, three worker nodes, and a cluster manager. The three nodes were configured in a manner that was similar to that shown in Figure 1. The three worker nodes were each given a memory allocation of 48 GB and were configured with four executors (each with a memory allocation of 4 GB) and two cores. Each worker was allotted three executors (each with a memory size of five gigabytes) while the master node was allotted two cores. A total of 16 GB of RAM has been assigned to the driver process. Scala 2.11.8 was used as the programming language for MLlib execution in
  • 5. Bulletin of Electr Eng & Inf ISSN: 2302-9285  An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali) 3397 the Spark 2.2.1 cluster, with Hadoop 2.7.3 serving as the distributed storage system. The amount of memory available to the executors in each worker node was changed in order to get the best possible performance. Table 1. Description of the system Operating system Windows10 CPU Intel® CoreTM i7-6700 processor running at 3.40 GHz with eight cores Memory 16 GB No. of workers 3 Computational framework Apache Spark 2.2.1 Compatible framework Radoop DSS HDFS (Hadoop 2.7.3) Code development editor Apache Zeppelin 0.7.1 Coding language Scala 2.11.8 Table 2. Datasets description Data No of record No of attributes No of classes Covtype 581012 54 7 Covtype- 2 581012 54 2 Higgs 11,000,000 28 2 Botnet Attacks 7,062,606 115 10 Dota2 102944 116 2 SUSY 5,000,000 18 2 5. RESULTS AND DISCUSSION The initial step of the classical GHA is loading the whole dataset to memory. Note that the data size must be within the limit that can fit within the memory size of the computer. Memoryup is a performance metric that assesses a parallel clustering algorithm's ability to efficiently utilize the available memory space on each node. It is possible to compute the memoryup by changing the memory size of each node while keeping the dataset and the number of nodes the same. The concept of the new GHA approach is to use the idea of data scanning by rows. The GHA approach can still implemented even when the data exceeds the computer memory size. A significant amount of CPU time is frequently lost in large datasets as a result of the unnecessary processing of redundant and non-representative data. The deletion of this type of data can frequently result in a significant increase in processing performance. A further benefit of eliminating nonrepresentative data from huge datasets is that storage and transmission of these datasets become less difficult. The computational advantages of the proposed new system were evaluated using numerical examples. The computation was performed on a third generation Intel core-i7 2.8GHz processor with 16GB DDR3 memory. The programming language for all the algorithms in the proposed big data GHA was C++. Through a thorough examination of the PGHA's running time utilizing the Radoop method and Parallel PCA, the comparison seeks to assess the speed performance of the algorithm. For the purposes of this example, we will suppose that the degree of support varies while the number of computer nodes remains at 3. Runtime with different support degrees for the datasets Covtype, Covtype-2, Higgs, Botnet Attacks, Dota2, and SUSY are depicted in Figures 4 (a)-(f). The algorithms is shown by the x-axis, while the running time is represented by the y-axis. The two techniques appear to be more efficient when the support degree is increased, as can be seen in the graph. Remember that our approach appears to be faster than parallel PCA when running on all datasets, which is a significant advantage. The performance of the proposed based was evaluated based on a single processor because if a parallel algorithm is used, the performance may be over-shadowed by the performance of the other algorithms. Execution times and speed-up ratios are depicted in Figure 4 in relation to the number of objects in the datasets for different numbers of processors. (a) (b) Figure 4. Running time of PGHA using Radoop compared to parallel PCA under different classification experiments. Naïve Bayes, NN, K-Nearest Neighbours and Random Forest; (a) Covetype, (b), Covetype 2
  • 6.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402 3398 (c) (d) (e) (f) Figure 4. Running time of PGHA using Radoop compared to parallel PCA under different classification experiments. Naïve Bayes, NN, K-Nearest Neighbours and Random Forest; (c) Higgs, (d) Botnet, (e) Dota2, (f) SUSY datasets (continue) The analysis of the dataset Covetype is only possible when the number of processors in our computer cluster system is equal to or greater than eight. When dealing with large amounts of data, the advantages of a distributed memory system are readily apparent. Based on the experimental results, it has been demonstrated that this parallel reduction method has superior speed-up and linear scaling behavior (time complexity), and that it may be used to overcome space complexity limits by using the aggregate memory of the reduction system. The performance evaluation of the new approach was based on the method of inducing the base classifier. The results of the experiments showed that the PGHA, as a data reduction tool, minimized the run time compared to parallel PCA algorithm as shown in Figure 4. However, our partial reduction method outperformed full reduction methods in many real-world data analysis and data reduction applications. 6. CONCLUSION The concept of parallel computing and parallel dimensionality reduction algorithms was introduced in this study. This article proposed the parallel algorithm concept based on the classical DR algorithm for effective handling of the issue encountered in big data mining. The proposed framework in this work was based on the previous studies with the aim of reducing the high volume of input data features while retaining the relevant information. To achieve this aim, both GHA and the proposed parallel algorithm were used to improve the DR and reduce features complexity. The evaluation results showed that GHA was better in reducing redundant features of datasets. In the future studies, effort will be focused on combination of the proposed parallel GHA in this work with other ML methods, as well as improving the performance of the latest datasets using some evolutionary optimization techniques. ACKNOWLEDGEMENTS The authors would like to thank ICCI, Informatics Institute for Postgraduate Studies, Iraqia University and Al Salam University College for their facilities, support and cooperation during this research;
  • 7. Bulletin of Electr Eng & Inf ISSN: 2302-9285  An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali) 3399 and Universitas Ahmad Dahlan to support this collaborative research. Special thanks to the anonymous reviewers for their valuable suggestions and constructive comments. REFERENCES [1] T. H. Davenport, P. Barth and R. Bean, “How Big Data Is Different,” MIT Sloan Manag. Rev., vol. 54, no. 1, pp. 22–24, 2012. [2] V. Chang, “An ethical framework for big data and smart cities,” Technological Forecasting and Social Change, vol. 165, 2021. [3] N. A. N. M. Idros, H. Mohamed and R. Jenal, “The use of expert review in component development for customer satisfaction towards E-hailing,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 17, no. 1. pp. 347–356, 2019, doi: 10.11591/ijeecs.v17.i1.pp347-356. [4] K. Anam, C. Avian and M. Nuh, “Multilayer extreme learning machine for hand movement prediction based on electroencephalography,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp. 2404–2410, 2020, doi: 10.11591/eei.v9i6.2626. [5] M. N. F. Jamaluddin, A. Ismail, A. A. Rashid and T. T. O. Takleh, “Performance comparison of Java based parallel programming models,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 16, no. 3. pp. 1577–1583, 2019, doi: 10.11591/ijeecs.v16.i3.pp1577-1583. [6] M. B. Swidan, A. A. Alwan, S. Turaev and Y. Gulzar, “A model for processing skyline queries in crowd-sourced databases,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 10, no. 2. pp. 798– 806, 2018, doi: 10.11591/ijeecs.v10.i2.pp798-806. [7] Mustakim, N. K. Sari, Jasril, I. Kusumanto and N. G. I. Reza, “Eigenvalue of analytic hierarchy process as the determinant for class target on classification algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 12, no. 3. pp. 1257–1264, 2018, doi: 10.11591/ijeecs.v12.i3.pp1257-1264. [8] S. Berhil, H. Benlahmar and N. Labani, “A review paper on artificial intelligence at the service of human resources management,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp. 32–40, 2019, doi: 10.11591/ijeecs.v18.i1.pp32-40. [9] W. A. Jbara, “Ear biometric verification approach based on morphological and geometric invariants,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1479–1484, 2020, doi: 10.11591/ijeecs.v20.i3.pp1479-1484. [10] H. A. Razak, M. A. M. Saleh and N. M. Tahir, “Review on anomalous gait behavior detection using machine learning algorithms,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 5. pp. 2090–2096, 2020, doi: 10.11591/eei.v9i5.2255. [11] M. M. Nasr, F. K. Kamel and Y. S. Abd ElWahab, “A survey on predicting oil spills by studying its causes using deep learning techniques,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 580–589, 2021, doi: 10.11591/ijeecs.v22.i1.pp580-589. [12] P. Chaudhury and H. K. Tripathy, “Optimising the parameters of a RBFN network for a teaching learning paradigm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 15, no. 1. pp. 435– 442, 2019, doi: 10.11591/ijeecs.v15.i1.pp435-442. [13] A. S. I. Hilaiwah, H. A. A. Abed Allah, B. A. Abbas and T. Sutikno, “Live to learn: learning rules-based artificial neural network,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 1. pp. 558–565, 2021, doi: 10.11591/ijeecs.v21.i1.pp558-565. [14] A. Labrinidis and H. V Jagadish, “Challenges and opportunities with big data,” Proc. VLDB Endow., vol. 5, no. 12, pp. 2032–2033, 2012. [15] Q. Shallal, Z. Hussien and A. A. Abbood, “Method to implement K-NN machine learningto classify data privacy in IoT environment,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 2. pp. 985–990, 2020, doi: 10.11591/ijeecs.v20.i2.pp985-990. [16] M. AbdullahAl-Hagery, M. AbdullahAl-Assaf and F. MohammadAl-Kharboush, “Exploration of the best performance method of emotions classification for arabic tweets,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 19, no. 2. pp. 1010–1020, 2020, doi: 10.11591/ijeecs.v19.i2.pp1010-1020. [17] E. Sutoyo and A. Almaarif, “Twitter sentiment analysis of the relocation of Indonesia’s capital city,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 4. pp. 1620–1630, 2020, doi: 10.11591/eei.v9i4.2352. [18] A. R. Lubis, M. K. M. Nasution, O. S. Sitompul and E. M. Zamzami, “The effect of the TF-IDF algorithm in times series in forecasting word on social media,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 2. pp. 368–376, 2020, doi: 10.11591/ijeecs.v22.i2.pp368-376. [19] E. S. Negara, R. Andryani and R. Amanda, “Network analysis of YouTube videos based on keyword search with graph centrality approach,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 2. pp. 172–178, 2020, doi: 10.11591/ijeecs.v22.i2.pp172-178. [20] M. N. Alraja, M. A. Hussein and H. M. S. Ahmed, “What affects digitalization process in developing economies? An evidence from smes sector in oman,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1. pp. 441–448, 2020, doi: 10.11591/eei.v10i1.2033. [21] N. H. M. Kadir and S. Aliman, “Text analysis on health product reviews using r approach,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 3. pp. 1303–1310, 2020, doi: 10.11591/ijeecs.v18.i3.pp1303-1310. [22] S. Sangam and S. Shinde, “Sentiment classification of social media reviews using an ensemble classifier,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16, no. 1. pp. 355–363, 2019,
  • 8.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402 3400 doi: 10.11591/ijeecs.v16.i1.pp355-363. [23] S. Manikam, S. Sahibudin and V. Kasinathan, “Business intelligence addressing service quality for big data analytics in public sector,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16, no. 1. pp. 491–499, 2019, doi: 10.11591/ijeecs.v16.i1.pp491-499. [24] B. Jabir, N. Falih and K. Rahmani, “HR analytics a roadmap for decision making: Case study,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 15, no. 2. pp. 979–990, 2019, doi: 10.11591/ijeecs.v15.i2.pp979-990. [25] M. A. B. W. Nordin, D. Vedenyapin, M. F. Alghifari and T. S. Gunawan, “The disruptometer: An artificial intelligence algorithm for market insights,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 8, no. 2. pp. 727–734, 2019, doi: 10.11591/eei.v8i2.1494. [26] O. A. Dawood, O. I. Hammadi, K. Shaker and M. Khalaf, “Multi-dimensional cubic symmetric block cipher algorithm for encrypting big data,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp. 2569–2577, 2020, doi: 10.11591/eei.v9i6.2475. [27] W. A. R. Wan Mohd Isa, A. I. H. Suhaimi, N. Noordin, A. F. Harun, J. Ismail and R. A. Teh, “Factors influencing cloud computing adoption in higher education institution,” Indonesian Journal of Electrical Engineering and Computer Science )(IJEECS), vol. 17, no. 1. pp. 412–419, 2019, doi: 10.11591/ijeecs.v17.i1.pp412-419. [28] S. Wilson and R. Sivakumar, “Twitter data analysis using hadoop ecosystems and apache zeppelin,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 16, no. 3. pp. 1490–1498, 2019, doi: 10.11591/ijeecs.v16.i3.pp1490-1498. [29] L. Y. Fang, N. F. M. Azmi, Y. Yahya, H. Sarkan, N. N. A. Sjarif and S. Chuprat, “Mobile business intelligence acceptance model for organisational decision making,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 7, no. 4. pp. 650–656, 2018, doi: 10.11591/eei.v7i4.1356. [30] P. D. Ibnugraha, L. E. Nugroho and P. I. Santosa, “An approach for risk estimation in information security using text mining and jaccard method,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 7, no. 3. pp. 393– 399, 2018, doi: 10.11591/eei.v7i3.847. [31] M. Z. H. Jesmeen et al., “A survey on cleaning dirty data using machine learning paradigm for big data analytics,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 10, no. 3. pp. 1234–1243, 2018, doi: 10.11591/ijeecs.v10.i3.pp1234-1243. [32] N. Prasanna Moorthi and Mathivananr, “A study about SOA based agriculture management data framework,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 9, no. 1. pp. 39–42, 2018, doi: 10.11591/ijeecs.v9.i1.pp39-42. [33] A. M. Saleh, H. Y. Abuaddous, O. Enaizan and F. Ghabban, “User experience assessment of a COVID-19 tracking mobile application (AMAN) in Jordan,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 23, no. 2. pp. 1120–1127, 2021, doi: 10.11591/ijeecs.v23.i2.pp1120-1127. [34] Hertina et al., “Data mining applied about polygamy using sentiment analysis on twitters in indonesian perception,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 4. pp. 2231–2236, 2021, doi: 10.11591/EEI.V10I4.2325. [35] T. A. Tran, J. Duangsuwan and W. Wettayaprasit, “A new approach for extracting and scoring aspect using SentiWordNet,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 3. pp. 1731–1738, 2021, doi: 10.11591/ijeecs.v22.i3.pp1731-1738. [36] I. S. Nasir, A. H. Mousa and I. L. Hussein Alsammak, “SMUPI-BIS: A synthesis model for users’ perceived impact of business intelligence systems,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 3. pp. 1856–1867, 2021, doi: 10.11591/ijeecs.v21.i3.pp1856-1867. [37] C. R. Pattnaik, S. N. Mohanty, S. Mohanty, J. M. Chatterjee, B. Jana and V. García-Díaz, “A fuzzy multi-criteria decision-making method for purchasing life insurance in india,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1. pp. 344–356, 2021, doi: 10.11591/eei.v10i1.2275. [38] N. S. Shaeeali, A. Mohamed and S. Mutalib, “Customer reviews analytics on food delivery services in social media: A review,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 4. pp. 691–699, 2020, doi: 10.11591/ijai.v9.i4.pp691-699. [39] A. S. Oh, “Smart urban farming service model with IoT based open platform,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 20, no. 1. pp. 320–328, 2020, doi: 10.11591/ijeecs.v20.i1.pp320-328. [40] N. M. Mahfuz, M. Yusoff and Z. Ahmad, “Review of single clustering methods,” IAES International Journal of Artificial Intelligence, vol. 8, no. 3. pp. 221–227, 2019, doi: 10.11591/ijai.v8.i3.pp221-227. [41] F. A. N. Rashid, N. S. Suriani and A. Nazari, “Kinect-based physiotherapy and assessment: A comprehensive review,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 11, no. 3. pp. 1176– 1187, 2018, doi: 10.11591/ijeecs.v11.i3.pp1176-1187. [42] Z. Faisal and N. K. El Abbadi, “Detection and recognition of brain tumor based on DWT, PCA and ANN,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp. 56–63, 2019, doi: 10.11591/ijeecs.v18.i1.pp56-63. [43] A. A. Vinaya, S. Yulianto, Q. A. M. O. Arifianti, D. Arifianto and A. S. Aisjah, “Machinery signal separation using non-negative matrix factorization with real mixing,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 4. pp. 1468–1476, 2020, doi: 10.11591/eei.v9i4.1956. [44] M. S. Abdul Razak and C. R. Nirmala, “A computing model for trend analysis in stock data stream classification,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 19, no. 3. pp. 1602–1609, 2020, doi: 10.11591/ijeecs.v19.i3.pp1602-1609.
  • 9. Bulletin of Electr Eng & Inf ISSN: 2302-9285  An effective classification approach for big data with parallel generalized … (Ahmed Hussein Ali) 3401 [45] K. Gangadharan, G. R. N. Kumari, D. Dhanasekaran and K. Malathi, “Detection and classification of various pest attacks and infection on plants using RBPN with GA based PSO algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1278–1288, 2020, doi: 10.11591/ijeecs.v20.i3.pp1278-1288. [46] K. Okokpujie, S. John, C. Ndujiuba, J. A. Badejo and E. Noma-Osaghae, “An improved age invariant face recognition using data augmentation,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 10, no. 1. pp. 179–191, 2021, doi: 10.11591/eei.v10i1.2356. [47] S. K. Addagarla and A. Amalanathan, “e-SimNet: A visual similar product recommender system for E-commerce,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 563–570, 2021, doi: 10.11591/ijeecs.v22.i1.pp563-570. [48] A. M. Martinez and A. C. Kak, "PCA versus LDA," in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 228-233, Feb. 2001, doi: 10.1109/34.908974. [49] M. A. Ahmed, R. A. Hasan, A. H. Ali and M. A. Mohammed, “The classification of the modern arabic poetry using machine learning,” TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 17, no. 5, pp. 2667–2674, 2019, doi: 10.12928/telkomnika.v17i5.12646. [50] I. Kamal, K. Housni and Y. Hadi, “Online dictionary learning for car recognition using sparse coding and lars,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 1. pp. 164–174, 2020, doi: 10.11591/ijai.v19i1.pp164-174. [51] B. Vijayalaxmi, C. Anuradha, K. Sekaran, M. N. Meqdad and S. Kadry, “Image processing based eye detection methods a theoretical review,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 3. pp. 1189– 1197, 2020, doi: 10.11591/eei.v9i3.1783. [52] M. Z. Alksasbeh et al., “Smart hand gestures recognition using K-NN based algorithm for video annotation purposes,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 1. pp. 242– 252, 2021, doi: 10.11591/ijeecs.v21.i1.pp242-252. [53] H. M. Salman, A. K. M. Al-Qurabat and A. A. R. Finjan, “Bigradient neural network-based quantum particle swarm optimization for blind source separation,” IAES International Journal of Artificial Intelligence (IJAI), vol. 10, no. 2. pp. 355–364, 2021, doi: 10.11591/ijai.v10.i2.pp355-364. [54] A. Parveen, Z. H. Khan and S. N. Ahmad, “Classification and evaluation of digital forensic tools,” TELKOMNIKA Telecommunication, Computing, Electronics and Control, vol. 18, no. 6, pp. 3096–3106, 2020, doi: 10.12928/telkomnika.v18i6.15295. [55] S. Xu et al., “The fuzzy comprehensive evaluation (FCE) and the principal component analysis (PCA) model simulation and its applications in water quality assessment of Nansi Lake Basin, China,” Environmental Engineering Research, vol. 26, no. 2, pp. 222–232, 2021, doi: 10.4491/eer.2020.022. [56] G. Gorrell and B. Webb, “Generalized hebbian algorithm for incremental latent semantic analysis,” Ninth European Conference on Speech Communication and Technology, 2005. [57] A. H. Ali and M. Z. Abdullah, “A parallel grid optimization of SVM hyperparameter for big data classification using spark Radoop,” Karbala International Journal of Modern Science, vol. 6, no. 1, article 3, pp. 1-18, 2020, doi: 10.33640/2405-609X.1270. [58] A. H. Ali and M. Z. Abdullah, “A novel approach for big data classification based on hybrid parallel dimensionality reduction using spark cluster,” Computer Science, vol. 20, no. 4, 2019, doi: 10.7494/csci.2019.20.4.3373. [59] R. Baklouti, M. Mansouri, M. Nounou, Z. Ben Messaoud and A. Ben Hamida, "Generalized Hebbian Algorithm for fault detection of CSTR model," 2016 2nd International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), 2016, pp. 421-424, doi: 10.1109/ATSIP.2016.7523127. [60] K. Song, B. Zhang, W. Li, L. Yan and X. Wang, “Research on parallel principal component analysis based on ternary optical computer,” Optik (Stuttg)., vol. 241, 2021, doi: 10.1016/j.ijleo.2021.167176. [61] M. K. Alsmadi, M. Tayfour, R. A. Alkhasawneh, U. Badawi, I. Almarashdeh and F. Haddad, “Robust feature extraction methods for general fish classification,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 6, pp. 5192–5204, 2019, doi: 10.11591/ijece.v9i6.pp5192-5204. [62] M. S. Al_Duais and F. S. Mohamad, “Improved Time Training with Accuracy of Batch Back Propagation Algorithm Via Dynamic Learning Rate and Dynamic Momentum Factor,” IAES International Journal of Artificial Intelligence, vol. 7, no. 4, pp. 170-178, 2018, doi: 10.11591/ijai.v7.i4.pp170-178. [63] M. Jupri and R. Sarno, “Data mining, fuzzy AHP and TOPSIS for optimizing taxpayer supervision,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 18, no. 1. pp. 75–87, 2019, doi: 10.11591/ijeecs.v18.i1.pp75-87. [64] S. Mohamed and A. Ezzati, “A data mining process using classification techniques for employability prediction,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 14, no. 2. pp. 1025–1029, 2019, doi: 10.11591/ijeecs.v14.i2.pp1025-1029. [65] E. B. B. Palad, M. J. F. Burden, C. R. Dela Torre and R. B. C. Uy, “Performance evaluation of decision tree classification algorithms using fraud datasets,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 6. pp. 2518–2525, 2020, doi: 10.11591/eei.v9i6.2630. [66] L. M. Padirayon, M. S. Atayan, J. S. Panelo and C. R. Fagela Jr., “Mining the crime data using naïve Bayes model,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 23, no. 2. pp. 1084– 1092, 2021, doi: 10.11591/ijeecs.v23.i2.pp1084-1092. [67] Y. Choubik, A. Mahmoudi, M. M. Himmi and L. El Moudnib, “STA/LTA trigger algorithm implementation on a seismological dataset using hadoop mapreduce,” IAES International Journal of Artificial Intelligence (IJAI), vol. 9, no. 2. pp. 269–275, 2020, doi: 10.11591/ijai.v9.i2.pp269-275.
  • 10.  ISSN: 2302-9285 Bulletin of Electr Eng & Inf, Vol. 10, No. 6, December 2021 : 3393 – 3402 3402 [68] D. A. Jasm, M. M. Hamad and A. T. H. Alrawi, “Deep image mining for convolution neural network,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 1. pp. 347–352, 2020, doi: 10.11591/ijeecs.v20.i1.pp347-352. [69] S. W. Kareem, R. Z. Yousif and S. M. J. Abdalwahid, “An approach for enhancing data confidentiality in hadoop,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 3. pp. 1547–1555, 2020, doi: 10.11591/ijeecs.v20.i3.pp1547-1555. [70] E. E. Abel, A. L. M. Shafie and W. H. Chan, “Deployment of internet of things-based cloudlet-cloud for surveillance operations,” IAES International Journal of Artificial Intelligence (IJAI), vol. 10, no. 1. pp. 24–34, 2021, doi: 10.11591/ijai.v10.i1.pp24-34. [71] S. Abed, L. Waleed, G. Aldamkhi and K. Hadi, “Enhancement in data security and integrity using minhash technique,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 21, no. 3. pp. 1739–1750, 2021, doi: 10.11591/ijeecs.v21.i3.pp1739-1750. [72] S. M. Mohammed, K. Jacksi and S. R. M. Zeebaree, “A state-of-the-art survey on semantic similarity for document clustering using GloVe and density-based algorithms,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 552–562, 2021, doi: 10.11591/ijeecs.v22.i1.pp552-562. [73] A. Joshi and S. D. Munisamy, “Enhancement of cloud performance metrics using dynamic degree memory balanced allocation algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 3. pp. 1697–1707, 2021, doi: 10.11591/ijeecs.v22.i3.pp1697-1707. [74] N. M. M. Sobran, M. M. Salmi, M. B. Bahar, M. N. Othman and S. H. Johari, “Fuzzy Takagi-Sugeno Method in Microcontroller Based Water Tank System,” International Journal of Robotics and Automation (IJRA), vol. 7, no. 1, pp. 1–7, 2018, doi: 10.11591/ijra.v7i1.pp1-7. [75] M. A. I. Al Jewari, A. Jidin, S. A. A. Tarusan and M. Rasheed, “Implementation of SVM for five-level cascaded H- Bridge multilevel inverters utilizing FPGA,” International Journal of Power Electronics and Drive Systems (IJPEDS), vol. 11, no. 3, pp. 1132-1144, 2020, doi: 10.11591/ijpeds.v11.i3.pp1132-1144. [76] M. A. Mohammed, I. A. Mohammed, R. A. Hasan, N. Ţăpuş, A. H. Ali and O. A. Hammood, "Green Energy Sources: Issues and Challenges," 2019 18th RoEduNet Conference: Networking in Education and Research (RoEduNet), 2019, pp. 1-8, doi: 10.1109/ROEDUNET.2019.8909595. [77] M. A. Mohammed, Z. H. Salih, N. Ţăpuş and R. A. K. Hasan, “Security and accountability for sharing the data stored in the cloud,” in 2016 15th RoEduNet Conference: Networking in Education and Research, 2016, pp. 1–5. [78] M. A. Mohammed and N. ŢĂPUŞ, “A novel approach of reducing energy consumption by utilizing enthalpy in mobile cloud computing,” Studies in Informatics and Control, vol. 26, no. 4, pp. 425–434, 2017, doi: https://doi.org/10.24846/v26i4y201706. [79] N. Q. Mohammed, M. S. Ahmed, M. A. Mohammed, O. A. Hammood, H. A. N. Alshara and A. A. Kamil, "Comparative Analysis between Solar and Wind Turbine Energy Sources in IoT Based on Economical and Efficiency Considerations," 2019 22nd International Conference on Control Systems and Computer Science (CSCS), 2019, pp. 448-452, doi: 10.1109/CSCS.2019.00082. [80] R. A. I. Alhayali, M. A. Ahmed, Y. M. Mohialden and A. H. Ali, “Efficient method for breast cancer classification based on ensemble hoffeding tree and naïve Bayes,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 18, no. 2, pp. 1074–1080, 2020, doi: 10.11591/ijeecs.v18.i2.pp1074-1080. [81] Z. H. Salih, G. T. Hasan and M. A. Mohammed, "Investigate and analyze the levels of electromagnetic radiations emitted from underground power cables extended in modern cities," 2017 9th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), 2017, pp. 1-4, doi: 10.1109/ECAI.2017.8166452. [82] Z. H. Salih, G. T. Hasan, M. A. Mohammed, M. A. S. Klib, A. H. Ali and R. A. Ibrahim, "Study the Effect of Integrating the Solar Energy Source on Stability of Electrical Distribution System," 2019 22nd International Conference on Control Systems and Computer Science (CSCS), 2019, pp. 443-447, doi: 10.1109/CSCS.2019.00081. [83] N. D. Zaki, N. Y. Hashim, Y. M. Mohialden, M. A. Mohammed, T. Sutikno and A. H. Ali, “A real-time big data sentiment analysis for iraqi tweets using spark streaming,” Bulletin of Electrical Engineering and Informatics (BEEI), vol. 9, no. 4, pp. 1411–1419, 2020, doi: 10.11591/eei.v9i4.1897. [84] M. Pradhan, “Evolutionary computational algorithm by blending of PPCA and EP-Enhanced supervised classifier for microarray gene expression data,” IAES International Journal of Artificial Intelligence (IJAI), vol. 7, no. 2. pp. 95–104, 2018, doi: 10.11591/ijai.v7.i2.pp95-104. [85] E. A. Gheni and Z. M. Algelal, “Human face recognition methods based on principle component analysis (PCA), wavelet and support vector machine (SVM): a comparative study,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 20, no. 2. pp. 991–999, 2020, doi: 10.11591/ijeecs.v20.i2.pp991-999. [86] P. V Kumar and K. M. Jeevan, “Face recognition with frame size reduction and DCT compression using PCA algorithm,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 168– 178, 2021, doi: 10.11591/ijeecs.v21.i4.pp168-178. [87] C. Darujati, S. M. Susiki Nugroho, D. Kurniawan and M. Hariadi, “Enhancing the feature-based 3D deformable face recognition using hybrid PCA-NN,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 22, no. 1. pp. 215–221, 2021, doi: 10.11591/ijeecs.v21.i4.pp215-221. [88] N. M. Hussien, Y. M. Mohialden, N. T. Ahmed, M. A. Mohammed and T. Sutikno, “A smart gas leakage monitoring system for use in hospitals,” Indonesian Journal of Electrical Engineering and Computer Science (IJEECS), vol. 19, no. 2, pp. 1048–1054, 2020, doi: 10.11591/ijeecs.v19.i2.pp1048-1054.