0% found this document useful (0 votes)
23 views

Crop Recommendation

The paper proposes a crop recommendation method using machine learning techniques. It introduces a Wrapper-PART-Grid approach that combines grid search, wrapper feature selection and PART classifier. It compares the approach to other ML models. The proposed method achieves 99.31% accuracy, highest among the approaches, in recommending suitable crops based on soil data.

Uploaded by

Mohammed T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Crop Recommendation

The paper proposes a crop recommendation method using machine learning techniques. It introduces a Wrapper-PART-Grid approach that combines grid search, wrapper feature selection and PART classifier. It compares the approach to other ML models. The proposed method achieves 99.31% accuracy, highest among the approaches, in recommending suitable crops based on soil data.

Uploaded by

Mohammed T.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/374144853

An effective crop recommendation method using machine learning techniques

Article in International Journal of Advanced Technology and Engineering Exploration · September 2023
DOI: 10.19101/IJATEE.2022.10100456

CITATION READS
1 1,107

2 authors:

Disha Garg Mansaf Alam


Jamia Millia Islamia Jamia Millia Islamia
7 PUBLICATIONS 41 CITATIONS 154 PUBLICATIONS 1,834 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Disha Garg on 24 September 2023.

The user has requested enhancement of the downloaded file.


International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)
ISSN (Print): 2394-5443 ISSN (Online): 2394-7454
Research Article
http://dx.doi.org/10.19101/IJATEE.2022.10100456

An effective crop recommendation method using machine learning techniques


Disha Garg* and Mansaf Alam
Department of Computer Science, Jamia Millia Islamia, New Delhi, India

Received: 12-November-2022; Revised: 17-May-2023; Accepted: 19-May-2023


©2023 Disha Garg and Mansaf Alam. This is an open access article distributed under the Creative Commons Attribution (CC
BY) License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

Abstract
The soil plays a vital role in agriculture, and soil testing serves as the initial step in determining the optimal nutrient
levels for cultivating specific crops. Machine learning (ML) classification techniques can leverage soil nutrient data to
recommend suitable crops. The Wrapper-PART-Grid approach, which incorporated crop recommendation data to suggest
appropriate crops, was introduced in this paper. This hybrid method combined the grid search (GS) method for
hyperparameter optimization, wrapper feature selection strategy, and the partial C4.5 decision tree (PART) classifier for
crop recommendation. The proposed approach was compared with other ML techniques, including multilayer perceptron
(MLP), instance-based learning with parameter k (IBk), C4.5 decision tree (CDT), and reduced error pruning (REP) tree.
Evaluation metrics such as true positive rate, false positive rate, precision, recall, F1-score, root mean squared error
(RMSE), and mean absolute error (MAE) were employed to assess these models. The suggested method demonstrated
superior reliability, accuracy, and effectiveness compared to other ML models for crop advisory purposes. This method
attained a remarkable accuracy rate of 99.31%, the highest among all the approaches considered. In this paper, a ML-
based crop recommendation technique aimed at assisting farmers in enhancing their knowledge of cultivating
appropriate crops. The technique not only seeks to reduce overall wastage but also aims to increase crop yield and
improve crop quality.

Keywords
WEKA tool, PART, ML, Smart farming, Crop recommendation, Feature selection, IoT.

1.Introduction In recent years, there have been drastic climatic


One of the primary strengths of the national and changes occurring because of global warming [58].
international economy has recently been highlighted The selection of inappropriate crops has a
as agriculture [1]. There are a variety of crops, but tremendous impact on farmer's hopes and dreams
the quality of crops, productivity, and yield forecast because it uses up all available resources (such as the
have all raised concerns for the future of agriculture cost of seeds, fertilizers, etc. Using machine learning
[2]. Digital technology has reduced the need for (ML) as a key technology, traditional farming can be
manual labor in agriculture, leading to increased reshaped. This research aims to introduce ML-based
productivity, better living standards, and more people crop suggestion system for farmers, hoping to use
working in the field [3]. Nowadays agriculture has this information to produce more productive and
developed a lot in India. Precision agriculture has higher-quality crops with less waste.
achieved better enhancements and is important in
recommending crops. The recommendation of crops ML recommends suitable crops using various
is dependent on various parameters. The first and mathematical or statistical methods. By employing
most crucial phase in farming is the Prediction of soil these methods, we can advise the farmer on the most
properties like pH, humidity, temperature, nitrogen suitable crop to grow in his particular agricultural
(N), phosphorus (P), and potassium (K). These are region, helping him to maximize his profits. To help
directly related to the geographical and climatic farmers make informed decisions about what to
conditions of the area being utilized [46]. grow, crops are classified according to the nutrients
they contain. Classification is ML technique that has
enormous potential for the farming sector. Different
classifiers are currently available for this purpose [9].
*Author for correspondence Classification uses training data to categorize new
498
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

observations. However, it is impossible to say which  Experiments were conducted to evaluate the
is best because it relies on the application and the effectiveness of the method and compare the
dataset. Analyzing a collection of training data is results with other approaches.
initially required before employing a classification
technique. The training data predict the relation An overview of the related work on the topic is
between the features and the class label. provided in Section 2. Section 3 includes the
methodology, dataset preparation, preprocessing, the
The novelty of the present study is to produce the proposed method, feature selection, data analysis, k-
Wrapper-PART-Grid method in this paper. The fold cross-validation, and hyperparameter
wrapper algorithm selects appropriate features from optimization. Section 4 presents the experimental
the collected data, and the Partial C4.5 decision tree study and result analysis. Section 5 is dedicated to the
(PART) algorithm is used for classifying crops in the discussion of the results and their interpretation.
proposed prediction technique. The wrapper method Finally, in Section 6, the paper concludes.
uses the grid search (GS) algorithm to examine the
combination of all feasible features and choose the 2.Related literature
subset that performs best for a given ML algorithm, Previous literature offered numerous works that may
known as Wrapper-PART-Grid algorithm. The be used to predict crops for the user. However, most
objective of this study is to suggest optimal crops of the study is not focused on various soil factors.
using input variables such as soil pH, humidity, This makes it necessary to enhance the effectiveness
temperature, nitrogen (N), phosphorus (P), and of crop prediction and recommendation systems so
potassium (K) levels. Then, based on the predicted that they can match the soil characteristics and
future yields of different crops, including rice, kidney climate circumstances in a better way. A model was
beans, maize, chickpeas, pomegranate, pigeon peas, used to analyze the sufficient amounts of soil
moth beans, black gramme, lentil, banana, mango, nutrients, including nitrogen, potassium, and
grapes, watermelon, mungbean, muskmelon, apples, phosphorus, and advise the crops that should be
oranges, papayas, coconuts, cotton, jute, and coffee, grown in the future. The crops in [10] were predicted
the most suitable crop is suggested using various ML using a neural network, and the accuracy was
model. 89.88%. This paper predicts suitable crops, but crop
rotation has not been thoroughly studied in this study.
The crop recommendation dataset was utilized for the The study has suggested a technique to help farmers
experiment. The experiment is divided into two main choose crops by considering all the variables,
sections. Firstly, feature selection is performed to including soil type, sowing season, and geographic
find the best features because it is well-recognized location. The suggested method considers soil
that different features can have varying effects. Then, properties like soil type, pH value, and nutrient
we assess our approach using the selected features on concentration, as well as climatic factors like rainfall,
different ML models after applying hyperparameter temperature, and geographic location in terms of the
tuning. Finally, the results were compared using state when recommending a suitable crop to the user.
standard metrics, i.e., accuracy, precision, recall, F1- Various ML algorithms were used, but the results are
score, root mean squared error (RMSE), mean not promising.
absolute error (MAE), and confusion matrix. This
approach performed better than conventional ML Similarly, a Naive Bayes algorithm incorporating the
methods. soil's temperature, humidity, and moisture as crucial
variables were suggested for crop recommendation
The main contribution are as follows: [11]. By utilizing ML, one of the most cutting-edge
 An efficient system for agricultural crop technologies in crop prediction, this research helps
recommendation was proposed, utilizing ML beginner farmers with a method that directs them to
techniques. sow good crops. Furthermore, a supervised learning
 The Wrapper-PART-Grid method was introduced method called Naive Bayes suggests how to do it.
for classifying agricultural data to provide crop The prediction accuracy of these models must be
recommendations. increased. In order to analyze the many soil
 To optimize the models for crop recommendation, properties and recommend the crop for cultivation,
the optimal parameters were identified using grid another ML approach was proposed [12]. They used
hyperparameter optimization. k-nearest neighbor (KNN) algorithms, but prediction
was based only on soil properties.
499
Disha Garg and Mansaf Alam

Random forest (RF) method was used to predict crop Different factors like N, P, K, pH, temperature,
yields in the agricultural sector [13]. The RF method humidity, and rainfall to advise the crops were
provides the optimal crop production model by discussed [19]. The dataset consists of 2200 instances
considering the fewest number of models possible. and eight features. The best model is created by
The results indicate that crop production prediction is utilizing ML algorithms in Waikato environment for
beneficial in the agricultural sector. knowledge analysis (WEKA). The ML algorithms
chosen for classification are decision tree classifiers,
A winter wheat prediction model was proposed by multilayer perceptron, and rule-based classifiers.
estimating the characteristics of the soil using online They have not evaluated the feature importance in
soil spectroscopy and a prototype sensor [14]. The this study.
model used A self-organizing map with supervised
Kohonen networks, XY-fused networks, and artificial Priya and Yuvaraj [20], deep learning algorithms like
neural networks based on counter-propagation. Even artificial neural network (ANN) are used to produce
though the technique yields valuable data, studying precise crops at the appropriate times. By providing
the parameters related to the soil will not be sufficient inputs like moisture, temperature, pH, and humidity
to maximize crop productivity. Crop prediction utilizing a sensor network and the Internet of Things,
depends on various variables, so feature selection is a deep neural network and graphical user interface
crucial. In order to predict crops utilizing different are used to forecast crops. Farmers can choose crops
classifiers using soil attributes and environmental to cultivate with the help of crop ideas.
data, such as rainfall, season, texture, and
temperature, a comparative evaluation of several In [21], internet of things (IoT) and ML system were
feature selection approaches was conducted [15]. suggested, which uses sensors to allow soil testing. It
is based on measuring and observing soil properties.
Suresh et al. developed a system for crop This method reduces the likelihood of soil
classification based on specific data. Increased deterioration and supports crop vitality. This system
precision and productivity were attained by utilizing uses many sensors to monitor temperature, humidity,
a support vector machine (SVM). The sample dataset soil moisture, pH, and nitrogen, phosphorus and
for location data and the sample dataset for crop data potassium (NPK) nutrients of the soil. These sensors
were the two datasets that were the target of this include soil temperature, soil moisture, pH, and
investigation. With this proposed approach, specific others. They have considered all the features and
crops, including rice, black gram, maize, carrot, and have not analyzed feature importance in this study.
radish, were advised based on the availability of the Also, hyperparameter optimization is not applied to
specific nutrients, i.e., N, P, K, and pH [16]. the input parameters.

Kulkarni et al. [17] suggested a technique to In [22], a recommendation system using an ensemble
accurately recommend the best crop based on the model with majority voting methods employing
kind and features of the soil, such as the average random trees, chi-squared automatic interaction
rainfall and surface temperature. The ML algorithms detection (CHAID), KNN and Naive Bayes as
used by this suggested system included linear SVM, learners to suggest a good crop based on soil data
RF, and Naive Bayes. This crop recommendation with high specific accuracy and efficacy was
algorithm classified the input soil dataset into the suggested. However, there was no result comparison
recommended crop types, Kharif and Rabi. Applying or analysis in this study, and there was no feature
the suggested approach produced a 99.91% accuracy importance evaluation.
rate.
In [23], the best crop prediction model that can assist
The study in [18] accurately compares many ML farmers in selecting the right crop to produce based
algorithms to determine the crop's recommended on local climate factors and soil nutrient levels was
yield, with an overall improvement over multiple identified. This article contrasts two widely used
other techniques of 3.6%. The resultant work assists criteria, Gini and Entropy, for algorithms like KNN,
agronomists in making the proper crop selections for decision tree, and RF classifier. Findings show that
farming. Furthermore, the crops' output will increase RF has the best accuracy. Further, features should be
exponentially. As a result, increasing India's income analyzed to determine the most effective features to
in the process. recommend crops.

500
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

AgroConsultant was introduced, as a smart system In [26], a multiclass soil fertilizer recommendation
designed to help Indian farmers choose the best crops system for paddy fields was developed. In addition,
for their regions [24]. During the planting season, his the SVM parameters are tuned using various
farm's location, soil, and climatic elements like optimization techniques, such as the genetic
temperature and rainfall are crucial. In the future, algorithm and particle swarm optimization.
crop rotations can be predicted.
Using a preliminary set on a fuzzy approximation
In [25], a recommendation system is proposed that space and a neural network, the authors of [27] could
uses Arduino microcontrollers to collect data on the estimate the crop's suitability in the Vellore District.
surrounding environment, ML techniques like Naive In [28], the most optimal crops for the current climate
Bayes (Multinomial) and SVM, K-means clustering, are predicted. Given the variables mentioned above,
and natural language processing (sentiment analysis) the study presented here gives farmers a more
to make recommendations about what to plant. The accurate idea of what crops to put where in their
neural network achieved the highest accuracy fields. An overview of some significant studies on
(98.8%) among all algorithms. various prediction models is given in Table 1.

Table 1 An overview of significant studies


Methods used Advantage Limitations Accuracy Reference
Neural Network Helps in identifying suitable Do not consider environmental factors 91% [25]
crops
Gradient Descent Considers soil factors No comparison of results with other 97% [15]
classifiers
Neural Network There is not a thorough study on crop 89.88%
Predicts suitable crops rotation available [10]
Naïve Bayes classifier Uses environmental No detailed result analysis is given 97% [11]
factors
Regular feedback is taken No comparison of results with other 95% [26]
Neural networks from the farmers classifiers
Extreme learning machine Improved classification result No comparison of results with other -- [28]
classifiers
Supervised self- Provides better results Do not use the climate to predict yields [14]
organizing maps or other variables. 81.65%
KNN classifier Displays solid efficiency Only soil properties are used for --- [7]
predicting crops
Fuzzy approximation Improved classification Do not consider crop predecessors 93.2% [13]
accuracy
Regression-based Ideally suited for primary No evaluation of alternative classifier 94.78%. [14]
ensemble crops models
Ensemble model Provides improved Prediction Compares poorly to other classifiers 96.69% [29]
using random forest and
XGBoost
Majority voting scheme This approach helps Feature analysis is not done Overall [18]
agronomists in selecting the improvement
best crop for their fields. 3.6%
Multilayer Perceptron Based on current Preprocessing and feature analysis is 98.2273% [20]
environmental parameters, the not done
smart module will provide
irrigation and yield
recommendations for the crop

Based on our literature analysis, most crop determining the significance of the traits for crop
suggestion and Prediction methods use ML recommendation. The main difficulties are finding
techniques such as decision trees, SVM, ANN, RF, high-quality publically available datasets, selecting
logistic regression, KNN, and others [20]. The the best features, and choosing the best algorithms.
findings of these researches are needed to improve The literature study found that current comparisons
because very few studies have concentrated on of artificial intelligence (AI) algorithms for crop
501
Disha Garg and Mansaf Alam

recommendation are still lacking in obtaining reliable proposed system is illustrated in Figure 1, and a
results. detailed flow diagram is shown in Figure 2.

3.Proposed methodology 3.2Dataset preparation and preprocessing


The concepts and materials utilized for this A crop recommendation dataset was used with 2200
experiment are described to make the proposed records and seven parameters (N, P, K, Temperature,
methodology more easily readable and clear. Humidity, pH, and Rainfall). The required soil
content was determined for each crop to understand
3.1Proposed system the data's nature better. Cross-validation was carried
In the proposed system, the dataset is processed, and out after splitting the dataset into a training set and a
features are chosen. After choosing the relevant validation set. Data was obtained from Kaggle [30].
features, these were given as input to the ML models. Table 2 gives the summary of the dataset used in this
In order to improve the model's effectiveness, the GS. work. This dataset was chosen to train the model
performs parameter tuning. In order to build our ML because it has parameters crucial for crop suggestion,
model, we used various ML classification algorithms, such as humidity, temperature, rainfall, pH, nitrogen,
such as IBk, multilayer perceptron, C4.5 decision tree phosphorous, and potassium requirement ratio.
(CDT), reduced error pruning (REP) tree, and partial Temperature, humidity, rainfall, nitrogen, potassium,
decision tree (PART) algorithms. The best features and phosphorus values are specific to each crop. The
are extracted through hyperparameter optimization. attributes in the crop recommendation dataset do not
After the models had been built, a performance have any empty fields. After confirming that there are
assessment of these models was done using no missing values, the data type of the attributes
performance metrics. The block diagram of the (int64) is determined, and labels are listed.

Model Evaluation

Hyperparameter selection
using Grid method

Classification using
PART method

Feature Selection using


Wrapper method

Preprocessing

Crop
recommendation
dataset

Figure 1 Schematic of the developed methodology

Table 2 Dataset description


Parameters Crop to be recommended (Label)
N is the ratio of Nitrogen content in soil - kg/ha Rice, maize, jute, cotton, coconut, papaya, orange, apple,
P is the ratio of Phosphorous content in soil - kg/ha muskmelon, watermelon, grapes, mango, banana,
K is the ratio of Potassium content in soil - kg/ha pomegranate, lentil, blackgram, mungbean, mothbeans,
Temperature is the temperature in degree Celsius pigeonpeas, kidneybeans, chickpea, coffee
Humidity is the relative humidity in %
pH value of the soil
Rainfall is in mm

502
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

The data must be perceptually prepared before or noisy data. PCA uses an orthogonal transformation
applying ML models to analyze the experimental to turn samples of correlated variables into samples
study. The input features are normalized this way of linearly uncorrelated features. The degree of
because ML models cannot effectively train and test feature redundancy is considered while searching for
the non-uniform distribution of real-world farming feature subsets using correlation-based feature
data collected by sensors. According to the selection. The objective of the evaluation technique is
dataset, attributes like N, P, and K values of soil play to identify subsets of features that are individually
a significant role from a biological point of view highly correlated with the class but have low inter-
because these are the primary macronutrients for correlation. IG calculates the difference in Entropy
crops. These macronutrients' primary contributions between the presence and absence of a feature. The
can generally be divided into the following difficulty of determining the significance of a feature
categories: inside a feature space is addressed here by using
N—Nitrogen is primarily in charge of the plant more generic techniques, such as the measurement of
leaves growing. informational entropy [32].
P—Phosphorus is essential for growing roots,
flowers, and fruits. Additionally, filtering methods give each feature a
K—Potassium performs the overall functions of the score before selecting the features having the highest
plant efficiently. scores [33]. It shows how closely and differently each
feature matches the output labels. In this study, the
3.3Feature selection effective environment indicators are chosen using a
Feature selection is an essential preprocessing step wrapper selection strategy. The comparison of the
that resolves the issues with large dimensionality in wrapper feature selection technique to the other
many ML applications. First, a subset of features feature selection strategies is demonstrated in Table
from the available data must be chosen to use a 3.
learning algorithm. Feature selection selects the most
significant features from the initial complete feature Table 3 Number of selected features
set and removes the irrelevant, redundant, and noisy Algorithm for Selected Selected features
ones based on an assessment criterion, narrowing the feature selection features
feature set to those most significant or pertinent to the count
ML model [29]. PCA 6 Temperature, pH, P, k,
humidity, N
Correlation 7 Temperature, pH, N, P,
Different features, i.e., N, P, K, rainfall, temperature,
k, humidity, rainfall
humidity, and pH, can be selected to find and suggest IG 7 Temperature, pH, N, P,
the best crop. The dependent variable in this k, humidity, rainfall
experiment is the name of the various crops. Our Wrapper 5 K, N, P, humidity,
proposed method considers N, P, K, temperature, rainfall
humidity, pH, and rainfall-independent variables. In
this phase, various feature selection approaches, i.e., 3.4Resampling
filter methods and wrapper methods such as principal After appropriate feature selection, resampling is
component analysis (PCA), correlation analysis, and done. A few classes (also known as majority classes)
information gain (IG), are applied to the dataset. Both frequently occupy the majority of instances in real-
methods were used to identify the most beneficial world data, whereas many other classes (also known
indicators for the agricultural system. By analyzing as minority classes) have few instances. It is called a
and choosing features independently of any learning class-imbalanced classification problem. A common
algorithm, filter techniques rely on the features of the technique for balancing class distributions is
datasets to assess their importance [31]. Wrapper resampling [34]. It consists of removing samples
approach evaluates all potential feature combinations from the majority class (under-sampling) and adding
and chooses the one that produces the best outcome more examples from the minority class (over-
for a particular ML technique. Wrapper techniques sampling). Resampling alters class distributions using
choose fewer features to maximize the effectiveness two well-known techniques known as cross-
of the learning process [32]. The potential for these validation and bootstrapping.
strategies to be generalized is thus limited. A simple
nonparametric method called PCA is used to extract Several models are fitted to a portion of data using
the most crucial information from a set of redundant the resampling approach known as cross-validation,
503
Disha Garg and Mansaf Alam

and the model is then tested on a different subset of on new and unseen data. In other words, the model
data. While trying to make accurate predictions, would be trained using 80% of the training data and
resampling throughout the training phase was crucial evaluated using 20% of the test data for which we
since it made it possible to determine which already know the real truth. Then, using 20% of the
algorithms generalized best depending on our data. data, we compared this ground truth with the model
Also, it considerably uses the process of prediction. We then check how well our model would
hyperparameter tuning, which involves modifying work with hypothetical data. It is the initial method
specific parameters of algorithms to improve of model evaluation [36]. However, the train-test split
outcomes [35]. The commonly used variations on has disadvantages. Because this method reduces the
cross-validation are train/test split, leave one out quantity of the train data and does not use all our
cross-validation (LOOCV), k-fold cross-validation, observations for testing, it creates bias. To solve this
etc. LOOCV divides the samples n times, where n is issue, we periodically split the data into training and
the sample count. Although it is similar to k-fold testing using a method known as cross-validation
cross-validation, the main distinction is that n (CV) [37]. As a result, the authors lessen the bias that
different data splits are carried out. Simple cross- the train-test split introduced. For this experiment, the
validation uses well-known k values (5 and 10) to k-fold CV approach was used. Each fold out of the
reduce complexity [35]. Train-test split typically training set was taken to create a model using the
divides the dataset into training and test data in an other folds, and then conduct testing on the excluded
80:20 ratio and mimics how a model would perform data. It is referred to as the k-fold CV method [38].

Figure 2 Workflow of the Proposed Wrapper-PART-Grid based crop recommendation system

3.5Classification algorithms prediction. We were mainly interested in multiclass


This study applied different ML-based classifiers on classifiers, which is why the following classifiers
the crop recommendation dataset to evaluate overall were selected randomly: multilayer perceptron,
performance and identify the best classifier for crop
504
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

Instance-based learning with parameter k (IBk), Recurse until the maximum allowable depth is
CDT, REP tree, and PART. reached for subsets DLEFT (θ*) and DRIGHT (θ*)
 IBk: WEKA uses the KNN method with its IBk Nm < min samples or Nm = 1
algorithm. This paper used The IBk approach with Values 0, 1… K-1 is assigned to the classification
k = 1 and k = 3 [39]. result for node m, which represents the region R m
 Multilayer perceptron: A multilayer perceptron with the observations as Equation 5:
has one or more hidden layers whose neurons are Pmk=1/Nm ∑Ai € Rm I (Bi=K) (5)
called hidden neurons. This algorithm can be used
for non-separable problems [40]. Be the proportion of class k observations in node m.
 CDT: The decision tree is built using the J48 The standard measure of impurity is Gini, as shown
method, from its root down to its leaf nodes. in Equation 6:
Starting at the tree's root and progressing through I (Am) = ΣkPmk (1−Pmk) (6)
it until we reach a leaf node, which offers the
classification of the instance, we may get the class Furthermore, Entropy, as shown in Equation 7:
label for a test item from a decision tree [41]. I (Am) = −Pmk logPmk (7)
 REP Tree: A decision tree is built through IG and
pruned using reduced-error pruning [41]. Moreover, misclassification is shown in Equation 8:
 PART: This technique generates rules by I (Am) = 1−MAX (Pmk) (8)
repeatedly building partial decision trees from data
collection. Because of this, the algorithm is known Here, Am represents the training data at node m
as PART. PART is C4.5's extended version [42].
3.6Metrics for comparative analysis
PART algorithm outperformed the others in terms of The efficiency of numerous supervised ML
several different metrics. Witten et al. [43] proposed algorithms was analyzed and compared. The
a separate-and-conquer rule learner. The algorithm following are crucial parameters employed in this
produces decision trees, which are ordered sets of phase:
rules. The item is given the category of the first (9)
matching rule when a new set of data is compared to
each rule in the list. Each iteration of the PART (10)
classifier creates a partial CDT, with the best leaf (11)
being a rule. The method combines rule learning with
C4.5 and RIPPER. (12)
Using training vectors Ai ∈ Rn, i=1... l and a label
vector B∈ Rl, a decision tree recursively splits the ∑ ( ) (13)
space to group samples with similar labels.
Let D as a representation of the data at node m. Then,
data is divided into Subsets DLEFT(θ) and DRIGHT(θ) for √∑ ( ) (14)
each candidate split with the formula θ = (j, tm) with
feature j and threshold tm as shown in Equation 1 and
Equation 2: Accuracy is a measurement of how closely a
DLEFT (θ) = ((A, B) |Aj<tm) (1) predicted value corresponds to the actual value, as
DRIGHT(θ)= (D|DLEFT (θ)) (2) determined by the % of cases that were adequately
identified. The ratio of correctly classified instances
Depending on the problem being solved, Impurity true positives (TP) and true negatives (TN) over the
function I is used to calculating the impurity at m total predictions, including TP, TN, and wrong
(Classification or regression) as illustrated in predictions, false positives (FP) and false negatives
Equation 3: (FN) is known as accuracy Equation 9. Precision
G (D, θ) = (nLEFT/Nm) I (D LEFT (θ)) + (nRIGHT /Nm) I evaluates how accurate examples with positive labels
(D RIGHT (θ)) (3) are Equation 10. How many instances of the positive
class were correctly identified, or how precisely
Choose parameters to reduce the impurity in positive examples were classified, is measured by
Equation 4: recall Equation 11. The harmony and balance are
θ*=argminθ G (D, θ) (4) measured by the f-measure Equation 12 [44]. The
amount of misclassifications or errors in the model's
505
Disha Garg and Mansaf Alam

Prediction is measured using MAE Equation 13. All research communities working on supervised and
When MAE values are comparable, the RMSE rate unsupervised learning approaches can use the open-
determines which classification method is superior source WEKA tool. This tool works well with ML
Equation 14. Finally, the similarity level between two approaches with Java platform implementation [46].
or more variables is evaluated using Cohen's Kappa. Furthermore, experiments involving ML techniques
Equation 15 can be used to express the equation written in Python are implemented using WEKA.
resulting from Cohen's Kappa evaluation as follows: WEKA on Windows 10, equipped with an Intel Core
)/ (1-Pe) (15) i7-8665U CPU@ 4.80 GHz processor (8.00 GB
RAM).
P0 is the total diagonal proportion of the observation
frequency, Pe is the total marginal proportion of the Various classification approaches have been used to
observation frequency, and k is the kappa coefficient select a crop, including IBk, multilayer perceptron,
value. The Cohen's kappa coefficient's value can be D.T., REP TREE, and PART. The wrapper algorithm
understood in terms of the degree of agreement: poor selects appropriate features and the most beneficial
≤ 0.20; fair = 0.21–0.40; moderate = 0.41–0.60; good environmental indicators for the PART classification
= 0.61–0.80; very good = 0.81–1.00. The values of algorithm. The wrapper method uses the GS.
the kappa statistic are above .90, indicating good. algorithm to examine the combination of all feasible
features and choose the subset that performs best for
Determining which model will provide the fastest a given ML algorithm. The PART algorithm is used
results is essential, so researchers calculated the time for classifying crops in the proposed prediction
in seconds taken by each algorithm. This value technique, "Wrapper-PART-Grid."
represents the time required to train the model.
Decision trees and other common ML approaches 4.2 Results analysis
exhibit a bias in favor of the majority class and tend The IG ranking, correlation, PCA, and wrapper
to neglect the minority class. They frequently ranking filters are the four types used in this study.
misclassify the minority class relative to the majority These filters were applied to the dataset to determine
class because they tend to forecast the majority class which feature combination is more important for
exclusively. The confusion matrix is also used to classification models, as demonstrated in Figure 3. In
evaluate how well a classification algorithm is accordance with the ranks, the wrapper ranking filter
performing. The confusion matrix provides a chose fewer attributes, and it discovered that the five
comparison between actual and expected values. It is most significant attributes are rainfall, humidity, N,
utilized to improve ML models. N is the number of P, and K. Undesirable features (Temperature and pH)
classes or outputs, and N is the size of the confusion were removed based on the returned ranking of the
matrix. We obtain a 2×2 confusion matrix for two features to maximize the performance of the models
classes. We obtain a 3×3 confusion matrix for three by updating the dataset. The wrapper method has
classes. The confusion matrix, which displays each selected minimum and valuable features. The
class's accurate and inaccurate predictions, may be outcome demonstrates that (as shown in Figure 4)
used to assess the outcomes. The first row's first filter methods are less reliable than wrapper feature
column shows how many classes "True" were selection techniques as they identified the relevant
accurately predicted, whereas the second row shows features only.
how many classes "True" were incorrectly predicted.
All class "False" items in the second row were In the selected dataset, 2200 instances are available.
expected to be class "Yes." Therefore, the higher the Suppose k = 10, 2200/10 = 220 observations would
diagonal values of the confusion matrix, the better the be in each fold. K-fold CV determines test accuracy
correct Prediction [45]. by using fold-1 (220 samples) as the testing set and
k-1 (9 folds) as the training set. The method is
4.Experimental study and result analysis repeated k times, or ten times if k = 10. Every time, a
4.1 Experimental environment distinct collection of observations is used as a
In this experimental investigation, the ML method for validation/test set. K-test accuracy predictions are
crop recommendation is implemented using WEKA. produced as a result of this method and then averaged
[47].

506
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

pH ph
Humidity temperature
Temperature K
K rainfall
P P
N humidity
0 0.2 0.4 0.6 0.8 N
0 0.05 0.1 0.15 0.2

a)Feature evaluation using PCA ranking filter b) Feature evaluation using correlation ranking filter

ph Rainfall
temperature
Humidity
N
P K
K P
rainfall N
humidity
0 0.2 0.4 0.6 0.8 1 1.2
0 1 2 3

c) Feature evaluation using Information Gain ranking d) Feature evaluation using wrapper filter ranking
filter
Figure 3 Feature evaluation

4.2.1 Hyperparameter tuning improved using this strategy. In the proposed method,
This section discusses the outcomes of the modeling- GS hyperparameter optimization is used. GS
related experiments, including the hyperparameter identifies the ideal hyper-parameters for a model or
tuning and the parameters set for each experiment. those that produce the most "correct" predictions. GS
The ideal configuration used throughout the examines every possible set of hyper-parameter
modeling phase may significantly improve the combinations. When using GS, the user defines a
performance of an algorithm. The comparative finite set of values, and the system evaluates the
analysis of the algorithms that were created is also cartesian product of those values. GS cannot fully
included in this part. This study was done to decide utilize the productive areas on its own [48]. GS
which supervised ML method would be most algorithm is based on brute force. By doing so, a
effective in crop recommendation. All the studies thorough search for a specific subset of the
employed 10-fold cross-validation and a batch size of hyperparameter space is made. Use a different
100 to assess algorithm performance. algorithm if the search space is too big. Random
search is faster but does not always guarantee the
Choosing a set of ideal hyper-parameters is known as most significant outcome [49].
hyper-parameter tuning. Before beginning the ML
task, the model value of the hyper-parameter is fixed. A three-hidden-layered ANN with ten hidden units in
In ML approaches, the hyper-parameter adjustment each layer made up the multilayer Perceptron (MLP)
has a significant impact. First, the data is protected classifier. Experimental decisions were made on the
from the model parameters. Then, tuning of the number of levels and hidden units in each layer.
hyper-parameters is done to achieve the optimum fit. Rectified linear unit (ReLU) served as the activation
Given the complexity of the problem, GS and random function of the hidden layer. The hyperparameter
search methods are utilized to find the optimum setting of MLP is shown in Table 4:
hyperparameter. The accuracy of the ML classifier is

507
Disha Garg and Mansaf Alam

PART

REP

CDT Wrapper
Information Gain
Correlation
Multiclass classifier
PCA

Multilayer perceptron

IBK

0 20 40 60 80 100 120
Accuracy
Figure 4 Accuracy of the classification algorithm using various feature selection techniques for selected features

Table 4 Multilayer Perceptron hyperparameter optimization results


Training Number of hidden layers Activation function Initial learning rate Accuracy%
For hidden layers
1 3 ReLU .001 98.2
2 6 Tanh .01 98
3 5 ReLU .05 97.21

It is one of the most important factors when creating choices (True/False) for the PART method. For
a neural network. Choosing the optimal learning rate PART, four experimental trainings were conducted.
might be challenging if it is too low. As a result, the The unpruned parameter was set to true for the first
training process may be slowed down. Nevertheless, two trainings, and the corresponding confidence
the model may not be optimized appropriately if the factors were 0.25 and 0.50. In this experiment, the
learning rate is too high. How successfully the model performed 99.3% and 99%, respectively.
network model learns the training dataset will depend However, when the unpruned parameter was set to
on the activation function used for the hidden layer. false in the most recent two trainings, the model's
Hidden layers are only needed in artificial neural accuracy reached 98.32% for the 0.25 confidence
networks when non-linear data separation is factor and 97.21% for the 0.50 confidence factor.
necessary. The highest accuracy is achieved as 98.2 Therefore, the unpruned option was set to True to
for training 1. show that no pruning is performed.

The IBk or KNN algorithm's hyperparameter tuning In Table 7, the parameter was used as the criterion
is done by selecting the number of neighbors and the and maximum depth. This criterion sets the standard
distance function in Table 5. In order to avoid either by which the impurity of a split is evaluated.
overfitting or underfitting, several values of k must Although "Gini" is the default parameter for
be considered while defining it. Larger values of k measuring impurity, "entropy" is another option. It is
may result in solid bias and low variance, whereas decided to keep the criterion as a Gini index and
smaller values of k may have high variance but low maximum depth as a minimum (5) and achieved an
bias. The distance measure makes finding the closest accuracy of 98.4%. One of the reasons for overfitting
train data points with known classes easier. The best in decision trees is allowing the tree to grow too
accuracy of 98.1% is achieved for training 2. deep, resulting in a more complicated model due to
the increased number of splits and the more data
Table 6 presents the outcomes obtained by employing captured. Bag size hyperparameter values of 100, 40,
WEKA's confidence factor and using unpruned and 20 are shown in Table 8. The accuracy was
508
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

97.4% and 96.8% for 100 and 75 bag sizes, default bag size of 100 was considered in WEKA.
respectively, and 96.32% for the 50 bag size. The

Table 5 IBk hyperparameter optimization results


Training Neighbors (K) Distance metric Accuracy%
1 3 Euclidian 98.3
2 5 Manhattan 98.1
3 4 Euclidian 97.7

Table 6 PART hyperparameter optimization results


Training Confidence factor Unpruned Accuracy %
1 .25 True 99.3
2 .50 True 99
3 .25 False 98.32
4 .50 False 98.21

Table 7 CDT hyperparameter optimization results


Training Criterion Max depth Accuracy%
1 Gini index 6 97.21
2 Entropy 8 96.3
3 Gini index 5 98.4

Table 8 REP hyperparameter optimization results


Training Bag size Accuracy
1 20 96.32
2 40 96.8
3 100 97.4

5.Discussion
All algorithms were used to recommend the suitable Using the PART algorithm led to the highest recall
crop on the crop recommendation dataset to compare (0.993). In other words, PART algorithms will
the efficiency of various approaches. The recommend something 99% of the time accurately
preprocessing of the dataset involves removing but 1% of the time incorrectly. The PART algorithm
unnecessary attributes that do not add value to the achieved the highest precision and f1-score (0.993),
result. The Wrapper-PART-Grid technique has indicating that the algorithm is accurate and that its
proven to be highly effective compared to alternative high precision score is related to its low false positive
methods. Various factors, i.e., recall, accuracy, rate. According to Table 9, PART had the lowest
precision, and F1 measure, supported the analysis. In RMSE of 0.0249 after hyperparameter tuning, giving
order to compare the effectiveness of the chosen the most precise result. The study was further
features with the other ways, the wrapper feature analyzed using kappa statistics and times to construct
selection technique is also examined. The wrapper the model. Table 9 shows a composite chart of these
feature selection approach discovers the fewest useful metrics. The kappa statistic was used to judge how
features among the different filter selection strategies. well the model performed. Results showed that the
As can be seen in Figure 5, the findings demonstrate PART algorithm was the best-performing algorithm
that this methodology has selected less number of with a kappa value of 0.9929 and MAE (0.0007). The
features as compared to other feature selection kappa statistic was used to judge how well the model
techniques. Each feature is assessed using its performed compared to the actual labels in the
similarity data to the output labels as part of feature dataset. The kappa scores of the MLP, IBk, and CDT
selection methods. Since it estimates all feasible are all relatively close to one another (.9814, .9824,
feature combinations and selects the combination set and.9833, respectively), whereas the kappa score of
that yields the maximum accuracy when applied to the REP tree is the lowest of all models (.9729). All
various ML models, as illustrated in Figure 4. The the created models had scored over 0.81, which is
wrapper selection approach is effective in the close to 1, indicating that their interpretations agreed
situation of the high similarity data of each attribute almost perfectly. If the value is 1, then all of the other
as opposed to filter selection strategies. classifiers agree with each other perfectly. The
509
Disha Garg and Mansaf Alam

comparison chart of kappa statics, MAE, and time Prediction. The results for MLP, IBk, CDT, and
taken by each model is given in Figure 6. PART were.004, .0026, .0076, and.003, respectively.
Table 10 shows that before and after hyperparameter
The PART algorithm has the maximum accuracy optimization results are also best in the case of the
among all selected ML models, as seen in Table 10. PART algorithm. Hyperparameters were tuned while
However, it can be challenging to identify which using several methods to create a precise predictive
class (positive or negative) the models predict when model. Many alternative models with potentially
their accuracy score is low. Therefore, assessing a different outcomes may result from optimizing the
model accurately using accuracy alone is not always parameters.
possible. To clarify this, we calculated precision,
recall, and f1-score for each model, and then, the Finally, the time the PART method took to construct
models were compared using these calculated metrics the model was considered, as this is additional
to determine precisely where one model outperforms information supplied by WEKA following the 10-
the other. fold cross-validation. The findings indicated that the
PART method was the most efficient in training time
MAE was considered to assess the discrepancy for the classifier model. During the 10-fold cross-
between the predicted and actual classes. This allows validation process, the time taken to build the model
it to compare the predicted labels of the samples to was recorded in seconds, and the PART algorithm
the actual values in the dataset and measure the took significantly less time in training (.01 seconds).
correctness of the constructed model. It is found that The findings indicated that IBk needs even more
the model with the lowest MAE was the most training time. Training for MLP took .04 s, IBk .16 s,
successful. With an MAE of 0.0007, the PART CDT .051 s, and REP tree .06 s. These are the top-
classifier was the most accurate. A lower score performing algorithms at the exact moment.
indicates less likelihood of misclassification during

Table 9 Results of Kappa statistics


Algorithms Kappa statistic MAE Time taken in seconds
Multilayer perceptron 0.9814 0.004 0.04
IBk 0.9824 0.0026 0.16
CDT 0.9833 0.0076 .051
REP Tree 0.9729 0.003 .06
PART 0.9929 0.0007 0.01

Figure 5 Confusion matrix (PART)

Table 10 Performance of implemented classifiers


Models Hyperparameter TP FP Precision Recall F1- RMSE
optimization status score
Multilayer Perceptron Before 0.963 0.002 0.963 0.963 0.963 0.0809
After 0.982 0.001 0.983 0.982 0.982 0.035
510
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

IBk Before 0.982 0.001 0.982 0.982 0.982 0.0405


After 0.983 0.001 0.984 0.983 0.983 0.0346
CDT Before 0.975 0.001 0.976 0.975 0.975 0.0451
After 0.984 0.001 0.985 0.984 0.984 0.0405
REP Tree Before 0.966 0.002 0.968 0.966 0.966 0.051
After 0.974 0.001 0.975 0.974 0.974 0.0454
PART Before 0.991 0.000 0.991 0.991 0.991 0.028
After 0.993 0.000 0.993 0.993 0.993 0.0249

0.995
0.99
0.985
0.98
0.975
0.97
0.965
0.96
Multilayer IBk CDT REP Tree PART
Perceptron

(a)
0.008 0.0076
0.007
0.006
0.005 0.004
0.004 0.003
0.003 0.0026
0.002
0.0007
0.001
0
Multilayer IBk CDT REP Tree PART
Perceptron

(b)
0.18 0.16
0.16
0.14
0.12
0.1
0.08 0.06
0.051
0.06 0.04
0.04
0.02
0.01
0
Multilayer IBk CDT REP Tree PART
Perceptron

(c)
Figure 6 Comparison based on (a) kappa statistics, (b) MAE, (c) Time taken in seconds

Another way of evaluation is the confusion matrix. after hyperparameter tuning. Clearly, out of 2200
Only a few errors were found in the confusion matrix instances, 27 were misclassified for different classes,
for the model trained using the proposed method. The and 98.77% were correctly classified. It was the best
diagonal elements of the confusion matrix indicate result as compared to others. A complete list of
how often the Prediction was accurate. Figure 5 abbreviations is shown in Appendix I.
shows the confusion matrix for the proposed method
511
Disha Garg and Mansaf Alam

6. Conclusion Technology and Engineering Exploration. 2022;


In this paper, Wrapper-PART-Grid was proposed as a 9(97):1812-45.
[4] Treboux J, Genoud D. High precision agriculture: an
prediction technique for decision-making systems in application of improved machine-learning algorithms.
the domain of crop recommendations. The proposed In 6th SWISS conference on data science (SDS) 2019
approach utilized wrapper feature selections, GS (pp. 103-8). IEEE.
hyperparameter optimization, and the PART [5] Sharma A, Jain A, Gupta P, Chowdary V. Machine
algorithm. The most informative features from the learning applications for precision agriculture: a
crop recommendation dataset were selected by the comprehensive review. IEEE Access. 2020; 9:4843-
wrapper method based on the results of other feature 73.
selection methods. The accuracy of each method was [6] Thilakarathne NN, Yassin H, Bakar MS, Abas PE.
evaluated after selecting the optimal parameters for Internet of things in smart agriculture: challenges,
opportunities and future directions. In Asia-pacific
each model. By tuning hyperparameters using the conference on computer science and data engineering
grid optimization method, an impressive accuracy of 2021 (pp. 1-9). IEEE.
99.31% was achieved by the PART algorithm. [7] Lawal ZK, Yassin H, Zakari RY. Flood prediction
using machine learning models: a case study of Kebbi
The findings of this research and the developed crop state Nigeria. In Asia-pacific conference on computer
recommendation model have the potential to be science and data engineering 2021 (pp. 1-6). IEEE.
integrated into a farmer's decision-making system. As [8] Lawal ZK, Yassin H, Zakari RY. Stock market
a result, farmers may become more inclined to seek prediction using supervised machine learning
crop recommendations during soil testing, leading to techniques: an overview. In Asia-pacific conference
on computer science and data engineering 2020 (pp. 1-
a reduction in crop losses. The utilization of 6). IEEE.
clustering for crop classification in classifiers is [9] Durai SK, Shamili MD. Smart farming using machine
expected to enhance accuracy in the future. Despite learning and deep learning techniques. Decision
several positive aspects of this study, there are also Analytics Journal. 2022.
certain limitations. Only five models were examined [10] Priyadharshini A, Chakraborty S, Kumar A,
in this research, and exploring various machine Pooniwala OR. Intelligent crop recommendation
learning models for this task could be beneficial. In system using machine learning. In 5th international
the future, the application of a deep learning-based conference on computing methodologies and
computer vision system can be explored to enhance communication 2021 (pp. 843-8). IEEE.
[11] Kalimuthu M, Vaishnavi P, Kishore M. Crop
productivity in the smart farming sector. prediction using machine learning. In third
international conference on smart systems and
Acknowledgment inventive technology 2020 (pp. 926-32). IEEE.
None. [12] Mariappan AK, Madhumitha C, Nishitha P,
Nivedhitha S. Crop recommendation system through
Conflicts of interest soil analysis using classification in machine learning.
The authors have no conflicts of interest to declare. International Journal of Advanced Science and
Technology. 2020; 29(3):12738-47.
Author's contribution statement [13] Kumar YJ, Spandana V, Vaishnavi VS, Neha K, Devi
Disha Garg: Conceptualization, investigation, writing- VG. Supervised machine learning approach for crop
original draft, editing, data collection, analysis, and yield prediction in agriculture sector. In international
interpretation of results. Mansaf Alam: Study conception, conference on communication and electronics systems
design, supervision, investigation. 2020 (pp. 736-41). IEEE.
[14] Pantazi XE, Moshou D, Alexandridis T, Whetton RL,
References Mouazen AM. Wheat yield prediction using machine
[1] AlZu’bi S, Hawashin B, Mujahed M, Jararweh Y, learning and advanced sensing techniques. Computers
Gupta BB. An efficient employment of internet of and Electronics in Agriculture. 2016; 121:57-65.
multimedia things in smart and future agriculture. [15] Anguraj K, Thiyaneswaran B, Megashree G, Shri JP,
Multimedia Tools and Applications. 2019; 78:29581- Navya S, Jayanthi J. Crop recommendation on
605. analyzing soil using machine learning. Turkish Journal
[2] Rezk NG, Hemdan EE, Attia AF, El-Sayed A, El- of Computer and Mathematics Education. 2021;
Rashidy MA. An efficient IoT based smart farming 12(6):1784-91.
system using machine learning algorithms. [16] Suresh G, Kumar AS, Lekashri S, Manikandan R,
Multimedia Tools and Applications. 2021; 80:773-97. Head CO. Efficient crop yield recommendation
[3] Ansari M, Ali SA, Alam M. Internet of things (IoT) system using machine learning for digital farming.
fusion with cloud computing: current research and International Journal of Modern Agriculture. 2021;
future direction. International Journal of Advanced 10(1):906-14.

512
International Journal of Advanced Technology and Engineering Exploration, Vol 10(102)

[17] Kulkarni NH, Srinivasan GN, Sagar BM, Cauvery [29] Bouchlaghem Y, Akhiat Y, Amjad S. Feature
NK. Improving crop productivity through a crop Selection: a review and comparative study. In E3S
recommendation system using ensembling technique. web of conferences 2022 (p. 01046). EDP Sciences.
In 3rd international conference on computational [30] https://www.kaggle.com/datasets/atharvaingle/crop-
systems and information technology for sustainable recommendation-dataset. Accessed 13 April 2013.
solutions 2018 (pp. 114-9). IEEE. [31] Kanyongo W, Ezugwu AE. Feature selection and
[18] Garanayak M, Sahu G, Mohanty SN, Jagadev AK. importance of predictors of non-communicable
Agricultural recommendation system for crops using diseases medication adherence from machine learning
different machine learning regression methods. research perspectives. Informatics in Medicine
International Journal of Agricultural and Unlocked. 2023.
Environmental Information Systems. 2021; 12(1):1- [32] Khalid S, Khalil T, Nasreen S. A survey of feature
20. selection and feature extraction techniques in machine
[19] Bakthavatchalam K, Karthik B, Thiruvengadam V, learning. In science and information conference 2014
Muthal S, Jose D, Kotecha K, et al. IoT framework for (pp. 372-8). IEEE.
measurement and precision agriculture: predicting the [33] Mafarja MM, Mirjalili S. Hybrid binary ant lion
crop using machine learning algorithms. optimizer with rough set and approximate entropy
Technologies. 2022; 10(1). reducts for feature selection. Soft Computing. 2019;
[20] Priya PK, Yuvaraj N. An IoT based gradient descent 23(15):6249-65.
approach for precision crop suggestion using MLP. In [34] Verma A. Evaluation of classification algorithms with
journal of physics: conference series 2019 (p. solutions to class imbalance problem on bank
012038). IOP Publishing. marketing dataset using WEKA. International
[21] Gosai D, Raval C, Nayak R, Jayswal H, Patel A. Crop Research Journal of Engineering and Technology.
recommendation system using machine learning. 2019; 5(13):54-60.
International Journal of Scientific Research in [35] Christias P, Mocanu M. A machine learning
Computer Science, Engineering and Information framework for olive farms profit prediction. Water.
Technology. 2021: 554-69. 2021; 13(23).
[22] Reddy DA, Dadore B, Watekar A. Crop [36] Pedregosa F, Varoquaux G, Gramfort A, Michel V,
recommendation system to maximize crop yield in Thirion B, Grisel O, et al. Scikit-learn: machine
ramtek region using machine learning. International learning in python. Journal of machine Learning
Journal of Scientific Research in Science and Research. 2011; 12:2825-30.
Technology. 2019; 6(1):485-9. [37] Villavicencio CN, Macrohon JJ, Inbaraj XA, Jeng JH,
[23] Rao MS, Singh A, Reddy NS, Acharya DU. Crop Hsieh JG. Covid-19 prediction applying supervised
prediction using machine learning. In Journal of machine learning algorithms with comparative
Physics: Conference Series 2022 (p. 012033). IOP analysis using Weka. Algorithms. 2021; 14(7).
Publishing. [38] Smith TC, Frank E. Introducing machine learning
[24] Doshi Z, Nadkarni S, Agrawal R, Shah N. concepts with WEKA. Statistical genomics: Methods
AgroConsultant: intelligent crop recommendation and Protocols. 2016:353-78.
system using machine learning algorithms. In fourth [39] Reynolds K, Kontostathis A, Edwards L. Using
international conference on computing communication machine learning to detect cyberbullying. In 10th
control and automation 2018 (pp. 1-6). IEEE. international conference on machine learning and
[25] Bandara P, Weerasooriya T, Ruchirawya T, applications and workshops 2011 (pp. 241-4). IEEE.
Nanayakkara W, Dimantha M, Pabasara M. Crop [40] Amin MN, Habib MA. Comparison of different
recommendation system. International Journal of classification techniques using WEKA for
Computer Applications. 2020; 175(22):22-5. hematological data. American Journal of Engineering
[26] Suchithra MS, Pai ML. Improving the performance of Research. 2015; 4(3):55-61.
sigmoid kernels in multiclass SVM using optimization [41] Sultana J, Jilani AK. Predicting breast cancer using
techniques for agricultural fertilizer recommendation logistic regression and multi-class classifiers.
system. In soft computing systems: second International Journal of Engineering & Technology.
international conference, ICSCS 2018, Kollam, India, 2018; 7(4.20):22-6.
2018 (pp. 857-68). Springer Singapore. [42] Mazid MM, Ali AS, Tickle KS. Input space reduction
[27] Anitha A, Acharjya DP. Crop suitability prediction in for rule based classification. WSEAS Transactions on
vellore district using rough set on fuzzy approximation Information Science and Applications. 2010; 7(6):749-
space and neural network. Neural Computing and 59.
Applications. 2018; 30:3633-50. [43] Witten IH, Frank E, Hall MA, Pal CJ, DATA M.
[28] Ashok T, Suresh Varma P. Crop prediction based on Practical machine learning tools and techniques. In
environmental factors using machine learning Data Mining 2005.
ensemble algorithms. In proceedings of intelligent [44] Garg D, Alam M. Integration of convolutional neural
computing and innovation on data science 2019 (pp. networks and recurrent neural networks for foliar
581-94). Singapore: Springer Singapore. disease classification in apple trees. International

513
Disha Garg and Mansaf Alam

Journal of Advanced Computer Science and International Journals in the field of Computer Sciences.
Applications. 2022; 13(4):357-67. Prof. Alam has authored three books: "Digital Logic
[45] Armah GK, Luo G, Qin K. A deep analysis of the Design" published by PHI, "Concepts of Multimedia" by
precision formula for imbalanced class distribution. Arihant, and "Internet of Things: Concepts and
International Journal of Machine Learning and Applications" published by Springer. He has also
Computing. 2014; 4(5):417-22. contributed to the books "Big Data Analytics: Applications
[46] https:// www.cs.waika to.ac.nzml/weka/. Accessed 13 in Business and Marketing" and "Big Data Analytics:
April 2013. Digital Marketing and Decision Making" by Taylor and
[47] Nie Y, De Santis L, Carratù M, O’Nils M, Sommella Francis, as well as "Extended reality for Healthcare
P, Lundgren J. Deep melanoma classification with k- System: Recent Advances in Contemporary Research" by
fold cross-validation for process optimization. In Elsevier, UK. Recently, Prof. Alam achieved an
international symposium on medical measurements international patent (Australian) for his work on "An AI
and applications (MeMeA) 2020 (pp. 1-6). IEEE. Based Smart Dustbin," highlighting his innovative
[48] Belete DM, Huchaiah MD. Grid search in contributions in the field.
hyperparameter optimization of machine learning Email: [email protected]
models for prediction of HIV/AIDS test results.
International Journal of Computers and Applications. Appendix I
2022; 44(9):875-86. S. No. Abbreviation Description
1 AI Artificial Intelligence
[49] Anggoro DA, Mukti SS. Performance comparison of
2 ANN Artificial Neural Network
grid search and random search methods for
3 CV Cross-Validation
hyperparameter tuning in extreme gradient boosting
4 CDT C4.5 Decision Tree
algorithm to predict chronic kidney failure. 5 CHAID Chi-Squared Automatic Interaction
International Journal of Intelligent Engineering and Detection
Systems. 2021; 14(6):198-207. 6 GS Grid Search
7 KNN K-Nearest Neighbor
Disha Garg is currently pursuing Ph.D. 8 FN False Negatives
in the Department of Computer 9 FP False Negatives
Science, Faculty of Natural Sciences, 10 IBk Instance-Based Learning with
Jamia Millia Islamia New Delhi- Parameter k
110025. Her research interests are Big 11 IG Information Gain
Data Analytics, Machine Learning and 12 LOOCV Leave One out Cross Validation
Deep Learning. She has been presented 13 MAE Mean Absolute Error
several articles in international 14 ML Machine Learning
15 MLP Multilayer Perceptron
conferences and published several research articles in
16 N Nitrogen
reputed International Journals. She is working in 17 NPK Nitrogen, Phosphorus and
Agriculture Area using IoT and Data Analytics. Potassium
Email: [email protected] 18 P Phosphorus
19 PART Partial C4.5 Decision Tree
Prof. Mansaf Alam is currently 20 PCA Principal Component Analysis
serving as a Professor in the 21 K Potassium
Department of Computer Science, 22 KNN K-Nearest Neighbor
Faculty of Natural Sciences, Jamia 23 RF Random Forest
Millia Islamia, located in New Delhi- 24 ReLU Rectified Linear Unit
110025. He holds the position of 25 REP Reduced Error Pruning
Young Faculty Research Fellow, 26 RMSE Root Mean Squared Error
DeitY, Govt. of India, and serves as the 27 SVM Support Vector Machine
Editor-in-Chief of the Journal of Applied Information 28 TN True Negatives
Science. Prof. Alam has a notable publication record with 29 TP True Positives
numerous research articles published in International 30 WEKA Waikato Environment for
Knowledge Analysis
Journals and Proceedings at conferences by prestigious
publishers such as IEEE, Springer, Elsevier Science, and
ACM. His research interests encompass various areas
including AI, Big Data Analytics, Machine Learning &
Deep Learning, Cloud Computing, and Data Mining. Prof.
Alam is actively involved in the academic community as a
reviewer for renowned international journals, including
Information Science published by Elsevier Science. He also
serves as a member of the program committee for several
esteemed international conferences. Additionally, he is a
valued member of the Editorial Board for reputable

514
© 2023. This work is published under
http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding
the ProQuest Terms and Conditions, you may use this content in accordance
with the terms of the License.

View publication stats

You might also like