Agriculture Yield Estimation Using Machine Learning Algorithms
Agriculture Yield Estimation Using Machine Learning Algorithms
Management Pune & Symbiosis Electronics Engineering, Malla Reddy Symbiosis International (Deemed
International (Deemed University), Institute of Engineering and University),
Pune, Maharashtra, India Technology, Pune, Maharashtra, India.
[email protected] Hyderabad, Telangana, India [email protected]
[email protected]
K. Elangovan
Department of Mechanical B. Meenakshi S. Srinivasan
Engineering, Dhanalakshmi Srinivasan Professor, Department of Electrical Department of Biomedical Engineering,
Engineering College, and Electronics Engineering, Sri Saveetha School of Engineering,
Chennai, Tamil Nadu, India Sairam Engineering College, Saveetha Institute of Medical and
[email protected] Chennai, Tamil Nadu, India Technical Sciences, Saveetha University,
[email protected] Chennai, Tamil Nadu, India
[email protected]
Abstract—Storage, distribution, pricing, marketing, analysis and proposes experiments to evaluate the model's
import/export, and other policy considerations all depend on performance [1].
accurate and timely crop output estimates. The directorates of
economics and statistics issues official predictions advance Data mining is then used to assess China's agricultural
estimates in the form of estimates and projections for key grains mechanization level and provide recommendations.
and commercial crops. However, these guesses in advance are According to the study's findings, the agricultural goods
not the actual projections. Using the K-Nearest Neighbors monitoring system can handle more than 800 events per
(KNN) method to estimate agricultural yields is a good choice second across all four linkages of production, processing,
since it is easy to understand, can handle non-linearity, and transportation, and sales. Response time is 0.021 seconds
works well with small to medium datasets. Given these features, when there are twenty persons online. It takes roughly 0.12
KNN is a good choice for agricultural stakeholders, including seconds to respond to messages received by 100 people at
academics, decision-makers, and farmers who may not have a once. The findings provide a starting point for further study
lot of experience with machine learning. With the use of the into related technologies for use in agricultural management
KNN method, agricultural yield estimate models may be trained and development [2].
to provide precise crop production predictions given pertinent
input data. There is a lot of subjective judgment based on a Spatial clustering, a kind of unsupervised learning, has
variety of qualitative criteria involved in arriving at these quickly become one of the most useful methods for analyzing
estimations. As a result, it's important to provide objective, soil data in the agricultural sector. It is not practical to define
statistically reliable projections of future crop areas and yields. a predetermined pattern in advance, and it is not possible to
Technological developments in computers and data storage get labeled data that needs human interaction, which makes
have made an enormous amount of Information available. The soil data analysis a challenging field in which to do research.
difficulty has been in gleaning insight from this mountain of The purpose of this study is to conduct a review of the
data, but recent advances in areas like data mining have opened literature on soil data analysis using spatial clustering
a door from this data to more precise estimates of agricultural algorithms in the context of agricultural applications. There
yield. Results show that the accuracy is 90%, and the error value is an initial discussion of soil qualities (physicals, chemicals,
is 10% using the KNN algorithm for the given agriculture
and biological) and the features of spatial soil data [3].
datasets.
The five main types of spatial clustering methods are then
Keywords—Data mining, regression analysis, crop-cutting briefly discussed. Agricultural production management
experiments, Yield estimation, K nearest Neighbor algorithm zoning, complete soils, and lands evaluation, soil and land
I. INTRODUCTION categorization, and correlations research for agro ecosystems
are discussed as examples of agricultural uses of spatial
This article combines data mining methodology with up- clustering for soil data analysis. Prototype-based clustering
to-date agricultural data to address the issues of stale data and approaches are increasingly popular in practice,
missing data in the agricultural data collection and sorting notwithstanding the success of classic clustering algorithms.
processes inherent to the Internet era of data mining. The spatial clustering techniques may be improved by using
Additionally, this article employs novel approaches to machine learning models to better account for the dataset's
agricultural data mining and statistical techniques, including unique soil features [4].
advancements in time series representations and
measurements. Furthermore, this research discovers practical Due to the unique characteristics of agriculture and the
mining techniques for gaining insight through data analysis. shortcomings of the current agricultural product logistics
In conclusion, this work integrates the real requirements for distribution systems, this investigation employs data mining
building data mining and statistical analysis models of technology from the fields of artificial intelligence to cut
precisions and intelligent agriculture based on big data down on the expense of distributing agricultural goods
through electronic commerce and boost satisfaction among
buyers. The logistics modes are optimized and enhanced, and 7) Improving crop management and decision-making
common delivery modes for agricultural e-commerce are via the provision of accurate, interpretable, and actionable
offered, all based on examinations of the distribution Information to stakeholders and farmers is the effect.
efficiency of the existing mainstream way of distributing
agricultural goods. Using a genetic algorithm and MATLAB, The following section will be a literature survey discussed
test and assess how different logistics modes affect both in section 2. After that, the proposed system is discussed
logistical costs and consumer happiness. using the KNN algorithm for agriculture yield estimation in
section 3. Then, the results and discussion for the given
The results show that the timely distribution rates of all dataset are discussed to improve the accuracy and error rate
three modes (self-operated distributions, third-party logistics,
in section 4. Finally, the conclusion provides the overall
and common deliveries) drop as the number of orders
performance of the healthcare system and recommendations
increases; the timely distribution rates of common delivery
mode fluctuate the most, followed by the timely distribution for future work.
rate of third-party logistics modes; the timely distributions II. LITERATURE SURVEY
rates of self-operated distributions modes is the most stable,
but relatively low. Genetics algorithm results show that oil During the COVID-19 epidemic, there have been
consumption, damage cost, and other factors may be considerable obstacles to the growth of international
significantly impacted by using the optimal common delivery agricultural commerce. Effective study of agricultural trade
method of agricultural goods logistics [5]. requires better processing of international agricultural trade
data using machine learning methods. Accurate yield
Oil consumption costs, penalty costs, refrigeration costs, assessment for the many commodities traded internationally
and damage costs are all optimized by 26.7%, 31.7%, 30.3%, is a crucial problem in the field of international agricultural
and 19.6%, respectively, when the standard delivery method commerce. Practical and effective solutions to this challenge
is used. In addition, 95% of buyers are pleased overall. Using can only be achieved via the use of data mining tools [7].
data mining technology, this study optimizes the agricultural
product distribution mode, which is useful for addressing the Features are extracted from input elements and desired
logistics bottleneck issue in the online retail setting, outputs using the Reverse Design and Similarity Relationship
expanding our understanding of how e-commerce logistics methods, respectively. The dataset was collected from the
are distributed, and giving us a point of reference as work to sugarcane field, and the model makes use of data mining to
further the evolution of the industry as a whole [6]. identify and categorize the data. According to the findings,
the best model performance is attained using the following
The problem statement is discussed below. Generally, settings: (1) five Input Factors, (2) 32 Target Outputs, and (3)
there is a lack of accurate yield estimation in agriculture the Random Forests method. The models accurately
sectors. Hence, KNN provides an accurate analysis of yield identified the 2019 training data with a 98.21% success rate,
estimation. KNN is enhancing crop management and and it properly predicted the yields of the 2019 test data with
decision-making. an 89.58% success rate (10.42 percent error). The Wonder
The following are the work contributions. Cane model has an accuracy of 98.69% (error of 1.31%)
when predicting the harvests yield of a 2020 dataset. In light
1) There are several facets to the study, of this, the Wonder cane model is reliable and effective
implementation, and practical applications that contribute to resources that may help the sugar industry get a more precise
the advancement of agricultural yield estimates utilizing estimate of sugarcane production before harvest [8].
theKNN method. Predicting the future of a company is a common function
2) Gather important agricultural data, such as records of many business apps. One of the most important functions
of past yields, meteorological Information, soil properties, of a company is the marketing of its goods. The Information
and other relevant variables. provided by customers' wants is invaluable for marketing the
3) Use the KNN algorithm to estimate agricultural right items at the right time. In addition, services are
yields. Experiment extensively to find the optimal value for increasingly being seen as commodities. The improvement of
critical factors like the number of neighbors (k). Produce healthcare and education is dependent on archival records.
Problems and crimes committed on social media platforms
efficient and accurate predictions using a well-optimized
need a substantial data set in order to be effectively mitigated.
model.
4) Optimizing the model for efficiency and tackling Predicting the future of such companies requires data
scalability issues, particularly for real-time applications or analysts to use an effective categorization system. However,
massive agricultural datasets. For real-world use in many processing a massive amount of data takes a lot of time. Data
agricultural contexts, a model that is both efficient and mining is a broad term, including a wide range of
methodologies for predicting statistical data in different
scalable is crucial.
commercial contexts. Classification is a popular method that
5) To evaluate the accuracy and reliability of the KNN- may be accomplished in many ways. This study takes a look
based yield estimate model, it is necessary to establish at the accuracy of many categorization methods used in
rigorous validation processes and performance indicators. diverse data mining settings [9].
Thorough validation guarantees that the model achieves the
A delegated reading of 20 articles in the literature allows
targeted accuracy levels and works well in various situations.
for a thorough evaluation. The purpose of these documents is
6) The creation, improvement, and actual deployment to provide guidance to data analysts as they choose which
of yield estimating models in agriculture utilizing the KNN classification algorithm will be most effective for a broad
method are all aided by these work contributions. range of commercial uses, such as those found in online
social media networks, agriculture, health, and education.
188
2024 International Conference on Automation and Computation (AUTOCOM)
According to the findings, when it comes to classifying original data came from agricultural questions-and-answers
agricultural datasets, the N.B. method provides the best data and familiar sciences data obtained via text crawling
results. When it comes to classifying data in the health area, [13].
OneR is the most reliable algorithm. When it comes to
classifying students' records and making predictions about Based on data mining, crop portraits disclose disease and
when they will finish their degrees, the C4.5 Decision Tree pest prevalence trends, show that there is little to no
algorithm performs the best. fundamental link between various diseases and pests, and
provide a range of pesticides with which to deal with them.
This research uses Bayesian networks, a data mining The findings confirmed the usefulness of crop portrait
technique, to analyze ecosystem data in an effort to learn construction for agricultural analysis, demonstrating the
more about the static and dynamic advantages of converting method's potential in both applied and theoretical settings of
cropland back into forests and ecological compensation. To big data analytics. This study utilizes a piece of software that
cut down on training expenses and make the model's structure employs data mining methods and machine learning methods
more straightforward, a limited network is recommended. to estimate the impact of various factors on student
Concurrently, many models are trained to address the same performance [14].
issue using ensemble learning, which improves prediction
accuracy over a single model. In addition, this paper uses data The model was constructed using an existing data set that
mining to investigate the ecosystem's design goals, includes both input variables and final scores. The focus on
components, and static frameworks, depict its operational and higher education, the number of absences, the amount of
evolutionary mechanisms, and establish an assessment study time, the education level of the parents, the occupations
framework for reforesting abandoned farmland and of the parents, and the number of previous failures are all
compensating the environment. This article carefully proves criteria that are evaluated. This finding demonstrates that
the static and dynamic advantages via tests and quantitative only students' study habits and absences have any impact on
statistics, and it also takes into account the present their grades. Predicting how kids will do in school allows
teachers to better understand their classrooms and make
circumstances. According to the results, the framework
developed in this study has a significant enough influence to preventative efforts to boost their pupils' learning. This study
be considered a viable building model [10]. also explores how the prediction algorithm might be utilized
to zero in on a student's most valuable data points [15].
India's agricultural sector is one of the country's most
distinctive and lucrative industries. Changes in weather or III. PROPOSED SYSTEM
other environmental factors may have a devastating effect on Over the last several decades, information technology has
farmers and ranchers. Limiting these factors is possible via grown more ubiquitous. The efficiency of modern agriculture
the use of a strategy determined by data on soil type, strength, is a prime example of a sector that may benefit greatly from
reasonable climate, and crop kind. These are the bare the use of Information Technology (I.T.). These days, a
minimums we want to see from the Soil Features project: the farmer collects more than just agricultural data. These
ability to determine whether or not a certain plot of land is numbers are specific and limited in scope. However, it's very
adaptable for agricultural use, the presence of optimal uncommon for the collection of massive volumes of data to
growing conditions, and an increase in the accuracy of be both a boon and a bane. There is a plethora of data at one's
calculations and comparisons in order to zero in on the best disposal that may be mined for insights on a particular asset.
option. This motivates us to designate some areas as suitable Here, producers may benefit from the favorable soil and crop
for farming while others are set aside for other uses [11]. yield characteristics. Figure 1 shows the system architecture
This motivates us to advance agriculture and gardening. of the proposed system.
Data on soil characteristics such as location, availability, soil Pre- Clean data Attribute
surface, water system scaling, rotation, yield, soil erosion, processing analysis
wind erosion, slope, drainage, and more are all stored in the
system. Features are extracted, selected, and scaled using a
chi-square element computation. It quiets the background
noise and accentuates the data the framework may use. Data Useful
Calculations like R.F. and Linear Discriminate Analysis will Provider data
help build and improve the accuracy. The findings
demonstrate that the suggested conspiracy is not only feasible
but also useful for ranchers in comprehending their ecological Raw data
list of homesteads [12]. Experienc
Neo4j is not only a database but also a database that can es with the
KNN
employ data relationships. In order to actualize the networked
management of crop information, "crop portraits," a kind of Fig. 1. System architecture of the proposed system
property graphs, represent the crop entities in the real world
based on Information. One crop variety, insufficient The phrase "data mining" was developed to describe an
descriptions, and a lack of agricultural understanding are only all too typical issue. The goal of data mining is to discover
a few of the problems with the current crop knowledge base. patterns and insights of relevance to the farmer within the
Creating crop portraits may help fill in the gaps and provide data. Yield prediction is an example of a typical specialized
a more accurate depiction of crops. After selecting labels to issue. A farmer who is curious about his expected harvest
createcrop portraits that include three categories (crop, should find out as soon as possible in the growing season how
pesticide, and disease and pest), this study used the graphs much output he might predict. In the past, farmers' years of
databases (Neo4j) to store and display these portrait data. The experience with a certain yield, crop, and climate were used
189
2024 International Conference on Automation and Computation (AUTOCOM)
Error value
and economic situations, and the like, all interact to determine 3 2.5
crop area and production. State agencies, credit institutions, 2.5
seed/fertilizer/pesticide agencies, and many other public and 2
private sector partners are actively engaged in increasing the 1.5
productivity of various crops in various agro-climate regions 1
0.5
through the implementation of a wide variety of schemes. 0
Nonetheless, agricultural yield swings persist as a major SVM KNN
source of concern in the industry. It is crucial for government
agencies to estimate agricultural yields in order to track the
Fig. 2. Error values of the proposed system
industry's development and provide appropriate insurance
coverage. Departments of Economics and Statistics, ܶݏ݊݅ݐܿ݅݀݁ݎݐܿ݁ݎݎ݈ܿܽݐ
Agriculture, and Revenue all contribute to the estimating ݕܿܽݎݑܿܿܣൌ ሺʹሻ
process. The Information collected by the government is used ܶݏ݊݅ݐܿ݅݀݁ݎ݂ݎܾ݁݉ݑ݈݊ܽݐ
by researchers and a wide variety of other organizations.
Satellite photos of crop slate are increasingly being utilized
to estimate the area, but productivity statistics must come 100 90
from crop-cutting experiments since they are often only
accessible in aggregate form. 80
190
2024 International Conference on Automation and Computation (AUTOCOM)
Time series, cross-sectional, and longitudinal data may be [2] M.U.A. Ayoobkhan, and L.A.K.S. Ali, “Web page recommendation
used in either a formal statistical approach or a more system by integrating ontology and stemming algorithm,” International
Journal of Advances in Signal and Image Sciences, vol. 8, no. 1, pp. 9-
informal, judgment-based approach. In hydrology, for 16, 2022.
instance, the terms "forecast" and "forecasting" may be [3] K. Vijayalakshmi, R. Raman, G. Venkatesh, C.J. Rawandale, G.
reserved for estimations of value at certain future times, while Kalaimani, C. Srinivasan, "Smart Energy Management using IoT-
the term "prediction" may be used for more general estimates, based Embedded Systems",International Conference on Sustainable
such as the frequency with which flood will occur over a long Communication Networks and Application, pp. 299-304, 2023.
period. Indicating the level of uncertainty attached to [4] H. Gao,“Agricultural Soil Data Analysis Using Spatial Clustering Data
projections is typically regarded as good practice since risk Mining Techniques,” In IEEE 13th International Conference on
Computer Research and Development, pp. 83-90, 2021.
and uncertainty are fundamental to forecasting and
[5] C.S. Ranganathan, R. Raman, K.K. Sutaria, R.A. Varma and S.
prediction. In any event, for the prediction to be as precise as Murugan, "Network Security in Cyberspace using Machine Learning
possible, the data should be up to date. Techniques", Seventh International Conference on Electronics,
Communication and Aerospace Technology, pp. 707-711, 2023.
V. CONCLUSIONS [6] L. Yan,“Development of international agricultural trade using data
Using input characteristics that account for data point mining algorithms-based trade equality,” Mobile Information Systems,
pp. 1-9, 2021.
similarities, the KNN model accurately predicts crop yields.
In order to allocate resources, plan harvests, and manage [7] B. Tanut, R. Waranusast, and P. Riyamongkol,“High accuracy pre-
harvest sugarcane yield forecasting model utilizing drone image
crops effectively, stakeholders and farmers may depend on analysis, data mining, and reverse design method,” Agriculture,vol. 11,
accurate yield projections. Due to its instance-based design, no. 7, pp. 1-6, 2021.
the KNN algorithm can rapidly adjust to new situations in the [8] C. S. Ranganathan, U. P. Nandekar, R. Raman, C. Srinivasan and R.
real world, skipping the time-consuming training process Adhvaryu, "Rural Automatic Healthcare Dispatch with Real-Time
altogether. Resolving new issues, enhancing model Remote Monitoring," Second International Conference On Smart
performance, and investigating potential new applications Technologies For Smart Nation , pp. 575-579,2023.
might be part of future work in agricultural yield estimating [9] B.R. Babu, M.A. Haile, D.T. Haile, and D. Zerihun, “Real-time sensor
data analytics and visualization in cloud-based systems for forest
research utilizing the KNN method. This highlights the need environment monitoring,” International Journal of Advances in Signal
for a non-subjective approach to crop forecasting in the lead- and Image Sciences, vol. 9, no. 1, pp. 29-39, 2023.
up to harvest. This necessitates the construction of [10] B. Meenakshi, A. Vanathi, B. Gopi, S. Sangeetha, L. Ramalingam and
appropriate prediction model, which has certain advantages S. Murugan, "Wireless Sensor Networks for Disaster Management and
over the standard forecasting approach using the KNN Emergency Response using SVM Classifier," 2023 Second
algorithm. These benefits include the forecast's impartiality International Conference On Smart Technologies For Smart Nation ,
pp. 647-651,2023.
and its ability to deliver a degree of dependability that would
[11] K.I. Taher, A.M. Abdulazeez, and D.A. Zebari,“Data mining
be impossible with a more conventional forecasting classification algorithms for analyzing soil data,” Asian Journal of
approach. This emphasizes the significance of using Research in Computer Science,vol. 8, no. 2, pp. 17-28, 2021.
objective methodologies to predict agricultural yields in India [12] M. Santhanalakshmi, S. Dhanalakshmi, M. Radhika, G. Kavitha, G.
prior to harvest. This study uses Data Mining methods to Elavel Visuvanathan and C. Srinivasan, "IoT Enabled Wearable
predict the cost of crops by analyzing historical data. While Technology Jacket for Tracking Patient Health and Safety
there are several enhancements to the fundamental algorithm, System,"Second International Conference On Smart Technologies For
Smart Nation , 2023, pp. 918-922, 2023 .
the applications that use the KNN technique only make use
of this. Not all Data Mining methods have been tried out on [13] Y.X. Shi, BK. Zhang, YX. Wang, HQ. Luo, and X. Li,“Constructing
crop portraits based on graph databases is essential to agricultural data
farming issues yet. Regression methods, for instance, may be mining,” Information, vol. 12, no. 6, pp. 1-7, 2021.
used to analyze agricultural databases for hidden insights. In [14] R. Raman, V. Sujatha, C.B. Thacker, K. Bikram, M.B. Sahaai and S.
order to improve the precision of price forecasts in the future, Murugan, "Intelligent Parking Management Systems using IoT and
experts want to build neural networks using genetic Machine Learning Techniques for Real-Time Space Availability
algorithms. Estimation",International Conference on Sustainable Communication
Networks and Application, pp. 286-291, 2023.
REFERENCES [15] A.J. Suarez, B. Singh, FH. Almukhtar, R. Kler, S. Vyas, and K.
[1] Z. Rao, and J. Yuan,“Data mining and statistics issues of precision and Kaliyaperumal,“Identifying smart strategies for effective agriculture
intelligent agriculture based on big data analysis,” Acta Agriculturae solution using data mining techniques,” Journal of Food Quality, pp. 1-
Scandinavica, Section B—Soil & Plant Science,vol. 71, no. 9, pp. 870- 9, 2022.
883, 2021.
191