SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 3, September 2024, pp. 2971~2977
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp2971-2977  2971
Journal homepage: http://ijai.iaescore.com
Encoder-decoder approach for describing health of cauliflower
plant in multiple languages
Parag Jayant Mondhe1
, Manisha P. Satone1
, Namrata N. Wasatkar2
1
Department of Electronics and Telecommunication Engineering, Matoshri College of Engineering and Research Centre, Nashik, India
2
Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India
Article Info ABSTRACT
Article history:
Received Jun 21, 2023
Revised Jan 24, 2024
Accepted Feb 10, 2024
Physically examining each plant to determine its state of health and
determining the disease if plant is affected due to it, is challenging. The
encoder - decoder approach is proposed for describing health of cauliflower
plant in English, Hindi, and Marathi languages from aerial images.
Experiments are performed with different convolutional neural network
(CNN) models and long short-term memory (LSTM) combinations. The
multilanguage cauliflower captions dataset (MCCD) is developed to evaluate
the performance of the model. The dataset contains 1213 images where each
image is described in 3 different languages. The dataset contains images of
cauliflower plant affected due to bacterial spot rot, black rot, and downy
mildew diseases. It also contains images of healthy plant. The objective
metrics such as bilingual evaluation understudy (BLEU) scores and
subjective criteria are used to decide the quality of the generated description.
Keywords:
Aerial images
Cauliflower plant
Description generation
Encoder decoder approach
Multi language captions
This is an open access article under the CC BY-SA license.
Corresponding Author:
Parag Jayant Mondhe
Department of Electronics and Telecommunication Engineering
Matoshri College of Engineering and Research Centre
Eklahareshivar, Near Odhagaon, Opp Nashik-Aurangabad Highway, Nashik, Maharashtra 422105, India
Email: mondheparag@gmail.com
1. INTRODUCTION
The manual methods of crop monitoring are labour intensive, time consuming and may results in
grossly inaccurate estimates. By automating the process of monitoring of crop health these challenges can be
addressed. This paper proposes automation of crop monitoring process. The encoder-decoder approach is
proposed for describing health of cauliflower plant in English, Hindi, and Marathi languages. The aerial
images of farmland can be taken using unmanned aerial vehicle (UAV). Later, these images will be analysed
by machine learning algorithm to describe health of the crop using captions in multiple languages.
The Hindi language is official language of Government of India and among top 5 globally spoken
languages [1]. The third most widely spoken language in India is Marathi, which is also the official language
of the state of Maharashtra [2]. Cauliflower is a member of the cruciferous family and has nutritional value. It
contains vitamins, nutrients, fiber, and antioxidants [3], [4]. The Figure 1 shows the global production/yield
quantities of cauliflowers and broccoli from year 1994 to 2020 [5].
The Figure 2 shows top cauliflower and broccoli producing countries during 1994 to 2021 [5].
During this period India was its second largest producer. The cauliflower plant is commonly affected by
bacterial spot rot, black rot, and downy mildew diseases. Previously, researchers concentrated primarily on
recognizing cauliflower disease from images [6]‒[10] and did not explore generating descriptions about its
health from aerial images in many languages. Three types of methodologies are utilized in the process of
generating descriptions: encoder-decoder approaches [11]‒[16], picture retrieval [17], and object recognition
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 2971-2977
2972
[18]. In comparison to other approaches, the encoder-decoder approach produced superior results [19]. As a
result, an encoder-decoder approach is proposed here for generating descriptions in multiple languages.
Figure 1. Global production/yield quantities of cauliflowers and broccoli [5]
Figure 2. Top Cauliflower and broccoli producing countries [5]
2. PROPOSED ENCODER–DECODER APPROACH
The suggested encoder-decoder method for using an aerial image to express the health of a
cauliflower plant in several languages is depicted in Figure 3. The images of farmland will be captured using
UAV such as a drone. These images will act as in input to the encoder block.
Figure 3. Proposed methodology for explaining health of cauliflower plant in multiple languages
Using aerial image of cauliflower fields as input, convolutional neural network (CNN) acts as an
encoder in the model, extracting features from them. To prevent losing spatial information, the output of the
CNN model's pooling layer is used instead of the last fully linked layer. The decoder is given the CNN-
generated features as well as their descriptions. However, in the suggested method, the long short-term
memory (LSTM) network serves as a decoder since it overcomes the recurrent neural networks (RNN)
declining gradient problem [20].
During the training phase, the decoder learns how to provide a description of an image using its
characteristics and previously produced words. In this method, each word is given a probability depending on
its distinguishing characteristics and the preceding word. Figure 4 depicts the flowchart of proposed encoder-
Int J Artif Intell ISSN: 2252-8938 
Encoder-decoder approach for describing health of cauliflower plant in … (Parag Jayant Mondhe)
2973
decoder approach. The same approach is followed to generate captions in other languages because we
transform words to numbers in step 8 of the flowchart.
Experiments on feature extraction were carried out using several pre-trained CNN architectures,
including visual geometry group 16 (VGG-16) [21], InceptionResNetV2 [22], and EfficientNetV2L [23]. The
top-1 accuracy of VGG-16, InceptionResNetV2 and EfficientNetV2L CNN model is 71.30%, 80.30% and
85.70% respectively on ImageNet validation dataset [24]. The top-1 accuracy of the model indicates that the
predicted label matches the target label. The top-5 accuracy of VGG-16, InceptionResNetV2, and
EfficientNetV2L CNN model is 90.10%, 95.30%, and 97.50% respectively on ImageNet validation dataset
[24]. The top-5 accuracy of the model indicates that the target label is one among the top 5 predicted label.
The VGG-16 is a CNN architecture that stands for VGG-16. VGG-16 is known for its depth because
to its 16 layers, which include 13 convolutional layers and 3 fully linked layers. The design is made up of
smaller 3×3 convolutional filters layered on top of each other, allowing for more detailed aspects of the input
image to be captured. It employs a straightforward and consistent design paradigm in which convolutional
layers are followed by max-pooling layers to lower spatial dimensions. The ImageNet dataset was used to
train the VGG-16 model, which comprises millions of labelled images from diverse object categories.
A combination of the Inception and ResNet models is the InceptionResNetV2 model. The Inception
architecture, upon which the InceptionResNetV2 model is based, combines convolutional layers with varying
kernel sizes to collect information at various scales. EfficientNetV2L is an EfficientNet model family CNN
model. The "V2L" in EfficientNetV2L stands for "vision to language". This model was created especially for
multimodal activities that need the processing of textual and visual data.
Figure 4. Flowchart for proposed approach
3. DATASET
The multilanguage cauliflower captions dataset (MCCD) is developed to evaluate the proposed
model's performance. The dataset comprises 1213 colour images, each of which is accompanied by a caption
written in English, Hindi, and Marathi language. The 656 images were captured in Bangladesh and provided
in [25]. The remaining 557 images were obtained by the authors in 2023 on farmland in the Nashik district of
Maharashtra State, India using a drone. The composition of the MCCD is shown in Table 1.
Table 1. Composition of MCCD
Category Number of images
Images from Bangladesh [25] Images captured in India Total images
Bacterial spot rot 173 126 299
Black rot 100 158 258
Downy mildew 177 106 283
No disease/healthy plant 206 167 373
Total 656 557 1213
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 2971-2977
2974
An agronomist captions each image with information in Marathi, Hindi, and English on the health of
the cauliflower plant. Thus, there are 1213 captions in the dataset in Marathi, Hindi, and English. The dataset
includes images of cauliflower plants infected by various diseases such as bacterial spot rot, black rot, and
downy mildew, as well as images of healthy plants. The images were acquired from many sources and in
various weather conditions, resulting in a diversified collection.
4. RESULTS AND DISCUSSIONS
4.1. Evaluation on objective metrics
The resulting caption quality of the proposed approach is validated using quantitative metrics
bilingual evaluation understudy (BLEU). The BLEU metric is established in [26], and it uses a weighted
average to compare various length phrase matches to the reference sentence. It counts the number of times an
n-gram appears in the produced caption and the dataset's reference caption, where an n-gram is a collection of
one or more ordered words. The BLEU is a precision-based score that ranges from 0 to 1, with a higher value
suggesting a better match. For calculation of the score, the brevity penalty (BP) is calculated as (1):
𝐵𝑃 = {
1,
𝑒
(1−
𝑟
𝑐
)
,
𝑖𝑓 𝑐 > 𝑟
𝑖𝑓 𝑐 ≪ 𝑟
(1)
Where, c is the length of the generated caption and r is the reference caption length. The BLEU score is
computed as (2):
𝐵𝐿𝐸𝑈 = 𝐵𝑃 ∙ 𝑒𝑥𝑝(∑ 𝑤𝑛 𝑙𝑜𝑔 𝑝𝑛
𝑁
𝑛=1 ) (2)
Where, pn is modified n-gram precision, N is n-gram length, wn is the positive weights and the sum of wn is one.
For various N values, the BLEU number is computed. BLEU-1 makes use of a unigram precision
value, but BLEU-2 makes use of a geometric sum of unigram and bigram precision. The BLEU-3 uses the
geometric average of unigram, bigram, and trigram accuracy, whereas the BLEU-4 uses the geometric
average of unigram, bigram, trigram, and four-gram precision. The Table 2 presents the BLEU metrics for
several CNN models. From VGG-16 through InceptionResNetV2 to EfficientNetV2L, the quality of caption
improves on all four BLEU measures. The BLEU score achieved for captions generated in multiple
languages for specific model is also comparable.
Table 2. Results on objective metrics
CNN model Language of caption BLEU-1 BLEU-2 BLEU-3 BLEU-4
VGG-16
English 0.79 0.76 0.73 0.69
Hindi 0.78 0.74 0.72 0.69
Marathi 0.78 0.75 0.73 0.68
InceptionResNetV2
English 0.86 0.81 0.77 0.74
Hindi 0.86 0.82 0.77 0.73
Marathi 0.85 0.81 0.76 0.72
EfficientNetV2L
English 0.90 0.84 0.80 0.77
Hindi 0.90 0.83 0.79 0.75
Marathi 0.89 0.82 0.80 0.75
4.2. Evaluation on subjective criteria
To determine the quality of the caption, the BLEU score just contrasts the generated caption with
the reference captions. An objective agronomist thoroughly verifies the generated description in subjective
judgement. Three categories—correct, partially correct, and incorrect—are created from the generated
caption and are based on quality. The findings on subjective criteria for several CNN models are displayed in
Table 3. Overall, caption quality is better from VGG-16 to InceptionResNetV2 to EfficientNetV2L.
Figure 5 compares the performance of several CNN models based on subjective criteria. Similar to
objective assessments, the performance of various models for numerous languages is similar. The
InceptionResNetV2 and EfficientNetV2L models beat the VGG-16 model, which is deemed shallow because
to its low number of layers.
The Figure 6 shows results for EfficientNetV2L CNN model while Figure 7 shows the results for
InceptionResNetV2 CNN model. The Figure 6 contains test image of cauliflower plant which is affected due to
bacterial spot rot disease. The Figure 7 contains test image of cauliflower plant which is in healthy condition.
Int J Artif Intell ISSN: 2252-8938 
Encoder-decoder approach for describing health of cauliflower plant in … (Parag Jayant Mondhe)
2975
If the generated description accurately portrays the health of the cauliflower plant with no
grammatical errors, it falls into the correct category. The caption generated in both English and Hindi
language for test image provided in Figure 6 is categorized as correct. Similarly, the caption generated in
English language for test image shown in Figure 7 is also categorized in correct category. If the description is
erroneous, linguistically faulty, or useless, it is placed in the incorrect category. The caption generated in
Marathi language for test image provided in Figure 7 falls into this category as the generated caption is
meaningless.
Table 3. Results on subjective criteria
CNN model Language of caption Correct caption (%) Partially correct caption (%) Incorrect caption (%)
VGG-16
English 77.00 15.11 7.89
Hindi 76.08 15.13 8.79
Marathi 76.03 14.09 9.88
InceptionResNetV2
English 84.19 11.03 4.78
Hindi 85.16 8.87 5.97
Marathi 83.97 10.11 5.92
EfficientNetV2L
English 87.60 10.81 1.59
Hindi 86.97 11.94 1.09
Marathi 86.97 11.02 2.01
Figure 5. Comparison of results on subjective criteria
Language Reference caption Generated caption Category of generated caption
English Bacterial spot rot has infected the head of a
cauliflower plant.
Bacterial spot rot has infected the
head of a cauliflower plant.
Correct
Hindi
Correct
Marathi
Partially correct
Figure 6. Results for EfficientNetV2L CNN model
0
10
20
30
40
50
60
70
80
90
100
English Hindi Marathi English Hindi Marathi English Hindi Marathi
VGG-16 InceptionResNetV2 EfficientNetV2L
Performance
in
%
Correct Caption (%) Partially Correct Caption (%) Incorrect Caption (%)
 ISSN: 2252-8938
Int J Artif Intell, Vol. 13, No. 3, September 2024: 2971-2977
2976
Language Reference caption Generated caption Category of generated caption
English A healthy and fresh cauliflower
crop is growing at a reasonable rate.
A healthy and fresh cauliflower
crop is growing at a reasonable rate.
Correct
Hindi Partially correct
Marathi Incorrect
Figure 7. Results for InceptionResNetV2 CNN model
A statement that accurately characterizes the plant's health but does not give specifics is classified as
partially correct. The caption generated in Marathi language for test image provided in Figure 6 is
categorized as partially correct as caption indicated that cauliflower plant is affected but fails to provide
details of the disease. Similarly, the caption generated in Hindi language for test image shown in Figure 7 is
also categorized in same category as it hides other details of healthy plant.
5. CONCLUSIONS
The encoder-decoder approach is proposed to generate captions in multiple languages to describe
health of cauliflower plant from aerial images. The experiments are performed with CNN models such as
VGG-16, InceptionResNetV2, EfficientNetV2L, and LSTM combinations. On both BLEU score and
subjective criteria EfficientNetV2L-LSTM combination has provided superior results. The
InceptionResNetV2 and EfficientNetV2L have performed substantially well than VGG-16 as they have more
layers which results in better feature extraction. The captions generated in various languages were of
comparable quality.
REFERENCES
[1] Ministry of Electronics and Information Technology India, “Internet of things,” MeitY, 2016. [Online]. Available:
https://www.meity.gov.in/content/internet-things
[2] “Maharashtra state government.” [Online]. Available: https://www.maharashtra.gov.in/
[3] J. Wang et al., “A comparative study on the nutrients, mineral elements, and antioxidant compounds in different types of
cruciferous vegetables,” Agronomy, vol. 12, no. 12, 2022, doi: 10.3390/agronomy12123121.
[4] “United States of America (USA), Department of Agricultural Research Service,” USDA. [Online]. Available:
https://fdc.nal.usda.gov/
[5] FAO, “Food and agriculture organization of the United Nations,” International Organization, 1947. [Online]. Available:
https://www.fao.org
[6] T. Y. Orin, M. U. Mojumdar, S. M. T. Siddiquee, and N. R. Chakraborty, “Cauliflower leaf disease detection using computerized
techniques,” 2021 IEEE 6th International Conference on Computing, Communication and Automation, ICCCA 2021. IEEE, pp.
730–733, 2021, doi: 10.1109/ICCCA52192.2021.9666437.
[7] S. K. Maria, S. S. Taki, M. J. Mia, A. A. Biswas, A. Majumder, and F. Hasan, “Cauliflower disease recognition using machine
learning and transfer learning,” Smart Innovation, Systems and Technologies, vol. 235. Springer Singapore, pp. 359–375, 2022,
doi: 10.1007/978-981-16-2877-1_33.
[8] M. A. Malek, S. S. Reya, N. Zahan, M. Z. Hasan, and M. S. Uddin, “Deep learning-based cauliflower disease classification,”
Algorithms for Intelligent Systems. Springer Singapore, pp. 171–186, 2022, doi: 10.1007/978-981-16-9991-7_11.
[9] S. Kashyap, T. Thaware, S. R. Sahu, and K. M. Rao, “Multi-crop leaf disease detection using deep learning methods,” INDICON
2022 - 2022 IEEE 19th India Council International Conference. IEEE, 2022, doi: 10.1109/INDICON56171.2022.10040099.
[10] K. P. A. Rani and S. Gowrishankar, “Pathogen-based classification of plant diseases: a deep transfer learning approach for
intelligent support systems,” IEEE Access, vol. 11, pp. 64476–64493, 2023, doi: 10.1109/ACCESS.2023.3284680.
[11] B. Qu, X. Li, D. Tao, and X. Lu, “Deep semantic understanding of high resolution remote sensing image,” 2016 International
Conference on Computer, Information and Telecommunication Systems. IEEE, 2016, doi: 10.1109/CITS.2016.7546397.
[12] X. Zhang, X. Li, J. An, L. Gao, B. Hou, and C. Li, “Natural language description of remote sensing images based on deep
learning,” International Geoscience and Remote Sensing Symposium (IGARSS), vol. 2017, pp. 4798–4801, 2017, doi:
10.1109/IGARSS.2017.8128075.
[13] X. Zhang, Q. Wang, S. Chen, and X. Li, “Multi-scale cropping mechanism for remote sensing image captioning,” International
Int J Artif Intell ISSN: 2252-8938 
Encoder-decoder approach for describing health of cauliflower plant in … (Parag Jayant Mondhe)
2977
Geoscience and Remote Sensing Symposium (IGARSS), vol. 2019, pp. 10039–10042, 2019, doi: 10.1109/IGARSS.2019.8900503.
[14] X. Lu, B. Wang, X. Zheng, and X. Li, “Exploring models and data for remote sensing image caption generation,” IEEE
Transactions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2183–2195, 2018, doi: 10.1109/TGRS.2017.2776321.
[15] S. C. Kumar, M. Hemalatha, S. B. Narayan, and P. Nandhini, “Region driven remote sensing image captioning,” Procedia
Computer Science, vol. 165, pp. 32–40, 2019, doi: 10.1016/j.procs.2020.01.067.
[16] G. Hoxha and F. Melgani, “A novel SVM-based decoder for remote sensing image captioning,” IEEE Transactions on
Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3105004.
[17] B. Wang, X. Lu, X. Zheng, and X. Li, “Semantic descriptions of high-resolution remote sensing images,” IEEE Geoscience and
Remote Sensing Letters, vol. 16, no. 8, pp. 1274–1278, 2019, doi: 10.1109/LGRS.2019.2893772.
[18] Z. Shi and Z. Zou, “Can a machine generate humanlike language descriptions for a remote sensing image?,” IEEE Transactions
on Geoscience and Remote Sensing, vol. 55, no. 6, pp. 3623–3634, 2017, doi: 10.1109/TGRS.2017.2677464.
[19] P. J. Mondhe, M. P. Satone, and G. K. Kharate, “Automatic caption generation for aerial images: a survey,” Indonesian Journal of
Electrical Engineering and Informatics, vol. 11, no. 1, pp. 267–285, 2023, doi: 10.52549/ijeei.v11i1.4342.
[20] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi:
10.1162/neco.1997.9.8.1735.
[21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International
Conference on Learning Representations, ICLR 2015-Conference Track Proceedings, pp. 1-14, 2015.
[22] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on
learning,” 31st AAAI Conference on Artificial Intelligence, AAAI 2017, vol. 31, no. 1, pp. 4278–4284, 2017, doi:
10.1609/aaai.v31i1.11231.
[23] M. Tan and Q. V. Le, “EfficientNetV2: smaller models and faster training,” in Proceedings of Machine Learning Research, 2021,
vol. 139, pp. 10096–10106.
[24] “Simple. Flexible. Powerful” Keras. [Online]. Available: https://keras.io
[25] A. Rajbongshi, U. S. SAra, R. Shakil, B. Akter, and M. S. Uddin, “VegNet: an extensive dataset of cauliflower images to
recognize the diseases using machine learning and deep learning models,” Mendeley Data, version 3, 2022, doi:
10.17632/t5sssfgn2v.3.
[26] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,”
Proceedings of the 40th Annual Meeting on Association for Computational Linguistics(ACL), pp. 311-318, 2002, doi:
10.3115/1073083.1073135.
BIOGRAPHIES OF AUTHORS
Parag Jayant Mondhe is a Ph.D. Research Scholar at Matoshri College of
Engineering and Research Centre, Nashik, India. He is working as an Assistant Professor at
K. K. Wagh Institute of Engineering Education and Research, Nashik, India since 2014. He
has completed Master of Engineering and Bachelor of Engineering from Savitribai Phule
Pune University, India in 2014 and 2012 respectively. His research papers are published in
journals and proceedings of international conferences indexed by Scopus. His area of
interest includes signal processing, artificial intelligence, and embedded systems. He can be
contacted at email: mondheparag@gmail.com.
Dr. Manisha P. Satone is a Professor of Electronics and Telecommunication
Engineering at Matoshri College of Engineering and Research Centre, Nashik, India. She
was awarded Ph.D. by Savitribai Phule Pune University, India in 2015. She has research
and teaching experience of more than 31 years. Her research papers are published in
journals indexed by Web of Science and Scopus. She has fetched a research grant and
acquired copyrights and patents. She has provided consultancy to many multinational
companies. Her area of expertise includes signal processing, artificial intelligence, and
embedded systems. She can be contacted at email: mps.eltx@gmail.com.
Dr. Namrata N. Wasatkar is working as an Assistant Professor in the
Department of Computer Engineering, Vishwakarma Institute of Information Technology,
Pune, India. She was awarded Ph.D. by Savitribai Phule Pune University, India in 2022.
She has teaching experience of more than 10 years. Her research papers are published in
journals indexed by Scopus. She has fetched a research grant from Savitribai Phule Pune
University, India. Her area of expertise includes machine learning, natural language
processing, and signal processing. She can be contacted at email:
namratakharate1@gmail.com.
Ad

Recommended

PDF
Explainable AI(XAI) using LIME and Disease Detection in Mango Leaf by Transfe...
IRJET Journal
 
PDF
LIMS FOR MAIZE MAPPING PROJECT
G2 APPS SA DE CV
 
PDF
LIMS for maize mapping project
Carlos Vela Shimano
 
PDF
A comparative study of mango fruit pest and disease recognition
TELKOMNIKA JOURNAL
 
PDF
IRJET- Agricultural Crop Yield Prediction using Deep Learning Approach
IRJET Journal
 
PDF
Hybrid features and ensembles of convolution neural networks for weed detection
IJECEIAES
 
PDF
Automatic speech recognition for indonesian medical dictation in cloud enviro...
IAESIJAI
 
PDF
Detection of diseases in rice leaf using convolutional neural network with tr...
IJECEIAES
 
PDF
Assessing the advancement of artificial intelligence and drones’ integration ...
IJECEIAES
 
PDF
Internet of things and multi-class deep feature-fusion based classification o...
nooriasukmaningtyas
 
PDF
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
IJECEIAES
 
PDF
Android application for detection of leaf disease (Using Image processing and...
IRJET Journal
 
PDF
Crop yield prediction using data mining techniques.pdf
ssuserb22f5a
 
PDF
7743-Article Text-13981-1-10-20210530 (1).pdf
NehaBhati30
 
PDF
Agriculture data analysis using parallel k-nearest neighbour classification a...
International Journal of Reconfigurable and Embedded Systems
 
PDF
Corn plant disease classification based on leaf using residual networks-9 arc...
IJECEIAES
 
PDF
A classification model based on depthwise separable convolutional neural net...
IJECEIAES
 
PDF
A deep learning-based approach for early detection of disease in sugarcane pl...
IAESIJAI
 
PDF
Development of durian leaf disease detection on Android device
IJECEIAES
 
PPTX
Pid_177_IDSCS 2024_research presentation.pptx
AmolBhilare3
 
PDF
RECOGNIZATION HOLIC- MEDICINE DETECTION USING DEEP LEARNING TECHNIQUES
CSEIJJournal
 
PDF
Recognization Holic- Medicine Detection using Deep Learning Techniques
CSEIJJournal
 
PDF
Classification of nutmeg ripeness using artificial intelligence
IAESIJAI
 
PDF
ORGANIC PRODUCT DISEASE DETECTION USING CNN
IRJET Journal
 
PDF
IRJET - Grape Leaf Diseases Classification using Transfer Learning
IRJET Journal
 
PDF
Early detection of tomato leaf diseases based on deep learning techniques
IAESIJAI
 
PPTX
Tomato disease detection using deep learning convolutional neural network
Priyanka Pradhan
 
PDF
Datamining 4
ruledbyrobotics2080
 
PDF
Enhancing road image clarity with residual neural network dehazing model
IAESIJAI
 
PDF
Seeding precision: a mask region based convolutional neural networks classifi...
IAESIJAI
 

More Related Content

Similar to Encoder-decoder approach for describing health of cauliflower plant in multiple languages (20)

PDF
Assessing the advancement of artificial intelligence and drones’ integration ...
IJECEIAES
 
PDF
Internet of things and multi-class deep feature-fusion based classification o...
nooriasukmaningtyas
 
PDF
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
IJECEIAES
 
PDF
Android application for detection of leaf disease (Using Image processing and...
IRJET Journal
 
PDF
Crop yield prediction using data mining techniques.pdf
ssuserb22f5a
 
PDF
7743-Article Text-13981-1-10-20210530 (1).pdf
NehaBhati30
 
PDF
Agriculture data analysis using parallel k-nearest neighbour classification a...
International Journal of Reconfigurable and Embedded Systems
 
PDF
Corn plant disease classification based on leaf using residual networks-9 arc...
IJECEIAES
 
PDF
A classification model based on depthwise separable convolutional neural net...
IJECEIAES
 
PDF
A deep learning-based approach for early detection of disease in sugarcane pl...
IAESIJAI
 
PDF
Development of durian leaf disease detection on Android device
IJECEIAES
 
PPTX
Pid_177_IDSCS 2024_research presentation.pptx
AmolBhilare3
 
PDF
RECOGNIZATION HOLIC- MEDICINE DETECTION USING DEEP LEARNING TECHNIQUES
CSEIJJournal
 
PDF
Recognization Holic- Medicine Detection using Deep Learning Techniques
CSEIJJournal
 
PDF
Classification of nutmeg ripeness using artificial intelligence
IAESIJAI
 
PDF
ORGANIC PRODUCT DISEASE DETECTION USING CNN
IRJET Journal
 
PDF
IRJET - Grape Leaf Diseases Classification using Transfer Learning
IRJET Journal
 
PDF
Early detection of tomato leaf diseases based on deep learning techniques
IAESIJAI
 
PPTX
Tomato disease detection using deep learning convolutional neural network
Priyanka Pradhan
 
PDF
Datamining 4
ruledbyrobotics2080
 
Assessing the advancement of artificial intelligence and drones’ integration ...
IJECEIAES
 
Internet of things and multi-class deep feature-fusion based classification o...
nooriasukmaningtyas
 
Improved vision-based diagnosis of multi-plant disease using an ensemble of d...
IJECEIAES
 
Android application for detection of leaf disease (Using Image processing and...
IRJET Journal
 
Crop yield prediction using data mining techniques.pdf
ssuserb22f5a
 
7743-Article Text-13981-1-10-20210530 (1).pdf
NehaBhati30
 
Agriculture data analysis using parallel k-nearest neighbour classification a...
International Journal of Reconfigurable and Embedded Systems
 
Corn plant disease classification based on leaf using residual networks-9 arc...
IJECEIAES
 
A classification model based on depthwise separable convolutional neural net...
IJECEIAES
 
A deep learning-based approach for early detection of disease in sugarcane pl...
IAESIJAI
 
Development of durian leaf disease detection on Android device
IJECEIAES
 
Pid_177_IDSCS 2024_research presentation.pptx
AmolBhilare3
 
RECOGNIZATION HOLIC- MEDICINE DETECTION USING DEEP LEARNING TECHNIQUES
CSEIJJournal
 
Recognization Holic- Medicine Detection using Deep Learning Techniques
CSEIJJournal
 
Classification of nutmeg ripeness using artificial intelligence
IAESIJAI
 
ORGANIC PRODUCT DISEASE DETECTION USING CNN
IRJET Journal
 
IRJET - Grape Leaf Diseases Classification using Transfer Learning
IRJET Journal
 
Early detection of tomato leaf diseases based on deep learning techniques
IAESIJAI
 
Tomato disease detection using deep learning convolutional neural network
Priyanka Pradhan
 
Datamining 4
ruledbyrobotics2080
 

More from IAESIJAI (20)

PDF
Enhancing road image clarity with residual neural network dehazing model
IAESIJAI
 
PDF
Seeding precision: a mask region based convolutional neural networks classifi...
IAESIJAI
 
PDF
Obstructive sleep apnea detection based on electrocardiogram signal using one...
IAESIJAI
 
PDF
Enhancing video anomaly detection for human suspicious behavior through deep ...
IAESIJAI
 
PDF
Detecting fraudulent financial statement under imbalanced data using neural n...
IAESIJAI
 
PDF
A novel pairwise based convolutional neural network for image preprocessing e...
IAESIJAI
 
PDF
Hybrid improved fuzzy C-means and watershed segmentation to classify Alzheime...
IAESIJAI
 
PDF
Unified and evolved approach based on neural network and deep learning method...
IAESIJAI
 
PDF
Implementation of fuzzy logic approach for thalassemia screening in children
IAESIJAI
 
PDF
Enhancing interpretability in random forest: Leveraging inTrees for associati...
IAESIJAI
 
PDF
The use of augmented reality in assessing and training children with attentio...
IAESIJAI
 
PDF
Internet of things and blockchain integration for security and privacy
IAESIJAI
 
PDF
Review of cloud computing models in education and the unmet needs
IAESIJAI
 
PDF
Development of a prioritized traffic light control system for emergency vehicles
IAESIJAI
 
PDF
Optimizing the position of photovoltaic solar tracker panels with artificial ...
IAESIJAI
 
PDF
Optimisation of semantic segmentation algorithm for autonomous driving using ...
IAESIJAI
 
PDF
Design and analysis plant factory with artificial light
IAESIJAI
 
PDF
Navigating the tech-savvy generation; key considerations in developing of an ...
IAESIJAI
 
PDF
Enhancing legal research through knowledge-infused information retrieval for...
IAESIJAI
 
PDF
Anartificial intelligence approach to smart exam supervision using YOLO v5 an...
IAESIJAI
 
Enhancing road image clarity with residual neural network dehazing model
IAESIJAI
 
Seeding precision: a mask region based convolutional neural networks classifi...
IAESIJAI
 
Obstructive sleep apnea detection based on electrocardiogram signal using one...
IAESIJAI
 
Enhancing video anomaly detection for human suspicious behavior through deep ...
IAESIJAI
 
Detecting fraudulent financial statement under imbalanced data using neural n...
IAESIJAI
 
A novel pairwise based convolutional neural network for image preprocessing e...
IAESIJAI
 
Hybrid improved fuzzy C-means and watershed segmentation to classify Alzheime...
IAESIJAI
 
Unified and evolved approach based on neural network and deep learning method...
IAESIJAI
 
Implementation of fuzzy logic approach for thalassemia screening in children
IAESIJAI
 
Enhancing interpretability in random forest: Leveraging inTrees for associati...
IAESIJAI
 
The use of augmented reality in assessing and training children with attentio...
IAESIJAI
 
Internet of things and blockchain integration for security and privacy
IAESIJAI
 
Review of cloud computing models in education and the unmet needs
IAESIJAI
 
Development of a prioritized traffic light control system for emergency vehicles
IAESIJAI
 
Optimizing the position of photovoltaic solar tracker panels with artificial ...
IAESIJAI
 
Optimisation of semantic segmentation algorithm for autonomous driving using ...
IAESIJAI
 
Design and analysis plant factory with artificial light
IAESIJAI
 
Navigating the tech-savvy generation; key considerations in developing of an ...
IAESIJAI
 
Enhancing legal research through knowledge-infused information retrieval for...
IAESIJAI
 
Anartificial intelligence approach to smart exam supervision using YOLO v5 an...
IAESIJAI
 
Ad

Recently uploaded (20)

PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
PDF
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
PDF
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
PDF
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
PDF
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
PDF
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PDF
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
PDF
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PDF
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
DOCX
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
PDF
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
PDF
Python Conference Singapore - 19 Jun 2025
ninefyi
 
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
PDF
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PDF
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
Mastering AI Workflows with FME by Mark Döring
Safe Software
 
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
AI Agents and FME: A How-to Guide on Generating Synthetic Metadata
Safe Software
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Raman Bhaumik - Passionate Tech Enthusiast
Raman Bhaumik
 
Daily Lesson Log MATATAG ICT TEchnology 8
LOIDAALMAZAN3
 
EIS-Webinar-Engineering-Retail-Infrastructure-06-16-2025.pdf
Earley Information Science
 
Python Conference Singapore - 19 Jun 2025
ninefyi
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Using the SQLExecutor for Data Quality Management: aka One man's love for the...
Safe Software
 
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Ad

Encoder-decoder approach for describing health of cauliflower plant in multiple languages

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 3, September 2024, pp. 2971~2977 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp2971-2977  2971 Journal homepage: http://ijai.iaescore.com Encoder-decoder approach for describing health of cauliflower plant in multiple languages Parag Jayant Mondhe1 , Manisha P. Satone1 , Namrata N. Wasatkar2 1 Department of Electronics and Telecommunication Engineering, Matoshri College of Engineering and Research Centre, Nashik, India 2 Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India Article Info ABSTRACT Article history: Received Jun 21, 2023 Revised Jan 24, 2024 Accepted Feb 10, 2024 Physically examining each plant to determine its state of health and determining the disease if plant is affected due to it, is challenging. The encoder - decoder approach is proposed for describing health of cauliflower plant in English, Hindi, and Marathi languages from aerial images. Experiments are performed with different convolutional neural network (CNN) models and long short-term memory (LSTM) combinations. The multilanguage cauliflower captions dataset (MCCD) is developed to evaluate the performance of the model. The dataset contains 1213 images where each image is described in 3 different languages. The dataset contains images of cauliflower plant affected due to bacterial spot rot, black rot, and downy mildew diseases. It also contains images of healthy plant. The objective metrics such as bilingual evaluation understudy (BLEU) scores and subjective criteria are used to decide the quality of the generated description. Keywords: Aerial images Cauliflower plant Description generation Encoder decoder approach Multi language captions This is an open access article under the CC BY-SA license. Corresponding Author: Parag Jayant Mondhe Department of Electronics and Telecommunication Engineering Matoshri College of Engineering and Research Centre Eklahareshivar, Near Odhagaon, Opp Nashik-Aurangabad Highway, Nashik, Maharashtra 422105, India Email: [email protected] 1. INTRODUCTION The manual methods of crop monitoring are labour intensive, time consuming and may results in grossly inaccurate estimates. By automating the process of monitoring of crop health these challenges can be addressed. This paper proposes automation of crop monitoring process. The encoder-decoder approach is proposed for describing health of cauliflower plant in English, Hindi, and Marathi languages. The aerial images of farmland can be taken using unmanned aerial vehicle (UAV). Later, these images will be analysed by machine learning algorithm to describe health of the crop using captions in multiple languages. The Hindi language is official language of Government of India and among top 5 globally spoken languages [1]. The third most widely spoken language in India is Marathi, which is also the official language of the state of Maharashtra [2]. Cauliflower is a member of the cruciferous family and has nutritional value. It contains vitamins, nutrients, fiber, and antioxidants [3], [4]. The Figure 1 shows the global production/yield quantities of cauliflowers and broccoli from year 1994 to 2020 [5]. The Figure 2 shows top cauliflower and broccoli producing countries during 1994 to 2021 [5]. During this period India was its second largest producer. The cauliflower plant is commonly affected by bacterial spot rot, black rot, and downy mildew diseases. Previously, researchers concentrated primarily on recognizing cauliflower disease from images [6]‒[10] and did not explore generating descriptions about its health from aerial images in many languages. Three types of methodologies are utilized in the process of generating descriptions: encoder-decoder approaches [11]‒[16], picture retrieval [17], and object recognition
  • 2.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 2971-2977 2972 [18]. In comparison to other approaches, the encoder-decoder approach produced superior results [19]. As a result, an encoder-decoder approach is proposed here for generating descriptions in multiple languages. Figure 1. Global production/yield quantities of cauliflowers and broccoli [5] Figure 2. Top Cauliflower and broccoli producing countries [5] 2. PROPOSED ENCODER–DECODER APPROACH The suggested encoder-decoder method for using an aerial image to express the health of a cauliflower plant in several languages is depicted in Figure 3. The images of farmland will be captured using UAV such as a drone. These images will act as in input to the encoder block. Figure 3. Proposed methodology for explaining health of cauliflower plant in multiple languages Using aerial image of cauliflower fields as input, convolutional neural network (CNN) acts as an encoder in the model, extracting features from them. To prevent losing spatial information, the output of the CNN model's pooling layer is used instead of the last fully linked layer. The decoder is given the CNN- generated features as well as their descriptions. However, in the suggested method, the long short-term memory (LSTM) network serves as a decoder since it overcomes the recurrent neural networks (RNN) declining gradient problem [20]. During the training phase, the decoder learns how to provide a description of an image using its characteristics and previously produced words. In this method, each word is given a probability depending on its distinguishing characteristics and the preceding word. Figure 4 depicts the flowchart of proposed encoder-
  • 3. Int J Artif Intell ISSN: 2252-8938  Encoder-decoder approach for describing health of cauliflower plant in … (Parag Jayant Mondhe) 2973 decoder approach. The same approach is followed to generate captions in other languages because we transform words to numbers in step 8 of the flowchart. Experiments on feature extraction were carried out using several pre-trained CNN architectures, including visual geometry group 16 (VGG-16) [21], InceptionResNetV2 [22], and EfficientNetV2L [23]. The top-1 accuracy of VGG-16, InceptionResNetV2 and EfficientNetV2L CNN model is 71.30%, 80.30% and 85.70% respectively on ImageNet validation dataset [24]. The top-1 accuracy of the model indicates that the predicted label matches the target label. The top-5 accuracy of VGG-16, InceptionResNetV2, and EfficientNetV2L CNN model is 90.10%, 95.30%, and 97.50% respectively on ImageNet validation dataset [24]. The top-5 accuracy of the model indicates that the target label is one among the top 5 predicted label. The VGG-16 is a CNN architecture that stands for VGG-16. VGG-16 is known for its depth because to its 16 layers, which include 13 convolutional layers and 3 fully linked layers. The design is made up of smaller 3×3 convolutional filters layered on top of each other, allowing for more detailed aspects of the input image to be captured. It employs a straightforward and consistent design paradigm in which convolutional layers are followed by max-pooling layers to lower spatial dimensions. The ImageNet dataset was used to train the VGG-16 model, which comprises millions of labelled images from diverse object categories. A combination of the Inception and ResNet models is the InceptionResNetV2 model. The Inception architecture, upon which the InceptionResNetV2 model is based, combines convolutional layers with varying kernel sizes to collect information at various scales. EfficientNetV2L is an EfficientNet model family CNN model. The "V2L" in EfficientNetV2L stands for "vision to language". This model was created especially for multimodal activities that need the processing of textual and visual data. Figure 4. Flowchart for proposed approach 3. DATASET The multilanguage cauliflower captions dataset (MCCD) is developed to evaluate the proposed model's performance. The dataset comprises 1213 colour images, each of which is accompanied by a caption written in English, Hindi, and Marathi language. The 656 images were captured in Bangladesh and provided in [25]. The remaining 557 images were obtained by the authors in 2023 on farmland in the Nashik district of Maharashtra State, India using a drone. The composition of the MCCD is shown in Table 1. Table 1. Composition of MCCD Category Number of images Images from Bangladesh [25] Images captured in India Total images Bacterial spot rot 173 126 299 Black rot 100 158 258 Downy mildew 177 106 283 No disease/healthy plant 206 167 373 Total 656 557 1213
  • 4.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 2971-2977 2974 An agronomist captions each image with information in Marathi, Hindi, and English on the health of the cauliflower plant. Thus, there are 1213 captions in the dataset in Marathi, Hindi, and English. The dataset includes images of cauliflower plants infected by various diseases such as bacterial spot rot, black rot, and downy mildew, as well as images of healthy plants. The images were acquired from many sources and in various weather conditions, resulting in a diversified collection. 4. RESULTS AND DISCUSSIONS 4.1. Evaluation on objective metrics The resulting caption quality of the proposed approach is validated using quantitative metrics bilingual evaluation understudy (BLEU). The BLEU metric is established in [26], and it uses a weighted average to compare various length phrase matches to the reference sentence. It counts the number of times an n-gram appears in the produced caption and the dataset's reference caption, where an n-gram is a collection of one or more ordered words. The BLEU is a precision-based score that ranges from 0 to 1, with a higher value suggesting a better match. For calculation of the score, the brevity penalty (BP) is calculated as (1): 𝐵𝑃 = { 1, 𝑒 (1− 𝑟 𝑐 ) , 𝑖𝑓 𝑐 > 𝑟 𝑖𝑓 𝑐 ≪ 𝑟 (1) Where, c is the length of the generated caption and r is the reference caption length. The BLEU score is computed as (2): 𝐵𝐿𝐸𝑈 = 𝐵𝑃 ∙ 𝑒𝑥𝑝(∑ 𝑤𝑛 𝑙𝑜𝑔 𝑝𝑛 𝑁 𝑛=1 ) (2) Where, pn is modified n-gram precision, N is n-gram length, wn is the positive weights and the sum of wn is one. For various N values, the BLEU number is computed. BLEU-1 makes use of a unigram precision value, but BLEU-2 makes use of a geometric sum of unigram and bigram precision. The BLEU-3 uses the geometric average of unigram, bigram, and trigram accuracy, whereas the BLEU-4 uses the geometric average of unigram, bigram, trigram, and four-gram precision. The Table 2 presents the BLEU metrics for several CNN models. From VGG-16 through InceptionResNetV2 to EfficientNetV2L, the quality of caption improves on all four BLEU measures. The BLEU score achieved for captions generated in multiple languages for specific model is also comparable. Table 2. Results on objective metrics CNN model Language of caption BLEU-1 BLEU-2 BLEU-3 BLEU-4 VGG-16 English 0.79 0.76 0.73 0.69 Hindi 0.78 0.74 0.72 0.69 Marathi 0.78 0.75 0.73 0.68 InceptionResNetV2 English 0.86 0.81 0.77 0.74 Hindi 0.86 0.82 0.77 0.73 Marathi 0.85 0.81 0.76 0.72 EfficientNetV2L English 0.90 0.84 0.80 0.77 Hindi 0.90 0.83 0.79 0.75 Marathi 0.89 0.82 0.80 0.75 4.2. Evaluation on subjective criteria To determine the quality of the caption, the BLEU score just contrasts the generated caption with the reference captions. An objective agronomist thoroughly verifies the generated description in subjective judgement. Three categories—correct, partially correct, and incorrect—are created from the generated caption and are based on quality. The findings on subjective criteria for several CNN models are displayed in Table 3. Overall, caption quality is better from VGG-16 to InceptionResNetV2 to EfficientNetV2L. Figure 5 compares the performance of several CNN models based on subjective criteria. Similar to objective assessments, the performance of various models for numerous languages is similar. The InceptionResNetV2 and EfficientNetV2L models beat the VGG-16 model, which is deemed shallow because to its low number of layers. The Figure 6 shows results for EfficientNetV2L CNN model while Figure 7 shows the results for InceptionResNetV2 CNN model. The Figure 6 contains test image of cauliflower plant which is affected due to bacterial spot rot disease. The Figure 7 contains test image of cauliflower plant which is in healthy condition.
  • 5. Int J Artif Intell ISSN: 2252-8938  Encoder-decoder approach for describing health of cauliflower plant in … (Parag Jayant Mondhe) 2975 If the generated description accurately portrays the health of the cauliflower plant with no grammatical errors, it falls into the correct category. The caption generated in both English and Hindi language for test image provided in Figure 6 is categorized as correct. Similarly, the caption generated in English language for test image shown in Figure 7 is also categorized in correct category. If the description is erroneous, linguistically faulty, or useless, it is placed in the incorrect category. The caption generated in Marathi language for test image provided in Figure 7 falls into this category as the generated caption is meaningless. Table 3. Results on subjective criteria CNN model Language of caption Correct caption (%) Partially correct caption (%) Incorrect caption (%) VGG-16 English 77.00 15.11 7.89 Hindi 76.08 15.13 8.79 Marathi 76.03 14.09 9.88 InceptionResNetV2 English 84.19 11.03 4.78 Hindi 85.16 8.87 5.97 Marathi 83.97 10.11 5.92 EfficientNetV2L English 87.60 10.81 1.59 Hindi 86.97 11.94 1.09 Marathi 86.97 11.02 2.01 Figure 5. Comparison of results on subjective criteria Language Reference caption Generated caption Category of generated caption English Bacterial spot rot has infected the head of a cauliflower plant. Bacterial spot rot has infected the head of a cauliflower plant. Correct Hindi Correct Marathi Partially correct Figure 6. Results for EfficientNetV2L CNN model 0 10 20 30 40 50 60 70 80 90 100 English Hindi Marathi English Hindi Marathi English Hindi Marathi VGG-16 InceptionResNetV2 EfficientNetV2L Performance in % Correct Caption (%) Partially Correct Caption (%) Incorrect Caption (%)
  • 6.  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 3, September 2024: 2971-2977 2976 Language Reference caption Generated caption Category of generated caption English A healthy and fresh cauliflower crop is growing at a reasonable rate. A healthy and fresh cauliflower crop is growing at a reasonable rate. Correct Hindi Partially correct Marathi Incorrect Figure 7. Results for InceptionResNetV2 CNN model A statement that accurately characterizes the plant's health but does not give specifics is classified as partially correct. The caption generated in Marathi language for test image provided in Figure 6 is categorized as partially correct as caption indicated that cauliflower plant is affected but fails to provide details of the disease. Similarly, the caption generated in Hindi language for test image shown in Figure 7 is also categorized in same category as it hides other details of healthy plant. 5. CONCLUSIONS The encoder-decoder approach is proposed to generate captions in multiple languages to describe health of cauliflower plant from aerial images. The experiments are performed with CNN models such as VGG-16, InceptionResNetV2, EfficientNetV2L, and LSTM combinations. On both BLEU score and subjective criteria EfficientNetV2L-LSTM combination has provided superior results. The InceptionResNetV2 and EfficientNetV2L have performed substantially well than VGG-16 as they have more layers which results in better feature extraction. The captions generated in various languages were of comparable quality. REFERENCES [1] Ministry of Electronics and Information Technology India, “Internet of things,” MeitY, 2016. [Online]. Available: https://www.meity.gov.in/content/internet-things [2] “Maharashtra state government.” [Online]. Available: https://www.maharashtra.gov.in/ [3] J. Wang et al., “A comparative study on the nutrients, mineral elements, and antioxidant compounds in different types of cruciferous vegetables,” Agronomy, vol. 12, no. 12, 2022, doi: 10.3390/agronomy12123121. [4] “United States of America (USA), Department of Agricultural Research Service,” USDA. [Online]. Available: https://fdc.nal.usda.gov/ [5] FAO, “Food and agriculture organization of the United Nations,” International Organization, 1947. [Online]. Available: https://www.fao.org [6] T. Y. Orin, M. U. Mojumdar, S. M. T. Siddiquee, and N. R. Chakraborty, “Cauliflower leaf disease detection using computerized techniques,” 2021 IEEE 6th International Conference on Computing, Communication and Automation, ICCCA 2021. IEEE, pp. 730–733, 2021, doi: 10.1109/ICCCA52192.2021.9666437. [7] S. K. Maria, S. S. Taki, M. J. Mia, A. A. Biswas, A. Majumder, and F. Hasan, “Cauliflower disease recognition using machine learning and transfer learning,” Smart Innovation, Systems and Technologies, vol. 235. Springer Singapore, pp. 359–375, 2022, doi: 10.1007/978-981-16-2877-1_33. [8] M. A. Malek, S. S. Reya, N. Zahan, M. Z. Hasan, and M. S. Uddin, “Deep learning-based cauliflower disease classification,” Algorithms for Intelligent Systems. Springer Singapore, pp. 171–186, 2022, doi: 10.1007/978-981-16-9991-7_11. [9] S. Kashyap, T. Thaware, S. R. Sahu, and K. M. Rao, “Multi-crop leaf disease detection using deep learning methods,” INDICON 2022 - 2022 IEEE 19th India Council International Conference. IEEE, 2022, doi: 10.1109/INDICON56171.2022.10040099. [10] K. P. A. Rani and S. Gowrishankar, “Pathogen-based classification of plant diseases: a deep transfer learning approach for intelligent support systems,” IEEE Access, vol. 11, pp. 64476–64493, 2023, doi: 10.1109/ACCESS.2023.3284680. [11] B. Qu, X. Li, D. Tao, and X. Lu, “Deep semantic understanding of high resolution remote sensing image,” 2016 International Conference on Computer, Information and Telecommunication Systems. IEEE, 2016, doi: 10.1109/CITS.2016.7546397. [12] X. Zhang, X. Li, J. An, L. Gao, B. Hou, and C. Li, “Natural language description of remote sensing images based on deep learning,” International Geoscience and Remote Sensing Symposium (IGARSS), vol. 2017, pp. 4798–4801, 2017, doi: 10.1109/IGARSS.2017.8128075. [13] X. Zhang, Q. Wang, S. Chen, and X. Li, “Multi-scale cropping mechanism for remote sensing image captioning,” International
  • 7. Int J Artif Intell ISSN: 2252-8938  Encoder-decoder approach for describing health of cauliflower plant in … (Parag Jayant Mondhe) 2977 Geoscience and Remote Sensing Symposium (IGARSS), vol. 2019, pp. 10039–10042, 2019, doi: 10.1109/IGARSS.2019.8900503. [14] X. Lu, B. Wang, X. Zheng, and X. Li, “Exploring models and data for remote sensing image caption generation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2183–2195, 2018, doi: 10.1109/TGRS.2017.2776321. [15] S. C. Kumar, M. Hemalatha, S. B. Narayan, and P. Nandhini, “Region driven remote sensing image captioning,” Procedia Computer Science, vol. 165, pp. 32–40, 2019, doi: 10.1016/j.procs.2020.01.067. [16] G. Hoxha and F. Melgani, “A novel SVM-based decoder for remote sensing image captioning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022, doi: 10.1109/TGRS.2021.3105004. [17] B. Wang, X. Lu, X. Zheng, and X. Li, “Semantic descriptions of high-resolution remote sensing images,” IEEE Geoscience and Remote Sensing Letters, vol. 16, no. 8, pp. 1274–1278, 2019, doi: 10.1109/LGRS.2019.2893772. [18] Z. Shi and Z. Zou, “Can a machine generate humanlike language descriptions for a remote sensing image?,” IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 6, pp. 3623–3634, 2017, doi: 10.1109/TGRS.2017.2677464. [19] P. J. Mondhe, M. P. Satone, and G. K. Kharate, “Automatic caption generation for aerial images: a survey,” Indonesian Journal of Electrical Engineering and Informatics, vol. 11, no. 1, pp. 267–285, 2023, doi: 10.52549/ijeei.v11i1.4342. [20] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997, doi: 10.1162/neco.1997.9.8.1735. [21] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd International Conference on Learning Representations, ICLR 2015-Conference Track Proceedings, pp. 1-14, 2015. [22] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception-v4, inception-ResNet and the impact of residual connections on learning,” 31st AAAI Conference on Artificial Intelligence, AAAI 2017, vol. 31, no. 1, pp. 4278–4284, 2017, doi: 10.1609/aaai.v31i1.11231. [23] M. Tan and Q. V. Le, “EfficientNetV2: smaller models and faster training,” in Proceedings of Machine Learning Research, 2021, vol. 139, pp. 10096–10106. [24] “Simple. Flexible. Powerful” Keras. [Online]. Available: https://keras.io [25] A. Rajbongshi, U. S. SAra, R. Shakil, B. Akter, and M. S. Uddin, “VegNet: an extensive dataset of cauliflower images to recognize the diseases using machine learning and deep learning models,” Mendeley Data, version 3, 2022, doi: 10.17632/t5sssfgn2v.3. [26] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: a method for automatic evaluation of machine translation,” Proceedings of the 40th Annual Meeting on Association for Computational Linguistics(ACL), pp. 311-318, 2002, doi: 10.3115/1073083.1073135. BIOGRAPHIES OF AUTHORS Parag Jayant Mondhe is a Ph.D. Research Scholar at Matoshri College of Engineering and Research Centre, Nashik, India. He is working as an Assistant Professor at K. K. Wagh Institute of Engineering Education and Research, Nashik, India since 2014. He has completed Master of Engineering and Bachelor of Engineering from Savitribai Phule Pune University, India in 2014 and 2012 respectively. His research papers are published in journals and proceedings of international conferences indexed by Scopus. His area of interest includes signal processing, artificial intelligence, and embedded systems. He can be contacted at email: [email protected]. Dr. Manisha P. Satone is a Professor of Electronics and Telecommunication Engineering at Matoshri College of Engineering and Research Centre, Nashik, India. She was awarded Ph.D. by Savitribai Phule Pune University, India in 2015. She has research and teaching experience of more than 31 years. Her research papers are published in journals indexed by Web of Science and Scopus. She has fetched a research grant and acquired copyrights and patents. She has provided consultancy to many multinational companies. Her area of expertise includes signal processing, artificial intelligence, and embedded systems. She can be contacted at email: [email protected]. Dr. Namrata N. Wasatkar is working as an Assistant Professor in the Department of Computer Engineering, Vishwakarma Institute of Information Technology, Pune, India. She was awarded Ph.D. by Savitribai Phule Pune University, India in 2022. She has teaching experience of more than 10 years. Her research papers are published in journals indexed by Scopus. She has fetched a research grant from Savitribai Phule Pune University, India. Her area of expertise includes machine learning, natural language processing, and signal processing. She can be contacted at email: [email protected].