Harvestable Black Pepper Recognition Using Computer Vision
Harvestable Black Pepper Recognition Using Computer Vision
Abstract -The objective of the research presented in this paper is GoogLeNet, ResNet and YOLO. A characteristic that sets
to speed up recognition of harvestable black pepper using apart all these different versions of CNN from each other are
computer vision for automated black pepper harvesting. In this in number, type, and order of layers used.
paper we introduce a novel dataset of black pepper images
acquired using a digital camera. The proposed system is based We introduce a high-quality dataset of black pepper images
on a combination of several image processing techniques and a which were acquired from the natural canopies of Kerala
deep learning model to achieve a system capable of recognizing using a digital camera. The architecture of the system
and detecting harvestable black pepper from different elements involves a 3-stage image processing and verification exercise
of the scene, such as leaves, tree trunks branches and unripe to detect black pepper. The images are allowed to undergo
pepper. The system is composed of a 3-stage image processing
preprocessing [12] to improve the overall quality of image. In
and a verification model in order to achieve 100% accuracy.
This approach not only increase the accuracy but also reduce the first stage, the system scans the input image for presence
the processing time and computational resources required as the of red components, for this the input image is converted into
system moves from one stage to another only if a set of pre- HSV color space and red components color range is defined
defined conditions are met. After performing trial and error as the ROI. At the end first stage the system produces a
method on a number of different classifiers we decided to use segmented image of all the harvestable black pepper. The
ResNet-50, a CNN based classifier for the final validation of test
goal of image segmentation [13] is to simplify the input
results due to its immense speed and accuracy. The
experimental results are showing promising 100% global image so that the resource and time required for the
accuracy with reasonable scan time which will enable real time identification process is reduced. In second stage the
application. segmented image is permitted to pass through several
morphological operations [14] which will remove all minute
Index Terms--Computer vision; Image processing; Image noise [15] particles and identify the centroid of the objects.
segmentation; Deep Learning; Morphological operations;
Object recognition
Once the centroids are identified circles are drawn around the
objects to clearly identify the presence and position of the
I. INTRODUCTION black pepper. The system then proceeds to check for the
presence of circles and start counting the number of the
Computer Vision has been used in various aspects of circles if there are no circles present the system determines
applications that we use today. The aim of this paper is to that there are no harvestable black pepper else the system
overcome the unavailability of skilled labor for black pepper continues to the third stage of image processing. In third
harvesting by automating black pepper recognition and stage the input image is presented to a residual neural
identification of its location from an image using computer network which is trained over 500 images from the newly
vision [1] and deep learning [2][3]. The work mentioned in introduced dataset and through feature extraction and deep
this paper finds its application in an automated harvesting learning the residual neural network [16] classifies the input
drone for easier identification of harvestable black pepper. image as harvestable black pepper or not.
The Computer vision is a part of artificial Intelligence that Software platform used for implementation of the proposed
utilizes information obtained from digital images and videos system is MATLAB version R2018b, with its high
to produces useful data. The main steps in computer vision performance computer vision toolbox along with combined
include image acquiring, processing [4] [5] and analyzing. In performance of GPU’s and parallel processing tool box we
recent years deep learning gained popularity in pattern can improve the speed of black pepper recognition
recognition [6] and machine learning [7]. Deep learning is a drastically. Speed and accuracy of black pepper recognition is
kind of machine learning that utilizes multi-layer [8] non- vital in order to increase the productivity and reliability of
linear processing [9] units that imitates human brain in automated black pepper harvesting. In order to achieve higher
processing data and recognizing patterns for use in decision speed and accuracy a system with high performance GPU and
making. Convolutional neural networks (CNN) [10] are a part Processor is desirable.
of deep learning which is composed of convolutional layers, The paper is arranged as: Section II mentions about the
pooling layers, ReLU layers, fully connected layers and loss previous works performed Section III describes the
layers [11]. Each layer learns and transforms input data into methodology of the proposed algorithm. Section IV presents
abstract and composite cluster and finally used for decision the pseudo-code for the proposed algorithm. Section V
discusses the results & analysis of this work, and finally,
making. Some of the prominent neural networks are R-CNN,
Section VI presents the future scope and development of the
Fast R-CNN, Faster R-CNN, LeNet, AlexNet, VGG,
proposed system.
978-1-7281-5523-4/19/$31.00 ©2019 IEEE
97
Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on January 25,2025 at 14:23:06 UTC from IEEE Xplore. Restrictions apply.
classifier classifies the images as harvestable pepper or not.
II. RELATED WORKS The proposed system flow chart is shown in Fig.1 and each
stage is explained in detail below.
A similar system was designed by King Hann Lim and
Alpha Agape Gopalai in 2013 using active contour method
[17]. Even though the system had three stages the
applicability in real time situation was limited as the
detection rate of the proposed system was only 91.3% and
success rate for extracting pepper region out from the scene
was only 84.35%. In 2016 Inkyu Sa and team proposed a
novel approach to fruit detection using deep convolutional
neural networks [18]. The aim was to build a fast and reliable
fruit detection system using Faster R-CNN. The system had
noticeable false negative rate due to small training images. If
an object size in a testing image is significantly less than that
of a training set, it misses the detections. Another drawback
of the system is that the system utilizes expensive NIR
camera and a normal RGB camera for image acquisition.
Another research was presented in 2018 to design an
algorithm for detecting field-grown cucumbers for robotic
harvesting automation [19]. The proposed algorithm was
based on a combination of several processing and data
mining techniques to achieve a classification system capable
of segmenting cucumbers. The proposed algorithm also
includes an SVM machine pixel classifier. The evaluation of
the algorithm exhibits 91.79% hit rate and FNR rate of
8.21%. The average precision provided by the algorithm was
only 85.65% which makes it unreliable for real life
Fig.1. Proposed System Flow Chart
applications. During same year Michael Halstead’s team
presented a robotic vision system [20] that can accurately A. Image acquisition
estimate the quantity and ripeness of sweet pepper. The
system contained 3 parts, detection ripeness estimation and Image acquisition tool used in this methodology is a RGB
tracking. The system was based on Faster R-CNN and it camera. Some of the other image acquisition widely used for
suffered the same drawback for the system proposed by King image acquisition are magnetic resonance imaging (MRI),
Hann Lim and Alpha Agape Gopalai in 2013. One of the NIR (Near-Infrared Camera), electrical tomography,
latest study conducted in fruit detection and classification is ultrasound and computed tomography. A better quality image
by Yang-Yang Zheng and team. In order to apply advanced will provide more details that can be very helpful for image
deep learning technology YOLOv3 they collected a dataset processing. Proper selection of camera and lighting is critical
CropDeep [21] consisting of 31,147 images with over 49,000 in order to obtain feasible image for image processing.
annotated instances from 31 different classes. The study However it is to be kept in mind that most of the time camera
showed promising results both in terms of accuracy and which produce high quality images are very expensive and
speed. will in turn increase the overall cost of the impletion. In this
paper we selected a normal RGB camera as this will provide
III. METHODOLOGY ideal images without going extravagant on camera. Now a
days even a smart phone camera is sufficient to capture good
The proposed system is composed of a 3-stage image quality images, below shown is an image captured in digital
processing and verification model in order to achieve 100% camera.
accuracy, this approach not only increase the accuracy but
also reduce the processing time and computational resources B. Systems implementation
required as the system moves from one stage to another only
if the pre-defined conditions are met. The first stage of the Software platform used for image processing is MATLAB
system checks for the presents of red components in the input R2018b, with its high performance computer vision toolbox
image and segment all the red components, in second stage along with combined performance of GPU’s and parallel
the segmented image is allowed to undergo morphological processing tool box we can improve the speed of black
transformation and is scanned for the presence of circles. In pepper recognition drastically. Speed and accuracy of black
the final stage the input image is presented to a Resnet-50 pepper recognition is vital in order to increase the
classifier trained with over 500 images from newly productivity and reliability of automated black pepper
introduced dataset and through deep learning the Resnet-50
Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on January 25,2025 at 14:23:06 UTC from IEEE Xplore. Restrictions apply.
harvesting. In order to achieve higher speed and accuracy a The output image after Local Laplacian Filtering is equalized
system with high performance GPU and Processor is using histogram equalization function so that the pepper
desirable. Choice of image processing methods and classifier present in the image is clearly visible as shown in Fig.4.
further determine the accuracy and speed in image processing
and black pepper recognition.
C. Image Preprocessing
Fig .3. Image after Increase Local Contrast Using Local Laplacian Filtering
Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on January 25,2025 at 14:23:06 UTC from IEEE Xplore. Restrictions apply.
features make feature vectors which defines the shape, color,
texture and other attributes of the object uniquely and
precisely. Here we are using morphological feature (size and
shape) extraction as it is very easy to distinguish between leaf
and black pepper while comparing the size and shape. Fig.9.
shows the resultant image after feature extraction and Table I
lists all the available centroid in the segmented image and
circles are drawn around the centroid to identify the location
of black pepper.
Fig .7.(a) Image after Segmentation, (b) Image after removing noise
All objects in Fig .7.(a) with less than 60 pixels in size are
considered as noise and is removed from the image as seen in
Fig .7.(b). Now the harvestable pepper is isolated as in Fig.8.
F. Classification
+VE -VE
Fig.8. Segmented RGB Image P 0.9375 TP 0.0625 FP
N 0.0104 FN 0.9896 TN
E. Feature Extraction
The classifier used in this paper is a residual neural network
In this stage initial set of informative and non-redundant ResNet-50. The accuracy of classifier alone is 98.44%, but
measured data and features are reduced in to more when coupled with the full 3-stage image processing and
manageable groups (features) for processing. The extracted verification architecture the system achieve a global accuracy
2019 9th International Conference on Advances in Computing and Communication (ICACC) 100
Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on January 25,2025 at 14:23:06 UTC from IEEE Xplore. Restrictions apply.
of 100% as shown in Table III. The system was exposed to computational resources required as the system moves from
rigorous testing exercises to ensure that the test results are one stage to another only if the pre-defined conditions are
sustainable and applicable in real time situation. met.
The final stage of the 3-stage architecture, feature extractor
TABLE III. CLASSIFIER ACCURACY and classifier was meticulously selected after numerous
Classifier Accuracy Global Accuracy execution and evaluating executing speed and accuracy of
different classifier using newly introduced dataset.
96.35% 100% It was noted that the accuracy went down a little between
ResNet-50 & ResNet-101, this is due to the presence of
PSEUDO-CODE FOR THE SYSTEM increased number of layers in ResNet-101 when compared to
ResNet-50. ResNet-50 model seems to be the optimal choice
A = imread(Input image location) for this experiment with its superior accuracy rate and
A = imresize(A,[300,400]) execution time as shown in Fig.10 & Fig.11. respectively.
Image Pre-Processing to improve the quality of
input image
Image Thresholding in HSV Color Space and
converting input RGB image to chosen HSV color
space
remove all objects < 60 pixels
Diameters= average length of major axis and
minor axis
Centers= values of centroid
Radius =diameters/2
2019 9th International Conference on Advances in Computing and Communication (ICACC) 101
Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on January 25,2025 at 14:23:06 UTC from IEEE Xplore. Restrictions apply.
c) False Positive – Input image does not contains har- overview. Neural networks, 61, pp.85-117.
[4] Sonka, M., Hlavac, V. and Boyle, R., 2014. Image processing,
vestable pepper and system falsely reported the pres-
analysis, and machine vision. Cengage Learning.
ence of harvestable pepper [5] Baxes, G.A., 1994. Digital image processing: principles and
d) False Negative – Input image contains harvestable applications (pp. I-XVIII). New York: Wiley.
pepper and system falsely reported as no harvestable [6] Gan, M. and Wang, C., 2016. Construction of hierarchical
diagnosis network based on deep learning and its application in
pepper the fault pattern recognition of rolling element bearings.
Precision, Recall and F1 Score was calculated in Table V Mechanical Systems and Signal Processing, 72, pp.92-104.
using following formulae as found all 3 parameter are 1, [7] Sonka, M., Hlavac, V. and Boyle, R., 2014. Image processing,
as the global accuracy of the system is 100%. The accu- analysis, and machine vision. Cengage Learning.
[8] Svozil, D., Kvasnicka, V. and Pospichal, J., 1997. Introduction to
racy of the proposed system is calculated by running the multi-layer feed-forward neural networks. Chemometrics and
algorithm on 115 test images as shown in Table IV. intelligent laboratory systems, 39(1), pp.43-62.
[9] Dougherty, E.R. and Astola, J., 1994. An introduction to
TABLE IV. ACCURACY OF THE PROPOSED SYSTEM nonlinear image processing (Vol. 16). SPIE press.
[10] Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M. and
+ve -ve Detection Rate
Schmidhuber, J., 2011, June. Flexible, high performance
115/115 convolutional neural networks for image classification. In
True 65 50
(100%) Twenty-Second International Joint Conference on Artificial
0/0 Intelligence.
False 0 0
(100%) [11] Yu, D., Xiong, W., Droppo, J., Stolcke, A., Ye, G., Li, J. and
Zweig, G., 2016, September. Deep Convolutional Neural
Networks with Layer-Wise Context Expansion and Attention. In
TABLE V. RECALL, PRECISION & F1 SCORE OF Interspeech (pp. 17-21).
PROPOSED SYSTEM [12] Förstner, W., 2000. Image preprocessing for feature extraction in
digital intensity, color and range images. In Geomatic method for
Recall Precision F-Score the analysis of data in the earth sciences (pp. 165-189). Springer,
100% 100% 1.00 Berlin, Heidelberg.
[13] Haralick, R.M. and Shapiro, L.G., 1985. Image segmentation
techniques. Computer vision, graphics, and image processing,
V. CONCLUSIONS AND FUTURE WORK 29(1), pp.100-132.
[14] Comer, M.L. and Delp, E.J., 1999. Morphological operations for
We have presented a vision-only system that can color image processing. Journal of electronic imaging, 8(3),
accurately detect and recognize presence of harvestable pp.279-290.
[15] Lee, J.S., 1980. Digital image enhancement and noise filtering by
black pepper. In that perspective we can confidently say use of local statistics. IEEE Transactions on Pattern Analysis &
that the research was a huge success as we achieved a Machine Intelligence, (2), pp.165-168.
100% accuracy in identifying the presence of harvestable [16] He, K., Zhang, X., Ren, S. and Sun, J., 2016. Deep residual
black pepper. The introduction of novel dataset learning for image recognition. In Proceedings of the IEEE
conference on computer vision and pattern recognition (pp. 770-
containing 500+ images of black pepper will be an added 778).
asset for the further research and development of fruit [17] Lim, K.H. and Gopalai, A.A., 2013, October. Robotic vision
recognition and identification field. system design for black pepper harvesting. In 2013 IEEE
Future work will consider following points: International Conference of IEEE Region 10 (TENCON 2013)
(pp. 1-5). IEEE.
Using live videos instead of images for real time [18] Sa, I., Ge, Z., Dayoub, F., Upcroft, B., Perez, T. and McCool, C.,
identification and harvesting of black pepper. 2016. Deepfruits: A fruit detection system using deep neural
Identification of cutting point so that the harvesting of networks. Sensors, 16(8), p.1222.
[19] Fernández, R., Montes, H., Surdilovic, J., Surdilovic, D.,
black pepper can be automated. Gonzalez-De-Santos, P. and Armada, M., 2018. Automatic
Implementation of the proposed system to a unmanned Detection of Field-Grown Cucumbers for Robotic
ground vehicles (UGV’s) or unmanned aerial vehicles Harvesting. IEEE Access, 6, pp.35512-35527.
[20] Halstead, M., McCool, C., Denman, S., Perez, T. and Fookes, C.,
(UAV’s) capable of recording high quality videos and 2018. Fruit quantity and ripeness estimation using a robotic vision
equipped with cutting mechanism which will utilize system. IEEE Robotics and Automation Letters, 3(4), pp.2995-
the cutting point information gathered by computer 3002.
vision system. [21] Zheng, Y.Y., Kong, J.L., Jin, X.B., Wang, X.Y. and Zuo, M.,
2019. CropDeep: The Crop Vision Dataset for Deep-Learning-
Developing a mobile application which can be used Based Classification and Detection in Precision
directly by farmers for yield estimation and Agriculture. Sensors, 19(5), p.1058.
harvesting.
REFERENCES
[1] Szeliski, R., 2010. Computer vision: algorithms and applications.
Springer Science & Business Media.
[2] LeCun, Y., Bengio, Y. and Hinton, G., 2015. Deep learning.
nature, 521(7553), p.436.
[3] Schmidhuber, J., 2015. Deep learning in neural networks: An
2019 9th International Conference on Advances in Computing and Communication (ICACC) 102
Authorized licensed use limited to: National Institute of Technology Karnataka Surathkal. Downloaded on January 25,2025 at 14:23:06 UTC from IEEE Xplore. Restrictions apply.