0% found this document useful (0 votes)
12 views

Food Classification and Meal Intake Amount Estimat

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Food Classification and Meal Intake Amount Estimat

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

applied

sciences
Article
Food Classification and Meal Intake Amount Estimation
through Deep Learning
Ji-hwan Kim 1 , Dong-seok Lee 2 and Soon-kak Kwon 1, *

1 Department of Computer Software Engineering, Dong-Eui University, Busan 47340, Republic of Korea;
[email protected]
2 AI Grand ICT Center, Dong-Eui University, Busan 47340, Republic of Korea; [email protected]
* Correspondence: [email protected]; Tel.: +82-51-890-1727

Abstract: This paper proposes a method to classify food types and to estimate meal intake amounts in
pre- and post-meal images through a deep learning object detection network. The food types and the
food regions are detected through Mask R-CNN. In order to make both pre- and post-meal images to a
same capturing environment, the post-meal image is corrected through a homography transformation
based on the meal plate regions in both images. The 3D shape of the food is determined as one
of a spherical cap, a cone, and a cuboid depending on the food type. The meal intake amount is
estimated as food volume differences between the pre-meal and post-meal images. As results of the
simulation, the food classification accuracy and the food region detection accuracy are up to 97.57%
and 93.6%, respectively.

Keywords: object detection; food classification; food volume estimation; meal intake amount
estimation

1. Introduction
For people who need strict dietary management such as diabetes, it is very important
to figure out the amount of eaten food. The meal state can be checked by asking or
directly looking at the meal plate. However, directly checking the meal state is not only
Citation: Kim, J.-h.; Lee, D.-s.; Kwon, inconvenient, but also has problems where the meal intake measurement is inaccurate
S.-k. Food Classification and Meal and biased [1]. Various studies have been conducted on automated methods to objectively
Intake Amount Estimation through recognize food types and meal intake amounts. After object detection improves noticeably
Deep Learning. Appl. Sci. 2023, 13, through artificial neural networks, various studies [2–5] have been conducted to detect
5742. https://doi.org/10.3390/ foods in an image through object detection networks. Even though the performance of
app13095742 the food detection improves noticeably through the artificial neural networks, it is still
Academic Editor: Yu-Dong Zhang difficult to measure the food volumes in the single image. The typical RGB image contains
no 3D spatial information, so the 3D shape of the food is hard to reconstruct from the image.
Received: 3 April 2023 Several attempts [6–11] have been made to estimate the food volumes using stereo vision
Revised: 4 May 2023
images or an RGB-D image. Some studies [12–17] estimate the food volume based on a
Accepted: 5 May 2023
shape-known reference object in a single image. Nevertheless, these are unsuitable for
Published: 6 May 2023
practical applications so far because the meal intake amount estimation is possible under
limited conditions.
Institutions such as welfare centers analyze the dietary status of people who are re-
Copyright: © 2023 by the authors.
ceiving certain meal services to manage their health. The health management is achieved
Licensee MDPI, Basel, Switzerland. by checking the food types and measuring the meal intake amount. Since the meal intake
This article is an open access article amount is measured in some fixed levels, a slight imprecision of the meal amount measure-
distributed under the terms and ment is acceptable. In this paper, we propose the methods of the food type classification
conditions of the Creative Commons and the intake amount estimation through an image pair for pre- and post-meal.
Attribution (CC BY) license (https:// The flow of the proposed method is as follows: Mask R-CNN [18], which detects object
creativecommons.org/licenses/by/ regions in pixel units, finds the food regions and classifies the food types in the pre- and post-
4.0/). meal images. The post-meal image is corrected by homography transformation through

Appl. Sci. 2023, 13, 5742. https://doi.org/10.3390/app13095742 https://www.mdpi.com/journal/applsci


Appl. Sci. 2023, 13, 5742 2 of 14

two meal plate regions in the images. For each food, the 3D shape type is determined as
one of a spherical cap, a cone, and a cuboid based on the food type. The food volume
is calculated through the 3D shape type and the food area. The meal intake amount is
estimated as the difference in the food volumes in the pre- and post-meal images.
The contributions of this paper are that only pre- and post-meal images are needed to
measure meal intake. That is, if a dedicated system attached with a camera takes a picture
of a meal plate before and after a meal or takes a picture of a meal plate before and after a
meal with a smartphone, then the food type can be automatically classified. In addition,
the meal volume can be measured based on the detected food region and its predefined 3D
shape without the need for a complex device to measure the volume.

2. Related Works for Food Detection and Meal Intake Amount Estimation
Research on the food type classification and the meal intake amount estimation is clas-
sified into methods by sensors and based on images. The food type and meal intake amount
can be calculated by the sensors that measure sound, pressure, inertia, and so on, caused by
eating food. The meal intake amount is measured by attaching weight or pressure sensors
to a tray where meal plates are placed [19,20]. However, the estimation methods by the
tray with the sensors have the disadvantage of extremely limited mobility. Some wearable
devices can be utilized to measure the meal intake amount [21–23]. Olubanjo et al. [21]
classifies the food type and measures the meal intake amount by template matching on
the characteristics of sounds generated by chewing. Thomaz et al. [22] recognizes the food
types and the meal intake amount through a smartwatch with a three-axis accelerometer.
However, these methods cannot estimate the meal intake amount while the wearable device
is not worn.
Recently, the technology for object detection in images has greatly been improved
due to the development of artificial intelligence technology through deep neural networks.
Various studies [2–5] for the food type classification have been conducted through object
detection networks such as YOLO [24], VGG19 [25], AlexNet [26], and Inception V3 [27].
Wang et al. [28] pre-processes the meal image through morphological operators before
detecting foods through an object detection network. The food detection methods through
deep neural networks outperform the traditional methods [29], which are scale invariant
feature transform (SIFT), histogram of oriented gradients (HOG), and local binary patterns
(LBP). However, it is very difficult to measure the meal intake amount through only one
image.
For the image-based methods, the meal intake amount should be measured by images
captured from two or more viewpoints, an RGB-D image, or prior modeling of the foods.
The methods based on multiple images [6,7] measure the food volume by reconstructing
the 3D shape through correspondences among the pixels of images. Bándi et al. [6] finds
some feature points in the food images captured with stereo vision in order to generate a
point cloud that is a set of points on 3D space. The food volumes are measured through the
point cloud for images. The food volume can also be estimated through the RGB-D image,
which is added as a channel for the distance to subjects [8–11]. Lu et al. [8] detects the food
regions and classifies the food types through convolution layers through RGB channels of
the captured image. The food volumes are estimated by reconstructing 3D surfaces through
a depth channel that has distance information. The 3D shape can be predicted by pre-
modeling for the template of the food or bowl [30–32]. Hippocrate et al. [30] estimates the
food volumes based on a known bowl shape. The food volume estimation through a single
image requires a distinct reference object [12–14]. The meal intake amount can be estimated
by the ratio of the number of pixels between the food and the reference object regions [12].
However, this estimation method has a large error for thin food. Smith et al. [13] calculate
a projection model from a food object to the image through the reference object region.
After that, one 3D shape type among sphere, ellipsoid, and cuboid is manually assigned
to the food region to estimate the food volume. Liu et al. [14] crop a food region without
background through Faster R-CNN, Grabcut algorithm, and median filter. The relative
cuboid is manually assigned to the food region to estimate the food volume. Liu et al. [14]
crop a food region without background through Faster R-CNN, Grabcut algorithm, and
Appl. Sci. 2023, 13, 5742 3 of 14
median filter. The relative food volume is estimated through the CNN network that learns
the relationship between the food image without background and the food volume. The
actual volume isfoodcalculated
volume through thethrough
is estimated area ratio
the between
CNN network the food and the
that learns thesize-known
relationship between
reference object. the
Thefood
estimation methods based on the reference object
image without background and the food volume. The actual have an volume
inconven-is calculated
ience where the through
referencetheobject should
area ratio be captured
between with
the food and thefood. In order
size-known to overcome
reference thisestimation
object. The
inconvenience, amethods based on
shape-known theor
dish reference
bowl can object have anas
be utilized inconvenience
the referencewhere
object the reference object
[15–17].
Jia et al. [15] generate a 3D bowl model by measuring distances between line marks ainshape-known
should be captured with food. In order to overcome this inconvenience, the
dish or bowl
graduated tape attached canbowl
at the be utilized as the the
to estimate reference object [15–17].
meal intake amount. JiaFang
et al.et
[15]
al.generate
[16] a 3D
bowl model by measuring distances between line marks in the graduated tape attached at
and Yue et al. [17] calculate the camera parameters such as a camera pose and a camera
the bowl to estimate the meal intake amount. Fang et al. [16] and Yue et al. [17] calculate
focus length through the shape-known dish. The 3D food shape is generated through the
the camera parameters such as a camera pose and a camera focus length through the
camera parameters to calculate
shape-known theThe
dish. food3Dvolume.
food shapeHowever, thesethrough
is generated methods thecan only parameters
camera esti- to
mate the foods on certainthe
calculate containers.
food volume. However, these methods can only estimate the foods on certain
containers.
3. Food Classification and Meal Intake Amount Estimation through Deep Learning
3. Food Classification and Meal Intake Amount Estimation through Deep Learning
The proposed method classifies food types and estimates meal intake amounts by
The proposed method classifies food types and estimates meal intake amounts by
comparing pre- and post-meal images. Figure 1 shows the flow of the proposed method.
comparing pre- and post-meal images. Figure 1 shows the flow of the proposed method. In
In both pre- andboth
post- meal images, the regions of the food and the meal plates are de-
pre- and post- meal images, the regions of the food and the meal plates are detected
tected and the food types
and the food aretypes
classified through
are classified the object
through detection
the object network.
detection network.InInorder
order to
to compare
compare food amounts between
food amounts two images
between two imagesunder thethe
under same
same capturing
capturingenvironment,
environment, thethepost-meal
post-meal imageimage
is corrected by homography
is corrected by homography transformation
transformationthrough
through the detected
detectedplate
plateregions in
the images.
regions in the both both images. For each
For each food food in the
in the images,the
images, the3D3Dshape
shape type
type isisdetermined
determined as one of a
spherical
as one of a spherical cap,cap, a cone,
a cone, andand a cuboidbased
a cuboid based on
on the
the food
foodtype.
type.The
Thefood volume
food volumeis estimated
through
is estimated through thethe
3D3Dshape
shapetypes
types and
and the
the food
foodarea.
area.The
Themeal
mealintake
intakeamount is estimated
amount is by
comparing the food volumes between the pre- and post-meal images.
estimated by comparing the food volumes between the pre- and post-meal images.

Figure 1. Flow of the proposed


Figure 1. Flowmethod.
of the proposed method.

3.1. Dataset
3.1. Dataset
Images with Korean food and the meal plate which are captured by ourselves are
Images withutilized
Korean food
as the and for
dataset thethe
meal plate method.
proposed which are
Thecaptured
dataset hasby
20ourselves are food as
types of Korean
utilized as the dataset for the proposed method. The dataset has 20 types of Korean food
rice, soup, and side-dish. The dataset has 520 images. The dataset is divided into 416 train-
ing images and 104 validation images at a ratio of about 8:2. The foods are placed on the
concave surface within the designated meal plate as shown in Figure 2. The size of the
meal plate is 40.5 cm × 29.5 cm. The soup food is served in a separate bowl whose radius
Appl. Sci. 2023, 13, 5742 is 3.5 cm. 4 of 14
Data augmentation is applied in order to increase the efficiency of the network train-
ing. Data augmentation is a strategy to increase the number of the data for the network
training.
shown Data augmentation
in Table 1. The foods inincreases the
the dataset data
are without
classified disappearance
into of main
three categories character-
as follows: rice,
istics through image processing. In the proposed method, the image burring,
soup, and side-dish. The dataset has 520 images. The dataset is divided into 416 training the image
rotation,and
images and
104thevalidation
image flip are applied
images probabilistically
at a ratio of about 8:2. for
Thethe dataare
foods augmentation
placed on the as
shown in Figure 3.
concave surface within the designated meal plate as shown in Figure 2. The size of the
meal plate is 40.5 cm × 29.5 cm. The soup food is served in a separate bowl whose radius
isTable 1. Categorization of foods in dataset.
3.5 cm.
No. of Images in Train- No. of Images in
Category
Table Food
1. Categorization of foodsName
in dataset.
ing Set Validation Set
Rice Rice No. of412
Images in 95
No. of Images in
Category Food Name
Bean sprout soup Training
108 Set Validation
16 Set
Rice Miso soup Rice 216
412 32
95
Soup
Radish leaf soup
Bean sprout soup 80
108 12
16
SeaweedMisosoup soup 216
68 32
12
Soup
Radish
Eggplant leaf soup 8680 12
13
Seaweed soup 68 12
Fruit salad 43 5
Eggplant
Grilled fish 24686 13
43
Fruit salad 43 5
Jeon fish
Grilled 104
246 23
43
Kimchi Jeon 258
104 29
23
Kimchi
Pepper seasoned 258
81 29
11
Pepper
Seastring seasoned
seasoned 8481 11
14
Seastring seasoned 84 14
Side-dish Stewed fish
Stewed fish
185
185
20
20
Side-dish
Stir-fried fishfish
Stir-fried cake
cake 8484 23
23
Stir-fried mushroom
Stir-fried mushroom 121
121 29
29
Stir-fried
Stir-fried octopus
octopus 5858 55
Stir-fried pork 179 29
Stir-fried pork 179 29
Stir-fried squash 87 8
Stir-friedTofu squash 87
116 268
YellowTofupickled radish 116
142 26
17
Yellow pickled radish 142 17

Figure 2. Food placement in meal plate.


Figure 2. Food placement in meal plate.

Data augmentation is applied in order to increase the efficiency of the network training.
Data augmentation is a strategy to increase the number of the data for the network train-
ing. Data augmentation increases the data without disappearance of main characteristics
through image processing. In the proposed method, the image burring, the image rotation,
and the image flip are applied probabilistically for the data augmentation as shown in
Figure 3.
Appl.
Appl. Sci. Sci. 2023,
2023, 13, x 13,
FOR 5742
PEER REVIEW 5 of 145 of 14
2023, 13, x FOR PEER REVIEW 5 of 14
Appl. Sci. 2023, 13, x FOR PEER REVIEW 5 of 14

(a) (a) (b) (b) (c) (c) (d) (d)


(a) (b) (c) (d)
Figure 3. DataFigure
augmentation
3. 3.
Data for network training:
augmentation for (a) original image; (b) blurring; (c) rotation; (d)
flip. FigureFigure
3. Data Data augmentation
augmentation fornetwork
for networknetwork training:
training:
training:
(a)
(a) original
(a) originaloriginal image;
image; image;
(b)blurring;
(b)
(b) blurring;
blurring; (c)(c)
(c) rotation;
rotation; (d)
rotation;
(d)
flip.
(d) flip.
flip.
3.2. Food Detection
3.2. through
Food Mask R-CNN
Detection through Mask Mask R-CNN
3.2.
3.2. FoodFood Detection
Detection through
through Mask R-CNN R-CNN
The bounding boxes detected by
The bounding boxes detected the usual object by the detection are inappropriate
usual object detection arefor es-
inappropriate for
The The bounding
bounding boxes boxes
detecteddetectedby the by the
usual usual
object object
detection detection
are are inappropriate
inappropriate
foods. of for
for es-
timating the food volumethe
estimating since
foodit does
volume notsince
haveitinformation
does not have oninformation
the shapes on of the
the shapes thees-
foods.
timating
timating the the food volume the since it does not have information on method
the of shapes of the foods.
The food volume foodfood
Theestimation volume
volumerequires since
estimation itrequires
food does
regionnotthe have
in information
pixel
food units.
region The
in on the shapes
proposed
pixel units. The proposed the foods.
method
detects theTheThe
food
foods food
through
detects volume
volume Mask
the foods estimation
estimation
R-CNN
through requiresrequires
the food
[18] which
Mask R-CNN the food
region
is possible
[18] region
whichinforpixel
isthein pixel
units.
objectThe
possible units.
for The
proposed
detection
the object proposed
method method
in detection in
detects
pixel units.detects the
ResNet-50 the
foodsfoods
[33] is through
through
applied Mask
as Mask
the R-CNN R-CNN
backbone [18] of [18]
which
Mask which
is is
possible
R-CNN.
pixel units. ResNet-50 [33] is applied as the backbone of Mask R-CNN. Figure 4 shows possible
for
Figure the
4 for
object
shows the object
detection
the detection
in in
pixel units.
pixel ResNet-50
units. ResNet-50
the flow through
flow for food detection [33]
for food Mask is
[33]
detection applied
is
R-CNN.throughas
applied the asbackbone
the
Mask R-CNN.
ResNet-50 of
backbone Mask
extractsResNet-50 ofR-CNN.
Mask Figure
R-CNN.
the featureextracts
maps from 4 shows
Figure the
4
the feature maps shows the
flow
a meal plate for
flow
from
image. food
fora food
The detection
meal detection
plate of
regions through
image. through
interest TheMask R-CNN.
Mask
regions
(ROIs) ofR-CNN.
are ResNet-50
interest
extracted by aextracts
ResNet-50
(ROIs) are the feature
extracts
extracted
region proposal bythe maps
anet-feature
region from
maps from
proposal
work (RPN) a meal
afrom
mealplate
network image.
plate
the (RPN)
image.
feature The
from
maps.regions
the
The of interest
feature
regions
ROIAlign maps.
ofcrops (ROIs)
ROIAlign
interest are crops
and (ROIs) extracted
interpolatesareand by a region
interpolates
extracted
the feature aproposal
by maps the feature
region net-
maps net-
proposal
work
through ROIs. (RPN)
through
For (RPN)
work from
ROIs.
each ROI, the
fromFor feature
thetheeach
food maps.
ROI,
type and
feature the ROIAlign
food type
the bounding
maps. crops
and
ROIAlignbox the and interpolates
bounding
cropsare and box
predicted the
are feature
predicted maps
through
throughthe feature maps
interpolates
through
fully connected fullyROIs.
layers
through (FC
ROIs.For
connected each
layers
layers).
For ROI,
The
each (FC the
food
ROI, food
layers).
regiontype
The
the food and
infood
pixel the
typeregion
units
and bounding
in pixel
for
the each box
boundingROIare
units for
is predicted
each
predicted
box are ROI through
is predicted
predicted through
fully
through a fully connected
through a
convolutional layers
fully (FC
network layers).
convolutional (FCN) The food
network
[34]. region
(FCN)
Figure 5 in pixel
[34].
shows units
Figure
the 5 for
shows
results each
of theROI
the is
results
food predicted
of the food
fully connected layers (FC layers). The food region in pixel units for each ROI is predicted
typeand
through
type classification a classification
fully and the
convolutional food region
network (FCN) detection
[34]. through
Figure Mask R-CNN.
5 shows the results of the food
through athe food
fully region
convolutional detection through
network Mask
(FCN) R-CNN.
[34]. Figure 5 shows the results of the food
type classification and the food region detection through Mask R-CNN.
type classification and the food region detection through Mask R-CNN.

Figure 4. Food detection


Figure 4.through Mask R-CNN.
Food detection through Mask R-CNN.
Figure 4. Food detection through Mask R-CNN.

Figure 4. Food detection through Mask R-CNN.

(a) (b)
(a) (b)
Figure 5. Food detection results: (a) food region detection; (b) food type classification.
FigureFigure
5. Food detection
5. Food results:
detection (a) food
results: region
(a) food detection;
region (b) food
detection; typetype
(b) food classification.
classification.
(a)
3.3. Image Correction for Food Amount Comparison (b)
3.3. Image Correction for Food Amount Comparison
The sizeFigure 5. Foodindetection
of an object an imageresults: (a) on
depends food
theregion detection;
capturing (b) food type
environments suchclassification.
as the
The size of an object in an image depends on the capturing environments such as the
camera pose and the distance. Therefore, both pre- and post-meal images should have the
camera pose and the distance. Therefore, both pre- and post-meal images should have the
same capturing3.3. environment
Image Correction for Food Amount
for accurately Comparison
comparing the food amounts. However, the
same capturing environment for accurately comparing the food amounts. However, the
The size of an object in an image depends on the capturing environments such as the
Appl. Sci. 2023, 13, 5742 6 of 14

3.3. Image Correction for Food Amount Comparison


Appl. Sci. 2023, 13, x FOR PEER REVIEW 6 of 14
The size of an object in an image depends on the capturing environments such as the
Appl. Sci. 2023, 13, x FOR PEER REVIEW 6 of 14
camera pose and the distance. Therefore, both pre- and post-meal images should have
the same capturing environment for accurately comparing the food amounts. However,
the capturing
capturing environments
environments of two
of two mealimages
meal imagesareare often different
differentininusual
usualsituations.
situations.In In
order to
capturing match both
environments images
of to
two the same
meal capturing
images are environment,
often different
order to match both images to the same capturing environment, the post-meal image the
in post-meal
usual image
situations. is is
In
corrected
order to based
match on
boththe meal
images plate
to regions
the same in the both
capturing images. For
environment,
corrected based on the meal plate regions in the both images. For each image, one each
the image, one
post-meal rectangle
image is
and its vertices
corrected
rectangle and itsare
based on found
the meal
vertices that
are enclose
plate
found thatthe
regions meal
in the
enclose plate
thebothregion
meal through
images.
plate rotating
For each
region calipers
image,
through one
rotating
algorithmand
rectangle [35].itsAvertices
homography
are foundmatrix
thatisenclose
calculated
the from plate
meal a pairregion
of fourthrough
verticesrotating
in two
calipers algorithm [35]. A homography matrix is calculated from a pair of four vertices in
images. algorithm
calipers The post-meal[35]. Aimage is corrected
homography by is
matrix the homography
calculated from transformation with the
a pairtransformation
of four vertices in
two images. The post-meal image is corrected by the homography with
calculated
two images.matrix as shown in
The post-meal Figure
image is 6.
corrected by the homography transformation with
the calculated matrix as shown in Figure 6.
the calculated matrix as shown in Figure 6.

(a)
(a) (b)
(b)

(c)
(c) (d)
(d) (e)(e)
Figure6.6.Image
Figure Image correction to have same capturing environment: (a) pre- and post-meal images; (b) (b)
Figure 6. Imagecorrection
correctiontotohave
havesame
samecapturing
capturingenvironment:
environment:(a) (a)pre-
pre-and
andpost-meal
post-mealimages;
images;
rectangles
rectangles and vertices
andand
vertices enclosing
enclosing meal
meal plate
plateregions;
regions;(c) corrected
(c) correctedpost-meal image
post-mealimage by
image homography
by homography
(b) rectangles vertices enclosing meal plate regions; (c) corrected post-meal by homography
transformation; (d) food regions of post-meal image before correction; (e) food regions after correc-
transformation; (d) food regions of post-meal image before correction; (e) food regions after correc-
transformation; (d) food regions of post-meal image before correction; (e) food regions after correction.
tion.
tion.
3.4. Meal Intake Amount Estimation
3.4. Meal Intake Amount Estimation
3.4. Meal
TheIntake Amount
meal intake Estimation
amounts are estimated as the differences in the food volumes between
The meal intake amounts are estimated as the differences in the food volumes be-
the The
pre- meal
and post-meal images.are
intake amounts Estimating
estimateda asvolume from an image
the differences in theis food
known to be abe-
volumes
tween the pre- and post-meal images. Estimating a volume from an image is known to be
very
tween challenging task. Nevertheless, we propose a method of the food volume estimation
a verythe pre- andtask.
challenging post-meal images.we
Nevertheless, Estimating
propose aamethod
volumeoffrom an image
the food volumeisestimation
known to be
by
aby assuming
very that task.
challenging the foods on the meal
Nevertheless, we plate havea amethod
propose few simple
of the3D shape
food types.estimation
volume In the
assuming that the foods on the meal plate have a few simple 3D shape types. In the
proposed
by assumingmethod, the
that the food
foods is modeled
onmodeled
the mealin a 3D
plateshape according
have aaccording to
few simple the type
3Dtype and
shape the area
types. Inasthe
proposed method,
shown in Figure 7. the food is in a 3D shape to the and the area
proposed
as shown method,
in Figurethe7. food is modeled in a 3D shape according to the type and the area
as shown in Figure 7.

Flow of
Figure 7. Flow of food
food volume estimation.

Figure 7. Flow of food volume estimation.


Food has a specific 3D shape depending on its food type. Foods in the rice and the
soup categories are similar to a spherical cap shape as shown in Figure 8a. In Figure 8b,
Food has a specific 3D shape depending on its food type. Foods in the rice and the
the foods which consist of bunches of an item are shaped like cones. Figure 8c shows that
soup categories
the shape of theare similar
food toto
is close a spherical
cuboids if cap shapeisas
the food shown
one chunk.inThe
Figure 8a. In method
proposed Figure 8b,
the foods which consist of bunches of an item are shaped like cones. Figure 8c shows that
i. 2023, 13, x FOR PEER REVIEW 7 of 14

Appl. Sci. 2023, 13, 5742 7 of 14

determines the 3D shape types for the characteristic of food as a spherical cap, a cone, or
a cuboid shape. Table 2 shows 3D shape types by the food types.
The food volume Food has a specific
is estimated 3D shape
through depending
the base area and theon 3D
its food
shapetype. Foods
types. in the rice and the
The base
soup
area is calculated categories
through are similar
the actual to a spherical
meal plate cap shape as shown in Figure 8a. In Figure 8b,
area as follows:
the foods which consist of bunches of an item are shaped like cones. Figure 8c shows that
the shape of the food𝐴is close= to cuboids
×𝐴 ,if the food is one chunk. The proposed
(1) method
determines the 3D shape types for the characteristic of food as a spherical cap, a cone, or a
where n(.) is the number of pixels; p and f are the regions of the meal plate and the food,
cuboid shape. Table 2 shows 3D shape types by the food types.
respectively; and Afood and Aplate are the base area and the meal plate area, respectively.

(a)

(b)

(c)
Figure 8. ShapesFigure
of foods: (a) spherical
8. Shapes cap (a)
of foods: shape type; (b)
spherical capcuboid
shape shape type;
type; (b) (c) cone
cuboid shape
shape type.
type; (c) cone shape type.

Table 2. 3D shape types


Table 2. for
3Dfood types
shape infor
types proposed method.
food types in proposed method.
gory Food Name 3D Shape Types Category Food Name 3D Shape Types
Category Food Name 3D Shape Types Category Food Name 3D Shape Types
ce Rice spherical cap Fruit salad Cone
Rice Rice spherical cap Fruit salad Cone
Bean sprout soup spherical cap Kimchi cone
Miso soupBean sprout spherical
soup spherical cap
cap Pepper seasoned Kimchi cone cone
up Miso soup spherical cap Pepper seasoned cone
Soup leaf soup
Radish spherical cap
Radish leaf soup sphericalSide-dish
cap
Seastring seasoned cone
Seastring seasoned cone
Seaweed soupSeaweed soup spherical cap
spherical cap Stir-fried fish cake cone
Stir-fried fish cake cone
Side-dish
Tofu Tofu cuboid cuboid Stir-fried mushroom cone
Stir-fried mushroom cone
-dish Jeon Jeon cuboid cuboid Stir-fried octopusStir-fried octopuscone cone
Stewed fish Stewed fish cuboid cuboid
Side-dish Stir-fried pork Stir-fried pork cone cone
Grilled fish cuboid Stir-fried squash cone
Eggplant cone Yellow pickled radish cone
Appl. Sci. 2023, 13, 5742 8 of 14

The food volume is estimated through the base area and the 3D shape types. The base
area is calculated through the actual meal plate area as follows:

n( p)
A f ood = × A plate , (1)
n( f )

where n(.) is the number of pixels; p and f are the regions of the meal plate and the food,
respectively; and Afood and Aplate are the base area and the meal plate area, respectively.
Even though the base area and the shape type are determined, the 3D shape can be
different. For example, the shape of a cone with a specific base depends on its height. It
is very difficult to estimate the height through one image. However, the foods cannot be
sloped beyond a certain angle, which is the angle of repose. Though the angles of repose
are different depending on the material of the food, the proposed method supposes that
the angle of repose is 30 degree which is the average angle [36]. In other words, the food is
assumed to have the slope angle of 30 degree.
The food volume is estimated through the 3D shape type of the food. For the food of
the spherical cap type, the food height h is equal to R − a as shown in Figure 9a, where R
is the radius of the whole sphere, and a is a distance between the spherical center and the
base area. The slope angle θslope is 30 degree; thus,
s
 2
R √
− 1 = tan θslope = 1/ 3, (2)
a

From (2), a and h are √


a= 3R/2, (3)

 √ 
h = R − a = 2 − 3 R/2. (4)

The base area Afood is equal to πr2 ; then, r is


q
r= A f ood /π. (5)

Since the angle between R and h is also equal to θslope , R is calculated as follows:
q
R = r × csc θslope = 2r = 2 A f ood /π. (6)

The result of substituting R into (4) is


q  √   √ q
h=2 A f ood /π × 2 − 3 /2 = 2 − 3 A f ood /π. (7)

The volume equation of the spherical cap is

1 2
Vsph = πh (3R − h). (8)
3
By substituting (6) and (7) into (8), the food volume Vsph for the spherical cap is
calculated as follows:

16 − 9 3  3/2  3/2
Vsph = √ A f ood ≈ 0.08 A f ood . (9)
3 π
Appl. Sci. 2023, 13, 5742 9 of 14

Though the food in the soup category has the spherical cap shape, it is liquid contained
in a bowl. The slope angle of the food is not fixed at 30 degree but depends on the curvature
of the bowl. If R is given, h is calculated through the Pythagorean theorem as follows:
p
h = R− R2 − r 2 . (10)

For the food with the cone shape, the radius of the base surface
q r is also estimated
through (5). The food height h is calculated to be rtan θslope = A f ood /3π as shown in
Figure 9b. The food volume of the cone-shaped food Vcone is estimated as follows:
ppl. Sci. 2023, 13, x FOR PEER REVIEW 1  3/2 9 of 14
Vcone = A f ood h ≈ 0.11 A f ood . (11)
3
Though the height of the cuboid is hard to estimate, it can be empirically predicted
Though
thatthe
theheight of the cuboid
food volume is hard
decreases to estimate,
in proportion to it can
the be area
base empirically predicted
of the food. Therefore, the
that the food volume decreases in proportion
food volume for the cuboid shape is to the base area of the food. Therefore, the
food volume for the cuboid shape is
𝑉 =Vcuboid
𝐴 𝐻= A f ood
, H f ood , (12)
(12)

where Hfood is theHpredefined


where food is the predefined height depending
height depending ontype.
on the food the food
For type. For each
each food, food, the meal
the meal
intake amount is estimated by comparing the food volumes between the
intake amount is estimated by comparing the food volumes between the pre- and post- pre- and post-meal
images.
meal images.

(a) (b)
Figure 9. 3D food 9.
Figure model for proposed
3D food model for method:
proposed(a)method:
spherical
(a)cap; (b) cone.
spherical cap; (b) cone.

4. Simulation
4. Simulation
4.1. Simulation Results
4.1. Simulation Results
We measure the accuracies of the food type classification and the meal intake amount
We measure the accuracies of the food type classification and the meal intake amount
by the trained Mask R-CNN. In the training of Mask R-CNN, the batch size is set as 64 and
by the trained Mask R-CNN. In the training of Mask R-CNN, the batch size is set as 64
the epochs are set as 10,000, 30,000, 50,000, and 70,000. The performances of the proposed
and the epochs are set as 10,000, 30,000, 50,000, and 70,000. The performances of the pro-
method are measured through 65 images with 206 food objects. In addition, Mask R-CNN
posed method are measured through 65 images with 206 food objects. In addition, Mask
and YOLOv8 [37] are applied as the food detection network to compare the performances
R-CNN and YOLOv8 [37] are applied as the food detection network to compare the per-
of the proposed method.
formances of the proposed method.
The accuracies of the food existence detection and the food type classification are
The accuracies of the food existence detection and the food type classification are
measured as shown in Table 3. The accuracy of the food existence increases as the epochs
measured as shown in Table 3. The accuracy of the food existence increases as the epochs
increase up to 50,000. All foods are detected with Mask R-CNN trained over 50,000 epochs.
increase up
Thetoaccuracy
50,000. Allof foods aretype
the food detected with Mask
classification R-CNN trained
is improved over 50,000
up to 97.57% epochs.
until the food detection
The accuracy of the food type classification is improved up to 97.57% until the food
network is trained with 50,000 epochs. The accuracy of the food type classification detec- hardly
tion network is trained
increases from with 50,000greater
the epochs epochs.thanThe50,000.
accuracy of R-CNN
Mask the foodistype
betterclassification
than YOLOv8 for the
hardly increases from the epochs greater than 50,000. Mask
food existence detection and the food type classification. R-CNN is better than YOLOv8
for the food existence detection and the food type classification.

Table 3. Performance of food type classification for 206 foods.

No. of Correct Classifica-


No. of Detected Objects
Network Epochs tion
(Accuracy %)
(Accuracy %)
10,000 202 (98.06%) 191 (92.72%)
Appl. Sci. 2023, 13, 5742 10 of 14

Table 3. Performance of food type classification for 206 foods.

No. of Detected No. of Correct


Network Epochs Objects Classification
(Accuracy %) (Accuracy %)
10,000 202 (98.06%) 191 (92.72%)
30,000 204 (99.03%) 196 (95.15%)
Mask R-CNN
50,000 206 (100%) 201 (97.57%)
70,000 206 (100%) 201 (97.57%)
10,000 191 (92.72%) 183 (88.83%)
Appl. Sci. 2023, 13, x FOR PEER REVIEW 30,000 196 (95.15%) 190 (92.23%)10 of 14
YOLOv8
50,000 200 (97.09%) 196 (95.15%)
70,000 202 (98.06%) 197 (95.63%)

Table 4 shows the accuracies of the food type classification by the food categories
Table 4 shows the accuracies of the food type classification by the food categories
through Mask R-CNN trained with 50,000 epochs. All foods in the rice category are accu-
through Mask R-CNN trained with 50,000 epochs. All foods in the rice category are
rately classified. However, some of the foods in the soup and the side-dishes categories
accurately classified. However, some of the foods in the soup and the side-dishes categories
are misclassified as different food types within the same category. The food detection net-
are misclassified as different food types within the same category. The food detection
work occasionally classifies foods as different food types with similar color.
network occasionally classifies foods as different food types with similar color.
Table 4. Performance of food type classification by food categories.
Table 4. Performance of food type classification by food categories.
No. of Detected No. of Correct
No. of Detected No. of Correct
Food Category
Food Category No. of Total Objects
No. of Total Objects Objects
Objects Classification
Classification
(Accuracy %) %)
(Accuracy (Accuracy
(Accuracy%)%)
Rice
Rice 63 63 63 (100%)
63 (100%) 63 63
(100%)
(100%)
Soup
Soup 18 18 18 (100%)
18 (100%) 17 17
(94.44%)
(94.44%)
Side-dish
Side-dish 94 94 94 (100%)
94 (100%) 89 89 (94.68%)
(94.68%)

Theaccuracies
The accuraciesofofthe
the food
food region
region detection
detection areare measured
measured by by calculating
calculating intersection
intersection of
of union
union (IoU)
(IoU) as follows:
as follows:
n( g ∩ d∩)
IoU = 𝐼𝑜𝑈 = (13)
n( g) + n(d) − n( g ∩∩ d, ) (13)

whereg gand
where andddare
arethe
theground
groundtruth
truthand
andthe
thedetected
detectedregions,
regions,respectively.
respectively.Figure
Figure1010shows
shows
the accuracies of the food area detection. Similar to the food type classification,
the accuracies of the food area detection. Similar to the food type classification, the accuracy
the accu-
ofracy
the of
food
thearea
fooddetection is also increased
area detection up to 93.6%
is also increased up tountil
93.6%theuntil
foodthe
detection networknet-
food detection is
trained
work iswith 50,000
trained epochs.
with 50,000Mask R-CNN
epochs. MaskisR-CNN
also more accurate
is also morethan YOLOv8
accurate thanfor the food
YOLOv8 for
area
the detection.
food area detection.

Figure 10.Accuracies
Figure10. foodarea
Accuraciesofoffood areadetection
detectionwith
withcomparison
comparisonbetween
betweenMask
MaskR-CNN
R-CNNand
andYOLOv8.
YOLOv8.

Figure 11 shows the images of the meal plates with average meal intakes of about
Figure 11 shows the images of the meal plates with average meal intakes of about
40%, 80%, and 90% for the food, respectively. The meal intake amounts for Figure 11
40%, 80%, and 90% for the food, respectively. The meal intake amounts for Figure 11 are
estimated through the proposed method as shown in Table 5. The proposed method esti-
mates the meal intake amounts closely to the actual amounts for most foods except the
soup category. For foods in the soup category, the estimated intake amounts are smaller
than the actual. The soup bowl has a small curvature, that is, a large slope. Therefore, the
Appl. Sci. 2023, 13, 5742 11 of 14

are estimated through the proposed method as shown in Table 5. The proposed method
estimates the meal intake amounts closely to the actual amounts for most foods except the
soup category. For foods in the soup category, the estimated intake amounts are smaller
than the actual. The soup bowl has a small curvature, that is, a large slope. Therefore, the
change in the food area is excessive small compared to the decrease in the food volume.

4.2. Discussion
The proposed method can accurately detect food regions in the meal image and classify
food types. In addition, the meal intake amount can be estimated through a pair of images
of the pre- and post-meal when foods are on the designated meal plate. RGB-D images or
images from multiple viewpoints are not necessary to estimate the food volumes in the
proposed method. However, the proposed method does not assume different foods mixed
in a single dish as shown in Figure 12. The proposed method is not suitable for estimating
precise food volume estimation with an extremely small tolerance.
Even though Korean foods are only covered, the proposed method can be applied
to foods from other countries. The type detection for another nation foods is possible by
training the object detection network with images of the corresponding foods. The 3D
shape types for another nation’s foods are similar to the types presented in this paper as
Appl. Sci. 2023, 13, x FOR PEER REVIEW 11 of 14
shown in Figure 13. Therefore, this meal intake amount estimation can be widely applied
for foods of various countries.

(a)

(b)

(c)
Figure11.
Figure 11.Pairs
Pairsofofimages
imagesbetween
betweenpre-
pre-and
andpost-meal:
post-meal:(a)
(a)intake
intakeofofabout
about40%;
40%;(b)
(b)intake
intakeofofabout
about
80%; (c) intake of about
80%; (c) intake of about 90%.90%.

Table 5. Results of food volume estimation through proposed method.

Food Volume of Food Volume of Meal Intake


Target Food Category Food Name Pre-Meal Image Post-Meal Image Amount
13, x FOR PEER REVIEW 12 of 14

Appl. Sci. 2023, 13, 5742 12 of 14

Side-dish 1 Stir-fried octopus 50.74 4.54 46.20


Side-dish 2 Fruit salad
Table 5. Results of food 26.41
volume estimation 3.30
through proposed method. 23.11
Side-dish 3 Stir-fried squash 36.31 4.54 31.77
Food Volume of Food Volume of Meal Intake
Side-dish
Target4 Stir-fried mushroom
Food Category Food Name 50.74 12.84
Pre-Meal Image 37.90
Post-Meal Image Amount
(cm3 ) (cm3 ) (cm3 )
4.2. Discussion Rice
Appl. Sci. 2023, 13, x FOR PEER REVIEW Rice 89.12 48.51 40.61 12 of 14
Soup Seaweed soup 104.81 82.80 22.01
The proposed
Side-dish 1
methodStir-fried
can accurately
octopus
detect food regions in the meal
50.74 23.58
image and clas-
27.16
Figure 11a
sify food Side-dish
types. In2 addition, Fruit
the meal intake amount can be estimated through a pair of
salad 26.41 17.15 9.26
images ofSide-dish
the pre-3
Side-dish 1and post-meal
Stir-fried when
squash
Stir-fried octopus foods are on
36.31the
50.74 designated meal
23.58
4.54 plate. RGB-D12.73
46.20
images orSide-dish
images4 from multiple
Stir-fried mushroom
Side-dish 2 viewpoints are not
Fruit salad
50.74 36.31 the food vol-
necessary to estimate
26.41
14.43
3.30 23.11
umes in the Rice
proposed
Side-dish 3 method. Rice squashthe proposed
However,
Stir-fried 89.12 26.41
method does not
36.31 assume different
4.54 62.71
31.77
Soup Sirak soup 104.81 49.69 55.12
foods mixed in a single
Side-dish1 4
Side-dish
dish as shown
Stir-fried in
Stir-friedmushroom
pork
Figure 12. The
50.74
50.74
proposed method
12.84
5.69
is not suitable37.90
45.05
Figure 11b
for estimating precise food volume estimation with an extremely small tolerance.
Side-dish 2 Stir-fried squash 36.31 5.69 30.62
Side-dish 34.2. Discussion
Pepper seasoned 17.15 3.30 13.85
Side-dish 4 Seastring seasoned 36.90 17.15 19.75
The proposed method can accurately detect food regions in the meal image and clas-
Rice sify food types.Rice 89.12
In addition, the meal intake amount can 17.15 71.97a pair of
be estimated through
Soup images of Seaweed soup 104.81 37.73 67.08
the pre- and post-meal when foods are on the designated meal plate. RGB-D
Side-dish 1 Stir-fried octopus 50.74 4.54 46.20
Figure 11c images or images from multiple viewpoints are not necessary to estimate the food vol-
Side-dish 2 Fruit salad 26.41 3.30 23.11
Side-dish 3umes in the proposed
Stir-fried method. However,
squash 36.31 the proposed method
4.54 does not assume
31.77different
Side-dish 4foods mixed in amushroom
Stir-fried single dish as shown in Figure 12. The 12.84
50.74 proposed method is37.90
not suitable
for estimating precise food volume estimation with an extremely small tolerance.

Figure 12. Various foods mixed in a single dish.

Even though Korean foods are only covered, the proposed method can be applied to
foods from other countries. The type detection for another nation foods is possible by
training the object detection network with images of the corresponding foods. The 3D
shape types for another nation’s foods are similar to the types presented in this paper as
shown in Figure 13. Therefore, this meal intake amount estimation can be widely applied
Figure 12. Various foods mixed in a single dish.
for foods of various countries.
Figure 12. Various foods mixed in a single dish.

Even though Korean foods are only covered, the proposed method can be applied to
foods from other countries. The type detection for another nation foods is possible by
training the object detection network with images of the corresponding foods. The 3D
shape types for another nation’s foods are similar to the types presented in this paper as
shown in Figure 13. Therefore, this meal intake amount estimation can be widely applied
for foods of various countries.

(a) (b) (c)


Figure 13. 3D shapeFigure
types13.
for3D
foods of various countries: (a) spherical cap shape; (b) cuboid shape;
shape types for foods of various countries: (a) spherical cap shape; (b) cuboid shape;
(c) cone shape. (c) cone shape.

5. Conclusions 5. Conclusions
In this paper, weIn this paper,
proposed wemethods
(a)the proposedofthe themethods of the
(b)
food type food type classification
classification and the meal (c)and the meal
intake amount estimation. The food regions and the food types were detected through the
intake amount estimation.
Figure 13.The foodtypes
3D shape regions and of
for foods the food countries:
various types were detectedcap
(a) spherical through the
shape; (b) cuboid shape;
Mask R-CNN. The post-meal image was corrected to have the same capturing environment
(c) cone shape.
Mask R-CNN. The post-meal image was corrected to have the same capturing environ-
based on the vertex points of the meal plate in the two images. The 3D shape type was
ment based on the vertex points
determined as oneof of
the meal plate
a spherical cap,inathe two
cone, andimages.
a cuboidThe for3D shape
each foodtype
in the images.
5. Conclusions
was determined as one
The of avolumes
food spherical werecap, a cone, through
estimated and a cuboid for each
the detected food
area sizesinand
the the
images.
3D shape types.
The food volumesThewere Inestimated
meal this paper,
intake we proposed
through
amounts the methods
theestimated
were detected theof
area
as thevolume
sizes
food foodthe
and type3D classification
shape between
differences types. andpre-
the meal
and
intake
The meal intake amounts
post-meal.amount
were estimation.
estimated
In the simulation The
as the food regions
foodthe
results, volume and the food
differences
accuracies types
of the food were
between detected through
pre- and and the
type classification the
Mask R-CNN. The post-meal image was corrected to have the same capturing environ-
post-meal. In the simulation results, the accuracies of the food type classification and the
ment based on the vertex points of the meal plate in the two images. The 3D shape type
food region detection were up to 97.57% and 93.6%, respectively. The proposed method
was determined as one of a spherical cap, a cone, and a cuboid for each food in the images.
can be applied notTheonlyfoodto volumes
Korean were food,estimated
but also through
to otherthe countries’
detected areafoods,
sizessuch
andastheother
3D shape types.
Asian countries orThecountries in Europe. It is possible to analyze the ingested
meal intake amounts were estimated as the food volume differences between pre- and nutrients
Appl. Sci. 2023, 13, 5742 13 of 14

food region detection were up to 97.57% and 93.6%, respectively. The proposed method can
be applied not only to Korean food, but also to other countries’ foods, such as other Asian
countries or countries in Europe. It is possible to analyze the ingested nutrients through
the proposed method. The ingested nutrients are identified through the classified food
types. The amounts of the ingested nutrients are calculated through the food types and the
estimated meal intake amount. The nutrient analysis through the proposed method allows
us to suggest a diet that provides a balanced nutrient intake. The adherence of a patient
to dietary restrictions can be checked by analyzing the ingested nutrients. Moreover, it
is possible to recommend the intake of the corresponding food for the part lacking in a
specific nutrient.

Author Contributions: Conceptualization, J.-h.K., D.-s.L. and S.-k.K.; software, J.-h.K.; writing—
original draft preparation, J.-h.K., D.-s.L. and S.-k.K.; supervision, S.-k.K. All authors have read and
agreed to the published version of the manuscript.
Funding: This research was supported by the BB21+ Project in 2022 and supported by the MSIT
(Ministry of Science and ICT), Korea, under the Grand Information Technology Research Center
support program (IITP-2023-2020-0-01791) supervised by the IITP (Institute for Information & com-
munications Technology Planning & Evaluation).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Westerterp, K.R.; Goris, A.H. Validity of The Assessment of Dietary Intake: Problems of Misreporting. Curr. Opin. Clin. Nutr.
Metab. Care 2002, 5, 489–493. [CrossRef] [PubMed]
2. Elbassuoni, S.; Ghattas, H.; Ati, J.E.; Shmayssani, Z.; Katerji, S.; Zoughbi, Y.; Semaan, A.; Akl, C.; Gharbia, H.B.; Sassi, S.
DeepNOVA: A Deep Learning NOVA Classifier for Food Images. IEEE Access 2022, 10, 128523–128535. [CrossRef]
3. Tiankaew, U.; Chunpongthong, P.; Mettanant, V. A Food Photography App with Image Recognition for Thai Food. In Proceedings
of the Seventh ICT International Student Project Conference, Nakhonpathom, Thailand, 11–13 July 2018.
4. Mezgec, S.; Seljak, B.K. Using Deep Learning for Food and Beverage Image Recognition. In Proceedings of the IEEE International
Conference on Big Data, Los Angeles, CA, USA, 9–12 December 2019.
5. Islam, M.T.; Siddique, B.M.K.; Rahman, S.; Jabid, T. Food Image Classification with Convolutional Neural Network. In Proceedings
of the International Conference on Intelligent Informatics and Biomedical Sciences, Bangkok, Thailand, 21–24 October 2018.
6. Bándi, N.; Tunyogi, R.B.; Szabó, Z.; Farkas, E.; Sulyok, C. Image-Based Volume Estimation Using Stereo Vision. In Proceedings of
the IEEE International Symposium on Intelligent Systems and Informatics, Subotica, Serbia, 17–19 September 2020.
7. Okinda, C.; Sun, Y.; Nyalala, I.; Korohou, T.; Opiyo, S.; Wang, J.; Shen, M. Egg Volume Estimation Based on Image Processing and
Computer Vision. J. Food Eng. 2020, 283, 110041. [CrossRef]
8. Lu, Y.; Stathopoulou, T.; Vasiloglou, M.F.; Christodoulidis, S.; Blum, B.; Walser, T.; Meier, V.; Stanga, Z.; Mougiakakou., S. An
Artificial Intelligence-Based System for Nutrient Intake Assessment of Hospitalised Patients. IEEE Trans. Multimedia 2020, 23,
1136–1147. [CrossRef]
9. Lo, F.P.W.; Sun, Y.; Qiu, J.; Lo, B. Food Volume Estimation Based on Deep Learning View Synthesis from a Single Depth Map.
Nutrients 2018, 10, 2005. [CrossRef] [PubMed]
10. Suzuki, T.; Futatsuishi, K.; Yokoyama, K.; Amaki, N. Point Cloud Processing Method for Food Volume Estimation Based on Dish
Space. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Montreal,
QC, Canada, 20–24 July 2020.
11. Ando, Y.; Ege, T.; Cho, J.; Yanai, K. Depthcaloriecam: A Mobile Application for Volume-Based Foodcalorie Estimation Using
Depth Cameras. In Proceedings of the International Workshop on Multimedia Assisted Dietary Management, New York, NY,
USA, 21 October 2019.
12. Okamoto, K.; Yanai, K. An Automatic Calorie Estimation System of Food Images on A Smartphone. In Proceedings of the
International Workshop on Multimedia Assisted Dietary Management, Amsterdam, The Netherlands, 16 October 2016.
13. Smith, S.P.; Adam, M.T.P.; Manning, G.; Burrows, T.; Collins, C.; Rollo, M.E. Food Volume Estimation by Integrating 3D Image
Projection and Manual Wire Mesh Transformations. IEEE Access 2022, 10, 48367–48378. [CrossRef]
14. Liu, Y.; Lai, J.; Sun, W.; Wei, Z.; Liu, A.; Gong, W.; Yang, Y. Food Volume Estimation Based on Reference. In Proceedings of the
International Conference on Innovation in Artificial Intelligence, Xiamen, China, 8–11 May 2020.
Appl. Sci. 2023, 13, 5742 14 of 14

15. Jia, W.; Ren, Y.; Li, B.; Beatrice, B.; Que, J.; Cao, S.; Wu, Z.; Mao, Z.H.; Lo, B.; Anderson, A.K.; et al. A Novel Approach to Dining
Bowl Reconstruction for Image-Based Food Volume Estimation. Sensors 2022, 22, 1493. [CrossRef] [PubMed]
16. Yue, Y.; Jia, W.; Sun, M. Measurement of Food Volume Based on Single 2-D Image without Conventional Camera Calibration. In
Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA,
USA, 28 August–1 September 2012.
17. Fang, S.; Liu, C.; Zhu, F.; Delp, E.J.; Boushey, C.J. Single-view Food Portion Estimation Based on Geometric Models. In Proceedings
of the IEEE International Symposium on Multimedia, Miami, FL, USA, 14–16 March 2015.
18. He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer
Vision, Venice, Italy, 22–29 October 2017.
19. Chang, K.H.; Liu, S.Y.; Chu, H.H.; Hsu, J.Y.J.; Chen, C.; Lin, T.Y.; Chen, C.Y.; Huang, P. The Diet-aware Dining Table: Observing
Dietary Behaviors over A Tabletop Surface. In Proceedings of the 4th International Conference on Pervasive Computing, Dublin,
Ireland, 7–10 May 2006; pp. 366–382.
20. Zhou, B.; Cheng, J.; Sundholm, M.; Reiss, A.; Huang, W.; Amft, O.; Lukowicz, P. Smart Table Surface: A Novel Approach
to Pervasive Dining Monitoring. In Proceedings of the 2015 IEEE International Conference on Pervasive Computing and
Communications (PerCom), St. Louis, MO, USA, 23–27 March 2015.
21. Olubanjo, T.; Moore, E.; Ghovanloo, M. Detecting Food Intake Acoustic Events in Noisy Recordings Using Template Matching. In
Proceedings of the International Conference on Biomedical and Health Informatics, Las Vegas, NV, USA, 25–27 February 2016.
22. Thomaz, E.; Essa, I.; Abowd, G.D. A Practical Approach for Recognizing Eating Moments with Wrist-mounted Inertial Sensing. In
Proceedings of the International Joint Conference on Pervasive and Ubiquitous Computing, Osaka, Japan, 7–11 September 2015.
23. Ye, X.; Chen, G.; Gao, Y.; Wang, H.; Cao, Y. Assisting Food Journaling with Automatic Eating Detection. In Proceedings of the
CHI Conference Extended Abstracts on Human Factors in Computing Systems, San Jose, CA, USA, 7–12 May 2016.
24. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
25. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the
International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015.
26. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet Classification with Deep Convolutional Neural Networks. Commun. ACM
2017, 60, 84–90. [CrossRef]
27. Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
28. Wang, Y.; Wu, J.; Deng, H.; Zeng, X. Food Image Recognition and Food Safety Detection Method Based on Deep Learning. Comput.
Intell. Neurosci. 2021, 2021, 1268453. [CrossRef] [PubMed]
29. Lo, F.P.W.; Sun, Y.; Qiu, J.; Lo, B. Image-Based Food Classification and Volume Estimation for Dietary Assessment: A Review.
IEEE J. Biomed. Health. Inf. 2020, 24, 1926–1939. [CrossRef] [PubMed]
30. Hippocrate, E.; Suwa, H.; Arakawa, Y.; Yasumoto, K. Food Weight Estimation Using Smartphone and Cutlery. In Proceedings of
the Annual International Conference on Mobile Systems, Applications, and Services, Singapore, 25–30 June 2016.
31. Xu, C.; He, Y.; Khanna, N.; Boushey, C.; Delp, E. Model-Based Food Volume Estimation Using 3D Pose. In Proceedings of the
IEEE International Conference on Image Processing, Melbourne, Australia, 15–18 September 2013.
32. Chang, X.; Ye, H.; Albert, P.; Edward, D.; Nitin, K.; Carol, B. Image-Based Food Volume Estimation. In Proceedings of the ACM
Multimedia Conference, Barcelona, Spain, 21–25 October 2013.
33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016.
34. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the Computer
Vision and Pattern Recognition Conference, Boston, MA, USA, 8–10 June 2015.
35. Shamos, M.I. Computational Geometry; Yale University: Connecticut, CT, USA, 1978.
36. Al-Hashemi, H.M.B.; Al-Amoudi, O.S.B. A Review on The Angle of Repose of Granular Materials. Powder Technol. 2018, 330,
397–417. [CrossRef]
37. YOLOv8. Available online: https://github.com/ultralytics/ultralytics (accessed on 22 April 2023).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like