Deep_Learning-Based_Quality_Assessment_Of_3d_Point_Clouds_Without_Reference
Deep_Learning-Based_Quality_Assessment_Of_3d_Point_Clouds_Without_Reference
REFERENCE
1
Laboratoire PRISME, Université d’Orléans, Orléans, France
2
L2S, Centrale Supélec, Université Paris-Saclay, Gif-sur-Yvette, France
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on December 11,2023 at 04:20:04 UTC from IEEE Xplore. Restrictions apply.
Fig. 1. General framework of the proposed method.
tributes: geometric distance, mean curvature and gray-level. distorted PC, we first randomly select a set of N points. The
The resulted patches (i.e., one per attribute) are stacked and latter are here considered as reference points from which the
fed as input to a Convolutional Neural Network (CNN) model quality is predicted. Then, we delimit a region around each
to predict their quality. The global quality index is finally point Ni by finding its K nearest neighbors. Next, the neigh-
given by aggregating the predicted patch quality indexes. We bors of each point Ni are characterized by three attributes:
assess the performance of our method on two datasets, includ- geometric distance, mean curvature and gray-level. The ge-
ing a large dataset more suited to deep models. The high cor- ometric distance D describes the spatial distribution of the
relations obtained showed the potential of the approach. The neighbors and it is computed as follows:
main contribution of this paper is the use of classical CNN
models to predict the quality of 3D PC through extracted fea- D Kji , Ni =
tures. 2 2 2
The rest of this paper is structured as follows: Section 2 xKji − xNi + yKji − yNi + zKji − zNi , (1)
describes the proposed framework. Experimental results are
presented in Section 3 and the conclusion is given in Sec-
where D Kji , Ni represents the Euclidean distance be-
tion 4. s
tween Ni and its j th nearest neighbor Kji . {x, y, z}Kji and
2. PROPOSED METHOD {x, y, z}Ni are the spatial coordinates of the j th neighbor
and the point Ni , respectively.
As illustrated by Fig. 1, the framework proposed for estimat-
ing the quality of 3D point clouds without reference is based The mean curvature allows to characterize the shape vari-
on 2 main steps: 1) feature extraction that aims to charac- ations and it is computed for each point of the delimited re-
terize the PC through a set of features as patches from se- gion, including the point Ni . The latter is estimated through
lected points and, 2) quality prediction that aims to estimate a quadric fitting. The grey level gives information more re-
the quality of the distorted PC via a CNN model. Both steps lated to the perceived rendering and it derived from the color
are described in details in this section. information (i.e. RGB to grey level). The features obtained
for each attribute are encapsulated into a patch and the three
2.1. Feature extraction as patches
resulted patches are then stacked to form a new patch SPi of
size P ×P ×3 where three represents the number of attributes
considered.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on December 11,2023 at 04:20:04 UTC from IEEE Xplore. Restrictions apply.
order to compare the performance of different architectures where N is the number of stacked patches (i.e. the number
and analyze the impact of the depth. of selected points).
• AlexNet [17]: This model is one of the pioneering It is worth noting that a plethora of handcrafted and deep-
models proposed by Alex Krizhevsky. It highlighted based methods already aggregated local predicted scores to
the relevance of using CNN models for classification derive a global quality index like SSIM [7], VDP [24] and the
tasks. The authors brought out three main points: the following deep-based methods as well [20, 25]. This strategy
use of the Relu (Rectified Linear Units) function, the was also applied for 3D content as done in [6, 8].
exploitation of the dropout to prevent the over-fitting
and overlap during the pooling step.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on December 11,2023 at 04:20:04 UTC from IEEE Xplore. Restrictions apply.
patch (P ) were fixed to 1023 and 32x32, respectively. The • The architecture seems to have an impact since VGG-
patch size was set following studies where its impact on the based models give the best correlations with an im-
performance was analyzed [20, 25]. provement mean PCC gain around 5%. Moreover,
unlike for classification tasks the integration of resid-
3.2. Performance evaluation ual modules doesn’t improve the performance. The
p-values between the VGG-based and ResNet-based
Two evaluation criteria commonly used to evaluate the perfor- models are smaller than 0.05, indicating a statistically
mance of quality metrics are adopted here: 1) Pearson Corre- significant difference between the two.
lation Coefficient (PCC) and 2) Spearman Rank-Order Coef-
ficient Correlation (SROCC). Both vary between 0 and 1 in Based on these results, it seems that the architecture of the
absolute value, with 1 being the best performance. These cor- considered models has more impact on the performance than
relations are computed for each dataset over each fold and the depth.
the mean correlations are then reported as results. It is worth
noting that the same procedure was applied to the compared 3.4. Comparison with state-of-the-art metrics
state-of-the-art metrics. Our method is here compared with state-of-the-art metrics.
More precisely, we consider po2point-based and po2plane-
3.3. Model comparison based metrics pooled with MSE, PSNR and Hausdorff. We
also considered recent metrics po2dist [10] (i.e. point to dis-
tribution) pooled with MSE and PSNR as well as PCQM
Model PCC SROCC which is based on both geometry and color features [9]. Re-
sults on sjtu and ICIP20 datasets are shown in Tables 2 and 3,
AlexNet 0.894 ± 0.055 0.874 ± 0.083 respectively with the top-2 results highlighted in bold.
VGG16 0.923 ± 0.031 0.907 ± 0.045
VGG19 0.925 ± 0.035 0.912 ± 0.042 Method PCC SROCC
ResNet18 0.886 ± 0.091 0.856 ± 0.113 Baseline methods (full reference)
ResNet50 0.885 ± 0.086 0.855 ± 0.109 po2pointMSE 0.686 0.801
PSNRpo2pointMSE 0.799 0.844
Table 1. Model comparison on sjtu dataset. The two best
results are highlighted in bold. po2pointHausdorff 0.517 0.686
PSNRpo2pointHausdorff 0.638 0.682
Table 1 shows the mean correlations obtained for each po2planeMSE 0.642 0.717
considered model. The two best results are highlighted in PSNRpo2planeMSE 0.744 0.722
bold. From these results, several observations can be made:
po2planeHausdorff 0.539 0.682
• First of all, the performance differs from a model to PSNRpo2planeHausdorff 0.755 0.825
another with a high gap between them (from 0.875 to po2distMSE (mmd) 0.710 0.603
0.925 in terms of mean PCC and from 0.855 to 0.912
PSNRpo2distMSE (mmd) 0.621 0.603
in terms of mean SROCC). The stability of the re-
sults over the folds differs as well. ResNet-based mod- po2distMSE (msmd) 0.706 0.603
els obtain the higher standard deviations outperformed PSNRpo2distMSE (msmd) 0.642 0.715
by AlexNet, while VGG-based models are more stable
PCQM 0.879 0.888
with the lower standard-deviations.
Our method (no reference)
• ResNet50 achieves the worst performance, lower even
GQI-VGG16 0.923 0.907
than the simpler and shallower model used (i.e.
AlexNet). However, it is competitive to the full ref- GQI-VGG19 0.925 0.912
erence metric PCQM and outperforms all the other full
reference compared metrics on sjtu (see Table 2). Table 2. Comparison with state-of-the-art methods on sjtu
dataset. The two best results are highlighted in bold.
• For VGG-based and ResNet-based models, we can see
that the depth doesn’t highly impact the performance On sjtu (see Table 2), the results obtained by our
since the differences are statistically not significant (p- method with both models (i.e. VGG16 and VGG19)
value > 0.05). surpass all the compared state-of-the-art metrics with a
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on December 11,2023 at 04:20:04 UTC from IEEE Xplore. Restrictions apply.
gain in terms of mean PCC that varies between 5% and proach). PCQM is outperformed by most of the compared
70%. PCQM achieves the third best performance with metrics, except po2point-based metrics pooled with Haus-
0.879 and 0.888 as mean PCC and SROCC, respectively. dorff (i.e. po2pointHausdorff and PSNRpo2pointHausdorff)
The worst result is obtained by po2pointHausdorff, closely and po2planeHausdorff. Similarly to the results obtained on
followed by po2planeHausdorff. PSNRpo2pointMSE, sjtu, po2point-based and po2plane-based metrics pooled with
PSNRpo2planeHausdorff and po2distMSE obtain the best re- MSE obtain higher correlations than those pooled with PSNR,
sults among the po2point-based, po2plane-based and po2dist- while the po2dist-based metrics pooled with PSNR give better
based metrics, respectively. Whereas po2pointHausdorff, results than those pooled with MSE.
po2planeHausdorff and PSNRpo2distMSE achieve the worst Globally, the correlations reached on ICIP20 are higher
results among the po2point-based, po2plane-based and than those obtained on sjtu, except for PCQM. These results
po2dist-based metrics, respectively. Unlike the po2dist- can be justified by the fact that ICIP20 is composed only
based metrics, po2point-based and po2plane-based metrics of compressed PCs with joint distortion of geometry and at-
pooled with MSE obtain higher correlations than those pooled tributes, while sjtu contains a more wide set of distortions
with PSNR. Globally, the compared metrics, except PCQM, including color noise. We also evaluated the generalization
achieve low correlations since they focus more geometric in- capacity of the proposed method. However, the correlations
formation, failing to catch other distortions. are not as expected.
4. CONCLUSION
Method PCC SROCC
In this paper, we proposed a deep learning-based method that
Baseline methods (full reference)
efficiently predicts the quality of distorted PCs without ref-
po2pointMSE 0.945 0.950 erence. After randomly selecting a set of points from the
PSNRpo2pointMSE 0.880 0.934 PC, we defined a region around each of them by finding the
po2pointHausdorff 0.717 0.690 nearest neighbors. The delimited regions are then character-
ized through three attributes (geometric-distance, mean cur-
PSNRpo2pointHausdorff 0.597 0.763 vature and gray-level) and the obtained values are stacked
po2planeMSE 0.945 0.959 into patches of size 32 × 32 × 3 to predict their quality in-
PSNRpo2planeMSE 0.916 0.953 dexes (i.e. patch quality indexes) through a CNN model. The
global quality index is finally given by aggregating the pre-
po2planeHausdorff 0.753 0.763 dicted patch quality indexes. We compared the performance
PSNRpo2planeHausdorff 0.939 0.970 of five pre-trained CNN models and the best results were com-
po2distMSE (mmd) 0.965 0.963 pared with state-of-the-art methods. The results obtained on
two datasets show the potential of the proposed approach.
PSNRpo2distMSE (mmd) 0.865 0.965
Despite the effectiveness of our method, some points
po2distMSE (msmd) 0.967 0.965 should be deeper analyzed, including the use of more effi-
PSNRpo2distMSE (msmd) 0.902 0.972 cient deep learning models, the impact on the performance of
PCQM 0.796 0.832 the number of selected points as well as the patch size. Other
point selection strategies will also be considered.
Our method (no reference)
GQI-VGG16 0.956 0.966
5. REFERENCES
GQI-VGG19 0.952 0.966
[1] D. Graziosi, O. Nakagami, S. Kuma, A. Zaghetto,
Table 3. Comparison with state-of-the-art methods on ICIP20 T. Suzuki, and A. Tabatabai, “An overview of ongo-
dataset. The two best results are highlighted in bold. ing point cloud compression standardization activities:
video-based (V-PCC) and geometry-based (G-PCC),”
APSIPA Trans. on Signal and Information Process., vol.
On ICIP20 (see Table 3), po2distMSE (msmd) ob- 9, 2020.
tains the best mean PCC (0.967), closely followed by [2] Maurice Quach, Giuseppe Valenzise, and Frederic Du-
po2distMSE (mmd) (0.965). Whereas the two best mean faux, “Learning Convolutional Transforms for Lossy
SROCC is reached by PSNRpo2distMSE (0.972) and Point Cloud Geometry Compression,” in 2019 IEEE
PSNRpo2planeHausdorff (0.970), respectively. Our metric Intl. Conf. on Image Process. (ICIP), Sept. 2019, pp.
achieves the third mean PCC (0.956 for VGG16 and 0.952 4320–4324, ISSN: 1522-4880.
for VGG19) and the fourth mean SROCC (0.966 for both [3] Maurice Quach, Giuseppe Valenzise, and Frederic Du-
models) without accessing the pristine PC (i.e. NR ap- faux, “Improved Deep Point Cloud Geometry Compres-
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on December 11,2023 at 04:20:04 UTC from IEEE Xplore. Restrictions apply.
sion,” in 2020 IEEE Intl. Workshop on Multimedia Sig- [16] S. Dodge and L. Karam, “Understanding how image
nal Process. (MMSP), Oct. 2020. quality affects deep neural networks,” in 2016 Eighth
[4] Dong Tian et al., “Geometric distortion metrics for point International Conference on Quality of Multimedia Ex-
cloud compression,” in 2017 IEEE Intl. Conf. on Image perience (QoMEX), 2016, pp. 1–6.
Process. (ICIP), Beijing, Sept. 2017, pp. 3460–3464, [17] A. Krizhevsky, “One weird trick for parallelizing convo-
IEEE. lutional neural networks,” CoRR, abs/1404.5997, 2014.
[5] Evangelos Alexiou and Touradj Ebrahimi, “Point Cloud [18] K. Simonyan and A. Zisserman, “Very deep convo-
Quality Assessment Metric Based on Angular Similar- lutional networks for large-scale image recognition,”
ity,” in 2018 IEEE Intl. Conf. on Multimedia and Expo CoRR, abs/1409.1556, 2014.
(ICME), July 2018, pp. 1–6, ISSN: 1945-788X. [19] S. Ren K. He, X. Zhang and J. Sun, “Deep resid-
[6] Gabriel Meynet, Julie Digne, and Guillaume Lavoué, ual learning for image recognition,” arXiv preprint
“PC-MSDM: A quality metric for 3D point clouds,” in arXiv:1512.03385, 2015.
2019 11th Intl. Conf. on Quality of Multimedia Experi- [20] L. Kang, P. Ye, Y. Li, and D. Doermann, “Convolutional
ence (QoMEX), June 2019, pp. 1–3, ISSN: 2472-7814, neural networks for no-reference image quality assess-
2372-7179. ment,” in 2014 IEEE Conference on Computer Vision
and Pattern Recognition, 2014, pp. 1733–1740.
[7] Zhou Wang et al., “Image quality assessment: from er-
[21] Aladine Chetouani, “A blind image quality metric using
ror visibility to structural similarity,” IEEE Tran. on Im-
a selection of relevant patches based on convolutional
age Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
neural network,” in 2018 26th European Signal Pro-
[8] Guillaume Lavoué, “A Multiscale Metric for 3D Mesh cessing Conference (EUSIPCO), 2018, pp. 1452–1456.
Visual Quality Assessment,” Computer Graphics Fo- [22] Oussama Messai, Aladine Chetouani, F. Hachouf, and
rum, vol. 30, pp. 1427–1437, July 2011. Zianou Seghir, “No-reference stereoscopic image qual-
[9] Gabriel Meynet et al., “PCQM: A Full-Reference Qual- ity predictor using deep features from cyclopean image,”
ity Metric for Colored 3D Point Clouds,” in 2020 01 2021.
12th Intl. Conf. on Quality of Multimedia Experience [23] Ilyass Abouelaziz, Aladine Chetouani, Mohammed
(QoMEX 2020), Athlone, Ireland, May 2020. El Hassouni, LJ Latecki, and Hocine Cherifi, “3D vi-
[10] A. Javaheri et al., “Mahalanobis Based Point to Distri- sual saliency and convolutional neural network for blind
bution Metric for Point Cloud Geometry Quality Eval- mesh quality assessment,” Neural Computing and Ap-
uation,” IEEE Signal Process. Lett., vol. 27, pp. 1350– plications, 2019.
1354, 2020. [24] Scott J. Daly, “Visible differences predictor: an algo-
[11] I. Viola, S. Subramanyam, and P. Cesar, “A Color-Based rithm for the assessment of image fidelity,” in Human
Objective Quality Metric for Point Cloud Contents,” in Vision, Visual Processing, and Digital Display III, Ber-
2020 12th Intl. Conf. on Quality of Multimedia Experi- nice E. Rogowitz, Ed. International Society for Optics
ence (QoMEX), May 2020, pp. 1–6, ISSN: 2472-7814. and Photonics, 1992, vol. 1666, pp. 2 – 15, SPIE.
[12] E. Alexiou and T. Ebrahimi, “Towards a Point Cloud [25] S. Bosse, D. Maniry, K. Müller, T. Wiegand, and
Structural Similarity Metric,” in 2020 IEEE Intl. Conf. W. Samek, “Deep neural networks for no-reference and
on Multimedia Expo Workshops (ICMEW), July 2020, full-reference image quality assessment,” IEEE Trans.
pp. 1–6. on Image Process., vol. 27, no. 1, pp. 206–219, Jan
[13] A. Javaheri et al., “Improving PSNR-based Quality Met- 2018.
rics Performance For Point Cloud Geometry,” in 2020 [26] Q. Yang, Z. Ma, Y. Xu, R. Tang, and J. Sun, “Predict-
IEEE Intl. Conf. on Image Process. (ICIP), Oct. 2020, ing the perceptual quality of point cloud: A 3d-to-2d
pp. 3438–3442, ISSN: 2381-8549. projection-based exploration,” IEEE Trans. on Multi-
[14] Maurice Quach, Aladine Chetouani, Giuseppe Valen- media, 2020.
zise, and Frédéric Dufaux, “A deep perceptual metric [27] Stuart Perry et al., “Quality Evaluation Of Static Point
for 3D point clouds,” in Image Quality and System Per- Clouds Encoded Using MPEG Codecs,” in 2020 IEEE
formance, IS&T International Symposium on Electronic Intl. Conf. on Image Process. (ICIP), Oct. 2020, pp.
Imaging (EI 2021), San Francisco, United States, Jan. 3428–3432, ISSN: 2381-8549.
2021.
[15] Ilyass Abouelaziz, Aladine Chetouani, Mohammed El
Hassouni, Longin Jan Latecki, and Hocine Cherifi, “No-
reference mesh visual quality assessment via ensemble
of convolutional neural networks and compact multi-
linear pooling,” Pattern Recognition, vol. 100, pp.
107174, 2020.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on December 11,2023 at 04:20:04 UTC from IEEE Xplore. Restrictions apply.