PQA-Net Deep No Reference Point Cloud Quality Assessment Via Multi-View Projection
PQA-Net Deep No Reference Point Cloud Quality Assessment Via Multi-View Projection
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4646 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
Fig. 1. Overview of the proposed PQA-Net, in which CNN denotes convolutional neural networks, CON denotes convolution layer and MP denotes
maxpooling, GDN denotes the generalized divisive normalization, which is used to replace the activation function, BN is the batch normal unit, FC is fully
connection module, DO is the dropout unit, l1 is cross-entropy loss function, l2 is defined based on person linear correlation coefficient (PLCC), and the
overall loss function l is the linear weighting of l1 and l2 .
based on DNNs. Usually, DNN needs a large dataset for given in Section II. Then, the proposed method is presented
training. Currently, the existing relatively large public point in detail in Section III. Comprehensive experimental results
cloud quality assessment dataset, i.e., the Waterloo Point and analyses are provided in Section IV. Finally, Section V
Cloud (WPC) dataset1 [12], contains only 20 annotations and concludes this paper.
720 degraded point clouds, which are not enough for directly
training. To overcome this problem, inspired by [14] and [15] II. R ELATED W ORK
which deal with small sample dataset very well, we propose Existing point cloud quality metrics can be broadly catego-
a DNN-based NR point cloud quality assessment method to rized as three-dimensional direct metrics and two-dimensional
estimate the quality of point clouds with a small sample data indirect metrics. Three-dimensional direct metrics rely on
set. The NR point cloud quality assessment problem is divided finding correspondences for all points or the relevant char-
into two sub-tasks. Sub-task I classifies a point cloud into a acteristics between the degraded point cloud and the original
specific distortion type from a set of pre-defined categories. point cloud. While two-dimensional indirect metrics rely on
Sub-task II predicts the perceptual quality of the corresponding the quality weight of the quality of multiple two-dimensional
point cloud by taking the advantage of distortion information projections from the point cloud.
obtained from the distortion type classification task. The two Usually, the Euclidean distance between corresponding
sub-tasks are accomplished by two sub-networks with shared points of the degraded and the original point clouds is used
features, as shown in Fig. 1. The contributions of this paper to measure the distortion, i.e., the point-to-point error adopted
are as follows. by MPEG [16]. However, it ignores the surface structures.
• We propose to use DNN to assess the quality of point Tian et al. [17] proposed point-to-plane distance to measure
cloud. To the best of our knowledge, this is the first NR the geometric distortion, which is less dependent on a complex
point cloud quality assessment method, which can be used surface construction. Beside Euclidean distance, Alexiou and
directly without any original information. Ebrahimi [18] proposed a promising alternative objective
• In the proposed network, the point cloud quality assess- quality metric for point clouds based on the angular similarity
ment task is divided into 2 cooperative sub-tasks, i.e., dis- geometric distortion. Javaheri et al. [19] used the generalized
tortion type classification and quality prediction tasks, Hausdorff distance to evaluate the geometry quality of point
to deal with the small sample dataset efficiently. clouds. Meynet et al. [20] used the local curvature statistics to
• A multi-view projection strategy is adopted to effectively evaluate the quality of point clouds. Viola et al. [21] extracted
extract the feature of a point cloud comprehensively. color statistics such as histograms and correlograms to assess
• Experimental results on the Waterloo dataset demonstrate the quality of point cloud. Similarly, Meynet et al. [22]
that the proposed method achieves comparable or even extracted a set of geometry and color features for each
better performance than existing state-of-the-art FR and point based on a correspondence between the distorted and
NR methods. reference point clouds, then, proposed an FR quality metric
The remainder of this paper is organized as follows. Brief as a linear combination of an optimal subset of the extracted
review of related work for point cloud quality assessment is features. Diniz et al. [23]–[25] proposed a local luminance
pattern descriptor-based and local binary pattern descriptor-
1 https://github.com/qdushl/Waterloo-Point-Cloud-Database based distances to assess the perceived quality of the test
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4647
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4648 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
Fig. 3. Detailed architecture
of the proposed PQA-Net, and denote the parameterizations of the convolutional layer as “height × width input channel ×
output channel stride padding. DTI and QVP share features extracted from MVFEF and the outputs of the DTI and QVP are multiplied to calculate the
quality of degraded point cloud.”
networks (CNN) blocks, i.e., CNN1, CNN2, CNN3, and effectively and quickly, the cross entropy loss l1 (X(k) ; W, w1 )
CNN4. The first three CNNs are all consisted of convolution over the mini batch is used,
(CON) layers, generalized divisive normalization (GDN)2 , and
maxpooling (MP), whereas the CNN4 additionally adds a
K
D
l1 (X(k) ; W, w1 ) = − pi(k)log[ p̂i(k) (X(k) ; W, w1 )], (2)
batch normal unit compared to the previous CNNs. The model
k=1 i=1
parameters of the feature extractor are collectively denoted
by W, and the parameterizations of convolution, maxpooling, where w1 denotes the model parameters of the DTI module.
and connectivity from layer to layer are given in Fig. 3. Note The QVP module for sub-task II has a similar structure to
that Fig. 3 only shows the detailed CNN structure for one the distortion identification module, but lacks of the softmax
projected image as an example. The parameters of the CNNs unit, resulting in an architecture of two FCs, one GDN, and
for the six projected images are shared. Since the point cloud one DO. The QVP module takes in the features from MVFEF.
has various shapes and sizes, to represent various point cloud Its output is then multiplied by the estimated distortion type
shapes, the resolution of the projected images must be large probability vector p̂(k) to obtain the final quality score of
enough, and the occupied effective pixels used to represent a point cloud. The aim of this module is to predict the
the point cloud in the projected images are also different. perceptual quality vector of X(k) in the form of a scalar value
Therefore, we first detect the center pixel of the point cloud on Q̂ (k) , whose parameters are collectively denoted by w2 . The
the six projection planes, and then perform center cropping to quality predictor produces a score vector ŝ(k) whose i -th entry
obtain six cropped images with size of 235 × 235. As a result, represents the perceptual quality score corresponding to the
we represent a point cloud by six 235 × 235 images with three i -th distortion type. The final scalar value Q̂ (k) is computed
channels (red, green, and blue), and then fed them into the by an inner-product of p̂(k) and ŝ(k) ,
feature extractor to generate a 384-dimensional feature vector.
D
The distortion identification and quality prediction sub-
Q̂ (k) = p̂(k)T ŝ(k) = p̂i(k) ŝi(k) . (3)
tasks are conducted by GDN, fully connection (FC), dropout
i=1
(DO), and Softmax operations. Specifically, the architecture
of DTI module is composed by two FC layers to compactly Person linear correlation coefficient (PLCC) represents the
represent the input feature, one GDN transform to increase linear correlation between the objective scores and the sub-
nonlinearity, and a dropout layer to reduce the probability of jective scores. As it is a commonly-used evaluation criteria in
overfitting. We also adopt the softmax function to convert the the context of perceptual quality assessment, we defined loss
unnormalized outputs of the last FC layer into D distortion function l2 of Subtask II to improve the PLCC directly. Once
probability vectors p̂(k) (X(k) ; W, w1 ), where D is the number the predicted score Q̂ (k) is obtained, the PLCC between the
of distortion types, and k is the index of the mini-batch. Since predicted scores and the ground truth q (k) in the mini-batch
the cross entropy loss is very suitable for classification, and its will be computed as
convergence speed is fast [43], to classify the distortion type
l2 (X(k) ; W, w1 , w2 )
K (k) − Q̂
2 GDN is a differentiable transform that can be trained with any preceding k=1 Q̂ m q (k) − qm
or subsequent layers. It is inspired biologically [14]. Its effectiveness has been = , (4)
proven in image quality assessment [40], Gaussianizing image densities [41], K (k) − Q̂
2
K (k) − q 2
and digital images compression [42] k=1 Q̂ m k=1 q m
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4649
where Q̂ m and qm denote the mean Q̂ (k) and q (k) across the a quality predictor by taking the distortion types as a strong
mini-batch. The advantage of choosing the PLCC loss instead regularizer.
of the widely-used L1 or L2 loss functions is that human
beings are more consistent with the rankings of perceptual IV. E XPERIMENTAL R ESULTS AND A NALYSES
quality rather than scores [44]. By taking the accuracy of
distortion type classification task into account, we define the In this section, we first describe the experimental setups
hybrid loss function of PQA-Net as including the implementation of PQA-Net and the Waterloo
Point Cloud Sub-Dataset (WPCSD). We then did ablation
l(X(k) ; W, w1 , w2 ) = l1 − λl2 , (5) study to confirm the influence of the λ and the DTI module.
After that, we compared the proposed PQA-Net with optimal
where λ is a positive weight value to account for the scale parameters with state-of-the-art FR and RR quality metrics
difference between the DTI and QVP modules. The hybrid and displayed the comparative visual results. Finally, we tested
loss function contains the cross entropy (l1 ) and PLCC (l2 ), the performance of the proposed PQA-Net by selecting differ-
whose lower and higher values indicate better performance, ent loss functions and also tested its performance on other
respectively. Therefore, the PLCC (l2 ) need to be subtracted datasets.
from the cross-entropy (l1 ) loss.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4650 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
computed as
i (Q i − Q m ) Q̂ i − Q̂ m
P LCC = , (8)
2
i (Q i − Q m ) 2
i Q̂ i − Q̂ m
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4651
shown in Table II, from which we can see that λ = 10 achieves 3) Experimental Results on Testing Dataset of WPCSD:
the best. Besides, Fig. 7 shows the convergence curves of the We compared the proposed PQA-Net with classic and state-
training procedure with λ = 10. of-the-art FR and RR point cloud quality metrics on WPCSD.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4652 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
Fig. 7. The loss curves of λ = 10, where x-coordinate indicates step number and the y-coordinate indicates the loss value.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4653
TABLE III
A CCURACY C OMPARISON W ITH E XISTING FR AND RR P OINT C LOUD Q UALITY M ETRICS FOR D IFFERENT C ONTENTS
there are too many features used in the metric, which limited sophisticatedly, and the weights are trained again on a larger
its generalization ability. If the number of features is reduced dataset, the results should be better. The Poi nt SS I M is the
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4654 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
Fig. 8. Scatter plots of objective scores vs. MOSs. The best fitting logistic functions are also shown as solid curves. Note: the smaller the value of PC Q M
and PC M R R , the higher the MOS value of the distorted point cloud.
worst, whose overall PLCC and SRCC are 0.14 and 0.18, TABLE IV
respectively. T HE C ONFUSION M ATRICES P RODUCED BY PQA-N ET
All the projection-based methods, i.e., SS I M p , M S-
SS I M p , and V I F Pp achieve good performance, due to that
both geometric and color information are taken into account,
which is the same as the proposed PQA-Net. It is worth
pointing out that the projected images employed in the pro-
posed PQA-Net are the same with that used in the image-
based metrics. We can see from Table III that V I F Pp has the To compare the accuracy of these quality metrics uniformly,
best performance, whose PLCC and SRCC are both as high a nonlinear four-parameter logistic function [60] is applied to
as 0.82, and PQA-Net is the second best, whose PLCC and map raw model predictions to the MOS scale. Scatter plots
SRCC are as high as 0.70 and 0.69, respectively. Limited by of objective scores and MOSs are shown in Fig. 8. The best
the unfair usage of the original information, the performance fitting logistic functions are shown as solid curves in Fig. 8.
of NR method is reasonably worse than the FR/RR methods. All the compared methods have different score ranges from
From the results, however, we can see that the performance the MOSs, and their correspondence with MOSs are given by
of the proposed PQA-Net is only slightly lower than the the fitting curve in each subplot of Fig. 8.
best FR method V I F Pp , indicating the advantage of the As a by-product, PQA-Net also outputs the distortion type
proposed PQA-Net. From Table III we can also see that of a point cloud. The confusion matrix [61] which is used
the prediction accuracy for the point cloud “Banana” is not to summarize the performance of a classification algorithm
good. The reason is that the point cloud “Banana” has a is shown in Table IV. The elements represent the probabili-
simple geometric structure and high-brightness single yellow ties that the distortion types are accurately predicted by the
colors, which are difficult for subjects to distinguish the quality classifier. The higher the diagonal values of the confusion
changes. As a result, the quality labels of “Banana” for matrix, the better. Since the statistical behaviors of noise
training are noisy, and the performance of PQA-Net is also distortion have obvious distinction with the other three kinds of
reduced. distortion, PQA-Net predicts noise distortion perfectly. On the
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4655
TABLE V
A CCURACY C OMPARISON W ITH E XISTING FR AND RR P OINT C LOUD Q UALITY M ETRICS FOR D IFFERENT D ISTORTION T YPES
other hand, the compression distortion generated by G-PCC type classifier was calculated. The accuracy is defined as the
and V-PCC is more easily to be confused by the classifier. proportion of the total number of predictions that are correct:
Since the number of samples with G-PCC distortion is about
DTr
1.33 times more than the V-PCC, the classification accuracy DTa = , (11)
of G-PCC is better. Accordingly, we can speculate that using a DTt
larger dataset is likely to result in better performance. To mea- where DTr denotes the number of correct predictions made
sure the performance of classifier, the accuracy of distortion by a distortion type classifier, and DTt denotes the total
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4656 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
Fig. 9. Subjective comparison between two point clouds which have the same quality predicted by evaluated quality metrics.
number of all the tested point clouds. From Table IV, we can directly discarding some points of the original point cloud,
compute the mean DTa predicted by PQA-Net is as high as which damage the geometry and color information rudely. This
(0.72 + 1.00 + 0.99 + 0.49) ÷ 4 = 0.8. is also the reason why the methods (i.e., P S N R M S E, p2 p0 ,
4) Experimental Results on Distortion Types: We also com- P S N R M S E, p2 pl , P S N R H F, p2 p0 , P S N R H F, p2 pl , ASMean ,
pared the proposed PQA-Net with classic and state-of-the-art AS R M S , and ASM S E ) that only consider the geometry struc-
FR and RR point cloud quality metrics for different distortion ture also perform good. For the noi se distortion type, white
types. gaussian noise was added independently to both geometry
These results are shown in Table V. First of all, for the and color elements, respectively. P S N RY performs the best
downsample distortion type, the performance of all methods since it focuses on the color noise at the corresponding point
is relatively good. Since this kind of distortion is caused by perfectly. For the G-PCC and V-PCC distortion types which
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4657
TABLE VI that these two degraded point clouds are selected from the
I NFLUENCE OF THE D IFFERENT L OSS F UNCTIONS l
2 IN (12) same distortion type for fair comparison. Take the subjective
visual difference of the point cloud pi neapple as an example,
as shown in Fig. 9, we can see that the visual perception of
the two degraded point clouds is very different in Fig. 9. (a),
but the Poi nt SS I M predicts that the quality of the two
degraded point clouds are the same, indicating the quality
metric Poi nt SS I M is inaccurate. Similar results can also be
introduces both geometry and color coding distortions, the found in Fig. 9. (b), (c), (d), and (f) for Gr aph S I M, PC Q M,
projection-based methods are better than the point-based and P S N RY , M S-SS I M p , SS I M p . The visual perceptions of the
the angular-based methods (as mentioned in Sec. IV-C3) due two degraded point clouds in Fig. 9. (g) and Fig. 9. (h) are
to comprehensive consideration of distortion of geometry and very close, and the quality scores predicted by the V I F Pp
color. In summary, V I F Pp performs the best for all distortion and the proposed PQA-Net are also very close, indicating that
types. PC M R R , PC Q M, and Poi nt SS I M also perform well the prediction quality score evaluated by the V I F Pp (an FR
in specific distortion types but they are not robust enough. method) and the proposed PQA-Net (a NR method) are con-
As a comparison, the performance of the proposed PQA-Net sistency with the subjective vision. To better compare the trend
is robust enough, and its accuracy is very close to the FR similarity between the predicted MOSs of each quality metric
metric V I F Pp . and the ground truth, we divided MOS interval into three sub-
5) Visual Result: As we know, human vision is the ultimate intervals ([0, 40), [40, 65) and [65, 100], respectively), and
standard to evaluate different quality metrics [62]. Accord- then compared the box plot statistical diagrams, as shown
ingly, we further compared the consistency between the visual in Fig 10. Note that we applied a nonlinear four-parameter
results of PQA-Net and the state-of-the-art methods. Since logistic function [60] to map raw model predictions to the
the distribution ranges of the point cloud quality scores MOS scale in Fig 10. It can be seen that the trend of the
predicted by different evaluation methods are very different, median line of the predicted MOSs of our proposed PQA-Net
we compared the visual difference of two degraded point is very similar to that of the trend of the ground truth MOSs.
clouds which have the same predicted quality score obtained The median line of the predicted MOSs of PC M R R is also
from the evaluated quality metrics. It is worth mentioning close to that of the ground truth MOSs in sub-intervals [0, 40)
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4658 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
TABLE VII
I NTRODUCTION OF O THER D ATASETS
and [40, 65), but unfortunately, there are no any raw model TABLE VIII
predictions mapped into [65, 100] interval. T HE P ERFORMANCE OF PQA-N ET ON D IFFERENT D ATASETS
6) Ablation Studies for Loss Functions: We investigated
more loss functions for PQA-Net to explore the performance.
The widely used loss functions, e.g. L1 loss which measures
the mean absolute error between each element in the input
objective and target objective [14], L2 loss which measures
the mean squared error between each element in the input because the differences between the distortion types are small,
objective and target objective [63], and smooth L1 loss which especially for the distortion types of Octree-Lifting, Octree-
uses a squared term if the absolute element-wise error falls RAHT, TriSoup-Lifting, and TriSoup-RAHT, which were all
below a threshold and an L1 term otherwise [64], are chosen. generated from the G-PCC encoder. To distinguish these sim-
In this ablation study, we adjusted the hybrid loss function of ilar distortion types is difficult for humans, let alone a neural
PQA-Net as network based on limited training dataset. The performance
l(X(k) ; W, w1 , w2 ) = l1 + λablat ion l2 , (12) of PQA-Net is the best on SJTU-PCQA dataset because the
difference between the distortion types on SJTU-PCQA dataset
where λablat ion is set to be 0.025 to balance the difference is obvious.
between the loss of distortion classifier and quality predictor
V. C ONCLUSION
task, l1 is the cross entropy to measure the classification error We proposed a deep learning-based NR point cloud quality
of sub-task I, l2 is chosen from L1, L2, or Smooth L1 to assessment method, namely PQA-Net, which contains feature
measure the error between the actual and predicted scores extraction, distortion type identification, and the quality vector
for sub-task II, and W, w1 , w2 are the parameters of MVFEF, prediction modules. The most important advantage of pro-
DTI, and QVP modules, respectively. The result of different posed PQA-Net is that it is the first NR quality assessment
loss functions are given in Table VI. We can see that the method for point clouds, which is crucial for point cloud
highest PLCC and SRCC of L1/L2/Smooth L1 loss functions communication system. In addition, DNN is first introduced
are 0.54 and 0.62, respectively, and the lowest MAE is 14.49. for point cloud quality assessment, and the limitation of small
However, the average PLCC and SRCC of the PLCC-based sample data was also overcome. Experimental results demon-
loss function (5) are 0.70 and 0.69, respectively, and the MAE strated it can achieve comparable or even better performance
is 12.36 (see Table III), indicating its advantage. than most of the existing FR and RR point cloud quality
7) Ablation Studies on Different Datasets: To fully test the metrics.
performance of PQA-Net, we also conducted further experi- One potential future direction is to reasonably and effec-
ments on the other subjective datasets. The subjective datasets tively expand the subjective dataset to further improve the
are briefly introduced in Table VII. For SJTU-PCQA dataset, performance. Another potential direction is to optimize the
as there are some distorted point clouds with mixed distortion network architecture by introducing advanced neural modules.
types which are useless for our testing, only the distorted
point clouds with individual distortion types are tested. In the R EFERENCES
experiments, the weighting factor λ in the proposed hybrid [1] Y. Tian, K. Wang, Y. Wang, Y. Tian, Z. Wang, and F.-Y. Wang, “Adaptive
and azimuth-aware fusion network of multimodal local features for 3D
loss function (5) was set to be 15 for IRPC [65], 15 for SJTU- object detection,” Neurocomputing, vol. 411, pp. 32–44, Oct. 2020.
PCQA [36], and 25 for M-PCCD [66] to adapt to the different [2] Z. Xie, J. Chen, and B. Peng, “Point clouds learning with attention-based
graph convolution networks,” Neurocomputing, vol. 402, pp. 245–255,
distortion types of these dataset. We randomly choose 80% Aug. 2020.
reference point clouds for training and the rest 20% reference [3] Q. Liu, H. Yuan, R. Hamzaoui, and H. Su, “Coarse to fine rate control
point clouds are left out for testing on each dataset. There are for region-based 3D point cloud compression,” in Proc. IEEE Int. Conf.
Multimedia Expo Workshops (ICMEW), Jul. 2020, pp. 1–6.
no overlapping samples between the training and the testing [4] S. Gu, J. Hou, H. Zeng, H. Yuan, and K.-K. Ma, “3D point cloud
sets. attribute compression using geometry-guided sparse representation,”
Detailed experiment results are given in Table VIII, from IEEE Trans. Image Process., vol. 29, pp. 796–808, Aug. 2019.
which we can see that the performance on the IRPC dataset [5] H. Yuan, D. Zhang, W. Wang, and Y. Li, “A sampling-based 3D point
cloud compression algorithm for immersive communication,” Mobile
is the worst. This is because the amount of training data Netw. Appl., vol. 25, no. 5, pp. 1863–1872, Oct. 2020.
of the IRPC dataset is very small, which greatly limits the [6] Q. Liu, H. Yuan, J. Hou, H. Liu, and R. Hamzaoui, “Model-based
performance of the proposed PQA-Net. The performance of encoding parameter optimization for 3D point cloud compression,”
in Proc. Asia–Pacific Signal Inf. Process. Assoc. Annu. Summit Conf.
PQA-Net on M-PCCD is better, but is not yet perfect. This is (APSIPA ASC), 2018, pp. 1981–1986.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4659
[7] L. Li, Z. Li, S. Liu, and H. Li, “Occupancy-map-based rate distortion [31] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
optimization and partition for video-based point cloud compression,” quality assessment: From error visibility to structural similarity,” IEEE
IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 1, pp. 326–338, Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
Jan. 2021. [32] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,”
[8] S. Gu, J. Hou, H. Zeng, and H. Yuan, “3D point cloud attribute IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.
compression via graph prediction,” IEEE Signal Process. Lett., vol. 27, [33] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
pp. 176–180, Jan. 2020. similarity for image quality assessment,” in Proc. 37th Asilomar Conf.
[9] Q. Wang, Y. Tan, and Z. Mei, “Computational methods of acquisition Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
and processing of 3D point cloud data for construction applications,” [34] R. L. de Queiroz and P. A. Chou, “Motion-compensated compression of
Arch. Comput. Methods Eng., vol. 27, no. 2, pp. 479–499, Apr. 2020. dynamic voxelized point clouds,” IEEE Trans. Image Process., vol. 26,
[10] S. Schwarz et al., “Emerging MPEG standards for point cloud com- no. 8, pp. 3886–3895, Aug. 2017.
pression,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 1, [35] Q. Liu, H. Yuan, J. Hou, R. Hamzaoui, and H. Su, “Model-based joint bit
pp. 133–148, Mar. 2018. allocation between geometry and color for video-based 3D point cloud
[11] H. Liu, H. Yuan, Q. Liu, J. Hou, and J. Liu, “A comprehensive compression,” IEEE Trans. Multimedia, early access, Sep. 10, 2020, doi:
study and comparison of core technologies for MPEG 3-D point cloud 10.1109/TMM.2020.3023294.
compression,” IEEE Trans. Broadcast., vol. 66, no. 3, pp. 701–717, [36] Q. Yang, H. Chen, Z. Ma, Y. Xu, R. Tang, and J. Sun, “Predicting
Sep. 2020. the perceptual quality of point cloud: A 3D-to-2D projection-based
[12] H. Su, Z. Duanmu, W. Liu, Q. Liu, and Z. Wang, “Perceptual quality exploration,” IEEE Trans. Multimedia, early access, Oct. 23, 2020, doi:
assessment of 3d point clouds,” in Proc. IEEE Int. Conf. Image Process. 10.1109/TMM.2020.3033117.
(ICIP), Sep. 2019, pp. 3182–3186. [37] E. Alexiou and T. Ebrahimi, “Towards a point cloud structural similarity
[13] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “Subjective and metric,” in Proc. IEEE Int. Conf. Multimedia Expo Workshops (ICMEW),
objective quality evaluation of compressed point clouds,” in Proc. IEEE Jul. 2020, pp. 1–6.
19th Int. Workshop Multimedia Signal Process. (MMSP), Oct. 2017, [38] Q. Liu, H. Yuan, R. Hamzaoui, H. Su, J. Hou, and H. Yang, “Reduced
pp. 1–6. reference perceptual quality model and application to rate control for 3D
[14] K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- point cloud compression,” 2020, arXiv:2011.12688. [Online]. Available:
end blind image quality assessment using deep neural networks,” IEEE https://arxiv.org/abs/2011.12688
Trans. Image Process., vol. 27, no. 3, pp. 1202–1213, Mar. 2017. [39] 10, Vocabulary for Performance and Quality of Service, I. Recommen-
[15] W. Liu, Z. Duanmu, and Z. Wang, “End-to-end blind quality assessment dation, Geneva, Switzerland, 2006.
of compressed videos using deep neural networks,” in Proc. 26th ACM [40] Q. Li and Z. Wang, “Reduced-reference image quality assessment using
Int. Conf. Multimedia, Oct. 2018, pp. 546–554. divisive normalization-based image representation,” IEEE J. Sel. Topics
[16] Common Test Conditions for Point Cloud Compression, docu- Signal Process., vol. 3, no. 2, pp. 202–211, Apr. 2009.
ment ISO/IEC/JTC1/SC29/WG11/MPEG/N19324, 3DG, Apr. 2020. [41] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images
[17] D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro, “Geometric using a generalized normalization transformation,” in Proc. 4th Int. Conf.
distortion metrics for point cloud compression,” in Proc. IEEE Int. Conf. Learn. Represent. (ICLR), 2016, pp. 1–14.
Image Process. (ICIP), Sep. 2017, pp. 3460–3464. [42] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image
[18] E. Alexiou and T. Ebrahimi, “Point cloud quality assessment metric compression,” in Proc. 5th Int. Conf. Learn. Represent. (ICLR), 2019,
based on angular similarity,” in Proc. IEEE Int. Conf. Multimedia Expo pp. 1–27.
(ICME), Jul. 2018, pp. 1–6.
[43] K. Zhang, X. Wang, Y. Guo, Z. Zhao, and Z. Ma, “Competing ratio
[19] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “A generalized
loss for multi-class image classification,” in Proc. IEEE Vis. Commun.
Hausdorff distance based quality metric for point cloud geometry,” in
Image Process. (VCIP), Dec. 2019, pp. 1–4.
Proc. 12th Int. Conf. Qual. Multimedia Exper. (QoMEX), May 2020,
[44] R. K. Mantiuk, A. Tomaszewska, and R. Mantiuk, “Comparison of
pp. 1–6.
four subjective methods for image quality assessment,” Comput. Graph.
[20] G. Meynet, J. Digne, and G. Lavoué, “PC-MSDM: A quality metric
Forum, vol. 31, no. 8, pp. 2478–2491, 2012.
for 3D point clouds,” in Proc. 11th Int. Conf. Qual. Multimedia Exper.
(QoMEX), Jun. 2019, pp. 1–3. [45] Agisoft. (2010). Agisoft Photoscan. [Online]. Available:
[21] I. Viola, S. Subramanyam, and P. Cesar, “A color-based objective http://www.agisoft.com
quality metric for point cloud contents,” in Proc. 12th Int. Conf. Qual. [46] E. Alexiou and T. Ebrahimi, “Exploiting user interactivity in quality
Multimedia Exper. (QoMEX), May 2020, pp. 1–6. assessment of point cloud imaging,” in Proc. 11th Int. Conf. Qual.
[22] G. Meynet, Y. Nehmé, J. Digne, and G. Lavoué, “PCQM: A full- Multimedia Exper. (QoMEX), Jun. 2019, pp. 1–6.
reference quality metric for colored 3D point clouds,” in Proc. 12th [47] D. P. Kingma and J. Ba, “Adam: A method for stochastic
Int. Conf. Qual. Multimedia Exper. (QoMEX), May 2020, pp. 1–6. optimization,” 2014, arXiv:1412.6980. [Online]. Available:
[23] R. Diniz, P. G. Freitas, and M. C. Q. Farias, “Local luminance patterns https://arxiv.org/abs/1412.6980
for point cloud quality assessment,” in Proc. IEEE 22nd Int. Workshop [48] P. Sedgwick, “Pearson’s correlation coefficient,” Brit. Med. J., vol. 345,
Multimedia Signal Process. (MMSP), Sep. 2020, pp. 1–6. p. e4483, Jul. 2012.
[24] R. Diniz, P. G. Freitas, and M. C. Q. Farias, “Towards a point cloud [49] D. Sheskin, “Spearman’s rank-order correlation coefficient,” in
quality assessment model using local binary patterns,” in Proc. 12th Int. Handbook of Parametric and Nonparametric Statistical Procedures.
Conf. Qual. Multimedia Exper. (QoMEX), May 2020, pp. 1–6. Boca Raton, FL, USA: Chapman & Hall, 2007, pp. 1353–1370.
[25] R. Diniz, P. G. Freitas, and M. C. Q. Farias, “Multi-distance point cloud [50] H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped
quality assessment,” in Proc. IEEE Int. Conf. Image Process. (ICIP), images,” IEEE Trans. Image Process., vol. 22, no. 2, pp. 657–667,
Oct. 2020, pp. 3443–3447. Feb. 2013.
[26] Q. Yang, Z. Ma, Y. Xu, Z. Li, and J. Sun, “Inferring point cloud [51] Q. Wu, H. Li, F. Meng, and K. N. Ngan, “A perceptually weighted
quality via graph similarity,” IEEE Transactions on Pattern Analy- rank correlation indicator for objective image quality assessment,” IEEE
sis and Machine Intelligence, early access, Dec. 24, 2020, doi: Trans. Image Process., vol. 27, no. 5, pp. 2499–2513, May 2018.
10.1109/TPAMI.2020.3047083. [52] R. Mekuria, Z. Li, C. Tulvan, and P. Chou, Evaluation Criteria for PCC
[27] Q. Yang, S. Chen, Y. Xu, J. Sun, M. S. Asif, and Z. Ma, “Point (Point Cloud Compression, document MPEG N16332, 2016.
cloud distortion quantification based on potential energy for human [53] R. Mekuria, S. Laserre, and C. Tulvan, “Performance assessment of
and machine perception,” 2021, arXiv:2103.02850. [Online]. Available: point cloud compression,” in Proc. IEEE Vis. Commun. Image Process.
https://arxiv.org/abs/2103.02850 (VCIP), Dec. 2017, pp. 1–4.
[28] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “Mahalanobis based [54] I. Viola and P. Cesar, “A reduced reference metric for visual quality
point to distribution metric for point cloud geometry quality evaluation,” evaluation of point cloud contents,” IEEE Signal Process. Lett., vol. 27,
IEEE Signal Process. Lett., vol. 27, pp. 1350–1354, Jul. 2020. pp. 1660–1664, Sep. 2020.
[29] E. M. Torlig, E. Alexiou, T. A. Fonseca, R. L. de Queiroz, and [55] R. Mekuria, K. Blom, and P. Cesar, “Design, implementation, and
T. Ebrahimi, “A novel methodology for quality assessment of voxelized evaluation of a point cloud codec for tele-immersive video,” IEEE Trans.
point clouds,” Proc. SPIE, vol. 10752, Sep. 2018, Art. no. 107520I. Circuits Syst. Video Technol., vol. 27, no. 4, pp. 828–842, Apr. 2016.
[30] S. Wolf and M. H. Pinson, Reference Algorithm for Computing Peak [56] Q. Yang, Z. Ma, Y. Xu, Z. Li, and J. Sun. (2020). Infer-
Signal to Noise Ratio (PSNR) of a Video Sequence With a Constant ring Point Cloud Quality Via Graph Similarity. [Online]. Available:
Delay, document ITU-T Contribution COM9-C6-E, 2009. https://github.com/NJUVISION/GraphSIM
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4660 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021
[57] G. Meynet, Y. Nehmé, J. Digne, and G. Lavoué. (2021). PCQM. Hao Liu received the B.E. degree from the Depart-
[Online]. Available: https://github.com/MEPP-team/PCQM ment of Communication of Engineering, Shandong
[58] E. Alexiou and T. Ebrahimi. (2020). PointSSIM: Point Cloud Agricultural University, Shandong, China, in 2017.
Structural Similarity Metric. [Online]. Available: https://github.com He is currently pursuing the Ph.D. degree with
/mmspg/pointssim Shandong University. His research interest is point
[59] I. Viola and P. Cesar. (2020). PCM_RR. [Online]. Available: clouds compression and processing.
https://github.com/cwi-dis/PCM_RR
[60] Final Report From the Video Quality Experts Group on the Validation of
Objective Models of Video Quality Assessment, VQEG Meeting, Ottawa,
ON, Canada, 2000.
[61] K. M. Ting, Confusion Matrix. Boston, MA, USA: Springer, 2010,
p. 209.
[62] A. Fang, X. Zhao, and Y. Zhang, “Cross-modal image fusion guided
by subjective visual attention,” Neurocomputing, vol. 414, pp. 333–345,
Nov. 2020. Yu Wang received the B.S. degree in commu-
[63] Q. Jiang, Z. Peng, S. Yang, and F. Shao, “Authentically distorted image nication engineering, the M.S. degree in software
quality assessment by learning from empirical score distributions,” IEEE engineering, and the Ph.D. degree in computer
Signal Process. Lett., vol. 26, no. 12, pp. 1867–1871, Dec. 2019. applications and techniques from Tianjin University
[64] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. in 2013, 2016, and 2020, respectively. He was an
(ICCV), Dec. 2015, pp. 1440–1448. Outstanding Visitor Scholar with the University of
[65] A. Javaheri, C. Brites, F. M. B. Pereira, and J. M. Ascenso, “Point Waterloo in 2019. He is currently an Assistant
cloud rendering after coding: Impacts on subjective and objective Professor with Tianjin University. He has published
quality,” IEEE Trans. Multimedia, early access, Nov. 11, 2020, doi: many peer-reviewed papers in world-class confer-
10.1109/TMM.2020.3037481. ences and journals, such as IEEE T RANSACTIONS
[66] E. Alexiou, I. Viola, T. M. Borges, T. A. Fonseca, R. L. de Queiroz, and ON F UZZY S YSTEMS (TFS), IEEE T RANSACTIONS
T. Ebrahimi, “A comprehensive study of the rate-distortion performance ON N EURAL N ETWORKS AND L EARNING S YSTEMS (TNNLS), IEEE
in MPEG point cloud compression,” APSIPA Trans. Signal Inf. Process., T RANSACTIONS ON C YBERNETICS (TCYB), and IEEE T RANSACTIONS
vol. 8, pp. 1–27, Nov. 2019. ON K NOWLEDGE AND D ATA E NGINEERING (TKDE). His research interests
focus on hierarchical learning and large-scale classification in industrial sce-
Qi Liu received the B.S. degree from Shandong narios and computer vision applications, data mining, and machine learning.
Technology and Business University, Shandong,
China, in 2011, and the M.S. degree from the
School of Telecommunication Engineering, Xidian
University, Xi’an, China, in 2014. She is currently
pursuing the Ph.D. degree with Shandong University, Huan Yang (Member, IEEE) received the B.S.
Shandong. From September 2018 to August 2019, degree in computer science from the Heilongjiang
she worked as a Visiting Graduate Student with the Institute of Technology, China, in 2007, the M.S.
Department of Electrical and Computer Engineering, degree in computer science from Shandong Uni-
University of Waterloo, Waterloo, ON, Canada. Her versity, China, in 2010, and the Ph.D. degree in
research interests include point cloud coding, and computer engineering from Nanyang Technological
processing and quality assessment. University, Singapore, in 2015. She is currently
working with the College of Computer Science and
Technology, Qingdao University, Qingdao, China.
Hui Yuan (Senior Member, IEEE) received the B.E. Her research interests include image/video process-
and Ph.D. degrees in telecommunication engineering ing and analysis, perception-based modeling and
from Xidian University, Xi’an, China, in 2006 and quality assessment, object detection/recognition, and machine learning.
2011, respectively.
He was with Shandong University (SDU), Jinan,
China, as a Lecturer from April 2011 to Decem-
ber 2014, an Associate Professor from Janu-
ary 2015 to August 2016, and has been a Full Junhui Hou (Senior Member, IEEE) received the
Professor since September 2016. He was with the B.Eng. degree in information engineering (Talented
Department of Computer Science, City University Students Program) from the South China Univer-
of Hong Kong (CityU), as a Post-Doctoral Fellow sity of Technology, Guangzhou, China, in 2009,
(Granted by the Hong Kong Scholar Project) from January 2013 to Decem- the M.Eng. degree in signal and information process-
ber 2014, and a Research Fellow from November 2017 to February 2018. ing from Northwestern Polytechnical University,
From November 2020 to November 2021, he also worked as a Marie Xi’an, China, in 2012, and the Ph.D. degree in
Curie Fellow (Granted by the Marie Skłodowska-Curie Actions Individual electrical and electronic engineering from the School
Fellowship under Horizon2020 Europe) with the School of Engineering of Electrical and Electronic Engineering, Nanyang
and Sustainable Development, De Montfort University, Leicester, U.K. His Technological University, Singapore, in 2016.
current research interests include video/image/immersive media processing, In January 2017, he immediately joined the
compression, adaptive streaming, and computer vision. He served as an Area Department of Computer Science, City University of Hong Kong, as an
Chair for IEEE ICME 2021, IEEE ICME 2020, and IEEE VCIP 2020. Assistant Professor. His research interests fall into the general areas of
visual computing, such as image/video/3D geometry data representation,
Honglei Su (Member, IEEE) received the B.A.Sc. processing and analysis, semi/un-supervised data modeling, and data com-
degree from Shandong University of Science and pression and adaptive transmission. He is currently an Elected Member of
Technology, Qingdao, China, in 2008, and the MSA-TC and VSPC-TC, and IEEE CAS. He was a recipient of several
Ph.D. degree from Xidian University, Xi’an, China, prestigious awards, including the Chinese Government Award for Outstanding
in 2014. Since September 2014, he has been work- Students Study Abroad from the China Scholarship Council in 2015 and
ing as an Assistant Professor with the School of the Early Career Award (3/381) from the Hong Kong Research Grants
Electronic Information, Qingdao University, Qing- Council in 2018. He served as an Area Chair for ACM MM19/20/21, IEEE
dao, China. From March 2018 to March 2019, ICME20, VCIP20/21, and WACV21. He is an Associate Editor for IEEE
he worked as a Visiting Scholar with the Department T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY,
of Electrical and Computer Engineering, University IEEE T RANSACTIONS ON I MAGE P ROCESSING, Signal Processing: Image
of Waterloo, Waterloo, ON, Canada. His research Communication, and The Visual Computer. He also served as the Guest
interests include perceptual image processing, immersive media processing, Editor for the IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH
and computer vision. O BSERVATIONS AND R EMOTE S ENSING.
Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.