0% found this document useful (0 votes)
35 views16 pages

PQA-Net Deep No Reference Point Cloud Quality Assessment Via Multi-View Projection

PQA

Uploaded by

Pramit Mazumdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views16 pages

PQA-Net Deep No Reference Point Cloud Quality Assessment Via Multi-View Projection

PQA

Uploaded by

Pramit Mazumdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO.

12, DECEMBER 2021 4645

PQA-Net: Deep No Reference Point Cloud Quality


Assessment via Multi-View Projection
Qi Liu , Hui Yuan , Senior Member, IEEE, Honglei Su , Member, IEEE, Hao Liu , Yu Wang,
Huan Yang , Member, IEEE, and Junhui Hou , Senior Member, IEEE

Abstract— Recently, 3D point cloud is becoming popular due I. I NTRODUCTION


to its capability to represent the real world for advanced content
modality in modern communication systems. In view of its wide
applications, especially for immersive communication towards
human perception, quality metrics for point clouds are essential.
Existing point cloud quality evaluations rely on a full or certain
W ITH the increasing capability of 3D data acquisition
devices, point clouds are becoming more and more
popular in a wide range of applications, from manufacture
portion of the original point cloud, which severely limits their and construction to 3D telepresence [1]–[3]. A 3D point cloud
applications. To overcome this problem, we propose a novel comprises a set of points consisting of geometry and attribute
deep learning-based no reference point cloud quality assessment
method, namely PQA-Net. Specifically, the PQA-Net consists of information [4], [5]. The geometry denotes the 3D space
a multi-view-based joint feature extraction and fusion (MVFEF) coordinates of the points, and the attribute usually includes
module, a distortion type identification (DTI) module, and a color, reflectance or normal vector of the points [6], [7].
quality vector prediction (QVP) module. The DTI and QVP Therefore, a point cloud can represent an object or a scene
modules share the feature generated from the MVFEF module. accurately and completely by millions of points directly [8].
By using the distortion type labels, the DTI and the MVFEF
modules are first pre-trained to initialize the network parameters, Due to the content complexity of actual 3D scene and
based on which the whole network is then jointly trained to the limitation of specific sensors [9], the acquired raw point
finally evaluate the point cloud quality. Experimental results on clouds always contain various types of noise, which are
the Waterloo Point Cloud dataset show that PQA-Net achieves harmful for further applications. In addition, because of the
better or equivalent performance comparing with the state-of- huge data volume [9] of the acquired point clouds, it is
the-art quality assessment methods. The code of the proposed
model will be made publicly available to facilitate reproducible difficult to achieve real-time lossless transmission under lim-
research https://github.com/qdushl/PQA-Net. ited bandwidth [10], [11]. Therefore, lossy compression must
be conducted before transmission, and thus the compression
Index Terms— No-reference point cloud quality assessment,
deep neural network, multi-task learning, multi-view. distortion is inevitable. As mentioned, point clouds are subject
to distortions induced by acquisition, processing, and com-
Manuscript received December 13, 2020; revised April 8, 2021; accepted pression, any of which may lead to quality degradation [12].
July 18, 2021. Date of publication July 26, 2021; date of current version
December 6, 2021. This work was supported in part by the National Consequently, how to assess the quality of point clouds and
Natural Science Foundation of China under Grant 61871342, in part by thus evaluate the performance of the corresponding acquisi-
the Open Project Program of the State Key Laboratory of Virtual Reality tion, processing, and compression systems becomes a critical
Technology and Systems, Beihang University, under Grant VRLAB2021A01,
in part by the Shandong Provincial Natural Science Foundation, China, under issue [13].
Grant ZR2018PF002, in part by the Hong Kong Research Grant Council Similar to image/video quality assessment, point cloud qual-
(RGC) under Grant 9042955 (CityU 11202320), and in part by the OPPO ity assessment can also be classified into three categories, i.e.,
Research Fund. This article was recommended by Associate Editor Z. Li.
(Corresponding author: Hui Yuan.) full-reference (FR), reduced-reference (RR), and no-reference
Qi Liu is with the School of Control Science and Engineering, Shandong (NR), based on the way of exploring the original point cloud.
University, Jinan 250061, China, and also with the School of Information Specifically, FR requires full access to the original point
Science and Engineering, Shandong University, Qingdao 266237, China
(e-mail: [email protected]). cloud, and RR relies on certain portion of the original point
Hui Yuan is with the School of Control Science and Engineering, Shandong cloud. Recently, major progresses in FR and RR quality
University, Jinan 250061, China (e-mail: [email protected]). assessment have been made (see Section II). However, in many
Honglei Su is with the School of Electronic Information, Qingdao Univer-
sity, Qingdao 266071, China (e-mail: [email protected]). point cloud application scenarios, the original point cloud
Hao Liu is with the School of Information Science and Engineering, cannot be obtained. Especially for some terminal devices
Shandong University, Qingdao 266237, China (e-mail: [email protected]). with limited storage and transmission capabilities, FR and RR
Yu Wang is with the School of Electrical and Information Engineer-
ing, Tianjin University, Tianjin 300072, China (e-mail: armstrong_wangyu@ quality assessment are unable to be conducted. Estimating the
tju.edu.cn). point cloud quality without the original one, i.e., NR quality
Huan Yang is with the College of Computer Science and Technology, assessment, is important for practical application since it can
Qingdao University, Qingdao 266071, China (e-mail: cathy_huanyang@
hotmail.com). be carried out readily. To the best of our knowledge, there is
Junhui Hou is with the Department of Computer Science, City University no NR quality assessment for point clouds by now.
of Hong Kong, Hong Kong (e-mail: [email protected]). Motivated by the great success of deep neural net-
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TCSVT.2021.3100282. works (DNNs) on image/video processing and analysis,
Digital Object Identifier 10.1109/TCSVT.2021.3100282 we propose an NR quality assessment method for point clouds
1051-8215 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4646 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

Fig. 1. Overview of the proposed PQA-Net, in which CNN denotes convolutional neural networks, CON denotes convolution layer and MP denotes
maxpooling, GDN denotes the generalized divisive normalization, which is used to replace the activation function, BN is the batch normal unit, FC is fully
connection module, DO is the dropout unit, l1 is cross-entropy loss function, l2 is defined based on person linear correlation coefficient (PLCC), and the
overall loss function l is the linear weighting of l1 and l2 .

based on DNNs. Usually, DNN needs a large dataset for given in Section II. Then, the proposed method is presented
training. Currently, the existing relatively large public point in detail in Section III. Comprehensive experimental results
cloud quality assessment dataset, i.e., the Waterloo Point and analyses are provided in Section IV. Finally, Section V
Cloud (WPC) dataset1 [12], contains only 20 annotations and concludes this paper.
720 degraded point clouds, which are not enough for directly
training. To overcome this problem, inspired by [14] and [15] II. R ELATED W ORK
which deal with small sample dataset very well, we propose Existing point cloud quality metrics can be broadly catego-
a DNN-based NR point cloud quality assessment method to rized as three-dimensional direct metrics and two-dimensional
estimate the quality of point clouds with a small sample data indirect metrics. Three-dimensional direct metrics rely on
set. The NR point cloud quality assessment problem is divided finding correspondences for all points or the relevant char-
into two sub-tasks. Sub-task I classifies a point cloud into a acteristics between the degraded point cloud and the original
specific distortion type from a set of pre-defined categories. point cloud. While two-dimensional indirect metrics rely on
Sub-task II predicts the perceptual quality of the corresponding the quality weight of the quality of multiple two-dimensional
point cloud by taking the advantage of distortion information projections from the point cloud.
obtained from the distortion type classification task. The two Usually, the Euclidean distance between corresponding
sub-tasks are accomplished by two sub-networks with shared points of the degraded and the original point clouds is used
features, as shown in Fig. 1. The contributions of this paper to measure the distortion, i.e., the point-to-point error adopted
are as follows. by MPEG [16]. However, it ignores the surface structures.
• We propose to use DNN to assess the quality of point Tian et al. [17] proposed point-to-plane distance to measure
cloud. To the best of our knowledge, this is the first NR the geometric distortion, which is less dependent on a complex
point cloud quality assessment method, which can be used surface construction. Beside Euclidean distance, Alexiou and
directly without any original information. Ebrahimi [18] proposed a promising alternative objective
• In the proposed network, the point cloud quality assess- quality metric for point clouds based on the angular similarity
ment task is divided into 2 cooperative sub-tasks, i.e., dis- geometric distortion. Javaheri et al. [19] used the generalized
tortion type classification and quality prediction tasks, Hausdorff distance to evaluate the geometry quality of point
to deal with the small sample dataset efficiently. clouds. Meynet et al. [20] used the local curvature statistics to
• A multi-view projection strategy is adopted to effectively evaluate the quality of point clouds. Viola et al. [21] extracted
extract the feature of a point cloud comprehensively. color statistics such as histograms and correlograms to assess
• Experimental results on the Waterloo dataset demonstrate the quality of point cloud. Similarly, Meynet et al. [22]
that the proposed method achieves comparable or even extracted a set of geometry and color features for each
better performance than existing state-of-the-art FR and point based on a correspondence between the distorted and
NR methods. reference point clouds, then, proposed an FR quality metric
The remainder of this paper is organized as follows. Brief as a linear combination of an optimal subset of the extracted
review of related work for point cloud quality assessment is features. Diniz et al. [23]–[25] proposed a local luminance
pattern descriptor-based and local binary pattern descriptor-
1 https://github.com/qdushl/Waterloo-Point-Cloud-Database based distances to assess the perceived quality of the test

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4647

point cloud. Besides, Yang et al. [26] constructed a local


graph to aggregate color gradient moments and estimate the
quality of point cloud. He also exploited multiscale potential
energy discrepancy to measure point cloud geometry and color
difference in [27]. Javaheri et al. [28] proposed a geometry
quality metric which computes the Mahalanobi s distance
between a point on the reference point cloud and a distribution
of points on a small region of the degraded one to measure
the geometry quality.
Two-dimensional indirect metrics project a point cloud onto
multiple 2D images from different viewpoint, and finally
predict the quality by weighting the quality of these 2D
images [29]. In this kind of methods, existing image distor-
tion metrics such as peak signal-to-noise ratio (PSNR) [30],
structural similarity (SSIM) [31], visual information fidelity in
Fig. 2. Illustration of projecting a point cloud onto 2D image planes.
pixel domain (VIFP) [32] and multi-scale structural similarity
(MS-SSIM) [33] are usually adopted directly to evaluate the
quality of the projected 2D images. De Queiroz and Chou [34] (RGB) color values. The six projected images are defined
applied the PSNR to guide rate distortion optimization in the as XY1 , XY2 , X Z 1 , X Z 2 , Y Z 1 , Y Z 2 , where XY1 and XY2
codec design. In our previous work [35], we have derived a are generated from XY -plane, X Z 1 and X Z 2 are generated
distortion metric by linearly combing the geometric distortion from X Z -plane, and Y Z 1 and Y Z 2 are generated from Y Z -
and the color distortion for point clouds, and also proposed plane. The pixel i of XY1 , XY2 , X Z 1 , X Z 2 , Y Z 1 , Y Z 2 can be
a corresponding rate distortion optimization method for point obtained as

cloud compression. Yang et al. [36] offered an objective FR ⎨
⎨ XY1 (i ) = (x i , yi , rmin(zi ) , gmin(zi ) , bmin(zi ) ),


metric via image feature aggregation by weighting global and ⎨
⎨ XY2 (i ) = (x i , yi , rmax(zi ) , gmax(zi ) , bmax(zi ) ),


local features extracted from two-dimensional color and depth ⎨ X Z (i ) = (x , z , r
1 i i min(yi ) , gmin(yi ) , bmin(yi ) ),
images of all projection planes. Despite the state-of-the-art (1)

⎨ X Z 2 (i ) = (x i , z i , rmax(yi ) , gmax(yi ) , bmax(yi ) ),
FR quality metrics, Alexiou and Ebrahimi [37] reduced the ⎨


⎨ Y Z 1 (i ) = (yi , z i , rmin(xi ) , gmin(xi ) , bmin(xi ) ),
dependence on the original point cloud information to some ⎨


extent, and proposed an RR quality metric which utilized a Y Z 2 (i ) = (yi , z i , rmax(xi ) , gmax(xi ) , bmax(xi ) ).
family of statistical dispersion measurements for the prediction
Take XY1 plane as an example, x i and yi denote the coordi-
of perceptual degradations. In our previous work [38], we also
nates of pixel i in the XY1 plane. There maybe multiple points
proposed an RR quality metric for point cloud based on the
projected at the same position of XY1 plane, therefore we
video-based point cloud coding (V-PCC) platform given by
select the point with the smallest Z-coordinate to be projected
MPEG. All of the above methods either work on 3D or 2D
onto the XY1 plane, that is (rmin(zi ) , gmin(zi ) , bmin(zi ) ). The
rely heavily on the whole or part of original point cloud, that
XY2 plane is generated the same with XY1 plane, except that
is to say, there is no NR quality assessment method for point
it selects the color of the point with the maximal Z-coordinate
clouds by now.
in the projected XY2 plane. The remaining projections are also
generated similarly, as shown in Fig. 2. The resolution of all
III. P ROPOSED PQA-N ET the projected images are set to be 1024 × 1024 as the side of
First, we introduce the pre-processing method briefly. Then, the bounding box of the used point clouds is 1024.
we describe the details of the proposed PQA-Net. Finally,
we wrap up this section by introducing the training and testing B. Network Architecture
procedures for the proposed PQA-Net.
The projected 2D images are then fed into the proposed
PQA-Net. We denote the input mini-batch training data by
A. Pre-Processing {(X(k) , p(k) , q (k) )}k=1
K , where X(k) is the k-th degraded point
For a point cloud, one can only see its 2D appearances cloud represented by its corresponding six projected images,
from a certain viewpoint at the same time. Thus, we first p(k) is a multi-class indicator vector with only one entry
project a point cloud onto multiple 2D images, and use activated to represent the ground truth distortion type, and
them to classify the distortion type and evaluate the qual- q (k) is the mean of opinion score (MOS) [39]1 of the k-th
ity. Specifically, to completely capture the characteristics of input point cloud. The feature extractor is to transform the six
a point cloud, we project it onto six 2D image planes projections X(k) to a 64 × 6 = 384 dimension quality-related
along the three orthogonal projections (up, down, left, right, feature vector. It includes four lightweight convolutional neural
front, and back). A point Pi in the point cloud is denoted
1 MOS is a commonly used indicator of perceived media quality. It is defined
by Pi = (x i , yi , z i , ri , gi , bi ) in which the first three ele-
as the average of opinion scores across subjects. The opinion score is a value
ments (x i , yi , z i ) stand for the geometry coordinates and in a predefined range (e.g. 0-100) that a subject assigns based on his opinion
the last three elements (ri , gi , bi ) stand for red-green-blue to the performance of a system.

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4648 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021


Fig. 3. Detailed architecture
 of the proposed PQA-Net, and denote the parameterizations of the convolutional layer as “height × width  input channel ×
output channel  stride  padding. DTI and QVP share features extracted from MVFEF and the outputs of the DTI and QVP are multiplied to calculate the
quality of degraded point cloud.”

networks (CNN) blocks, i.e., CNN1, CNN2, CNN3, and effectively and quickly, the cross entropy loss l1 (X(k) ; W, w1 )
CNN4. The first three CNNs are all consisted of convolution over the mini batch is used,
(CON) layers, generalized divisive normalization (GDN)2 , and
maxpooling (MP), whereas the CNN4 additionally adds a 
K 
D
l1 (X(k) ; W, w1 ) = − pi(k)log[ p̂i(k) (X(k) ; W, w1 )], (2)
batch normal unit compared to the previous CNNs. The model
k=1 i=1
parameters of the feature extractor are collectively denoted
by W, and the parameterizations of convolution, maxpooling, where w1 denotes the model parameters of the DTI module.
and connectivity from layer to layer are given in Fig. 3. Note The QVP module for sub-task II has a similar structure to
that Fig. 3 only shows the detailed CNN structure for one the distortion identification module, but lacks of the softmax
projected image as an example. The parameters of the CNNs unit, resulting in an architecture of two FCs, one GDN, and
for the six projected images are shared. Since the point cloud one DO. The QVP module takes in the features from MVFEF.
has various shapes and sizes, to represent various point cloud Its output is then multiplied by the estimated distortion type
shapes, the resolution of the projected images must be large probability vector p̂(k) to obtain the final quality score of
enough, and the occupied effective pixels used to represent a point cloud. The aim of this module is to predict the
the point cloud in the projected images are also different. perceptual quality vector of X(k) in the form of a scalar value
Therefore, we first detect the center pixel of the point cloud on Q̂ (k) , whose parameters are collectively denoted by w2 . The
the six projection planes, and then perform center cropping to quality predictor produces a score vector ŝ(k) whose i -th entry
obtain six cropped images with size of 235 × 235. As a result, represents the perceptual quality score corresponding to the
we represent a point cloud by six 235 × 235 images with three i -th distortion type. The final scalar value Q̂ (k) is computed
channels (red, green, and blue), and then fed them into the by an inner-product of p̂(k) and ŝ(k) ,
feature extractor to generate a 384-dimensional feature vector.
D

The distortion identification and quality prediction sub-
Q̂ (k) = p̂(k)T ŝ(k) = p̂i(k) ŝi(k) . (3)
tasks are conducted by GDN, fully connection (FC), dropout
i=1
(DO), and Softmax operations. Specifically, the architecture
of DTI module is composed by two FC layers to compactly Person linear correlation coefficient (PLCC) represents the
represent the input feature, one GDN transform to increase linear correlation between the objective scores and the sub-
nonlinearity, and a dropout layer to reduce the probability of jective scores. As it is a commonly-used evaluation criteria in
overfitting. We also adopt the softmax function to convert the the context of perceptual quality assessment, we defined loss
unnormalized outputs of the last FC layer into D distortion function l2 of Subtask II to improve the PLCC directly. Once
probability vectors p̂(k) (X(k) ; W, w1 ), where D is the number the predicted score Q̂ (k) is obtained, the PLCC between the
of distortion types, and k is the index of the mini-batch. Since predicted scores and the ground truth q (k) in the mini-batch
the cross entropy loss is very suitable for classification, and its will be computed as
convergence speed is fast [43], to classify the distortion type
l2 (X(k) ; W, w1 , w2 )
K (k) − Q̂
2 GDN is a differentiable transform that can be trained with any preceding k=1 Q̂ m q (k) − qm
or subsequent layers. It is inspired biologically [14]. Its effectiveness has been =  , (4)
proven in image quality assessment [40], Gaussianizing image densities [41], K (k) − Q̂
2 
K (k) − q 2
and digital images compression [42] k=1 Q̂ m k=1 q m

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4649

Fig. 4. The distortion type of samples.

where Q̂ m and qm denote the mean Q̂ (k) and q (k) across the a quality predictor by taking the distortion types as a strong
mini-batch. The advantage of choosing the PLCC loss instead regularizer.
of the widely-used L1 or L2 loss functions is that human
beings are more consistent with the rankings of perceptual IV. E XPERIMENTAL R ESULTS AND A NALYSES
quality rather than scores [44]. By taking the accuracy of
distortion type classification task into account, we define the In this section, we first describe the experimental setups
hybrid loss function of PQA-Net as including the implementation of PQA-Net and the Waterloo
Point Cloud Sub-Dataset (WPCSD). We then did ablation
l(X(k) ; W, w1 , w2 ) = l1 − λl2 , (5) study to confirm the influence of the λ and the DTI module.
After that, we compared the proposed PQA-Net with optimal
where λ is a positive weight value to account for the scale parameters with state-of-the-art FR and RR quality metrics
difference between the DTI and QVP modules. The hybrid and displayed the comparative visual results. Finally, we tested
loss function contains the cross entropy (l1 ) and PLCC (l2 ), the performance of the proposed PQA-Net by selecting differ-
whose lower and higher values indicate better performance, ent loss functions and also tested its performance on other
respectively. Therefore, the PLCC (l2 ) need to be subtracted datasets.
from the cross-entropy (l1 ) loss.

A. Waterloo Point Cloud Sub-Dataset


C. Training and Testing
We compared the proposed PQA-Net with classic and
The PQA models are trained on the newly and the existing
state-of-the-art FR and RR point cloud quality assessments
relatively large collected data set [12] with two category labels,
metrics on the Waterloo Point Cloud Dataset [12] which
i.e., distortion types and MOS scores. PQA adopts a two-step
covers various geometry and texture complexity. In the experi-
strategy to train the multi-task neural network. The distortion
ments, we selected 20 high-resolution point clouds with nearly
type classification task and the quality prediction task share
pristine quality as the basis to construct the sub-dataset for
the extracted feature of MVFEF module, and the quality of
PQA-Net. The 3D point clouds of Waterloo Point Cloud
a degraded point cloud is the dot product of the output of
Dataset are constructed by using three steps. First of all,
the DTI and QVP modules. At the first step, the model para-
a single-lens-reflex camera and a turntable were employed
meters W of the MVFEF module and the model parameters
to take multiview images of an object. Second, the Agisoft
w1 of the DTI module are trained by minimizing the loss
Photoscan [45] software was used to calibrate and align the
function
captured multiview images and represent the object as a point
(Ŵ, ŵ1 ) = arg min l1 (X(k) ; W, w1 ). (6) cloud. Finally, each point cloud was normalized to a unit-
cube with a step size of 0.001. One can refer to [12] for
In the second step, the model parameters (W, w1 ) of the detailed information of the point cloud generation pro-
MVFEF and DTI modules are initialized as (Ŵ, ŵ1 ), respec- cedure. We considered four distortion types, i.e., downsample
tively, and the overall loss is minimized by jointly fine-tune distortion (60 degraded point clouds), Gaussian white noise
the whole network. As a result, the optimal QVP module (180 degraded point clouds), coding distortion induced by
parameters  w2 and the optimal MVFEF and DTI module V-PCC (240 degraded point clouds with different geometry
parameters (W, w1 ) can be obtained: and color distortions) and G-PCC (180 degraded point clouds
 
(W, w2 ) = arg min l(X(k) ; W, w1 , w2 ).
w1 ,  (7) with different geometry and color distortions) respectively,
as shown in Fig. 4. To augment the dataset, each point cloud
In the two-step training strategy, the first pre-training step was treated as 12 versions by rotating it to 12 different
allows us to train a quality-related feature extractor and viewpoints, as shown in Fig. 5. The twelve viewpoints placed
distortion classifier, while the joint optimization step trains uniformly around the object based on the twelve polyhedron

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4650 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

computed as

i (Q i − Q m ) Q̂ i − Q̂ m
P LCC =  , (8)
  2
i (Q i − Q m ) 2
i Q̂ i − Q̂ m

where Q i and Q̂ i denotes the ground truth MOS and the


predicted MOS of the i -th point cloud, respectively.
(2) Spearman’s rank-order correlation coefficient
(SRCC) [49] is another non-parametric measure for quality
assessment field and defined as

6 i di2
S RCC = 1 − , (9)
I I2 − 1
where I is the number of the tested point clouds and di is
the rank difference between the ground truth M O S and the
predicted M O S of the i -th point cloud. SRCC is independent
of monotonic mappings.
(3) Kendall’s rank-order correlation coefficient (KRCC) [50]
Fig. 5. Example of 12 viewpoints of one point cloud. is aimed to evaluate the association between two ordinal
(two ranked variables, not necessarily intervals) variables and
vertices of a regular icosahedron [46]. Then, we got (60 + computed as
180 + 240 + 180) × 12 = 7920 distorted point clouds Nc − Nd
K RCC = , (10)
from twelve viewpoints, which is defined as Waterloo Point
2 N(N − 1)
1
Cloud Sub-Dataset (WPCSD). Then the generated 7920 × 6 =
47520 projections were fed into the PQA-Net. The content where Nc and Nd are the numbers of concordant (of consistent
as well as the number of points of each point cloud are rank order) and discordant (of inconsistent rank order) pairs
shown in Fig. 6. in the data set, respectively, and N is the number of elements
of the variables.
Besides the above criterions, a perceptually weighted rank
B. Experimental Setups correlation (PWRC) indicator, which rewards the capability
The Adam optimization with a mini batch of 30 was adopted of correctly ranking high-quality images and suppresses the
for training. In both sub-tasks, we set the learning rate (α) to attention toward insensitive rank mistakes [51], is also used
be 10−4 and subsequently decrease it by a factor of 0.1 until to compare the performance of different quality assessment
the loss converges. The remaining parameters of Adam were algorithms.
set by default [47]. The parameters β and γ in GDN [14] were In addition, we also computed Mean Absolute Error (MAE)
clipped to nonnegative values after each update. Additionally, and Root Mean Squared Error (RMSE) between the ground
we enforced γ to be symmetric by averaging it with its truth M O S and the predicted M O S of tested point cloud to
transpose as suggested in [42]. In our work, the balance weight further compare the accuracy. To compare with the ground
λ was set to be 10. To reduce computational complexity, truth MOS, the raw model predictions are mapped to the MOS
as mentioned in part B of Section III, the six projected scale by using a four-parameter logistic function.
1024 × 1024 images were cropped into a 235 × 235 × 3 × 6 1) Results Without Distortion Type Classification: To illus-
matrix for each point cloud. Eighty percent of the point clouds trate the importance of the distortion type classification sub-
in the dataset were randomly selected for training, while the task, we tested the results without DTI module (namely
remaining twenty percent were used for testing. The training PQA_W/O_DTI) and compared the result with our proposed
set includes “Bag”, “Bi scui ts”, “Cake”, “Flower pot”, PQA-Net. To make a fair comparison, all the training and
“Glasses_case”, “H oneydew_melon”, “H ouse”, “Li tchi ”, testing point clouds are the same for PQA_W/O_DTI and
“Pen_contai ner ”, “Pi ng − pong_bat”, “Puer _tea”, PQA-Net. The network parameters are also the same except
“Pumpki n”, “Shi p”, “Statue”, “Stone”, “T ool_box”, the parameters required by the DTI module. The results
while the testing set includes “Banana”, “Cauli f lower ”, presented in Table I show that the DTI module can signifi-
“Mushr oom”, “Pi neapple”. cantly improve the accuracy. The average PLCC and SRCC
increment are 0.55 and 0.56, respectively, by adding the DTI
module, indicating that the DTI module contributes a lot for
C. Experimental Results handling the small sample dataset.
The common evaluation criterions, i.e., PLCC, SRCC, and 2) Experimental Results of Different lambda: The para-
KRCC, were adopted to compare the performance. meter λ in the loss function (5) also plays a key role in
(1) Pearson linear correlation coefficient (PLCC) [48] is a the proposed PQA-Net. It is used to balance the importance
non-parametric measure of the linear correlation, it can be between the two sub-tasks. The performance of different λs are

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4651

Fig. 6. Snapshots and points number of the used point clouds.

shown in Table II, from which we can see that λ = 10 achieves 3) Experimental Results on Testing Dataset of WPCSD:
the best. Besides, Fig. 7 shows the convergence curves of the We compared the proposed PQA-Net with classic and state-
training procedure with λ = 10. of-the-art FR and RR point cloud quality metrics on WPCSD.

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4652 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

Fig. 7. The loss curves of λ = 10, where x-coordinate indicates step number and the y-coordinate indicates the loss value.

TABLE I graph can be constructed in both reference and distorted point


T HE C OMPARISON R ESULT B ETWEEN clouds to calculate the similarity index for quality assess-
THE PQA_W/O_DTI AND PQA-N ET
ment [26], all the parameters of Gr aph S I M are following
the published version [56]. We also compared the point cloud
quality assessment method in [22] (namely, PC Q M) and the
method in [37] (namely, Poi nt SS I M). The PC Q M uses
an optimally weighted linear combination of geometry and
TABLE II color features to assess the point cloud quality. We followed
I NFLUENCE OF THE PARAMETER λ IN (5) the weight values recommended by [57] in our experiments.
The Poi nt SS I M uses a wide set of features which represent
local changes to estimate point cloud quality. We used the
parameters recommended by [58] for Poi nt SS I M. Beside
the above FR quality metrics, a state-of-the-art PC M R R [54]
quality metric which evaluates the perceptual quality score by
a linear combination of the differences of geometry, color and
normal vector features were also compared. When implement-
The compared metrics are chosen to cover a diversity of design ing PC M R R , all the parameters were set based on [59].
philosophies, including point-based, angular-based, projection- The SRCC, PLCC, KRCC, MAE, RMSE, and PWRC
based. The most representative point-based point cloud quality results are given in Table III. We should note that the
metric is point-to-point PSNR only for geometry or color quality metrics can either directly evaluate the quality or
(Y component) [52], [53], [55], namely P S N R M S E, p2 po , indirectly predict the quality. Therefore, the criterion (e.g.
P S N R H F, p2 po , and P S N RY , respectively. The geometry dif- PLCC, SRCC, and KRCC) can be positive or negative. The
ference of one point in the reference point cloud and that of the closer the absolute value of PLCC, SRCC, or KRCC is to 1,
nearest point (evaluated by Euclidean distance or Hausdorff the better the performance. These results provide some useful
distance) in the distorted point cloud is used to calculate the insights with respect to the approaches for point cloud quality
P S N R M S E, p2 po , P S N R H F, p2 po , while the color difference assessment. First of all, the methods (i.e., P S N R M S E, p2 p0 ,
(MSE of Y component) of the matched points (evaluated P S N R M S E, p2 pl , P S N R H F, p2 p0 , P S N R H F, p2 pl , ASMean ,
by Euclidean distance) in the reference and the distorted AS R M S , and ASM S E ) that only consider the geometry struc-
point cloud is used to calculate the P S N RY . In addition, ture do not perform well. The major reason is that these
the point-to-plane PSNRs for geometry information [17], methods do not take the color information into consideration,
which are related with Euclidean and Hausdorff distances which has been shown to be the major impact factor of point
(namely P S N R M S E, p2 pl and P S N R H F, p2 pl , respectively) cloud quality [54]. Indeed, this is quite apparent from our
were also compared. For the angular-based point cloud quality test results, where even P S N RY , a crude quality metric per-
metrics, the angular similarity of associated points belonging forms significantly better than those only consider geometry
to a reference and a point cloud under evaluation is used to information. PC Q M gives better performance than P S N RY
assess the quality [18], e.g., ASMean , AS R M S , and ASM S E . due to the additional geometry features. For Gr aph S I M,
For the projection-based point cloud quality metrics, the most the PLCC and SRCC are 0.47 and 0.46 because it is very
essential idea is to estimate the average quality of the six hard to construct the local graph accurately due to the complex
projected images as the final point cloud quality, and the geometric structure of point clouds. However, the Gr aph S I M
algorithms SS I M [31], M S-SS I M [33], and V I F P [32] are provides a new methodology for point cloud quality assess-
usually employed to assess the projected image quality, namely ment, and maybe more suitable for other datasets. The PLCC
SS I M p , M S-SS I M p , and V I F Pp , respectively. The state- and SRCC of the PC M R R are 0.29 and 0.26 based on the
of-the-art Gr aph S I M was also compared. In this method a optimal feature weights provided by [59]. The reason is that

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4653

TABLE III
A CCURACY C OMPARISON W ITH E XISTING FR AND RR P OINT C LOUD Q UALITY M ETRICS FOR D IFFERENT C ONTENTS

there are too many features used in the metric, which limited sophisticatedly, and the weights are trained again on a larger
its generalization ability. If the number of features is reduced dataset, the results should be better. The Poi nt SS I M is the

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4654 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

Fig. 8. Scatter plots of objective scores vs. MOSs. The best fitting logistic functions are also shown as solid curves. Note: the smaller the value of PC Q M
and PC M R R , the higher the MOS value of the distorted point cloud.

worst, whose overall PLCC and SRCC are 0.14 and 0.18, TABLE IV
respectively. T HE C ONFUSION M ATRICES P RODUCED BY PQA-N ET
All the projection-based methods, i.e., SS I M p , M S-
SS I M p , and V I F Pp achieve good performance, due to that
both geometric and color information are taken into account,
which is the same as the proposed PQA-Net. It is worth
pointing out that the projected images employed in the pro-
posed PQA-Net are the same with that used in the image-
based metrics. We can see from Table III that V I F Pp has the To compare the accuracy of these quality metrics uniformly,
best performance, whose PLCC and SRCC are both as high a nonlinear four-parameter logistic function [60] is applied to
as 0.82, and PQA-Net is the second best, whose PLCC and map raw model predictions to the MOS scale. Scatter plots
SRCC are as high as 0.70 and 0.69, respectively. Limited by of objective scores and MOSs are shown in Fig. 8. The best
the unfair usage of the original information, the performance fitting logistic functions are shown as solid curves in Fig. 8.
of NR method is reasonably worse than the FR/RR methods. All the compared methods have different score ranges from
From the results, however, we can see that the performance the MOSs, and their correspondence with MOSs are given by
of the proposed PQA-Net is only slightly lower than the the fitting curve in each subplot of Fig. 8.
best FR method V I F Pp , indicating the advantage of the As a by-product, PQA-Net also outputs the distortion type
proposed PQA-Net. From Table III we can also see that of a point cloud. The confusion matrix [61] which is used
the prediction accuracy for the point cloud “Banana” is not to summarize the performance of a classification algorithm
good. The reason is that the point cloud “Banana” has a is shown in Table IV. The elements represent the probabili-
simple geometric structure and high-brightness single yellow ties that the distortion types are accurately predicted by the
colors, which are difficult for subjects to distinguish the quality classifier. The higher the diagonal values of the confusion
changes. As a result, the quality labels of “Banana” for matrix, the better. Since the statistical behaviors of noise
training are noisy, and the performance of PQA-Net is also distortion have obvious distinction with the other three kinds of
reduced. distortion, PQA-Net predicts noise distortion perfectly. On the

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4655

TABLE V
A CCURACY C OMPARISON W ITH E XISTING FR AND RR P OINT C LOUD Q UALITY M ETRICS FOR D IFFERENT D ISTORTION T YPES

other hand, the compression distortion generated by G-PCC type classifier was calculated. The accuracy is defined as the
and V-PCC is more easily to be confused by the classifier. proportion of the total number of predictions that are correct:
Since the number of samples with G-PCC distortion is about
DTr
1.33 times more than the V-PCC, the classification accuracy DTa = , (11)
of G-PCC is better. Accordingly, we can speculate that using a DTt
larger dataset is likely to result in better performance. To mea- where DTr denotes the number of correct predictions made
sure the performance of classifier, the accuracy of distortion by a distortion type classifier, and DTt denotes the total

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4656 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

Fig. 9. Subjective comparison between two point clouds which have the same quality predicted by evaluated quality metrics.

number of all the tested point clouds. From Table IV, we can directly discarding some points of the original point cloud,
compute the mean DTa predicted by PQA-Net is as high as which damage the geometry and color information rudely. This
(0.72 + 1.00 + 0.99 + 0.49) ÷ 4 = 0.8. is also the reason why the methods (i.e., P S N R M S E, p2 p0 ,
4) Experimental Results on Distortion Types: We also com- P S N R M S E, p2 pl , P S N R H F, p2 p0 , P S N R H F, p2 pl , ASMean ,
pared the proposed PQA-Net with classic and state-of-the-art AS R M S , and ASM S E ) that only consider the geometry struc-
FR and RR point cloud quality metrics for different distortion ture also perform good. For the noi se distortion type, white
types. gaussian noise was added independently to both geometry
These results are shown in Table V. First of all, for the and color elements, respectively. P S N RY performs the best
downsample distortion type, the performance of all methods since it focuses on the color noise at the corresponding point
is relatively good. Since this kind of distortion is caused by perfectly. For the G-PCC and V-PCC distortion types which

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4657

Fig. 10. Box plot visualization of MOS intervals.

TABLE VI that these two degraded point clouds are selected from the
I NFLUENCE OF THE D IFFERENT L OSS F UNCTIONS l
2 IN (12) same distortion type for fair comparison. Take the subjective
visual difference of the point cloud pi neapple as an example,
as shown in Fig. 9, we can see that the visual perception of
the two degraded point clouds is very different in Fig. 9. (a),
but the Poi nt SS I M predicts that the quality of the two
degraded point clouds are the same, indicating the quality
metric Poi nt SS I M is inaccurate. Similar results can also be
introduces both geometry and color coding distortions, the found in Fig. 9. (b), (c), (d), and (f) for Gr aph S I M, PC Q M,
projection-based methods are better than the point-based and P S N RY , M S-SS I M p , SS I M p . The visual perceptions of the
the angular-based methods (as mentioned in Sec. IV-C3) due two degraded point clouds in Fig. 9. (g) and Fig. 9. (h) are
to comprehensive consideration of distortion of geometry and very close, and the quality scores predicted by the V I F Pp
color. In summary, V I F Pp performs the best for all distortion and the proposed PQA-Net are also very close, indicating that
types. PC M R R , PC Q M, and Poi nt SS I M also perform well the prediction quality score evaluated by the V I F Pp (an FR
in specific distortion types but they are not robust enough. method) and the proposed PQA-Net (a NR method) are con-
As a comparison, the performance of the proposed PQA-Net sistency with the subjective vision. To better compare the trend
is robust enough, and its accuracy is very close to the FR similarity between the predicted MOSs of each quality metric
metric V I F Pp . and the ground truth, we divided MOS interval into three sub-
5) Visual Result: As we know, human vision is the ultimate intervals ([0, 40), [40, 65) and [65, 100], respectively), and
standard to evaluate different quality metrics [62]. Accord- then compared the box plot statistical diagrams, as shown
ingly, we further compared the consistency between the visual in Fig 10. Note that we applied a nonlinear four-parameter
results of PQA-Net and the state-of-the-art methods. Since logistic function [60] to map raw model predictions to the
the distribution ranges of the point cloud quality scores MOS scale in Fig 10. It can be seen that the trend of the
predicted by different evaluation methods are very different, median line of the predicted MOSs of our proposed PQA-Net
we compared the visual difference of two degraded point is very similar to that of the trend of the ground truth MOSs.
clouds which have the same predicted quality score obtained The median line of the predicted MOSs of PC M R R is also
from the evaluated quality metrics. It is worth mentioning close to that of the ground truth MOSs in sub-intervals [0, 40)

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4658 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

TABLE VII
I NTRODUCTION OF O THER D ATASETS

and [40, 65), but unfortunately, there are no any raw model TABLE VIII
predictions mapped into [65, 100] interval. T HE P ERFORMANCE OF PQA-N ET ON D IFFERENT D ATASETS
6) Ablation Studies for Loss Functions: We investigated
more loss functions for PQA-Net to explore the performance.
The widely used loss functions, e.g. L1 loss which measures
the mean absolute error between each element in the input
objective and target objective [14], L2 loss which measures
the mean squared error between each element in the input because the differences between the distortion types are small,
objective and target objective [63], and smooth L1 loss which especially for the distortion types of Octree-Lifting, Octree-
uses a squared term if the absolute element-wise error falls RAHT, TriSoup-Lifting, and TriSoup-RAHT, which were all
below a threshold and an L1 term otherwise [64], are chosen. generated from the G-PCC encoder. To distinguish these sim-
In this ablation study, we adjusted the hybrid loss function of ilar distortion types is difficult for humans, let alone a neural
PQA-Net as network based on limited training dataset. The performance
l(X(k) ; W, w1 , w2 ) = l1 + λablat ion l2 , (12) of PQA-Net is the best on SJTU-PCQA dataset because the
difference between the distortion types on SJTU-PCQA dataset
where λablat ion is set to be 0.025 to balance the difference is obvious.
between the loss of distortion classifier and quality predictor
V. C ONCLUSION
task, l1 is the cross entropy to measure the classification error We proposed a deep learning-based NR point cloud quality
of sub-task I,  l2 is chosen from L1, L2, or Smooth L1 to assessment method, namely PQA-Net, which contains feature
measure the error between the actual and predicted scores extraction, distortion type identification, and the quality vector
for sub-task II, and W, w1 , w2 are the parameters of MVFEF, prediction modules. The most important advantage of pro-
DTI, and QVP modules, respectively. The result of different posed PQA-Net is that it is the first NR quality assessment
loss functions are given in Table VI. We can see that the method for point clouds, which is crucial for point cloud
highest PLCC and SRCC of L1/L2/Smooth L1 loss functions communication system. In addition, DNN is first introduced
are 0.54 and 0.62, respectively, and the lowest MAE is 14.49. for point cloud quality assessment, and the limitation of small
However, the average PLCC and SRCC of the PLCC-based sample data was also overcome. Experimental results demon-
loss function (5) are 0.70 and 0.69, respectively, and the MAE strated it can achieve comparable or even better performance
is 12.36 (see Table III), indicating its advantage. than most of the existing FR and RR point cloud quality
7) Ablation Studies on Different Datasets: To fully test the metrics.
performance of PQA-Net, we also conducted further experi- One potential future direction is to reasonably and effec-
ments on the other subjective datasets. The subjective datasets tively expand the subjective dataset to further improve the
are briefly introduced in Table VII. For SJTU-PCQA dataset, performance. Another potential direction is to optimize the
as there are some distorted point clouds with mixed distortion network architecture by introducing advanced neural modules.
types which are useless for our testing, only the distorted
point clouds with individual distortion types are tested. In the R EFERENCES
experiments, the weighting factor λ in the proposed hybrid [1] Y. Tian, K. Wang, Y. Wang, Y. Tian, Z. Wang, and F.-Y. Wang, “Adaptive
and azimuth-aware fusion network of multimodal local features for 3D
loss function (5) was set to be 15 for IRPC [65], 15 for SJTU- object detection,” Neurocomputing, vol. 411, pp. 32–44, Oct. 2020.
PCQA [36], and 25 for M-PCCD [66] to adapt to the different [2] Z. Xie, J. Chen, and B. Peng, “Point clouds learning with attention-based
graph convolution networks,” Neurocomputing, vol. 402, pp. 245–255,
distortion types of these dataset. We randomly choose 80% Aug. 2020.
reference point clouds for training and the rest 20% reference [3] Q. Liu, H. Yuan, R. Hamzaoui, and H. Su, “Coarse to fine rate control
point clouds are left out for testing on each dataset. There are for region-based 3D point cloud compression,” in Proc. IEEE Int. Conf.
Multimedia Expo Workshops (ICMEW), Jul. 2020, pp. 1–6.
no overlapping samples between the training and the testing [4] S. Gu, J. Hou, H. Zeng, H. Yuan, and K.-K. Ma, “3D point cloud
sets. attribute compression using geometry-guided sparse representation,”
Detailed experiment results are given in Table VIII, from IEEE Trans. Image Process., vol. 29, pp. 796–808, Aug. 2019.
which we can see that the performance on the IRPC dataset [5] H. Yuan, D. Zhang, W. Wang, and Y. Li, “A sampling-based 3D point
cloud compression algorithm for immersive communication,” Mobile
is the worst. This is because the amount of training data Netw. Appl., vol. 25, no. 5, pp. 1863–1872, Oct. 2020.
of the IRPC dataset is very small, which greatly limits the [6] Q. Liu, H. Yuan, J. Hou, H. Liu, and R. Hamzaoui, “Model-based
performance of the proposed PQA-Net. The performance of encoding parameter optimization for 3D point cloud compression,”
in Proc. Asia–Pacific Signal Inf. Process. Assoc. Annu. Summit Conf.
PQA-Net on M-PCCD is better, but is not yet perfect. This is (APSIPA ASC), 2018, pp. 1981–1986.

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
LIU et al.: PQA-Net: DEEP NR POINT CLOUD QUALITY ASSESSMENT VIA MULTI-VIEW PROJECTION 4659

[7] L. Li, Z. Li, S. Liu, and H. Li, “Occupancy-map-based rate distortion [31] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
optimization and partition for video-based point cloud compression,” quality assessment: From error visibility to structural similarity,” IEEE
IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 1, pp. 326–338, Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
Jan. 2021. [32] H. R. Sheikh and A. C. Bovik, “Image information and visual quality,”
[8] S. Gu, J. Hou, H. Zeng, and H. Yuan, “3D point cloud attribute IEEE Trans. Image Process., vol. 15, no. 2, pp. 430–444, Feb. 2006.
compression via graph prediction,” IEEE Signal Process. Lett., vol. 27, [33] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
pp. 176–180, Jan. 2020. similarity for image quality assessment,” in Proc. 37th Asilomar Conf.
[9] Q. Wang, Y. Tan, and Z. Mei, “Computational methods of acquisition Signals, Syst. Comput., vol. 2, Nov. 2003, pp. 1398–1402.
and processing of 3D point cloud data for construction applications,” [34] R. L. de Queiroz and P. A. Chou, “Motion-compensated compression of
Arch. Comput. Methods Eng., vol. 27, no. 2, pp. 479–499, Apr. 2020. dynamic voxelized point clouds,” IEEE Trans. Image Process., vol. 26,
[10] S. Schwarz et al., “Emerging MPEG standards for point cloud com- no. 8, pp. 3886–3895, Aug. 2017.
pression,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 9, no. 1, [35] Q. Liu, H. Yuan, J. Hou, R. Hamzaoui, and H. Su, “Model-based joint bit
pp. 133–148, Mar. 2018. allocation between geometry and color for video-based 3D point cloud
[11] H. Liu, H. Yuan, Q. Liu, J. Hou, and J. Liu, “A comprehensive compression,” IEEE Trans. Multimedia, early access, Sep. 10, 2020, doi:
study and comparison of core technologies for MPEG 3-D point cloud 10.1109/TMM.2020.3023294.
compression,” IEEE Trans. Broadcast., vol. 66, no. 3, pp. 701–717, [36] Q. Yang, H. Chen, Z. Ma, Y. Xu, R. Tang, and J. Sun, “Predicting
Sep. 2020. the perceptual quality of point cloud: A 3D-to-2D projection-based
[12] H. Su, Z. Duanmu, W. Liu, Q. Liu, and Z. Wang, “Perceptual quality exploration,” IEEE Trans. Multimedia, early access, Oct. 23, 2020, doi:
assessment of 3d point clouds,” in Proc. IEEE Int. Conf. Image Process. 10.1109/TMM.2020.3033117.
(ICIP), Sep. 2019, pp. 3182–3186. [37] E. Alexiou and T. Ebrahimi, “Towards a point cloud structural similarity
[13] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “Subjective and metric,” in Proc. IEEE Int. Conf. Multimedia Expo Workshops (ICMEW),
objective quality evaluation of compressed point clouds,” in Proc. IEEE Jul. 2020, pp. 1–6.
19th Int. Workshop Multimedia Signal Process. (MMSP), Oct. 2017, [38] Q. Liu, H. Yuan, R. Hamzaoui, H. Su, J. Hou, and H. Yang, “Reduced
pp. 1–6. reference perceptual quality model and application to rate control for 3D
[14] K. Ma, W. Liu, K. Zhang, Z. Duanmu, Z. Wang, and W. Zuo, “End-to- point cloud compression,” 2020, arXiv:2011.12688. [Online]. Available:
end blind image quality assessment using deep neural networks,” IEEE https://arxiv.org/abs/2011.12688
Trans. Image Process., vol. 27, no. 3, pp. 1202–1213, Mar. 2017. [39] 10, Vocabulary for Performance and Quality of Service, I. Recommen-
[15] W. Liu, Z. Duanmu, and Z. Wang, “End-to-end blind quality assessment dation, Geneva, Switzerland, 2006.
of compressed videos using deep neural networks,” in Proc. 26th ACM [40] Q. Li and Z. Wang, “Reduced-reference image quality assessment using
Int. Conf. Multimedia, Oct. 2018, pp. 546–554. divisive normalization-based image representation,” IEEE J. Sel. Topics
[16] Common Test Conditions for Point Cloud Compression, docu- Signal Process., vol. 3, no. 2, pp. 202–211, Apr. 2009.
ment ISO/IEC/JTC1/SC29/WG11/MPEG/N19324, 3DG, Apr. 2020. [41] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images
[17] D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro, “Geometric using a generalized normalization transformation,” in Proc. 4th Int. Conf.
distortion metrics for point cloud compression,” in Proc. IEEE Int. Conf. Learn. Represent. (ICLR), 2016, pp. 1–14.
Image Process. (ICIP), Sep. 2017, pp. 3460–3464. [42] J. Ballé, V. Laparra, and E. P. Simoncelli, “End-to-end optimized image
[18] E. Alexiou and T. Ebrahimi, “Point cloud quality assessment metric compression,” in Proc. 5th Int. Conf. Learn. Represent. (ICLR), 2019,
based on angular similarity,” in Proc. IEEE Int. Conf. Multimedia Expo pp. 1–27.
(ICME), Jul. 2018, pp. 1–6.
[43] K. Zhang, X. Wang, Y. Guo, Z. Zhao, and Z. Ma, “Competing ratio
[19] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “A generalized
loss for multi-class image classification,” in Proc. IEEE Vis. Commun.
Hausdorff distance based quality metric for point cloud geometry,” in
Image Process. (VCIP), Dec. 2019, pp. 1–4.
Proc. 12th Int. Conf. Qual. Multimedia Exper. (QoMEX), May 2020,
[44] R. K. Mantiuk, A. Tomaszewska, and R. Mantiuk, “Comparison of
pp. 1–6.
four subjective methods for image quality assessment,” Comput. Graph.
[20] G. Meynet, J. Digne, and G. Lavoué, “PC-MSDM: A quality metric
Forum, vol. 31, no. 8, pp. 2478–2491, 2012.
for 3D point clouds,” in Proc. 11th Int. Conf. Qual. Multimedia Exper.
(QoMEX), Jun. 2019, pp. 1–3. [45] Agisoft. (2010). Agisoft Photoscan. [Online]. Available:
[21] I. Viola, S. Subramanyam, and P. Cesar, “A color-based objective http://www.agisoft.com
quality metric for point cloud contents,” in Proc. 12th Int. Conf. Qual. [46] E. Alexiou and T. Ebrahimi, “Exploiting user interactivity in quality
Multimedia Exper. (QoMEX), May 2020, pp. 1–6. assessment of point cloud imaging,” in Proc. 11th Int. Conf. Qual.
[22] G. Meynet, Y. Nehmé, J. Digne, and G. Lavoué, “PCQM: A full- Multimedia Exper. (QoMEX), Jun. 2019, pp. 1–6.
reference quality metric for colored 3D point clouds,” in Proc. 12th [47] D. P. Kingma and J. Ba, “Adam: A method for stochastic
Int. Conf. Qual. Multimedia Exper. (QoMEX), May 2020, pp. 1–6. optimization,” 2014, arXiv:1412.6980. [Online]. Available:
[23] R. Diniz, P. G. Freitas, and M. C. Q. Farias, “Local luminance patterns https://arxiv.org/abs/1412.6980
for point cloud quality assessment,” in Proc. IEEE 22nd Int. Workshop [48] P. Sedgwick, “Pearson’s correlation coefficient,” Brit. Med. J., vol. 345,
Multimedia Signal Process. (MMSP), Sep. 2020, pp. 1–6. p. e4483, Jul. 2012.
[24] R. Diniz, P. G. Freitas, and M. C. Q. Farias, “Towards a point cloud [49] D. Sheskin, “Spearman’s rank-order correlation coefficient,” in
quality assessment model using local binary patterns,” in Proc. 12th Int. Handbook of Parametric and Nonparametric Statistical Procedures.
Conf. Qual. Multimedia Exper. (QoMEX), May 2020, pp. 1–6. Boca Raton, FL, USA: Chapman & Hall, 2007, pp. 1353–1370.
[25] R. Diniz, P. G. Freitas, and M. C. Q. Farias, “Multi-distance point cloud [50] H. Yeganeh and Z. Wang, “Objective quality assessment of tone-mapped
quality assessment,” in Proc. IEEE Int. Conf. Image Process. (ICIP), images,” IEEE Trans. Image Process., vol. 22, no. 2, pp. 657–667,
Oct. 2020, pp. 3443–3447. Feb. 2013.
[26] Q. Yang, Z. Ma, Y. Xu, Z. Li, and J. Sun, “Inferring point cloud [51] Q. Wu, H. Li, F. Meng, and K. N. Ngan, “A perceptually weighted
quality via graph similarity,” IEEE Transactions on Pattern Analy- rank correlation indicator for objective image quality assessment,” IEEE
sis and Machine Intelligence, early access, Dec. 24, 2020, doi: Trans. Image Process., vol. 27, no. 5, pp. 2499–2513, May 2018.
10.1109/TPAMI.2020.3047083. [52] R. Mekuria, Z. Li, C. Tulvan, and P. Chou, Evaluation Criteria for PCC
[27] Q. Yang, S. Chen, Y. Xu, J. Sun, M. S. Asif, and Z. Ma, “Point (Point Cloud Compression, document MPEG N16332, 2016.
cloud distortion quantification based on potential energy for human [53] R. Mekuria, S. Laserre, and C. Tulvan, “Performance assessment of
and machine perception,” 2021, arXiv:2103.02850. [Online]. Available: point cloud compression,” in Proc. IEEE Vis. Commun. Image Process.
https://arxiv.org/abs/2103.02850 (VCIP), Dec. 2017, pp. 1–4.
[28] A. Javaheri, C. Brites, F. Pereira, and J. Ascenso, “Mahalanobis based [54] I. Viola and P. Cesar, “A reduced reference metric for visual quality
point to distribution metric for point cloud geometry quality evaluation,” evaluation of point cloud contents,” IEEE Signal Process. Lett., vol. 27,
IEEE Signal Process. Lett., vol. 27, pp. 1350–1354, Jul. 2020. pp. 1660–1664, Sep. 2020.
[29] E. M. Torlig, E. Alexiou, T. A. Fonseca, R. L. de Queiroz, and [55] R. Mekuria, K. Blom, and P. Cesar, “Design, implementation, and
T. Ebrahimi, “A novel methodology for quality assessment of voxelized evaluation of a point cloud codec for tele-immersive video,” IEEE Trans.
point clouds,” Proc. SPIE, vol. 10752, Sep. 2018, Art. no. 107520I. Circuits Syst. Video Technol., vol. 27, no. 4, pp. 828–842, Apr. 2016.
[30] S. Wolf and M. H. Pinson, Reference Algorithm for Computing Peak [56] Q. Yang, Z. Ma, Y. Xu, Z. Li, and J. Sun. (2020). Infer-
Signal to Noise Ratio (PSNR) of a Video Sequence With a Constant ring Point Cloud Quality Via Graph Similarity. [Online]. Available:
Delay, document ITU-T Contribution COM9-C6-E, 2009. https://github.com/NJUVISION/GraphSIM

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.
4660 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 31, NO. 12, DECEMBER 2021

[57] G. Meynet, Y. Nehmé, J. Digne, and G. Lavoué. (2021). PCQM. Hao Liu received the B.E. degree from the Depart-
[Online]. Available: https://github.com/MEPP-team/PCQM ment of Communication of Engineering, Shandong
[58] E. Alexiou and T. Ebrahimi. (2020). PointSSIM: Point Cloud Agricultural University, Shandong, China, in 2017.
Structural Similarity Metric. [Online]. Available: https://github.com He is currently pursuing the Ph.D. degree with
/mmspg/pointssim Shandong University. His research interest is point
[59] I. Viola and P. Cesar. (2020). PCM_RR. [Online]. Available: clouds compression and processing.
https://github.com/cwi-dis/PCM_RR
[60] Final Report From the Video Quality Experts Group on the Validation of
Objective Models of Video Quality Assessment, VQEG Meeting, Ottawa,
ON, Canada, 2000.
[61] K. M. Ting, Confusion Matrix. Boston, MA, USA: Springer, 2010,
p. 209.
[62] A. Fang, X. Zhao, and Y. Zhang, “Cross-modal image fusion guided
by subjective visual attention,” Neurocomputing, vol. 414, pp. 333–345,
Nov. 2020. Yu Wang received the B.S. degree in commu-
[63] Q. Jiang, Z. Peng, S. Yang, and F. Shao, “Authentically distorted image nication engineering, the M.S. degree in software
quality assessment by learning from empirical score distributions,” IEEE engineering, and the Ph.D. degree in computer
Signal Process. Lett., vol. 26, no. 12, pp. 1867–1871, Dec. 2019. applications and techniques from Tianjin University
[64] R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. in 2013, 2016, and 2020, respectively. He was an
(ICCV), Dec. 2015, pp. 1440–1448. Outstanding Visitor Scholar with the University of
[65] A. Javaheri, C. Brites, F. M. B. Pereira, and J. M. Ascenso, “Point Waterloo in 2019. He is currently an Assistant
cloud rendering after coding: Impacts on subjective and objective Professor with Tianjin University. He has published
quality,” IEEE Trans. Multimedia, early access, Nov. 11, 2020, doi: many peer-reviewed papers in world-class confer-
10.1109/TMM.2020.3037481. ences and journals, such as IEEE T RANSACTIONS
[66] E. Alexiou, I. Viola, T. M. Borges, T. A. Fonseca, R. L. de Queiroz, and ON F UZZY S YSTEMS (TFS), IEEE T RANSACTIONS
T. Ebrahimi, “A comprehensive study of the rate-distortion performance ON N EURAL N ETWORKS AND L EARNING S YSTEMS (TNNLS), IEEE
in MPEG point cloud compression,” APSIPA Trans. Signal Inf. Process., T RANSACTIONS ON C YBERNETICS (TCYB), and IEEE T RANSACTIONS
vol. 8, pp. 1–27, Nov. 2019. ON K NOWLEDGE AND D ATA E NGINEERING (TKDE). His research interests
focus on hierarchical learning and large-scale classification in industrial sce-
Qi Liu received the B.S. degree from Shandong narios and computer vision applications, data mining, and machine learning.
Technology and Business University, Shandong,
China, in 2011, and the M.S. degree from the
School of Telecommunication Engineering, Xidian
University, Xi’an, China, in 2014. She is currently
pursuing the Ph.D. degree with Shandong University, Huan Yang (Member, IEEE) received the B.S.
Shandong. From September 2018 to August 2019, degree in computer science from the Heilongjiang
she worked as a Visiting Graduate Student with the Institute of Technology, China, in 2007, the M.S.
Department of Electrical and Computer Engineering, degree in computer science from Shandong Uni-
University of Waterloo, Waterloo, ON, Canada. Her versity, China, in 2010, and the Ph.D. degree in
research interests include point cloud coding, and computer engineering from Nanyang Technological
processing and quality assessment. University, Singapore, in 2015. She is currently
working with the College of Computer Science and
Technology, Qingdao University, Qingdao, China.
Hui Yuan (Senior Member, IEEE) received the B.E. Her research interests include image/video process-
and Ph.D. degrees in telecommunication engineering ing and analysis, perception-based modeling and
from Xidian University, Xi’an, China, in 2006 and quality assessment, object detection/recognition, and machine learning.
2011, respectively.
He was with Shandong University (SDU), Jinan,
China, as a Lecturer from April 2011 to Decem-
ber 2014, an Associate Professor from Janu-
ary 2015 to August 2016, and has been a Full Junhui Hou (Senior Member, IEEE) received the
Professor since September 2016. He was with the B.Eng. degree in information engineering (Talented
Department of Computer Science, City University Students Program) from the South China Univer-
of Hong Kong (CityU), as a Post-Doctoral Fellow sity of Technology, Guangzhou, China, in 2009,
(Granted by the Hong Kong Scholar Project) from January 2013 to Decem- the M.Eng. degree in signal and information process-
ber 2014, and a Research Fellow from November 2017 to February 2018. ing from Northwestern Polytechnical University,
From November 2020 to November 2021, he also worked as a Marie Xi’an, China, in 2012, and the Ph.D. degree in
Curie Fellow (Granted by the Marie Skłodowska-Curie Actions Individual electrical and electronic engineering from the School
Fellowship under Horizon2020 Europe) with the School of Engineering of Electrical and Electronic Engineering, Nanyang
and Sustainable Development, De Montfort University, Leicester, U.K. His Technological University, Singapore, in 2016.
current research interests include video/image/immersive media processing, In January 2017, he immediately joined the
compression, adaptive streaming, and computer vision. He served as an Area Department of Computer Science, City University of Hong Kong, as an
Chair for IEEE ICME 2021, IEEE ICME 2020, and IEEE VCIP 2020. Assistant Professor. His research interests fall into the general areas of
visual computing, such as image/video/3D geometry data representation,
Honglei Su (Member, IEEE) received the B.A.Sc. processing and analysis, semi/un-supervised data modeling, and data com-
degree from Shandong University of Science and pression and adaptive transmission. He is currently an Elected Member of
Technology, Qingdao, China, in 2008, and the MSA-TC and VSPC-TC, and IEEE CAS. He was a recipient of several
Ph.D. degree from Xidian University, Xi’an, China, prestigious awards, including the Chinese Government Award for Outstanding
in 2014. Since September 2014, he has been work- Students Study Abroad from the China Scholarship Council in 2015 and
ing as an Assistant Professor with the School of the Early Career Award (3/381) from the Hong Kong Research Grants
Electronic Information, Qingdao University, Qing- Council in 2018. He served as an Area Chair for ACM MM19/20/21, IEEE
dao, China. From March 2018 to March 2019, ICME20, VCIP20/21, and WACV21. He is an Associate Editor for IEEE
he worked as a Visiting Scholar with the Department T RANSACTIONS ON C IRCUITS AND S YSTEMS FOR V IDEO T ECHNOLOGY,
of Electrical and Computer Engineering, University IEEE T RANSACTIONS ON I MAGE P ROCESSING, Signal Processing: Image
of Waterloo, Waterloo, ON, Canada. His research Communication, and The Visual Computer. He also served as the Guest
interests include perceptual image processing, immersive media processing, Editor for the IEEE J OURNAL OF S ELECTED T OPICS IN A PPLIED E ARTH
and computer vision. O BSERVATIONS AND R EMOTE S ENSING.

Authorized licensed use limited to: University Roma Tre AREA SCIENTIFICO TECNOLOGICA. Downloaded on February 07,2024 at 05:28:28 UTC from IEEE Xplore. Restrictions apply.

You might also like