0% found this document useful (0 votes)
46 views16 pages

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

This document summarizes a study on subjective and objective quality assessment of stereoscopic 3D videos. The study involved collecting subjective quality ratings from human subjects for reference and distorted 3D video sequences. Various forms of distortion were introduced to the test videos, including compression, blurring, and frame freezing. Objective quality metrics were also calculated for the videos based on factors like motion vector strength and binocular fusion that related to how human vision processes 3D images. The results of the subjective and objective assessments were analyzed to evaluate how well the objective metrics predicted perceived video quality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views16 pages

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

This document summarizes a study on subjective and objective quality assessment of stereoscopic 3D videos. The study involved collecting subjective quality ratings from human subjects for reference and distorted 3D video sequences. Various forms of distortion were introduced to the test videos, including compression, blurring, and frame freezing. Objective quality metrics were also calculated for the videos based on factors like motion vector strength and binocular fusion that related to how human vision processes 3D images. The results of the subjective and objective assessments were analyzed to evaluate how well the objective metrics predicted perceived video quality.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Study of Subjective Quality and

Objective Blind Quality Prediction


of Stereoscopic 3D Videos

SUBMITTED BY:
D T D ARAVIND KUMAR 21EC65R03
CHRIST LARSEN KUMAR EKKA 21EC65R05
SOURABH KUMAR SAHU 21EC65R06
SOURAV NAG 21EC65R11
SUDIP DAS 21EC65R12
SUBHAM GUCHHAIT 21EC65R27
Introduction
A study on the quality is required as after acquisition of 3D content. Several post acquisition
processing such as Sampling, Quantization to be done on the video. Which In result degrades
the overall perceived as 3D video quality. Quality assessment are of 2 types that is subjective
and objective.
Subjective quality assessment of S3D videos
In subjective assessment, human subjects perform the quality assessment task as human
opinion score serve as valuable benchmark on objective assessment algorithms.
Quality analysis in this domain is purely observed by several authors work they have
concluded from their observation of S3D video.
- De Silva et al. [9] created an S3D video dataset containing H.264 and H.265 compression
artifacts. The dataset has 14 reference and 116 test sequences of full HD resolution,
down sampled to 960*1080. They concluded that higher quantization step sizes caused
more significant perceptual quality differences than lower quantization step sizes.
- Hewage et al.[10] created an S3D video dataset which they used to explore the effects of
random packet losses on the overall perception of S3D videos. They used 9 reference
sequences, 54 test sequences, and 6 different packet loss rates. They concluded that
S3D perceptual quality was significantly affected by the loss of packets from either the
left or right views of an S3D video.

Objective Quality Assessment of 3D videos


The basic objective models to Stereoscopic 3D videos are based on reusing popular 2D image
quality assessment, and 2D video quality assessment algorithms on the individual views and
disparity views of the S3D videos. These algorithms are applied on a frame-by-frame basis or
view basis, to estimate the quality of S3D videos.
Generally VQA (Video quality assessment) algorithms perform better than IQA (Image Quality
Assessment), and the addition of depth information, improves the assessment technique.
One such assessment is a S3D-RR model based on motion vector strength, binocular fusion.
Here motion vector strengths, relates to accuracy of the two-dimensional motion vector, based
on inter frame motion prediction of an object, which provides an estimate for the coordinates
in the object in the temporal space for a video. Binocular fusion refers to the process, or set of
processes, through which information from the two eyes is combined to yield single vision,
rather than double vision (diplopia), which in this case is deduced from the left and right
frames of a 3D video frame.
Algorithms like S3D RR (Reduced Reference, i.e., the evaluation has some knowledge of the
reference image, like some of its statistical measures) VQA is based on, combining the 3D
information in the depth map and chrominance (colour images contain more significant edge
information than the corresponding depth map).
S3D NR (Non-Reference, i.e., evaluator has absolute no idea of the reference image) VQA
algorithm, has been based on spatiotemporal segmentation (sequences of temporally adjacent
arbitrarily shaped spatial regions), spatial structural loss measurement, motion
inconsistencies (based on motion vector strengths in the segment), inter and intra disparity
variation, and in another NR based method, in a multi view model based on binocular
perception, spatial texture features and temporal features are computed on an optical flow
before estimating quality with the pooling features using temporal weights.
Added to that Autoregressive prediction based on disparity measurements and estimated
natural S3D video statistical model parameters can be used to predict the quality of an S3D
video.
In another assessment, univariate Generalised Gaussian Distribution (UGGD) and asymmetric
GGD model parameters and spatial and spectral entropies from tensor decomposition are
estimated and based on that a random forest classifier is used to predict the quality of the
video.
In the mentioned blind no reference-based algorithm, video quality assessment is based on
joint statistical dependencies that exist between motion and disparity, and on measured
motion variation.

Subjective Quality Assessment of S3D videos and Video Set


Reference Video Sequences include pristine videos from publicly available RMIT3D
uncompressed video dataset, consisting of 46 left and right sequences captured from
Panasonic AG 3DA1, professional stereoscopic camera, in full HD 1080*1920, in YUV 422P
10bit format, which were rated on a scale of 0 to 5 by a group of individual, correlating
increasing video quality, based on perceptual senses of disparity, spatial activity, and motion
information, and to ensure smooth playback on TV were been compressed to YUV 420 8 bit,
videos.
The spatial and temporal indices of the video was calculated by finding out the mean of the
individual left and right video sequences. Disparity spatial and temporal indices were
computed from the disparity maps of the reference videos using Structural Similarity based
stereo matching algorithm, and it has been observed to have broad spectrum of spatial
temporal and disparity information.
Test video sequences have been introduced with 4 independent and mutually exclusive forms
of distortion:
1. H.264 and H.265 compression, where pristine left and right views were compressed
with H.264 and H.265 encoding varying the constant rate factor, acting as the
parameter of quality of compression, (lower CRF better quality), as CRF corresponds to
the retention of the uniqueness of frame among its neighbours, inversely related to
quality. Encoding may be symmetric (same CRF for both the channels) or asymmetric
(different CRF for the two channels).
2. Blur mostly spatially occurs due to camera defocus and could be interpreted as the
smoothening of the pixel variation at a region, by removing high frequency components
in a circular region, the radii of which can considered as a distortion parameter.
3. Frame Freezes, distortion occurs when the display appears unchanging at some spaces,
when the renderer fails to decode data, which was mimicked by dropping frames (5,7,
or 9) after some random frame, and replacing them with the first, and then H.264
compression was applied to the frame dropped videos.

The test videos were rated similarly by a group of individuals on the similar 0 to 5 scale.
The difference of the reference and distorted S3D videos were calculated as per ITU-R
recommendation.
𝑑𝑖𝑞𝑖 𝑞𝑗 = 𝑠𝑢𝑏𝑞𝑖 𝑞𝑗 − 𝑠𝑢𝑏𝑞𝑖 𝑞𝑗
𝑟𝑒𝑓

qi indicating the subject and qj indicating the video id, 𝑠𝑢𝑏𝑞𝑖 𝑞𝑗 is the reference score and
𝑟𝑒𝑓

𝑠𝑢𝑏𝑞𝑖 𝑞𝑗 is the distorted score, which was followed by calculating the DMOS scores,
(difference between “reference” and “processed” Mean Opinion Score in a full reference
testing).
∑𝑍𝑞𝑖 =1 𝑑𝑖𝑞𝑖 𝑞𝑗
𝐷𝑀𝑂𝑆𝑞𝑗 =
𝑍
The efficacy of the subjective study was done by examining the internal structure of the
dataset, randomly dividing all of the collected DMOS scores into two halves, wherein the
human subjects were mutually exclusive, computing LCC (Linear Correlation Coefficient)
and Spearman’s Rank Order Correlation Coefficient (SROCC) between these two halves,
which were repeated 100 times to ensure statistical consistency, before computing the
mean, median and standard deviation over all the iteration, correlating as to how much the
opinion was consistent among the subjects.
Objective Quality Assessment:

Various psycho-visual experiments have been performed on the mammalian visual cortex to
find out the disparity selectivity in visual area MT and the dependencies that exist between
motion and disparity. The middle temporal visual area (MT or V5) is a region of extrastriata
visual cortex. In several species of both New World monkeys and Old World monkeys the MT
area contains a high concentration of direction-selective neurons. The MT in primates is
thought to play a major role in the perception of motion, the integration of local motion
signals into global precepts, and the guidance of some eye movements. The result of the
experiments was that a large portion of area MT is responsible for disparity processing and
that these components exhibit patchy, distributive, and directional dependencies. Luminance
and disparity sub-band coefficients of S3D pictures have sharp peaks and heavy tails that can
be modelled using an UGGD (Univariate Generalized Gaussian Distribution). A series of
experiments has been performed on S3D scene components (spatial, disparity and
motion/temporal) of natural S3D images and videos to explore the statistical dependencies
that arise among these scene components. S3D scene components exhibit strong
dependencies and that these dependencies can be well modelled as following a BGGD
(Bivariate Generalized Gaussian Distribution). The psychovisual studies and the S3D scene
component statistical studies lead to a completely blind S3D NR VQA algorithm based on a
BGGD model of the joint statistical dependencies between motion and disparity.
Let the multivariate random vector x ϵ RN follow a Multivariate Generalized Gaussian
Distribution (MGGD) with density function given by,
1 𝑇 −1
𝑝(𝑥 |𝑀, 𝛼, 𝛽 ) = 1 𝑔𝛼,𝛽 (𝑥 𝑀 𝑥)
|𝑀|2
𝑁
𝛽𝑇( ) 1 𝑦
𝑔𝛼,𝛽 (𝑦) = 2 𝑒 − ( )𝛽
2 𝛼
𝑁
1 2 𝑁
(2𝛽 П𝑎) Ґ( )
2𝛽
Where M is an N×N covariance matrix, α is a scale parameter, β is a shape parameter and
𝑔𝛼,𝛽 (. )is the density generator. The popular Maximum Likelihood Estimation (MLE) method is
used to compute the parameters α, β and M of the BGGD.
In the model, motion and disparity provide the primary features and this results in N = 2.
Therefore, the multivariate GGD becomes a bivariate GGD (BGGD). The BGGD model
parameters α and β, and the coherence score (Ѱ) are used for quality prediction. The
coherence score is defined as:
(𝜆𝑚𝑎𝑥 − 𝜆𝑚𝑖𝑛 )2
Ѱ=
(𝜆𝑚𝑎𝑥 + 𝜆𝑚𝑖𝑛 )2
Where 𝜆𝑚𝑎𝑥 and 𝜆𝑚𝑖𝑛 represent the maximum and minimum eigenvalues of M. These
eigenvalues are capable of accurately capturing directional dependencies between the motion
and disparity components.
The motion vector and disparity maps are decomposed at multiple scales (3 scales) and at
multiple orientations (0̊, 30̊, 60̊, 90̊, 120̊, 150̊) using the steerable pyramid decomposition.
Motion vectors and disparity maps are computed on a frame-by-frame basis. A three-step
search is employed to estimate the motion vectors and a SSIM based algorithm to conduct
disparity estimation. Each of the corresponding motion and disparity sub-bands is jointly
modeled using BGGD.
Figure (a) shows the 150th frame of the left video of the ‘Domain Parklands’ S3D video. Figures
(b), (c) and (d) show the distorted versions of the same frame. The distortions are H.264 (CRF
= 50), H.265 (CRF = 50) and Blur (radii = 7) respectively. Figures (e), (f), (g) show the
frame-wise α, β and Ѱ scores of the ‘Domain Parklands’ pristine S3D video and its H.264
compressed versions, respectively. Figures (i), (j) and (k) show scatter plots of the α, β, Ѱ
scores of the same reference and distorted S3D video sequences.

From the plots, it is clear that the features follow a number of trends:
1. The features can clearly discriminate videos having large perceptual quality differences,
e.g., the computed BGGD features (α, β, Ѱ) of the S3D video compressed at (CL = 35, CR
= 35) significantly differ from those of the S3D video compressed at (CL = 50, CR = 50).
2. The features take similar values on videos that are perceptually similar in quality. For
example, the BGGD features computed on the S3D videos compressed at (CL = 45, CR =
45), (CL = 45, CR = 50) and at (CL = 50, CR = 50) yield similar feature values. These
observations further motivate us to use them as quality features in the proposed
MoDi3D algorithm. The plots in correspond to the first scale 0̊ orientation of the
steerable pyramid decomposition, and the plots use the negative logarithmic scores of
all features for better visualization. The x-axis represents the frame sequence number
of the S3D video set. The frame-wise average NIQE scores and scatter plots of the
average NIQE scores of the left and right views of the ‘Domain Parklands’ pristine S3D
video and its H.264 compressed versions are shown in Figures (h) and (l), respectively.
The plots clearly show quality variations with respect to the distortion levels.
Proposed method:
The first stage of the proposed method computes the motion and disparity features of the S3D
video. The second stage performs the MoDi 2d score computation. In the third stage, spatial
features of the individual views of an S3D video are computed using the NIQE model. In the
last stage, we compute the MoDi3d score of an S3D video.

Flowchart of the proposed MoDi3D algorithm

Motion and Disparity Feature Extraction:


Motion feature set:
The motion feature vector map of the S3D video is used to compute the motion feature set and
the motion vectors are computed using the Three-Step Search Motion Estimation algorithm
with a macroblock size of 8x8. The magnitude of the motion vector is used as a motion feature
in the algorithm.

𝑇𝑡 = √(𝑇𝐻 + 𝑇𝑉 )

Where Tt represents the motion vector strength and TH,TV are the horizontal and vertical
motion vector components respectively.
The Three-Step Search Motion Estimation algorithm was taken from one of the reference
“Block-based motion estimation algorithms a survey”, and the short summary of that
reference is , In block-based motion estimation (BBME), the current frame is divided into N×N
pixel−size macroblocks (MBs) and for each MB a certain area of the reference frame is
searched to minimize a block difference measure (BDM), which is usually a sum of absolute
differences (SAD) between the current and the reference MB. The displacement within the
search area (SA) which gives the minimum BDM value is called a motion vector (MV). MVs
together with transformed and quantized block differences (residua) are entropy−coded into
the video bitstream.

The simple and regular three step search (TSS) which utilizes the square search pattern with
eight search points (SPs) around the centre at each step. The initial step size is d/2 and is
halved in the subsequent steps. When equals 7, the number of steps is 3 and the number of
used SPs is 25. For larger search ranges, TSS can be easily extended to n steps with the
number of search points equal to 1 + 8[log2(d+ 1)].

Disparity Features Set:


The computation of the disparity map is complex and sensitive to distortion. SSIM based
stereo matching algorithm to compute the disparity based on the trade-off between time
complexity and accuracy.

The algorithm computes the best matching block in the right view of a corresponding block of
the left view to estimate the disparity was limited to 30. The disparity map for a given S3D
pair on a frame-wise basis.

The SSIM based stereo matching algorithm was taken from the reference “Image Quality
Assessment: From Error Visibility to Structural Similarity”, and the short summary is written
below.

The luminance of the surface of an object being observed is the product of the illumination
and the reflectance, but the structures of the objects in the scene are independent of the
illumination. Consequently, to explore the structural information in an image, we wish to
separate the influence of the illumination. We define the structural information in an image as
those attributes that represent the structure of objects in the scene, independent of the
average luminance and contrast.

Suppose x and y are two nonnegative image signals, which have been aligned with each other
(e.g., spatial patches extracted from each image). If we consider one of the signals to have
perfect quality, then the similarity measure can serve as a quantitative measurement of the
quality of the second signal. The system separates the task of similarity measurement into
three comparisons: luminance, contrast and structure.

First, the luminance of each signal is compared. Assuming discrete signals, this is estimated as
the mean intensity. The luminance comparison function is then a function of and y. Second,
we remove the mean intensity from the signal. We use the standard deviation (the square root
of variance) as an estimate of the signal contrast. An unbiased estimate in discrete form is
given by,
The contrast comparison is then the comparison of x and third, the signal is normalized
(divided) by its own standard deviation, so that the two signals being compared have unit
standard deviation. Finally, the three components are combined to yield an overall similarity
measure

𝑆(𝑥, 𝑦) = 𝑓(𝑙(𝑥, 𝑦). 𝑐(𝑥, 𝑦). 𝑠(𝑥, 𝑦))

And the expression for each of the luminance, contrast and structure components is,

The resulting similarity measure, SSIM index between signals x and y is

where 𝛼 >0, β >0 and Ψ >0 are parameters used to adjust the relative importance of the
three components.

The steerable pyramid decomposition was performed on the motion and disparity maps at
multiple scales and multiple orientations. Since the motion vectors were estimated using a
block size of 8 × 8, we down sampled the sub bands of the disparity map to the same size by
averaging over 8×8 blocks.

The steerable pyramid decomposition is a linear multi resolution image decomposition


method, by which an image is sub divided into a collection of sub bands localized at different
scales and orientations. Using a highland low- pass filter (H0, L0) the input image is initially
decomposed into two sub bands: high- and low-pass sub bands, respectively. Further, the low
pass sub band is decomposed into K-oriented band-pass portions B0….... Bk-1, and into a low-
pass sub band L1. The decomposition is done recursively by subsampling the Lower low-pass
sub band (Ls) by a factor of two along the rows and columns. Each recursive step captures
different directional information at a given scale. The basic functions of the steerable pyramid
are directional derivative operators that come in different sizes and orientations. The number
of orientations may be adjusted by changing the derivative order. So, Steerable pyramid is an
architecture for efficient and accurate linear decomposition of an image into scale and
orientation sub bands. The basis functions of this decomposition are directional derivative
operators of any desired order.
Spatial feature estimation: NIQE (Natural image quality evaluator) is a completely blind 2D NR
IQA model. We compute the NIQE scores on a frame-by-frame basis on both views and
calculate the mean values of all frame level scores to estimate the spatial quality of each S3D
video.
𝑄
1 𝑁𝐼𝐸𝑄𝑗𝐿 + 𝑁𝐼𝐸𝑄𝑗𝑅
𝑆 = ∗∑
𝑄 2
𝑖=1

where j represents the frame number and Q represents the total number of S3D video frames.
L, R represent the left and right views of an S3D video. NIEQjL and NIEQjR represent the frame
level NIQE scores of the left and right views, while S denotes the overall spatial quality of an
S3D video

MoDi2D computation:
As stated previously, the motion vector maps and disparity maps of an S3D video are
decomposed at three scales and six orientations using the steerable pyramid decomposition.
We estimate the BGGD model parameters (α, β) and coherence score (Ψ) at each sub band of
an S3D view of a video and denoted as:

𝑓 α = [ 𝛼𝑗𝑖 ] 𝑓 β = [ 𝛽𝑗𝑖 ] 𝑓 Ψ = [ 𝛹𝑗𝑖 ]

where i represents the sub band level (1 ≤ i ≤ 18). The total number of motion vector maps
computed in an S3D video is Q − 1. Therefore, 1 ≤ j ≤ Q − 1. fα, fβ and f Ψ are video level
feature sets of the α, β and Ψ scores, respectively.

Pristine Multivariate Gaussian Models:


Reference video sequences from the RMIT S3D video dataset are used to estimate the
parameters of the pristine MVG model. The videos which we used as pristine S3D sequences in
the subjective study are excluded. Specifically, the 34-remaining uncompressed RMIT S3D
videos are used as the reference set in the objective evaluations.

The BGGD model parameters α, β and Ψ scores are estimated over all sub bands of motion
vector and disparity maps from the 34 pristine S3D video sequences.

Create three individual feature sets from the α, β and Ψ scores of the reference S3D video set.
𝑝 𝑝 𝑝 𝑝 𝑝 𝑝
𝑓 α = [α𝑗𝑖 ] , 𝑓 β = [β𝑗𝑖 ] , 𝑓 Ψ = [Ψ𝑗𝑖 ]; 1 ≤ 𝑝 ≤ 𝑃

where p represents the pristine S3D video and P represents the total number of pristine
videos (P = 34).
𝑝 𝑝 𝑝
As in NIQE, the pristine S3D video parameter sets (𝑓 α , 𝑓 β and 𝑓 Ψ ) are modeled using a
Multivariate Gaussian (MVG) distribution denoted by N (ν, Σ), where ν and Σ are the mean
vector and covariance matrix, respectively. Specifically, the means (𝑣α𝑝 , 𝑣β𝑝 , 𝑣Ψ𝑝 ) and

𝑝 𝑝 𝑝
covariances correspond to the 𝑓 α , 𝑓 β and 𝑓 Ψ sets, respectively.
Distorted Feature Set
First calculate the BGGD parameters α, β and Ψ scores over all sub bands over all the sub
bands of the framewise motion vectors and disparity map of each distorted S3D videos. The
feature set of distorted S3D videos are
𝑑 𝑑 𝑑
𝑓 𝛼 = [𝛼𝑗𝑖𝑑 ], 𝑓 𝛽 = [𝛽𝑗𝑖𝑑 ], 𝑓 𝛹 = [𝛹𝑗𝑖𝑑 ]

Now too check whether the given test video is pristine or distorted the likelihood of the
parameter is evaluated using MVG distribution.

Likelihood estimated is calculated on frame level bases and it’s a single value per frame.

The video level likelihood is estimated by averaging the frame level estimates.

Where denote the mean values of the frame level likelihood estimation
scores of the individual features

Next the pairwise product of likelihood estimate scores

γ represents the overall departure of a distorted video’s statistics with respect to pristine
model.

Global Motion Strength:


SSIM score between the right and left view to measure the degree of frame level motion
variations of the S3D videos.
∆L j and ∆R j denote the frame level motion of the left and right views.

∆ represents the global motion strength of an S3D video.

The product of ∆ and γ scores to measure the joint quality of motion and disparity
components of each test S3D video.

MoDi2D = log (γ × ∆).

Overall Quality Computation:


The spatial feature S and the MoDi2D scores jointly impact the overall quality estimate of an
S3D video.

MoDi3D = MoDi2D × S.

Result And Discussion


The performance of the proposed MoDi3D objective algorithm was evaluated on the following
datasets: IRCCYN S3D video dataset, WaterlooIVC, Phase I dataset, LFOVIA dataset and the
proposed LFOVIAS3DPh2 S3D video dataset.

The LCC, SROCC, and Root Mean Square Error (RMSE) are used to evaluate the proposed
MoDi3D algorithm's performance. LCC denotes a linear relationship between two variables.
The monotonic relationship between two input sets is measured by SROCC. The magnitude
error between estimated objective scores and subjective DMOS scores is measured by the
RMSE. Higher LCC and SROCC values indicate good agreement between subjective and
objective measures, and lower RMSE signifies more accurate prediction performance.

Conclusion
There were two main contributions in this report. First, a performed a comprehensive
subjective quality evaluation on a symmetrically and asymmetrically distorted full HD S3D
video dataset was performed. The dataset contains 12 pristine S3D video sequences and 288
test stimuli. The test video sequences are a combination of H.264 and H.265 compression,
blur distortions, and frame freezes. 20 subjects were involved in the study and the study was
conducted using the ACR-HR method.
Secondly, a completely blind S3D NR VQA algorithm based on computing the joint
statistical dependencies between motion and disparity sub band coefficients of an S3D video
was proposed. The BGGD parameters (α, β), and the coherence score (Ψ) from the
eigenvalues of the covariance matrix were extracted and the features were found to be
distortion discriminable. An unsupervised 2D NR IQA model (NIQE) was used to estimate
spatial quality.
Finally, these features were pooled to predict the overall quality of an S3D video. The
proposed objective algorithm MoDi3D demonstrates competitive performance as compared
to popular 2D and 3D FR and supervised NR image and video quality assessment models,
even though it is not trained on any distorted S3D videos nor on any annotations of them.
Q. Compute the performance of the objective blind predictor on the databases.

DATA SET LCC SROCC RMSE


IRCCYN S3D 0.6060 0.6233 0.9853
LFOVIA 0.6759 0.6552 9.5929
WaterlooIVC 0.4834 0.4265 18.1095
LFOVIAS3DPh2 S3D 0.699 0.661 0.657

The best performance was achieved by using the Video Quality Evaluation using Motion and
Depth Statistics (VQUEMODES) under 3D NR VQA (supervised) is shown to outperform
popular state-of-the-art 2D IQA/VQA and 3D IQA/VQA when evaluating over all the datasets.
The performance of this algorithm for all the datasets and the performance metrics have the
values as shown,

DATA SET LCC SROCC RMSE


IRCCYN S3D 0.9697 0.9637 0.2635
LFOVIA 0.8943 0.8890 5.9124
WaterlooIVC 0.8519 0.8266 7.1526
LFOVIAS3DPh2 S3D 0.878 0.839 0.444

Although NIQE and other NR QA algorithms perform well on symmetrically distorted videos,
they fall short when it comes to asymmetrically distorted S3D videos. On both symmetrically
and asymmetrically distorted S3D videos, the proposed MoDi3D model (combination of
MoDi2D and Spatial NIQE score) performs consistently.

Due to the blank frames with H.264 compression artifacts during frame freezes, the joint
dependencies between motion and disparity components varied more when compared to the
H.264 and H.265 compressions. The MoDi3D model effectively captures these statistical
variations and delivers better performance on frame freezes than on other compression
artifacts. Blur is a spatial distortion that does not significantly change the motion information
properties of an S3D video. Therefore, the dependency variation between motion and
disparity components is lower compared to compression artifacts. As a result, the suggested
model is unable to capture statistical connections between motion and disparity components,
and MoDi3D performs worse than compression-based distortions. Finally, and most vital part
is, the proposed approach is absolutely unaware of subjective opinion, i.e., it is 'fully blind,'
and it performs well across a wide range of distortions and datasets.
Reference
[1] Y. Liu, L. K. Cormack, and A. C. Bovik, “Statistical modeling of 3-D natural
scenes with application to bayesian stereopsis,” IEEE Transactions on Image
Processing, vol. 20, pp. 2515–2530, Sept 2011.
[2] “eMarketer: Better research. Better business decisions.”, url =
https://www.emarketer.com/.”
[3] R. Tenniswood, L. Safonova, and M. Drake, “3Ds effect on a films box office
and profitability,” 2010. [4] K. Seshadrinathan, R. Soundararajan, A. C. Bovik,
and L. K. Cormack, “Study of subjective and objective quality assessment of
video,” IEEE Transactions on Image Processing, vol. 19, pp. 1427–1441, June
2010.
[5] J. Wang, S. Wang, and Z. Wang, “Asymmetrically compressed stereoscopic
3D videos: Quality assessment and rate-distortion performance evaluation,”
IEEE Transactions on Image Processing, vol. 26, pp. 1330– 1343, March 2017.
[6] B. Appina, K. Manasa, and S. S. Channappayya, “Subjective and objective
study of the relation between 3D and 2D views based on depth and bitrate,”
Electronic Imaging, vol. 2017, no. 5, pp. 145–150, 2017.
[7] K. Wang, M. Barkowsky, R. Cousseau, K. Brunnstrom, R. Olsson, P. Le
Callet, and M. Sjostrom, “Subjective evaluation of HDTV stereoscopic videos in
IPTV scenarios using absolute category rating,” in Proc. SPIE, vol. 7863, 2011.
[8] Z. Chen, W. Zhou, and W. Li, “Blind stereoscopic video quality assessment:
From depth perception to overall experience,” IEEE Transactions on Image
Processing, vol. 27, no. 2, pp. 721–734, 2018.
[9] E. Dumic, S. Grgi ´ c, K. ´ Saki ˇ c, P. M. R. Rocha, and L. A. da Silva Cruz,
´ “3D video subjective quality: a new database and grade comparison study,”
Multimedia tools and applications, vol. 76, no. 2, pp. 2087– 2109, 2017.
[10] “Lab for Video and Image Analysis (LFOVIA) Downloads,
http://www.iith.ac.in/∼lfovia/downloads.html.” [20] S. L. P. Yasakethu, C. T. E.
R. Hewage, W. A. C. Fernando, and A. M. Kondoz, “Quality analysis for 3D
video using 2D video quality models,” IEEE Transactions on Consumer
Electronics, vol. 54, pp. 1969–1976, November 2008.
[11] J. Yang, H. Wang, W. Lu, B. Li, A. Badiid, and Q. Meng, “A noreference
optical flow-based quality evaluator for stereoscopic videos in curvelet domain,”
Information Sciences, vol. 414, pp. 133–146, 2017.
[12] G. Jiang, S. Liu, M. Yu, F. Shao, Z. Peng, and F. Chen, “No reference stereo
video quality assessment based on motion feature in tensor decomposition
domain,” Journal of Visual Communication and Image Representation, 2017.
[13] B. Appina, A. Jalli, S. S. Battula, and S. S. Channappayya, “Noreference
stereoscopic video quality assessment algorithm using joint motion and depth
statistics,” in 25th International Conference on Image Processing, IEEE, pp.
2800–2804, 2018.
[14] E. Cheng, P. Burton, J. Burton, A. Joseski, and I. Burnett, “Rmit3dv: Pre-
announcement of a creative commons uncompressed HD 3D video database,” in
Fourth International Workshop on Quality of Multimedia Experience, pp. 212–
217, July 2012.
[15] M. T. Pourazad, Z. Mai, P. Nasiopoulos, K. Plataniotis, and R. K. Ward,
“Effect of brightness on the quality of visual 3D perception,” in International
Conference on Image Processing, IEEE, pp. 989–992, Sept 2011.

You might also like