0% found this document useful (0 votes)

46 views16 pages

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

This document summarizes a study on subjective and objective quality assessment of stereoscopic 3D videos. The study involved collecting subjective quality ratings from human subjects for reference and distorted 3D video sequences. Various forms of distortion were introduced to the test videos, including compression, blurring, and frame freezing. Objective quality metrics were also calculated for the videos based on factors like motion vector strength and binocular fusion that related to how human vision processes 3D images. The results of the subjective and objective assessments were analyzed to evaluate how well the objective metrics predicted perceived video quality.

Uploaded by

christ larsen kumar ekka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views16 pages

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

Uploaded by

christ larsen kumar ekka

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Study of Subjective Quality and

Objective Blind Quality Prediction

of Stereoscopic 3D Videos

SUBMITTED BY:
D T D ARAVIND KUMAR 21EC65R03
CHRIST LARSEN KUMAR EKKA 21EC65R05
SOURABH KUMAR SAHU 21EC65R06
SOURAV NAG 21EC65R11
SUDIP DAS 21EC65R12
SUBHAM GUCHHAIT 21EC65R27
Introduction
A study on the quality is required as after acquisition of 3D content. Several post acquisition
processing such as Sampling, Quantization to be done on the video. Which In result degrades
the overall perceived as 3D video quality. Quality assessment are of 2 types that is subjective
and objective.
Subjective quality assessment of S3D videos
In subjective assessment, human subjects perform the quality assessment task as human
opinion score serve as valuable benchmark on objective assessment algorithms.
Quality analysis in this domain is purely observed by several authors work they have
concluded from their observation of S3D video.
- De Silva et al. [9] created an S3D video dataset containing H.264 and H.265 compression
artifacts. The dataset has 14 reference and 116 test sequences of full HD resolution,
down sampled to 960*1080. They concluded that higher quantization step sizes caused
more significant perceptual quality differences than lower quantization step sizes.
- Hewage et al.[10] created an S3D video dataset which they used to explore the effects of
random packet losses on the overall perception of S3D videos. They used 9 reference
sequences, 54 test sequences, and 6 different packet loss rates. They concluded that
S3D perceptual quality was significantly affected by the loss of packets from either the
left or right views of an S3D video.

Objective Quality Assessment of 3D videos

The basic objective models to Stereoscopic 3D videos are based on reusing popular 2D image
quality assessment, and 2D video quality assessment algorithms on the individual views and
disparity views of the S3D videos. These algorithms are applied on a frame-by-frame basis or
view basis, to estimate the quality of S3D videos.
Generally VQA (Video quality assessment) algorithms perform better than IQA (Image Quality
Assessment), and the addition of depth information, improves the assessment technique.
One such assessment is a S3D-RR model based on motion vector strength, binocular fusion.
Here motion vector strengths, relates to accuracy of the two-dimensional motion vector, based
on inter frame motion prediction of an object, which provides an estimate for the coordinates
in the object in the temporal space for a video. Binocular fusion refers to the process, or set of
processes, through which information from the two eyes is combined to yield single vision,
rather than double vision (diplopia), which in this case is deduced from the left and right
frames of a 3D video frame.
Algorithms like S3D RR (Reduced Reference, i.e., the evaluation has some knowledge of the
reference image, like some of its statistical measures) VQA is based on, combining the 3D
information in the depth map and chrominance (colour images contain more significant edge
information than the corresponding depth map).
S3D NR (Non-Reference, i.e., evaluator has absolute no idea of the reference image) VQA
algorithm, has been based on spatiotemporal segmentation (sequences of temporally adjacent
arbitrarily shaped spatial regions), spatial structural loss measurement, motion
inconsistencies (based on motion vector strengths in the segment), inter and intra disparity
variation, and in another NR based method, in a multi view model based on binocular
perception, spatial texture features and temporal features are computed on an optical flow
before estimating quality with the pooling features using temporal weights.
Added to that Autoregressive prediction based on disparity measurements and estimated
natural S3D video statistical model parameters can be used to predict the quality of an S3D
video.
In another assessment, univariate Generalised Gaussian Distribution (UGGD) and asymmetric
GGD model parameters and spatial and spectral entropies from tensor decomposition are
estimated and based on that a random forest classifier is used to predict the quality of the
video.
In the mentioned blind no reference-based algorithm, video quality assessment is based on
joint statistical dependencies that exist between motion and disparity, and on measured
motion variation.

Subjective Quality Assessment of S3D videos and Video Set

Reference Video Sequences include pristine videos from publicly available RMIT3D
uncompressed video dataset, consisting of 46 left and right sequences captured from
Panasonic AG 3DA1, professional stereoscopic camera, in full HD 1080*1920, in YUV 422P
10bit format, which were rated on a scale of 0 to 5 by a group of individual, correlating
increasing video quality, based on perceptual senses of disparity, spatial activity, and motion
information, and to ensure smooth playback on TV were been compressed to YUV 420 8 bit,
videos.
The spatial and temporal indices of the video was calculated by finding out the mean of the
individual left and right video sequences. Disparity spatial and temporal indices were
computed from the disparity maps of the reference videos using Structural Similarity based
stereo matching algorithm, and it has been observed to have broad spectrum of spatial
temporal and disparity information.
Test video sequences have been introduced with 4 independent and mutually exclusive forms
of distortion:
1. H.264 and H.265 compression, where pristine left and right views were compressed
with H.264 and H.265 encoding varying the constant rate factor, acting as the
parameter of quality of compression, (lower CRF better quality), as CRF corresponds to
the retention of the uniqueness of frame among its neighbours, inversely related to
quality. Encoding may be symmetric (same CRF for both the channels) or asymmetric
(different CRF for the two channels).
2. Blur mostly spatially occurs due to camera defocus and could be interpreted as the
smoothening of the pixel variation at a region, by removing high frequency components
in a circular region, the radii of which can considered as a distortion parameter.
3. Frame Freezes, distortion occurs when the display appears unchanging at some spaces,
when the renderer fails to decode data, which was mimicked by dropping frames (5,7,
or 9) after some random frame, and replacing them with the first, and then H.264
compression was applied to the frame dropped videos.

The test videos were rated similarly by a group of individuals on the similar 0 to 5 scale.
The difference of the reference and distorted S3D videos were calculated as per ITU-R
recommendation.
𝑑𝑖𝑞𝑖 𝑞𝑗 = 𝑠𝑢𝑏𝑞𝑖 𝑞𝑗 − 𝑠𝑢𝑏𝑞𝑖 𝑞𝑗
𝑟𝑒𝑓

qi indicating the subject and qj indicating the video id, 𝑠𝑢𝑏𝑞𝑖 𝑞𝑗 is the reference score and
𝑟𝑒𝑓

𝑠𝑢𝑏𝑞𝑖 𝑞𝑗 is the distorted score, which was followed by calculating the DMOS scores,
(difference between “reference” and “processed” Mean Opinion Score in a full reference
testing).
∑𝑍𝑞𝑖 =1 𝑑𝑖𝑞𝑖 𝑞𝑗
𝐷𝑀𝑂𝑆𝑞𝑗 =
𝑍
The efficacy of the subjective study was done by examining the internal structure of the
dataset, randomly dividing all of the collected DMOS scores into two halves, wherein the
human subjects were mutually exclusive, computing LCC (Linear Correlation Coefficient)
and Spearman’s Rank Order Correlation Coefficient (SROCC) between these two halves,
which were repeated 100 times to ensure statistical consistency, before computing the
mean, median and standard deviation over all the iteration, correlating as to how much the
opinion was consistent among the subjects.
Objective Quality Assessment:

Various psycho-visual experiments have been performed on the mammalian visual cortex to
find out the disparity selectivity in visual area MT and the dependencies that exist between
motion and disparity. The middle temporal visual area (MT or V5) is a region of extrastriata
visual cortex. In several species of both New World monkeys and Old World monkeys the MT
area contains a high concentration of direction-selective neurons. The MT in primates is
thought to play a major role in the perception of motion, the integration of local motion
signals into global precepts, and the guidance of some eye movements. The result of the
experiments was that a large portion of area MT is responsible for disparity processing and
that these components exhibit patchy, distributive, and directional dependencies. Luminance
and disparity sub-band coefficients of S3D pictures have sharp peaks and heavy tails that can
be modelled using an UGGD (Univariate Generalized Gaussian Distribution). A series of
experiments has been performed on S3D scene components (spatial, disparity and
motion/temporal) of natural S3D images and videos to explore the statistical dependencies
that arise among these scene components. S3D scene components exhibit strong
dependencies and that these dependencies can be well modelled as following a BGGD
(Bivariate Generalized Gaussian Distribution). The psychovisual studies and the S3D scene
component statistical studies lead to a completely blind S3D NR VQA algorithm based on a
BGGD model of the joint statistical dependencies between motion and disparity.
Let the multivariate random vector x ϵ RN follow a Multivariate Generalized Gaussian
Distribution (MGGD) with density function given by,
1 𝑇 −1
𝑝(𝑥 |𝑀, 𝛼, 𝛽 ) = 1 𝑔𝛼,𝛽 (𝑥 𝑀 𝑥)
|𝑀|2
𝑁
𝛽𝑇( ) 1 𝑦
𝑔𝛼,𝛽 (𝑦) = 2 𝑒 − ( )𝛽
2 𝛼
𝑁
1 2 𝑁
(2𝛽 П𝑎) Ґ( )
2𝛽
Where M is an N×N covariance matrix, α is a scale parameter, β is a shape parameter and
𝑔𝛼,𝛽 (. )is the density generator. The popular Maximum Likelihood Estimation (MLE) method is
used to compute the parameters α, β and M of the BGGD.
In the model, motion and disparity provide the primary features and this results in N = 2.
Therefore, the multivariate GGD becomes a bivariate GGD (BGGD). The BGGD model
parameters α and β, and the coherence score (Ѱ) are used for quality prediction. The
coherence score is defined as:
(𝜆𝑚𝑎𝑥 − 𝜆𝑚𝑖𝑛 )2
Ѱ=
(𝜆𝑚𝑎𝑥 + 𝜆𝑚𝑖𝑛 )2
Where 𝜆𝑚𝑎𝑥 and 𝜆𝑚𝑖𝑛 represent the maximum and minimum eigenvalues of M. These
eigenvalues are capable of accurately capturing directional dependencies between the motion
and disparity components.
The motion vector and disparity maps are decomposed at multiple scales (3 scales) and at
multiple orientations (0̊, 30̊, 60̊, 90̊, 120̊, 150̊) using the steerable pyramid decomposition.
Motion vectors and disparity maps are computed on a frame-by-frame basis. A three-step
search is employed to estimate the motion vectors and a SSIM based algorithm to conduct
disparity estimation. Each of the corresponding motion and disparity sub-bands is jointly
modeled using BGGD.
Figure (a) shows the 150th frame of the left video of the ‘Domain Parklands’ S3D video. Figures
(b), (c) and (d) show the distorted versions of the same frame. The distortions are H.264 (CRF
= 50), H.265 (CRF = 50) and Blur (radii = 7) respectively. Figures (e), (f), (g) show the
frame-wise α, β and Ѱ scores of the ‘Domain Parklands’ pristine S3D video and its H.264
compressed versions, respectively. Figures (i), (j) and (k) show scatter plots of the α, β, Ѱ
scores of the same reference and distorted S3D video sequences.

From the plots, it is clear that the features follow a number of trends:
1. The features can clearly discriminate videos having large perceptual quality differences,
e.g., the computed BGGD features (α, β, Ѱ) of the S3D video compressed at (CL = 35, CR
= 35) significantly differ from those of the S3D video compressed at (CL = 50, CR = 50).
2. The features take similar values on videos that are perceptually similar in quality. For
example, the BGGD features computed on the S3D videos compressed at (CL = 45, CR =
45), (CL = 45, CR = 50) and at (CL = 50, CR = 50) yield similar feature values. These
observations further motivate us to use them as quality features in the proposed
MoDi3D algorithm. The plots in correspond to the first scale 0̊ orientation of the
steerable pyramid decomposition, and the plots use the negative logarithmic scores of
all features for better visualization. The x-axis represents the frame sequence number
of the S3D video set. The frame-wise average NIQE scores and scatter plots of the
average NIQE scores of the left and right views of the ‘Domain Parklands’ pristine S3D
video and its H.264 compressed versions are shown in Figures (h) and (l), respectively.
The plots clearly show quality variations with respect to the distortion levels.
Proposed method:
The first stage of the proposed method computes the motion and disparity features of the S3D
video. The second stage performs the MoDi 2d score computation. In the third stage, spatial
features of the individual views of an S3D video are computed using the NIQE model. In the
last stage, we compute the MoDi3d score of an S3D video.

Flowchart of the proposed MoDi3D algorithm

Motion and Disparity Feature Extraction:

Motion feature set:
The motion feature vector map of the S3D video is used to compute the motion feature set and
the motion vectors are computed using the Three-Step Search Motion Estimation algorithm
with a macroblock size of 8x8. The magnitude of the motion vector is used as a motion feature
in the algorithm.

𝑇𝑡 = √(𝑇𝐻 + 𝑇𝑉 )

Where Tt represents the motion vector strength and TH,TV are the horizontal and vertical
motion vector components respectively.
The Three-Step Search Motion Estimation algorithm was taken from one of the reference
“Block-based motion estimation algorithms a survey”, and the short summary of that
reference is , In block-based motion estimation (BBME), the current frame is divided into N×N
pixel−size macroblocks (MBs) and for each MB a certain area of the reference frame is
searched to minimize a block difference measure (BDM), which is usually a sum of absolute
differences (SAD) between the current and the reference MB. The displacement within the
search area (SA) which gives the minimum BDM value is called a motion vector (MV). MVs
together with transformed and quantized block differences (residua) are entropy−coded into
the video bitstream.

The simple and regular three step search (TSS) which utilizes the square search pattern with
eight search points (SPs) around the centre at each step. The initial step size is d/2 and is
halved in the subsequent steps. When equals 7, the number of steps is 3 and the number of
used SPs is 25. For larger search ranges, TSS can be easily extended to n steps with the
number of search points equal to 1 + 8[log2(d+ 1)].

Disparity Features Set:

The computation of the disparity map is complex and sensitive to distortion. SSIM based
stereo matching algorithm to compute the disparity based on the trade-off between time
complexity and accuracy.

The algorithm computes the best matching block in the right view of a corresponding block of
the left view to estimate the disparity was limited to 30. The disparity map for a given S3D
pair on a frame-wise basis.

The SSIM based stereo matching algorithm was taken from the reference “Image Quality
Assessment: From Error Visibility to Structural Similarity”, and the short summary is written
below.

The luminance of the surface of an object being observed is the product of the illumination
and the reflectance, but the structures of the objects in the scene are independent of the
illumination. Consequently, to explore the structural information in an image, we wish to
separate the influence of the illumination. We define the structural information in an image as
those attributes that represent the structure of objects in the scene, independent of the
average luminance and contrast.

Suppose x and y are two nonnegative image signals, which have been aligned with each other
(e.g., spatial patches extracted from each image). If we consider one of the signals to have
perfect quality, then the similarity measure can serve as a quantitative measurement of the
quality of the second signal. The system separates the task of similarity measurement into
three comparisons: luminance, contrast and structure.

First, the luminance of each signal is compared. Assuming discrete signals, this is estimated as
the mean intensity. The luminance comparison function is then a function of and y. Second,
we remove the mean intensity from the signal. We use the standard deviation (the square root
of variance) as an estimate of the signal contrast. An unbiased estimate in discrete form is
given by,
The contrast comparison is then the comparison of x and third, the signal is normalized
(divided) by its own standard deviation, so that the two signals being compared have unit
standard deviation. Finally, the three components are combined to yield an overall similarity
measure

𝑆(𝑥, 𝑦) = 𝑓(𝑙(𝑥, 𝑦). 𝑐(𝑥, 𝑦). 𝑠(𝑥, 𝑦))

And the expression for each of the luminance, contrast and structure components is,

The resulting similarity measure, SSIM index between signals x and y is

where 𝛼 >0, β >0 and Ψ >0 are parameters used to adjust the relative importance of the
three components.

The steerable pyramid decomposition was performed on the motion and disparity maps at
multiple scales and multiple orientations. Since the motion vectors were estimated using a
block size of 8 × 8, we down sampled the sub bands of the disparity map to the same size by
averaging over 8×8 blocks.

The steerable pyramid decomposition is a linear multi resolution image decomposition

method, by which an image is sub divided into a collection of sub bands localized at different
scales and orientations. Using a highland low- pass filter (H0, L0) the input image is initially
decomposed into two sub bands: high- and low-pass sub bands, respectively. Further, the low
pass sub band is decomposed into K-oriented band-pass portions B0….... Bk-1, and into a low-
pass sub band L1. The decomposition is done recursively by subsampling the Lower low-pass
sub band (Ls) by a factor of two along the rows and columns. Each recursive step captures
different directional information at a given scale. The basic functions of the steerable pyramid
are directional derivative operators that come in different sizes and orientations. The number
of orientations may be adjusted by changing the derivative order. So, Steerable pyramid is an
architecture for efficient and accurate linear decomposition of an image into scale and
orientation sub bands. The basis functions of this decomposition are directional derivative
operators of any desired order.
Spatial feature estimation: NIQE (Natural image quality evaluator) is a completely blind 2D NR
IQA model. We compute the NIQE scores on a frame-by-frame basis on both views and
calculate the mean values of all frame level scores to estimate the spatial quality of each S3D
video.
𝑄
1 𝑁𝐼𝐸𝑄𝑗𝐿 + 𝑁𝐼𝐸𝑄𝑗𝑅
𝑆 = ∗∑
𝑄 2
𝑖=1

where j represents the frame number and Q represents the total number of S3D video frames.
L, R represent the left and right views of an S3D video. NIEQjL and NIEQjR represent the frame
level NIQE scores of the left and right views, while S denotes the overall spatial quality of an
S3D video

MoDi2D computation:
As stated previously, the motion vector maps and disparity maps of an S3D video are
decomposed at three scales and six orientations using the steerable pyramid decomposition.
We estimate the BGGD model parameters (α, β) and coherence score (Ψ) at each sub band of
an S3D view of a video and denoted as:

𝑓 α = [ 𝛼𝑗𝑖 ] 𝑓 β = [ 𝛽𝑗𝑖 ] 𝑓 Ψ = [ 𝛹𝑗𝑖 ]

where i represents the sub band level (1 ≤ i ≤ 18). The total number of motion vector maps
computed in an S3D video is Q − 1. Therefore, 1 ≤ j ≤ Q − 1. fα, fβ and f Ψ are video level
feature sets of the α, β and Ψ scores, respectively.

Pristine Multivariate Gaussian Models:

Reference video sequences from the RMIT S3D video dataset are used to estimate the
parameters of the pristine MVG model. The videos which we used as pristine S3D sequences in
the subjective study are excluded. Specifically, the 34-remaining uncompressed RMIT S3D
videos are used as the reference set in the objective evaluations.

The BGGD model parameters α, β and Ψ scores are estimated over all sub bands of motion
vector and disparity maps from the 34 pristine S3D video sequences.

Create three individual feature sets from the α, β and Ψ scores of the reference S3D video set.
𝑝 𝑝 𝑝 𝑝 𝑝 𝑝
𝑓 α = [α𝑗𝑖 ] , 𝑓 β = [β𝑗𝑖 ] , 𝑓 Ψ = [Ψ𝑗𝑖 ]; 1 ≤ 𝑝 ≤ 𝑃

where p represents the pristine S3D video and P represents the total number of pristine
videos (P = 34).
𝑝 𝑝 𝑝
As in NIQE, the pristine S3D video parameter sets (𝑓 α , 𝑓 β and 𝑓 Ψ ) are modeled using a
Multivariate Gaussian (MVG) distribution denoted by N (ν, Σ), where ν and Σ are the mean
vector and covariance matrix, respectively. Specifically, the means (𝑣α𝑝 , 𝑣β𝑝 , 𝑣Ψ𝑝 ) and

𝑝 𝑝 𝑝
covariances correspond to the 𝑓 α , 𝑓 β and 𝑓 Ψ sets, respectively.
Distorted Feature Set
First calculate the BGGD parameters α, β and Ψ scores over all sub bands over all the sub
bands of the framewise motion vectors and disparity map of each distorted S3D videos. The
feature set of distorted S3D videos are
𝑑 𝑑 𝑑
𝑓 𝛼 = [𝛼𝑗𝑖𝑑 ], 𝑓 𝛽 = [𝛽𝑗𝑖𝑑 ], 𝑓 𝛹 = [𝛹𝑗𝑖𝑑 ]

Now too check whether the given test video is pristine or distorted the likelihood of the
parameter is evaluated using MVG distribution.

Likelihood estimated is calculated on frame level bases and it’s a single value per frame.

The video level likelihood is estimated by averaging the frame level estimates.

Where denote the mean values of the frame level likelihood estimation
scores of the individual features

Next the pairwise product of likelihood estimate scores

γ represents the overall departure of a distorted video’s statistics with respect to pristine
model.

Global Motion Strength:

SSIM score between the right and left view to measure the degree of frame level motion
variations of the S3D videos.
∆L j and ∆R j denote the frame level motion of the left and right views.

∆ represents the global motion strength of an S3D video.

The product of ∆ and γ scores to measure the joint quality of motion and disparity
components of each test S3D video.

MoDi2D = log (γ × ∆).

Overall Quality Computation:

The spatial feature S and the MoDi2D scores jointly impact the overall quality estimate of an
S3D video.

MoDi3D = MoDi2D × S.

Result And Discussion

The performance of the proposed MoDi3D objective algorithm was evaluated on the following
datasets: IRCCYN S3D video dataset, WaterlooIVC, Phase I dataset, LFOVIA dataset and the
proposed LFOVIAS3DPh2 S3D video dataset.

The LCC, SROCC, and Root Mean Square Error (RMSE) are used to evaluate the proposed
MoDi3D algorithm's performance. LCC denotes a linear relationship between two variables.
The monotonic relationship between two input sets is measured by SROCC. The magnitude
error between estimated objective scores and subjective DMOS scores is measured by the
RMSE. Higher LCC and SROCC values indicate good agreement between subjective and
objective measures, and lower RMSE signifies more accurate prediction performance.

Conclusion
There were two main contributions in this report. First, a performed a comprehensive
subjective quality evaluation on a symmetrically and asymmetrically distorted full HD S3D
video dataset was performed. The dataset contains 12 pristine S3D video sequences and 288
test stimuli. The test video sequences are a combination of H.264 and H.265 compression,
blur distortions, and frame freezes. 20 subjects were involved in the study and the study was
conducted using the ACR-HR method.
Secondly, a completely blind S3D NR VQA algorithm based on computing the joint
statistical dependencies between motion and disparity sub band coefficients of an S3D video
was proposed. The BGGD parameters (α, β), and the coherence score (Ψ) from the
eigenvalues of the covariance matrix were extracted and the features were found to be
distortion discriminable. An unsupervised 2D NR IQA model (NIQE) was used to estimate
spatial quality.
Finally, these features were pooled to predict the overall quality of an S3D video. The
proposed objective algorithm MoDi3D demonstrates competitive performance as compared
to popular 2D and 3D FR and supervised NR image and video quality assessment models,
even though it is not trained on any distorted S3D videos nor on any annotations of them.
Q. Compute the performance of the objective blind predictor on the databases.

DATA SET LCC SROCC RMSE

IRCCYN S3D 0.6060 0.6233 0.9853
LFOVIA 0.6759 0.6552 9.5929
WaterlooIVC 0.4834 0.4265 18.1095
LFOVIAS3DPh2 S3D 0.699 0.661 0.657

The best performance was achieved by using the Video Quality Evaluation using Motion and
Depth Statistics (VQUEMODES) under 3D NR VQA (supervised) is shown to outperform
popular state-of-the-art 2D IQA/VQA and 3D IQA/VQA when evaluating over all the datasets.
The performance of this algorithm for all the datasets and the performance metrics have the
values as shown,

DATA SET LCC SROCC RMSE

IRCCYN S3D 0.9697 0.9637 0.2635
LFOVIA 0.8943 0.8890 5.9124
WaterlooIVC 0.8519 0.8266 7.1526
LFOVIAS3DPh2 S3D 0.878 0.839 0.444

Although NIQE and other NR QA algorithms perform well on symmetrically distorted videos,
they fall short when it comes to asymmetrically distorted S3D videos. On both symmetrically
and asymmetrically distorted S3D videos, the proposed MoDi3D model (combination of
MoDi2D and Spatial NIQE score) performs consistently.

Due to the blank frames with H.264 compression artifacts during frame freezes, the joint
dependencies between motion and disparity components varied more when compared to the
H.264 and H.265 compressions. The MoDi3D model effectively captures these statistical
variations and delivers better performance on frame freezes than on other compression
artifacts. Blur is a spatial distortion that does not significantly change the motion information
properties of an S3D video. Therefore, the dependency variation between motion and
disparity components is lower compared to compression artifacts. As a result, the suggested
model is unable to capture statistical connections between motion and disparity components,
and MoDi3D performs worse than compression-based distortions. Finally, and most vital part
is, the proposed approach is absolutely unaware of subjective opinion, i.e., it is 'fully blind,'
and it performs well across a wide range of distortions and datasets.
Reference
[1] Y. Liu, L. K. Cormack, and A. C. Bovik, “Statistical modeling of 3-D natural
scenes with application to bayesian stereopsis,” IEEE Transactions on Image
Processing, vol. 20, pp. 2515–2530, Sept 2011.
[2] “eMarketer: Better research. Better business decisions.”, url =
https://www.emarketer.com/.”
[3] R. Tenniswood, L. Safonova, and M. Drake, “3Ds effect on a films box office
and profitability,” 2010. [4] K. Seshadrinathan, R. Soundararajan, A. C. Bovik,
and L. K. Cormack, “Study of subjective and objective quality assessment of
video,” IEEE Transactions on Image Processing, vol. 19, pp. 1427–1441, June
2010.
[5] J. Wang, S. Wang, and Z. Wang, “Asymmetrically compressed stereoscopic
3D videos: Quality assessment and rate-distortion performance evaluation,”
IEEE Transactions on Image Processing, vol. 26, pp. 1330– 1343, March 2017.
[6] B. Appina, K. Manasa, and S. S. Channappayya, “Subjective and objective
study of the relation between 3D and 2D views based on depth and bitrate,”
Electronic Imaging, vol. 2017, no. 5, pp. 145–150, 2017.
[7] K. Wang, M. Barkowsky, R. Cousseau, K. Brunnstrom, R. Olsson, P. Le
Callet, and M. Sjostrom, “Subjective evaluation of HDTV stereoscopic videos in
IPTV scenarios using absolute category rating,” in Proc. SPIE, vol. 7863, 2011.
[8] Z. Chen, W. Zhou, and W. Li, “Blind stereoscopic video quality assessment:
From depth perception to overall experience,” IEEE Transactions on Image
Processing, vol. 27, no. 2, pp. 721–734, 2018.
[9] E. Dumic, S. Grgi ´ c, K. ´ Saki ˇ c, P. M. R. Rocha, and L. A. da Silva Cruz,
´ “3D video subjective quality: a new database and grade comparison study,”
Multimedia tools and applications, vol. 76, no. 2, pp. 2087– 2109, 2017.
[10] “Lab for Video and Image Analysis (LFOVIA) Downloads,
http://www.iith.ac.in/∼lfovia/downloads.html.” [20] S. L. P. Yasakethu, C. T. E.
R. Hewage, W. A. C. Fernando, and A. M. Kondoz, “Quality analysis for 3D
video using 2D video quality models,” IEEE Transactions on Consumer
Electronics, vol. 54, pp. 1969–1976, November 2008.
[11] J. Yang, H. Wang, W. Lu, B. Li, A. Badiid, and Q. Meng, “A noreference
optical flow-based quality evaluator for stereoscopic videos in curvelet domain,”
Information Sciences, vol. 414, pp. 133–146, 2017.
[12] G. Jiang, S. Liu, M. Yu, F. Shao, Z. Peng, and F. Chen, “No reference stereo
video quality assessment based on motion feature in tensor decomposition
domain,” Journal of Visual Communication and Image Representation, 2017.
[13] B. Appina, A. Jalli, S. S. Battula, and S. S. Channappayya, “Noreference
stereoscopic video quality assessment algorithm using joint motion and depth
statistics,” in 25th International Conference on Image Processing, IEEE, pp.
2800–2804, 2018.
[14] E. Cheng, P. Burton, J. Burton, A. Joseski, and I. Burnett, “Rmit3dv: Pre-
announcement of a creative commons uncompressed HD 3D video database,” in
Fourth International Workshop on Quality of Multimedia Experience, pp. 212–
217, July 2012.
[15] M. T. Pourazad, Z. Mai, P. Nasiopoulos, K. Plataniotis, and R. K. Ward,
“Effect of brightness on the quality of visual 3D perception,” in International
Conference on Image Processing, IEEE, pp. 989–992, Sept 2011.

Error Detection and Data Recovery Architecture For Motion Estimation
100% (1)
Error Detection and Data Recovery Architecture For Motion Estimation
63 pages
Emotion Triggered H.264 Compression: This Chapter Describes
No ratings yet
Emotion Triggered H.264 Compression: This Chapter Describes
27 pages
A Robust Full-Reference Color Image Quality Measure: Using Discrete Cosine Transform
No ratings yet
A Robust Full-Reference Color Image Quality Measure: Using Discrete Cosine Transform
6 pages
Unit - 6 Fundamentals of Digital Video
100% (1)
Unit - 6 Fundamentals of Digital Video
29 pages
978-1-4244-4318-5/09/$25.00 C °2009 IEEE 3DTV-CON 2009
No ratings yet
978-1-4244-4318-5/09/$25.00 C °2009 IEEE 3DTV-CON 2009
4 pages
Research Article
No ratings yet
Research Article
17 pages
Performance Enhancement of Video Compression Algorithms With SIMD
No ratings yet
Performance Enhancement of Video Compression Algorithms With SIMD
80 pages
An Overview of Recent Video Coding Developments in Mpeg and Aomedia
No ratings yet
An Overview of Recent Video Coding Developments in Mpeg and Aomedia
9 pages
NR Va RR
No ratings yet
NR Va RR
2 pages
Motion Estimtion and Motion Compensated (Video) Coding
No ratings yet
Motion Estimtion and Motion Compensated (Video) Coding
41 pages
MIQM: Multicamera Image Quality Measures
No ratings yet
MIQM: Multicamera Image Quality Measures
8 pages
Understanding PQR, DMOS, and PSNR Measurements - 28W-21224-0
No ratings yet
Understanding PQR, DMOS, and PSNR Measurements - 28W-21224-0
20 pages
tmp4309 TMP
No ratings yet
tmp4309 TMP
6 pages
Quality of Experience Model For 3DTV
No ratings yet
Quality of Experience Model For 3DTV
10 pages
Anu Seminar 1
No ratings yet
Anu Seminar 1
9 pages
Video Fundamentals: Figure 2.1 Progressive Scan
No ratings yet
Video Fundamentals: Figure 2.1 Progressive Scan
16 pages
EE398a 3DVideoCoding 2012
No ratings yet
EE398a 3DVideoCoding 2012
31 pages
A Hybrid Transformation Technique For Advanced Video Coding: M. Ezhilarasan, P. Thambidurai
No ratings yet
A Hybrid Transformation Technique For Advanced Video Coding: M. Ezhilarasan, P. Thambidurai
7 pages
DEPTH MAPstereo-Pair Video
No ratings yet
DEPTH MAPstereo-Pair Video
7 pages
List of C++ Multiple-choice Questions and Answers
No ratings yet
List of C++ Multiple-choice Questions and Answers
63 pages
Comparison of Video Codecs
No ratings yet
Comparison of Video Codecs
12 pages
RD-Optimisation Analysis For H.264/AVC Scalable Video Coding
No ratings yet
RD-Optimisation Analysis For H.264/AVC Scalable Video Coding
5 pages
The Novel Broadcast Encryption Method For Large Dynamically Changing User Groups
No ratings yet
The Novel Broadcast Encryption Method For Large Dynamically Changing User Groups
8 pages
Digital Video Compression Using Embedded Zero Tree Wavelet Encoding and Non Uniform Quantization
No ratings yet
Digital Video Compression Using Embedded Zero Tree Wavelet Encoding and Non Uniform Quantization
4 pages
Sipij040309 PDF
No ratings yet
Sipij040309 PDF
18 pages
Depth Map Generation For 2D To 3D Video: Abstract
No ratings yet
Depth Map Generation For 2D To 3D Video: Abstract
5 pages
Unit 3 MM
No ratings yet
Unit 3 MM
15 pages
Chapter E 5
No ratings yet
Chapter E 5
117 pages
DE:Noise Is A Tool That Removes Excessive Noise Out of An Image Sequence. DE:Noise
No ratings yet
DE:Noise Is A Tool That Removes Excessive Noise Out of An Image Sequence. DE:Noise
18 pages
A Review On Design and Implementation of Compensated Frame Prediction & Reconstruction
No ratings yet
A Review On Design and Implementation of Compensated Frame Prediction & Reconstruction
4 pages
Traffic Aware Video Coding Using Scalable Video Coding (SVC)
No ratings yet
Traffic Aware Video Coding Using Scalable Video Coding (SVC)
7 pages
V V T - P: Irtual Iewpoint Hree Dimensional Anorama
No ratings yet
V V T - P: Irtual Iewpoint Hree Dimensional Anorama
17 pages
White Paper - Video Quality - AcceptTV
No ratings yet
White Paper - Video Quality - AcceptTV
5 pages
Compressed-Sensing - Enabled Video Streaming For Wireless Multimedia Sensor Networks
No ratings yet
Compressed-Sensing - Enabled Video Streaming For Wireless Multimedia Sensor Networks
7 pages
Quantification of Ultrasonic Beams Using Photoelastic Visualisation
No ratings yet
Quantification of Ultrasonic Beams Using Photoelastic Visualisation
12 pages
White Paper: How To Do Objective Video Testing: 1566 La Pradera DR Campbell, CA 95008 408-379-6952
No ratings yet
White Paper: How To Do Objective Video Testing: 1566 La Pradera DR Campbell, CA 95008 408-379-6952
3 pages
ks_spie09
No ratings yet
ks_spie09
12 pages
Subjective_and_Objective_Quality_Assessment_Methods_of_Stereoscopic_Videos_with_Visibility_Affecting_Distortions
No ratings yet
Subjective_and_Objective_Quality_Assessment_Methods_of_Stereoscopic_Videos_with_Visibility_Affecting_Distortions
13 pages
Evaluating Depth Perception of 3D Stereoscopic Videos
No ratings yet
Evaluating Depth Perception of 3D Stereoscopic Videos
21 pages
(Tutorial Encode) 10bit 01-Ateme Pierre Larbier 422 10-Bit
No ratings yet
(Tutorial Encode) 10bit 01-Ateme Pierre Larbier 422 10-Bit
7 pages
Peculiarities of 3D Compression of Noisy Multichannel Images
No ratings yet
Peculiarities of 3D Compression of Noisy Multichannel Images
4 pages
Call - For - PaperDDiscrete Cosine Transform For Image Compression
No ratings yet
Call - For - PaperDDiscrete Cosine Transform For Image Compression
7 pages
DC 8
No ratings yet
DC 8
9 pages
Summary of Research Papers
No ratings yet
Summary of Research Papers
4 pages
DIP_Module_3&4(1)
No ratings yet
DIP_Module_3&4(1)
23 pages
Synopsis Format
40% (5)
Synopsis Format
2 pages
Pms - Contoh Tubes Pms
No ratings yet
Pms - Contoh Tubes Pms
74 pages
Compusoft, 2 (5), 127-129 PDF
No ratings yet
Compusoft, 2 (5), 127-129 PDF
3 pages
Lossless Compression in MPEG4 Videos: K.Rajalakshmi, K.Mahesh
No ratings yet
Lossless Compression in MPEG4 Videos: K.Rajalakshmi, K.Mahesh
4 pages
w13364
No ratings yet
w13364
14 pages
Kratochvil
No ratings yet
Kratochvil
4 pages
Csiplearninghub Co...
No ratings yet
Csiplearninghub Co...
27 pages
MTEK Lect-Wavelet Filt
No ratings yet
MTEK Lect-Wavelet Filt
23 pages
D-Series CO2 Laser Brochure
No ratings yet
D-Series CO2 Laser Brochure
6 pages
SAR GPON Module Installation Guide R9.0.R4
No ratings yet
SAR GPON Module Installation Guide R9.0.R4
48 pages
7-Paper
No ratings yet
7-Paper
8 pages
Yao 2007
No ratings yet
Yao 2007
4 pages
Case Analysis 3: Papa Gino's Inc
No ratings yet
Case Analysis 3: Papa Gino's Inc
11 pages
Os Lab Manual R20
No ratings yet
Os Lab Manual R20
34 pages
Group 3 - Introduction To Educational Research
No ratings yet
Group 3 - Introduction To Educational Research
12 pages
BM NOTES Uit - 4 and 5
No ratings yet
BM NOTES Uit - 4 and 5
20 pages
BOM configuration
No ratings yet
BOM configuration
31 pages
Improved Error Detection and Data Recovery Architecture For Motion Estimation Testing Applications
No ratings yet
Improved Error Detection and Data Recovery Architecture For Motion Estimation Testing Applications
7 pages
CH: 7 Fundamentals of Video Coding
No ratings yet
CH: 7 Fundamentals of Video Coding
13 pages
2nd SPTA General Assembly Accomplishment Report
No ratings yet
2nd SPTA General Assembly Accomplishment Report
10 pages
Legrand GRMS Technical Specifications BACnet HOTEL - EN
No ratings yet
Legrand GRMS Technical Specifications BACnet HOTEL - EN
15 pages
Video Coding With Semantic Image
No ratings yet
Video Coding With Semantic Image
6 pages
Wksheet2 L2 Spr2024
No ratings yet
Wksheet2 L2 Spr2024
4 pages
MPEG-unit-5
No ratings yet
MPEG-unit-5
9 pages
Management Plane and Netconf
No ratings yet
Management Plane and Netconf
14 pages
422 10 Bit Color Subsampling
No ratings yet
422 10 Bit Color Subsampling
7 pages
Iso 22318 2015 Business Continuity Management System
No ratings yet
Iso 22318 2015 Business Continuity Management System
5 pages
BSNL_KTD_FTTH_Tariff
No ratings yet
BSNL_KTD_FTTH_Tariff
11 pages
Upwork Beginners Complete Guide 3
No ratings yet
Upwork Beginners Complete Guide 3
4 pages
Esra Helena-Review Jurnal
No ratings yet
Esra Helena-Review Jurnal
3 pages
Catálogo Emlid Reach RS2 - GeoSurvey - ENG PDF
No ratings yet
Catálogo Emlid Reach RS2 - GeoSurvey - ENG PDF
2 pages
Morph Effect
No ratings yet
Morph Effect
14 pages
1.6.11 Worksheet
No ratings yet
1.6.11 Worksheet
2 pages
The Phaeton Air Suspension Design and Function
100% (1)
The Phaeton Air Suspension Design and Function
75 pages
Vehicle Information: MFG: DTC SPN FMI Type Message
No ratings yet
Vehicle Information: MFG: DTC SPN FMI Type Message
3 pages
Maharashtra State Board of Technical Education (MSBTE) : Chief Patrons
No ratings yet
Maharashtra State Board of Technical Education (MSBTE) : Chief Patrons
2 pages
Ophthalmic EMR - (Your Machine IP Address Is 17 2
No ratings yet
Ophthalmic EMR - (Your Machine IP Address Is 17 2
1 page
Distributed Control Systems: Prof - Dr. Joyanta Kumar Roy
No ratings yet
Distributed Control Systems: Prof - Dr. Joyanta Kumar Roy
27 pages
Quarter 3 - Module 1-W3: Computer Systems Servicing (CSS)
No ratings yet
Quarter 3 - Module 1-W3: Computer Systems Servicing (CSS)
3 pages
Beep Code Manual
No ratings yet
Beep Code Manual
8 pages
Tacho Troubleshooting v1.3
No ratings yet
Tacho Troubleshooting v1.3
23 pages
Diplexores GSM - DCS - Umts
No ratings yet
Diplexores GSM - DCS - Umts
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Edge Detection: Exploring Boundaries in Computer Vision
From Everand
Edge Detection: Exploring Boundaries in Computer Vision
Fouad Sabry
No ratings yet
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

Uploaded by

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

Uploaded by

Study of Subjective Quality and

Objective Blind Quality Prediction

Objective Quality Assessment of 3D videos

Subjective Quality Assessment of S3D videos and Video Set

Flowchart of the proposed MoDi3D algorithm

Motion and Disparity Feature Extraction:

Disparity Features Set:

𝑆(𝑥, 𝑦) = 𝑓(𝑙(𝑥, 𝑦). 𝑐(𝑥, 𝑦). 𝑠(𝑥, 𝑦))

The resulting similarity measure, SSIM index between signals x and y is

The steerable pyramid decomposition is a linear multi resolution image decomposition

𝑓 α = [ 𝛼𝑗𝑖 ] 𝑓 β = [ 𝛽𝑗𝑖 ] 𝑓 Ψ = [ 𝛹𝑗𝑖 ]

Pristine Multivariate Gaussian Models:

Next the pairwise product of likelihood estimate scores

Global Motion Strength:

∆ represents the global motion strength of an S3D video.

MoDi2D = log (γ × ∆).

Overall Quality Computation:

Result And Discussion

DATA SET LCC SROCC RMSE

DATA SET LCC SROCC RMSE

You might also like