STEREOSCOPIC IMAGE QUALITY METRIC BASED ON BINOCULAR

STEREOSCOPIC IMAGE QUALITY METRIC BASED ON BINOCULAR PERCEPTION MODEL Seungchul Ryu, Dong Hyun Kim, and Kwanghoon Sohn Digital Image Media Laboratory (DIML), Department of Electrical and Electronic Engineering, Yonsei University, South Korea {ryus01, dhkim.leo, khsohn}@yonsei.ac.kr ABSTRACT Measuring a perceptual quality of an image is one of the important tasks in various applications such as image coding, processing, enhancement, and monitoring system. Although active researches have been made for objective quality assessment of 2D images for some decades, still very few efforts have been concentrated on 3D image quality assessment. In this paper, we propose a new quality metric for stereoscopic images based on the binocular perception model considering asymmetric property of a stereoscopic image pair. Experiments for publicly available databases show that the proposed metric provides consistent correlations with subjective quality scores. The results also show that the proposed metric outperforms state-of-the-arts metrics. Index Terms— Stereoscopic image, objective quality metric, binocular perception model, Structural similarity index

1. INTRODUCTION In recent years, there has been remarkable progress in the theoretical and practical understandings for image quality perception of human visual system (HVS) [1]. However, most researches have been concentrated on 2D images [1, 2, 3]. While technologies required for 3D imaging are rapidly emerging, perceptual quality of 3D image has not been thoroughly studied. Thus, studies for perceptual quality of 3D image are required for perceptually optimized 3D imaging system. A quality assessment of 3D images is a challenging task since perceptual quality of 3D image is a multidimensional perception of 2D artifacts, 3D artifacts, depth sense, and visual comfort. The general distortions of 3D imaging systems are well studied in [4]. Several severe distortions such as crosstalk between views, keystone distortion, depth-plane curvature, puppet theater effect, cardboard effect, shear distortion, picket-fence effect, and image flipping can appear in general 3D applications [4]. Subjective evaluations of 3D image were conducted to investigate the influences of 3D factors such as depth and fatigue to a perceived quality of 3D image. Over the last decades, safety and fatigue issues related to stereoscopic displays have been extensively studied [5, 6]. On the other hand, several researches [7, 8, 9] insist that image quality degradation has no effect on the perceived depth sense, while Tam et al. [10] found that perceived depth is correlated with a quality of stereo image. A few objective quality metrics for 3D images have been

proposed in [9, 11, 12, 13, 14, 15]. Several studies [9, 10, 12] have valuated the appropriateness of 2D image quality metrics for 3D images. In [11], Campisi et al. evaluated several 2D metrics for stereoscopic images. The results indicate that the performance of 2D metrics is moderate but not satisfactory when they are used for 3D images. In [9, 12], color plus depth images are evaluated using 2D metrics after being rendered to stereo images. The results show that the employed 2D metrics can predict both the perceived quality and depth sense well. However, they considered no 3D features such as depth and characteristics of stereovision. In order to measure an objective quality of a stereoscopic image, several full-reference quality metrics have been proposed using depth information [13, 14, 15]. In [13, 14], Benoit et al. proposed a quality metric of stereoscopic images using the fusion of 2D quality metrics and depth distortion. In [15], indices of image quality and stereo sense are proposed based on absolute disparity images. A no-reference stereoscopic image quality metric is proposed in [16]. The metric is based on segmented local features, edge, and disparity. In this paper, we propose a new quality metric for stereoscopic images based on binocular perception model and three components of Structural SIMilarity (SSIM) [2] index. The remainder of this paper is organized as follows. Section II describes the proposed objective quality metric for stereoscopic images. Section III then presents the experimental evaluations of the proposed metric for publicly available databases. Lastly, Section IV concludes this paper.

Fig. 1. Block diagram of the proposed metric

(a) (b) (c) Fig. 2. Examples of binocular perception: (a) original (Left) + original (right), (b) original (left) + blurred (right), (c) original (left) + blocked (right). Note that these are anaglyphic images, and thus anaglyphic glasses are required to see these figures in stereovision.

2. PROPOSED STEREOSCOPIC IMAGE QUALITY METRIC In this paper, we propose a new objective quality metric for stereoscopic images considering asymmetric property. The proposed metric is based on a mathematical model of binocular perception. Fig. 1 describes the entire block diagram of the proposed metric. First, luminance, contrast, and structural similarities are computed for original and distorted stereo images. Then, using binocular perception models, binocular perceptual luminance, contrast, and structural similarities are derived, and they are integrated into an overall quality index. Details of the proposed metric are presented in the following sections.

2.1. Binocular perception model Theory of binocular suppression insists that a perceptual quality of a stereo image pair is dominated by the high quality component [7]. Thus, based on this theory, when one image of the stereo pair is compressed with maintaining high quality, the other view can be compressed more heavily without introducing visible degradation in stereovision. Such asymmetric coding concept was introduced by Perkins [18]. Since then, several researches reported the effectiveness of the asymmetric coding [19, 20]. However, several contradictories are presented in [10, 17], which insist that a quality of stereovision is approximately average of left and right images for specific distortions such as blocking artifact. The contradiction can be explained by the fact that quality perception of a stereoscopic image pair depends on types of distortions. Here, we classify the types of distortions into two groups, information-loss distortion (ILD) and information-additive distortion (IAD). ILD occurs when details of an image is lost. For example, the detailed information is wiped out when an image is blurred or magnified. On the contrary, IAD is a kind of degradation that modifies the inherent information of the original image. Blockiness, Gaussian noise, and structural distortion are typical examples of IAD. A perceived quality of a stereo image pair is dominated by the high-quality component for ILD distortion since a high quality component containing sufficient information can supplement the

lost information of the other view. On the other hand, in case of IAD, a perceptual quality of stereovision does not follow high quality component since IAD cannot be offset or removed by the other view component. In this paper, the phenomena are referred as binocular perception. Examples of binocular perception are presented in Fig. 2. As shown in Fig. 2, when viewing the second stereo image pair, the lost details of the blurred right image are supplemented by the undistorted left image. On the other hand, for the case of the third stereo image pair, blocking artifact is still visible even when seen in stereovision. We employed a mathematical model of binocular perception to integrate the respective quality of left and right images into the overall quality. An efficient function suggested in [17] is employed as the binocular perception model. The equation is as follows:





1/n

n n  (1) Qs   w  Qlow   1  w  Qhigh  ,   where Qlow and Qhigh represent quality scores of lower and higher quality images, respectively. w and n are parameters controlling weights of binocular stereovision. A higher value of n indicates that HVS assigns higher weight to high quality image and vice versa. Furthermore, w can control the degree of suppression more precisely.

2.2. Objective stereoscopic image quality index SSIM index [2] is one of the most popular objective metrics for quality of 2D image. SSIM measures structural similarity between original and distorted images. Three components of SSIM are luminance, contrast, and structural similarities. They are computed as follows: 2ux uy  C1 , (2) l ( x, y )  ux2  u 2y  C1 c( x, y ) 

2σ x σ y  C 2 σ x2  σ y2  C2

C2 2 , s( x, y )  C2 σ xσ y  2 σ xy 

,

(3)

(4)

where l ( x, y ) , c( x, y ) , and s( x, y ) are pixel-wise luminance, contrast, and structural similarities, respectively, in which x and y are pixels in original and distorted images, respectively. u and σ represent a mean and a standard deviation of signals, respectively. The constants C1 and C2 are included to avoid instability. l ( x, y ) , c( x, y ) , and s( x, y ) are respectively summed up to luminance (L), contrast (C), and structural (S) similarities of an image using Gaussian window. As discussed in Section 2.1, we should consider a binocular perception when measuring a quality of stereo image. Thus, we extended three components of SSIM into stereo version employing binocular perception models as follows:



 Cs   wc Chigh 



 Ls   wL Lhigh 





n   1  wc Clow n  .

(5)

.  

(6)

nL

c

 1  wL   Llow 

c

nL



nS n   (7) S s   wS Shigh  1  wS  Slow  S  .   Then, binocular perceptual contrast similarity Cs, luminance similarity Ls, and structural similarity Ss are integrated into an overall quality index Q. The integration equation is as follows: (8) Q  Ls α  Cs β  S s γ ,

where α  0 , β  0 , and γ  0 are parameters used to adjust the relative importance of the three components. Note that the parameters used in this paper are as follows: wc = 0.7, wL = 0.2, wS = 0.6, nc = 1, nL = 1, nS = 2, α = 15, β = 0.2, and γ = 0.5.

3. PERFORMANCE EVALUATIONS This section presents the performance evaluations for the proposed metric. The proposed metric is compared with several existing metrics, i.e., PSNR [1], SSIM [2], 3DM1 [13, 14], and 3DM2 [15]. PSNR and SSIM are popular quality metrics for 2D image. Thus, in the comparisons, the average values of left and right images are used for PSNR and SSIM. 3DM1 combines the 2D quality metric SSIM and distortion of depth acquired by disparity estimation algorithm. 3DM2 is based on the absolute disparity image. In the experiments, we use various images and videos from IVC [14] and DIML [21] databases.

3.1. Databases IVC stereo image database consists of 6 original stereo images and 82 distorted stereo images. The distortion types are JPEG, JPEG2000, and Gaussian blurring. The average size of images is about 512  448. The subjective tests are conducted using SAMVIQ with a continuous scale graded from 0 to 100 with seventeen subjects for each image. The difference mean opinion score (DMOS) is computed for each image. We use all the images in our experiments. In order to construct the stereoscopic video database, we have conducted subjective assessment using double stimulus continuous quality scale (DSCQS) according to ITU-R 500 recommendations [22]. The DIML stereo video database consists of symmetric and asymmetric compressed stereo videos. The original videos are two MPEG 3DV test sequences [23, 24] (‘Café’, and ‘Poznan_Hall2’: 1920  1080, 10sec) and stereo videos from 3dtv.at [25] (‘Heidelberg1’, and ‘Heidelberg2’, 10sec).

Table 1. PCC Comparisons for IVC database. PSNR [1] SSIM [2] 3DM1 [14] 3DM2 [15] Proposed

Gaussian Blur 0.79 0.75 0.76 0.57 0.81

JPEG 0.74 0.85 0.81 0.73 0.87

JPEG 2000 0.74 0.81 0.88 0.78 0.89

ALL 0.76 0.80 0.82 0.73 0.85

DIML database consists of 4 original and 96 symmetrically and asymmetrically compressed stereo videos. The videos are randomly displayed on a Samsung 55’’ UN55C8WXF 3DTV with shutter glasses. For each video, twenty-one subjects conducted tests using a continuous linear scale which is divided into five different grades of “Excellent”, “Good”, “Fair”, “Poor”, and “Bad”. The DMOS is computed for each image, and they are scaled into ranges from 0 to100. The DIML database is available at [21] with DMOS.

3.2. Results In order to evaluate the performance of the proposed metric, we follow the suggestions of the VQEG report [26]. A four parameter logistic function, as recommended in [26], is used for non-linear regression before calculating the performance measures, including Pearson correlation coefficient (PCC, indicates the prediction accuracy), Spearman rank-order correlation coefficient (SROCC, indicates the prediction monotonicity), root mean squared error (RMSE, indicates the prediction consistency), and mean absolute error (MAE, indicates the prediction consistency). Note that, for a well-defined metric, the values of PCC and SROCC should be high while the values of RMSE and MAE should be low. To determine the parameters only a few training images (14 stereo images from IVC database) are used. The remainder (74 stereo images) of IVC database and all videos of DIML database are used for the performance evaluations. Table 1 presents the results of the proposed metric compared with the existing objective metrics for IVC stereo image database. As shown in Table 1, PCC of the proposed metric is 0.83, 0.91, and 0.89 for Gaussian blur, JPEG, and JPEG 2000, respectively. For all the distortions, the proposed metric provides 0.87 of PCC. The results show that the proposed metric outperforms the other metrics for IVC database. Table 2 summarizes the comparison results for DIML stereo video database. The proposed metric shows a higher performance in terms of all measurements, PCC, SROCC, RMSE, and MAE, than the other metrics. This means that the proposed metric is accurate, monotonic, and consistent objective metric for measuring perceptual quality of stereo images and videos. The results shown in Tables 1 and 2 indicate that the proposed metric can provide consistent correlations with subjective scores for both symmetric and asymmetric stereo images. Table 2. Comparison results (PCC, SROCC, RMSE, and MAE) for DIML database. PSNR [1] SSIM [2] 3DM1 [14] 3DM2 [15] Proposed

PCC 0.70 0.54 0.50 0.67 0.76

SROCC 0.70 0.66 0.51 0.68 0.82

RMSE 12.87 18.69 18.69 13.87 11.86

MAE 10.43 16.13 16.13 11.77 14.37

4. CONCLUSION AND FUTURE WORKS In this paper, an objective metric for quality of stereoscopic images based on the binocular perception models. Using the mathematical binocular perception models, luminance, contrast, structural similarities of left and right images are integrated into an overall quality index. In order to validate the performance of the proposed metric, we conducted experimental verifications on several publicly available stereo image and video databases. The results show that the proposed metric can provide consistent and outstanding performance in both symmetric and asymmetric stereoscopic images compared with four existing metrics. Future research will focus on measuring a quality based on local distortion because distortions are spatially varied in general. To this end, we will consider employing stereo feature matching to find matched local pixels. Other direction of future researches includes no-reference method due to its usefulness in many practical applications.

5. REFERENCES [1] Z. Wang and A.C. Bovik, Modern Image Quality Assessment, Morgan & Claypool, USA, 2006. [2] Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004. [3] S. Ryu and K. Sohn, “No-reference sharpness metric based on inherent sharpness,” IET Electronics Letters, vol. 47, no. 21, pp. 1178-1180, Oct. 2011. [4] L.M.J. Meesters, W.A. IJsselsteijn, and P.J.H. Seuntiëns, “A survey of perceptual evaluations and requirements of threedimensional TV,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 3, pp. 381-391, Mar. 2004. [5] M. Lambooji and W. IJsselsteijn, “Visual discomfort and visual fatigue of stereoscopic displays: a review,” Journal of Imaging Science and Technology, vol. 53, no. 3, pp. 030201-(1- 14) Apr. 2009. [6] D. Kim and K. Sohn, “Visual fatigue prediction for stereoscopic image,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 21, no. 2, pp. 231-236. Feb. 2011. [7] P. Seuntiens, L. Meesters, and W. IJsselsteijn, “Perceived quality of compressed stereoscopic images: effects of symmetric and asymmetric JPEG coding and camera separation,” ACM Transactions on Applied Perception, vol. 3, no. 2, pp. 95-109, Apr. 2006. [8] G. Leon, H. Kalva, and B. Furht, “3D video quality evaluation with depth quality variations,” Proc. IEEE 3DTV-Conference: The True Vision – Capture, Transmission and Display of 3D Video, Istanbul, Turkey, May 2008. [9] C.T.E.R. Hewage, S.T. Worrall, S. Dogan, S. Villette, and A. M. Kondoz, “Quality evaluation of color plus depth map-based stereoscopic video,” IEEE Journal of Selected Topics in Signal Processing, vol.3 no.2, pp. 304-318, Apr. 2009. [10] W.J. Tam, L.B. Stelmach, and P. Corriveau, “Psychovisual aspects of viewing stereoscopic video sequences,” Proc. SPIE, vol. 3295, Jan. 1998.

[11] P. Campisi, P.L. Callet, and E. Marini, “Stereoscopic images quality assessment,” Proc. European Signal Processing Conference (EUSIPCO), Poznan, Poland, Sep. 2007. [12] P. Joveluro, H. Malekmohamadi, W.A.C. Fernando, and A.M. Kondoz, “Perceptual video quality metric for 3D video quality assessment,” Proc. IEEE 3DTV-Conference: The True Vision – Capture, Transmission and Display of 3D Video, Tampere, Finland, Jun. 2010. [13] A. Benoit, P.L. Callet, P. Campisi, and R. Cousseau, “Using disparity for quality assessment of stereoscopic images,” Proc. IEEE International Conference on Image Processing, San Diego, CA, USA, Oct. 2008. [14] A. Benoit, P.L. Callet, P. Campisi, and R. Cousseau, “Quality assessment of stereoscopic images,” EURASIP Journal on Image and Video Processing, vol. 2008, 2008. [15] J. Yang, C. Hou, R. Xu, and J. Lei, “New metric for stereo image quality assessment based on HVS,” International Journal of Imaging Systems and Technology, vol. 20, no. 4, pp. 301-307, Dec. 2010. [16] Z.M.P. Sazzad, S. Yamanaka, Y. Kawayoke, and Y. Horita, “Stereoscopic image quality prediction,” Proc. IEEE International Workshop on Quality of Multimedia Experience (QoMEX), San Diego, CA, USA, Jul. 2009. [17] D.V. Meegan, L.B. Stelmach, and W.J. Tam, “Unequal weighting of monocular inputs in binocular combinations: implications for the compression of stereoscopic imagery,” Journal of Experimental Psychology: Applied, vol. 7, no. 2, pp. 143-153, Jun. 2001. [18] M.G. Perkins, “Data compression of stereopairs,” IEEE Transactions on Communications, vol. 40, no. 4, pp. 684-696, Apr. 1992. [19] L. Stelmach, W.J. Tam, D. Meegan, and A. Vincent, “Stereo image quality: effects of mixed spatio-temporal resolution,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 10, no. 2, pp. 188-193, Mar. 2000. [20] P. Aflaki, M.M. Hannuksela, J. Häkkinen, P. Lindroos, M. Gabbouj, “Impact of downsampling ratio in mixed-resolution stereoscopic video,” Proc. IEEE 3DTV-Conference: The True Vision – Capture, Transmission and Display of 3D Video, Tampere, Finland, Jun. 2010. [21] DIML stereo video databases, 2011, http://diml.yonsei.ac.kr/ dimldb/stereo/. [22] ITU-R Recommendation BT.500-11, “Methodology for the subjective assessment of the quality of television pictures,” ITU-R, Geneva, Switzerland, 2000. [23] H. Schwarz, D. Marpe and T. Wiegand, “Description of Exploration Experiments in 3D Video Coding,” ISO/IEC JTC1/SC29/WG11 MPEG2010/N11274, Dresden, Germany, Apr. 2010. [24] M. Domanski, T. Grajek, K. Klimaszewski, M. Kurc, O. Stankiewicz, J. Stankowski, and K. Wegner "Poznan Multiview Video Test Sequences and Camera Parameters," ISO/IEC JTC1/SC29/WG11 MPEG2009/M17050, Xian, China, Oct. 2009. [25] http://3dtv.at/Movies/Heidelberg_en.aspx. [26] VQEG, “Final report from the Video Quality Experts Group on the validation of objective models of video quality assessment Phase II,” Aug. 2003.