dibr synthesized image quality assessment based on ... - IEEE Xplore

1 downloads 0 Views 228KB Size Report
between the morphological pyramid decomposition scheme and human visual perception. In this paper, multi-scale measure, morphological pyramid peak ...
DIBR SYNTHESIZED IMAGE QUALITY ASSESSMENT BASED ON MORPHOLOGICAL PYRAMIDS Dragana Sandić-Stankovića, Dragan Kukoljb, Patrick Le Calletc a

Institute for Telecommunication and Electronics, IRITEL, Beograd, Serbia [email protected] b University of Novi Sad, Faculty of Technical Sciences, Serbia [email protected] c Ecole polytechnique de l'Universite de Nantes, IRCCyN Lab, France [email protected] ABSTRACT

Most Depth Image Based Rendering (DIBR) techniques produce synthesized images which contain non-uniform geometric distortions affecting edges coherency. This type of distortions are challenging for common image quality metrics. Morphological filters maintain important geometric information such as edges across different resolution levels. There is inherent congruence between the morphological pyramid decomposition scheme and human visual perception. In this paper, multi-scale measure, morphological pyramid peak signal-to-noise ratio MP-PSNR, based on morphological pyramid decomposition is proposed for the evaluation of DIBR synthesized images. It is shown that MPPSNR achieves much higher correlation with human judgment compared to the state-of-the-art image quality measures in this context. Index Terms — Multi-scale PSNR, morphological pyramids, DIBR synthesized image quality assessment 1. INTRODUCTION Depth image-based rendering techniques which generate new viewpoints can be used for different 3D video applications in a 2D context like free viewpoint video (FVV) or in a 3D context like reproduction of stereoscopic vision on 3D displays [1]. The main difference between the two applications is the stereopsis phenomenon (fusion of the left and the right view in human visual system) used by 3DTV. Depending on the context, the factors affecting the visual experience may differ. Distortions perceptible in 2D and 3D context may be different. The quality assessment protocols for different contexts may be different. The success of 3D video applications depends on the perceptual quality of the synthesized views. In this paper, the quality assessment of still images from free viewpoint video in a 2D context is concerned. The evaluation of still images is important scenario in the case when the user switches the video in pause mode [1]. Virtual views synthesized either from original data or from decoded and distorted data should be assessed. In this paper, still images from virtual views synthesized from original data are assessed. The evaluation of DIBR synthesized views from uncompressed data has been already discussed in literature for still images from FVV in 2D context [2] using IRCCyN/IVC DIBR image dataset [3]. It has been demonstrated that 2D quality metrics originally designed to address image compression distortions are very far to be effective to assess the visual quality of synthesized views. 978-1-4673-8090-4/15/$31.00 ©2015 IEEE

New metrics, VSQA [4] and 3DswIM [5], dedicated to the evaluation of DIBR synthesized images from FVV have been proposed to improve the performance obtained by standard quality metrics. Both metrics are dedicated to synthesis-related artifacts without compression-related artifacts and both metrics are tested using IRCCyN/IVC DIBR images database [3]. The VSQA metric achieves the gain of 17.8% over SSIM in correlation to subjective measurements. 3DswIM metric outperforms the conventional 2D and DIBR-synthesized views dedicated metrics under test. Even while these results are encouraging there is still room for improvement motivating our contribution. As in most other areas of image processing and analysis, multi-resolution methods have improved performances compared to single-resolution methods for image quality assessment. The image pyramid offers a flexible, convenient multiresolution format that matches the multiple scales found in the visual scenes and mirrors the multiple scales of processing in the human visual system [6]. Multi-scale structural similarity measure, MS-SSIM [7] is based on linear low-pass pyramid decomposition. Multi-scale image quality measures using information content weighted pooling, IW-SSIM and IW-PSNR [9] use Laplacian pyramid decomposition. CW-SSIM [10] is based on multi-orientation steerable pyramid decomposition using multi-scale bandpass oriented filters. DIBR algorithms introduce new types of artifacts mostly located around disoccluded regions [1]. These artifacts as opposed to usual 2D video compression are non-uniformly spatially located. They consist mainly of geometric transformations affecting edges coherency in the synthesized images. Typical rendering errors include black hole, boundary blur, edge displacements or misalignments. These artifacts are consequently challenging for standard quality metrics, usually tuned for other types of distortion. Multiscale approach with linear filters is not sufficient in the context of synthesized DIBR images due to specificity of geometric distortions which affect edges coherency in the synthesized images. We propose to combine multi-scale analysis with morphological operators to better deal with this problem. More precisely, by introducing non-linear morphological filters, important geometric information such as edges is maintained across different resolution levels [11]. In the previous work [13], we explored morphological wavelet decompositions for the multiscale metric MW-PSNR, which achieves much higher correlation with human judgment compared to the state-of-the-art image quality measures. In this paper, we explore morphological pyramid decomposition for the multi-scale metric, morphological pyramid peak signal-to-noise ratio measure MP-PSNR. There

is inherent congruence between the morphological pyramid decomposition scheme and human visual perception [12]. MPPSNR is based on a morphological bandpass pyramid (MBP) decomposition generated using the Laplacian type pyramid decomposition scheme [14] where morphological filters are used instead of linear filters. MBP can be interpreted as a structural image decomposition tending to enhance image features such as edges which are segregated by scale at the various pyramid levels [12]. We propose to calculate MSE at all pyramid levels. These MSE are than combined to obtain the multi-scale MPPSNR metric. Since the morphological operators involve only integer, max, min and addition operations in their computation and the calculation of MSE is not computationally demanding, the proposed MP-PSNR approach is with low computationally complexity. The next section introduces the proposed measure MPPSNR with its main stages: morphological pyramid decomposition, distortion and pooling stage. The experimental protocol and achieved MP-PSNR performances are presented in section 3. 2. MORPHOLOGICAL PYRAMID PEAK-SIGNAL-TONOISE-RATIO (MP-PSNR) In this section, the proposed multi-scale measure MP-PSNR is described. It’s first stage is morphological bandpass pyramid decomposition. In the second stage, squared error maps are calculated at all pyramid levels and mean squared errors (MSE) are calculated from these maps. MSE from all pyramid levels are combined into multi-scale MP-MSE which is transformed in MP-PSNR. f R0

+

mr

↓2

me

↑2

f R1

MSE1

↓2

+

me

↑2

f RM −1

↑2

mr

↓2

+ f D1

f DM − 2

y∈SE (2) where f is grayscale image and SE is binary structuring element.

The erosion/dilation filter pair in MBP scheme constitutes a morphological adjunction pyramid ED [15]. Using MBP ED decomposition, the pixels in the approximation images are with lower intensities than in the original image and therefore the detail images are nonnegative [15]. 2.2 Distortion and pooling stage After the morphological pyramid decomposition of the reference and DIBR-synthesized images, squared error maps are calculated between appropriate images of the two pyramids in the second stage of multiscale metrics MP-PSNR. Mean squared errors (MSE) are calculated from these maps, Fig.2. MSE j is calculated as the mean value of the squared error map at scale j:

MSE j =

1 Nj ⋅Kj

KjNj

∑ ∑ ( x j (k , n) − y j (k , n)) 2

where x j and y j denote the reference and distorted pyramid error MP-MSE is calculated as weighted product of MSE j of all M



j =1

MSE M

me

↑2

mr

↓2

(3)

k =1n =1

image at scale j, which size is K j N j . Multi-scale mean squared

MP _ MSE =

d DM −1

me

y∈SE

D : dilation ( f )( x) = max f ( x − y)

pyramid images and equal weights β i

MSEM −1

d D1

f D0

mr

d RM −1

d R1

+

f RM − 2

prominent in the images on the higher decomposition levels. In this paper, morphological operator erosion (1) is used as reduce operator and morphological operator dilation (2) as expand operator in MBP. (1) E : erosion ( f )( x) = min f ( x + y )

me morphological expand filter 2.1 Morphological band-pass pyramid decomposition The reference and DIBR-synthesized images are decomposed by morphological band-pass pyramid (MBP) [12], [15], [16] in the first stage of MP-PSNR computation. MBP is an array of detail images of decreasing size d j , j = 1 , ..., M − 1 with the lowest resolution image copy f M −1 at the top of pyramid, Fig.2. MBP uses morphological reduce filters mr for low-pass filtering and morphological expand filters me for interpolation filtering. A detail pyramid, band-pass pyramid, is derived by subtracting from each level an interpolated version of the next coarser level. MBP tends to enhance image features such as edges which are segregated by scale in the various pyramid levels. Enhanced features are segregated by size: fine details are prominent in the lower level images while progressively coarser features are

(4)

Finally, multi-scale peak signal-to-noise-ratio MP-PSNR is calculated as: R2 ) (5) MP _ PSNR = 10 ⋅ log 10 ( MP _ MSE where R is the maximum dynamic range of the image.

f DM −1

Fig. 2. Multi-scale MP-PSNR based on morphological band-pass pyramid decomposition and MSE: mr morphological reduce filter,

[MSE j ] β j

3. RESULTS In this section, experimental setup for the validation of MPPSNR measure and its performances are presented. 3.1 Experimental setup Multi-scale image quality measure MP-PSNR is evaluated using IRCCyN/IVC DIBR images database [3] introduced in [2]. This database contains frames from 3 multiview video plus depth sequences: Book Arrival (1024x768, 16 cameras with 6.5cm spacing), Lovebird1 (1024x768, 12 cameras with 3.5cm spacing), Newspaper(1024x768, 9 cameras with 5cm spacing). For each sequence, four virtual views are generated according to Table I, on the positions corresponding those positions obtained by the real cameras. One key frame from each synthesized sequence is randomly chosen for the DIBR images database [2]. Seven depth image based rendering algorithms, named A1-A7, are used. In the algorithm A1 [17] the depth image is preprocessed by a low-pass filter. The image border is cropped and than the image is interpolated to reach its original size. The algorithm A2 is based on A1 except that the borders are not cropped but inpainted by the method described in [18]. The algorithm A3

[19] use inpainting method [18] to fill in the missing parts in the virtual image which introduces blur in the disoccluded area. This algorithm was adopted as the reference software for MPEG standardization experiments in 3D Video group. The algorithm A4 performs hole-filling method aided by depth information [20]. The algorithm A5 uses a patch-based texture synthesis as the hole-filling method [21]. The algorithm A6 uses depth temporal information to improve synthesis in the disoccluded areas [22]. The frames generated by algorithm A7 contain unfilled holes. The shifting artifact is the most noticeable in the frames synthesized by algorithm A1. Therefore, these frames are excluded from the tests. In this paper, part of DIBR images database containing 72 frames generated by A2-A7 algorithms is used. Table I. The IRCCyN/IVC DIBR images database: 3 MVD sequences, 4 views synthesized by 7 DIBR algorithms sequence BookArrival

Lovebird

Newspaper

orig. view 10 8 10 8 8 6 8 6 6 4 6 4

synth. view 8 9 9 10 6 7 7 8 4 5 5 6

frame number 60 54 60 54 112 104 112 104 104 136 104 136

In the subjective assessment session [2], 43 non-expert observers scored every key frame according to five-level ACR-5 scale. Observers’ opinion scores are then averaged by computing a mean opinion score (MOS). Absolute categorical rating with hidden reference removal was used to collect perceived quality scores [2]. The stimuli were displayed on a TVLogic LVM401W and according to ITU-T BT.500. Considering the large size of the tested database, only key frames were presented to the observers since still images can be a plausible scenario for FVV [2]. The difference mean opinion scores (DMOS) is calculated as the difference between the reference frame’s MOS and the synthesized frame’s MOS. 3.2 Performances of MP-PSNR To compare the performances of the image quality measures the following evaluation metrics are used: root mean squared error between the subjective and objective scores (RMSE), Pearson correlation coefficient with nonlinear mapping between the subjective scores and objective measures (PCC) and Spearman’s rank order correlation coefficient (SCC). The calculations of DMOS from given MOS, and non-linear mapping between the subjective scores and objective measures are done according to [23]. Morphological pyramid decompositions with different number of decomposition levels (1–7) and with different sizes of square SE (2x2 to 13x13) are explored. The number of decomposition levels for the best MP-PSNR performances depends on the size of SE, Table II. In cases when the size of SE is from 3x3 to 9x9, the best performances of MP-PSNR are obtained with 5 decomposition levels (M=6). The shape and the size of the SE determine which geometrical features are preserved in the filtered image especially the direction of object’s enlargement or shrinking. More features are removed as larger SE is used. Equal value

weights β i are used for the MP-MSE (4). The MP-PSNR performances increase with the enlargement of the SE. The highest correlation with subjective scores, Pearson 0.887 and Spearman 0.817 and the lowest RMSE is for MP-PSNR using MBP ED pyramid with SE of size 11x11 pixels. Table II. Performances of MP-PSNR with different sizes of SE SE 2x2 3x3 5x5 7x7 9x9 11x11 13x13

levels 6 5 5 5 5 4 4

RMSE 0.4101 0.3996 0.3561 0.3264 0.3263 0.3165 0.3221

PCC 0.8019 0.8131 0.8549 0.8796 0.8798 0.8874 0.8830

SCC 0.7083 0.7101 0.7759 0.8050 0.8015 0.8175 0.8021

In order to find the influence of the DIBR algorithm on the correlation of the MP-PSNR with DMOS, we used 6 parts of the DIBR database. Each part contains 60 frames. In each part frames synthesized by two algorithms, A1 and An, n=2,…7 are excluded (see Table III). We calculated Pearson correlation of the MP-PSNR using SE=11x11 to DMOS for these database parts. The highest score is achieved for the part of the database which doesn’t contain frames synthesized by algorithms A1 and A3 which means that MP-PSNR has the lowest correlation to DMOS for the images synthesized by algorithm A3 (after A1). MP-PSNR has the best correlation to DMOS for the frames synthesized by algorithm A7, then A6, A5, etc. Table III. Rank of algorithms A2-A7: Pearson correlation of the MP-PSNR with DMOS for the parts of the dataset without frames synthesized by algorithms A1 and An, n=2÷7 Algor. Rank PCC

A7 1 0.845

A6 2 0.877

A5 3 0.884

A4 4 0.893

A3 6 0.913

A2 5 0.894

The performances of some commonly used 2D metrics, SSIM [8] using code [24], MS-SSIM [7], IW-SSIM [9], PSNR and IW-PSNR [9] are presented in Table IV. MP-PSNR shows much better performances than commonly used 2D metrics. Table IV. Performances of commonly used 2D metrics

SSIM MS-SSIM IW-SSIM PSNR IW-PSNR

RMSE 0.5513 0.5127 0.5350 0.4525 0.5267

PCC 0.5956 0.6649 0.6265 0.7519 0.6411

SCC 0.4424 0.5188 0.4856 0.6766 0.5320

In this paper, MSE from all pyramid levels are used for the multiscale measure MP-MSE (4). In order to get the multiscale measure with better performances, only MSE with the best performances will be used for MP-MSE in the future work. In this paper, equal weights β i are used (4). In the future work, these weights can be optimized to obtain measure with better performances. In this paper, the distortions introduced only by view synthesis algorithms are evaluated using proposed morphological pyramid measure. In the future work, the measure can be evaluated on the database which contain both synthesis and compression distortions. The proposed measure is used for the

quality assessment of still images from FVV as preliminary step. In the future work, the measure can be used also for the quality assessment of video sequences from FVV. 4. CONCLUSION Multi-scale measure, morphological pyramid peak signal-tonoise-ratio, MP-PSNR, based on morphological bandpass pyramid decomposition has been proposed for DIBR synthesized image quality assessment. The measure has better performances than tested commonly used 2D metric. MP-PSNR shows significant improvement of performances (13.55% of PCC over PSNR) for the evaluation of DIBR synthesized images. High correlations with human judgment, Pearson 0.887 and Spearman 0.817 is obtained by MP-PSNR based on morphological bandpass pyramid with analysis/synthesis operators erosion/dilation with SE of size 11x11 pixels. Morphology-based multiscale measure is computationally efficient procedure because the morphological operators involve only integers and only max, min and addition in their calculations. It provides reliable DIBR synthesized image quality assessment even without a precise registration process in the preprocessing stage and without parameter optimization. 5. ACKNOWLEDGEMENT This work was partially supported by the Ministry of Education, Science and Technological Development of the Republic of Serbia under Grant TR-32034 and by the Secretary of Science and Technology Development of the Province of Vojvodina under Grant 114-451-813/2015-03. 6. REFERENCES [1] E. Bosc, P. L. Callet, L. Morin and M. Pressigout, “Visual quality assessment of synthesized views in the context of 3DTV”, 3D-TV system with Depth-Image-Based Rendering, C. Zhu, Y. Zhao, L. Yu, and M. Tanimoto, Springer, New York, pp 439-473, 2013. [2] E. Bosc, R. Pepion, P. Le Callet, M. Koppel, P. Ndjiki-Nya, M. Pressigout and L.Morin “Towards a New Quality Metric for 3-D Synthesized View Assessment”, IEEE Journal on Selected Topics in Signal Processing, Sept. 2011. [3] ftp://ftp.ivc.polytech.univnantes.fr/IRCCyN_IVC_DIBR_Images [4] P. Conze, P. Robert, L. Morin, “Objective View Synthesis Quality Assessment”, Proc. SPIE, Stereoscopic Displays and Applications, Feb. 2012. [5] F.Battisti, E.Bosc, M.Carli, P. Le Callet, S.Perugia, “Objective image quality assessment of 3D synthesized views”, Elsevier Signal Processing: Image Communication, vol.30, pp. 78-88, January 2015. [6] E.Adelson, E.Simoncelli, W.Freeman, “Pyramids and multiscale representations”, Proc. European Conf. On Visual Perception, Aug. 1990. [7] Z. Wang, E. Simoncelli and A. C. Bovik, ”Multi-scale structural similarity for image quality assessment”, Asilomar Conference on Signals, Systems and Computers, 2003. [8] Z. Wang, A.C. Bovik, H.R. Sheikh and E. Simoncelli, ”Image Quality Assessment: From Error Visibility to Structural Similarity”, IEEE Trans. On Image Processing, April 2004. [9] Z. Wang and Q. Li, ”Information Content Weighting for Perceptual Image Quality Assessment”, IEEE Trans. On Image Processing, pp.1185-1198, May 2011. [10] Z. Wang, E.Simoncelli, “Translation insensitive image similarity in complex wavelet domain”, ICASSP, 2005.

[11] P.Maragos, R.Schafer, “Morphological systems for multidimensional signal processing”, Proc. IEEE, Apr. 1990. [12] A. Toet, “A morphological pyramidal image decomposition“, Pattern Recognition Letters, vol.9, pp. 255-261, 1989. [13] D. Sandic-Stankovic, D. Kukolj, P. Le Callet, “DIBR synthesized image quality assessment based on morphological wavelets”, International Workshop on Quality of Multimedia Experience QoMEX, May 2015. [14] P. J. Burt and E. H. Adelson, “The Laplacian pyramid as a compact image code”, IEEE Trans. on Communications, COM-31, pp.532-540, Apr.1983. [15] J. Goutsias and H. Heijmans, “Nonlinear Multiresolution Signal Decomposition Schemes—Part I: Morphological Pyramids,” IEEE Trans. on Image Processing, vol. 9, pp. 1862–1876, Nov. 2000. [16] D. Sandić-Stanković, ”Multiresolution decomposition using morphological filters for 3D volume image decorrelation”, EUSIPCO, 2011. [17] C.Fehn, “Depth image based rendering (DIBR), compression and transmission for a new approach on 3D-TV”, SPIE IS&T Electronic Imaging, 5291, 93, 2004. [18] A.Telea, “An image inpainting technique based on the fast matching method”, Journal of graphics, GPU and game tools, 9(1), 23-34, 2004. [19] Y.Mori, N.Fukushima, T.Yendo, T.Fujii and M.Tanimoto, “View generation with 3D warping using depth information for FTV”, Signal Processing: Image Communication, 24(12), 65-72, 2009. [20] K.Muller, A.Smolic, K.Dix, P.Merkle, P.Kauff and T.Wiegand, “View synthesis for advanced 3D video systems”, EURASIP Journal on image and video processing, 1-12, 2008. [21] P.Ndjiki-Nya, P.Koppel, M.Doshkov, H.Lakshman, P.Merkle, K. Muller and T.Wiegand, “Depth image based rendering with advanced texture synthesis for 3D video”, IEEE Int. Conf. on Multimedia&Expo, 2010. [22] M.Koppel, P.Ndjiki-Nya, M.Doshkov, H.Lakshman, P.Merkle, K. Muller and T.Wiegand, “Temporally consistent handling of disocclusions with texture synthesis for depth-image-based rendering ”, IEEE Int. Conf. on Image Processing, 2010. [23] VQEG HDTV Group, “Test Plan for Evaluation of Video Quality Models for Use with High Definition TV Content”, 2009. [24] https://ece.uwaterloo.ca/~z70wang/research/ssim/ssim_inde x.m