Perceived Quality of the Variation of the Video Temporal ... - eurasip

1 downloads 0 Views 143KB Size Report
PERCEIVED QUALITY OF THE VARIATION OF THE VIDEO TEMPORAL. RESOLUTION FOR LOW BIT RATE CODING. Quan Huynh-Thu. 1,2 and Mohammed ...
PERCEIVED QUALITY OF THE VARIATION OF THE VIDEO TEMPORAL RESOLUTION FOR LOW BIT RATE CODING Quan Huynh-Thu 1,2 and Mohammed Ghanbari 2 1

2

Psytechnics Ltd, UK Department of Electronic Systems Engineering, University of Essex, UK ABSTRACT

We conducted a subjective quality assessment experiment to measure the impact of video frame rate decimation and variation in relation with impairment duration but also with content motion and texture. We found that for intermediate and high frame rate values, quality was similar independently from the duration of the frame rate decimation. On the other hand, for very low frame rates, quality decreased as the duration of the frame rate decimation increased. Our results also do not confirm the traditional thinking of higher motion content requiring a higher frame rate to produce a given level of quality. Our observations indicate that for a given frame rate, perceived quality does not necessarily increase with decreasing motion speed and that a reduction of the temporal resolution over the entire video does not lead necessarily to a significant loss of quality. Index Terms— MOS, frame rate decimation, jerkiness, mobile video broadcasting, video-conferencing. 1. INTRODUCTION Digital transmission of video services occurs mostly over a fixed bandwidth and requires videos to be encoded using a constant target bit rate value, i.e. using variable quantizer step (QS) size, before transmission. Because the encoding bit rate will actually vary with the content complexity, a buffer and rate control mechanism are used to maintain the encoding bit rate to the target value. This regulation mechanism is not always optimal from the point of view of the human visual perception of quality. Video encoding applies compression both spatially and temporally. When video is to be transmitted over a limited bandwidth, e.g. mobile video broadcasting or video-conferencing, low encoding bit rates are typically achieved by temporally down-sampling the video signal to reduce the number of frames per second, which defines the temporal resolution of the video. For a specified target bit rate, a higher frame rate will reduce the average number of bits allocated per picture and therefore introduce stronger spatial artifacts such as blocking, blurring and ringing. On the other hand, allocating more bits per picture will reduce the average frame rate and can lead to temporal artifacts known as jerkiness and jitter. Jerkiness is motion perceived as a series of snapshots rather than looking

smooth and continuous. Jitter is motion perceived as being irregularly interrupted due to a variation of the period between successive frames (or variation of the frame rate) and isolated events of frozen or dropped frames. Both jerkiness and jitter have a negative impact on the perceived video quality. For a given source content and bandwidth, a critical issue remains to choose the adequate encoding parameters to achieve the best trade-off between picture quality and motion fluidity in order to optimize the overall perceived video quality. For a chosen target encoding frame rate and bit rate, the quantizer step size will vary accordingly depending on the complexity of the video. To cope with an increase of content complexity, the encoder will increase the QS size in order to keep the bit rate within the limit and prevent the buffer to fill up. However, with sudden increases of content complexity, if the target bit rate can not be reached even with the maximum QS size, a buffer overflow will occur causing the encoder to freeze and drop frames, regularly or irregularly in time. For applications where the content can be pre-encoded and stored before transmission, e.g. video streaming, an offline verification of the encoded video is possible and the presence of such temporal artifacts may be addressed to some extent using a trial-and-error approach to choose appropriately the encoder settings and produce the best subjective quality. However, this is not possible for mobile video broadcasting and video-conferencing scenarios where adequate choice of the encoding frame rate remains an important issue to address in order to provide highquality live content over low bandwidths. In this paper, we investigate the perceived impact of frame rate decimation and variation in the video. We study more specifically the impact of the duration and strength of the impairment. We also study the influence of frame rate decimation in relation with content motion and texture. Our results provide a useful insight on user preference and perception of quality, and can be used to design better rate control mechanisms in video encoders and develop perceptual video quality metrics. 2. BACKGROUND Several related studies on the characterization of visual quality in motion sequences, e.g. [1][2][3], have shown that the overall perceived video quality is the result of the interaction between a spatial and a temporal perceptual

quality axis, corresponding respectively to human responses to spatial errors and motion distortions. Most of previous research studies addressing the issue of frame rate decimation only considered the case where frame rate is constant across the entire video. The effect of frame rate decimation alone or combined with a burst of dropped frames was studied in [4]. Frame rate decimation was reported to have a negative impact on perceived video quality and the visibility of the impairment was found to be content and motion dependent. Similar findings were reported in [5] using videos impaired by several frame rate decimation factors. The authors conducted two experiments, one using QCIF and the other using CIFresolution videos. Results indicated that, with exception of very low-motion sequences such as talking-heads, quality variation with frame rate was similar for both resolutions. In [6], the authors showed that the traditional schemes of rate control in video encoders are not optimal in terms of human perception of quality and investigated the impact of spatial resolution, frame rate and quantization parameter on subjective video quality. They compared different combinations of frame rates and QS sizes, and observed through comparison that frame rate should be reduced when QS size reaches mid-range value. Irregular frame dropping was addressed in [7] and a temporal quality metric was proposed based on motion activity in each group of frame dropping. All previous studies except [7] considered constant frame rate over the entire sequence but frame rate variation was not investigated. In all of studies, the impact on perceived quality was reported to be only motion-dependent. The focus of our investigation is therefore to study the impact of frame rate decimation and variation in relation to content motion and also texture. We consider scenarios of real-time communication over low bandwidths, such as video-conferencing and mobile broadcasting, where the frame rate can be adjusted dynamically. 3. QUALITY ASSESSMENT EXPERIMENT 3.1. Test environment and procedure We conducted a subjective quality assessment experiment in a soundproof room with controlled lighting conforming to international Recommendations [8]. A calibrated 17inch computer LCD monitor was used to display the video sequences. Videos were displayed at 1:1 pixel ratio in the center of the screen at native resolution and surrounded by a neutral grey background. The experiment was computerbased and quality ratings were recorded electronically using a computer mouse. We conducted the experiment using the single-stimulus ACR method with the 5-point categorical quality scale [8]. Each processed video sequence is presented one at a time and rated individually. The presentation order of the test videos was randomized between viewers such that each of them viewed the test sequences in a different presentation order. After each video presentation, viewers are asked to judge its overall quality. Voting period was not time-limited. After each

vote, a neutral grey background was displayed on the screen during 1 second before the next sequence was presented. Viewing distance was not fixed. Participants were allowed to adjust to their most comfortable viewing distance. The test procedure and monitor selection adhere to the latest findings and best practice recommendations from the Video Quality Experts Group1 supporting ITU standardization activities. A total of 15 naïve viewers participated in our experiment. All participants reported to have normal vision. Before the start of the test, viewers had to go through 4 practice trials covering a representative range of quality included in the actual test but using a different content than the one of the test. 3.2. Source material We conducted our experiment using seven source videos covering different categories of content: drink (advertisement), mountain (documentary), ski (home video), music (music video), football (sports), boxing (movie/advertisement) and talk (talking-head). These spanned over a wide range of spatial and motion complexities. Source videos were progressive, 8 seconds in duration and did not contain any audio track. The image resolution was QCIF (176 x 144) and original frame rate was 25 fps. Source videos were generated from original standard-definition or high-definition material. 3.3. Experimental design and stimuli Each source video (SRC) was processed through each condition (HRC) to generate the processed video sequences (PVS) used in the experiment. The HRCs included only temporal degradations so to investigate the impact of such impairments independently from the influence of spatial artifacts. We included a hidden reference condition, i.e. one error condition was the original unprocessed sequence. We considered frame rates of 25, 12.5, 8.33, 6.25, 5 and 2.5 fps. We varied the duration of the frame rate decimation between 100%, 50% and 25% of the video duration. When the temporal impairment was introduced during a portion of the video, it was placed more or less around the middle of the video. 4. RESULTS AND DISCUSSION 4.1. Quality variation with impairment strength First, we examine the impact of frame rate decimation when this occurs over the entire video. Figure 1 shows frame rate against condition MOS (average over content) with the 95% confidence intervals. A first interesting finding is that MOS remains in the good quality category for a frame rate of 12.5fps. This indicates that frame rate decimation by a factor of 2 did not affect significantly the perceived quality. Another interesting observation is that the confidence interval around MOS increases with the 1

See Multimedia Test Plan on http://www.vqeg.org

decreasing frame rate value, i.e. subjects‟ opinion of quality seemed to diverge as the frame rate decreased. From 25 to 8.33 fps, quality decreases only slowly as the frame rate decreases. However, from 8.33fps quality dramatically drops with the frame rate.

Figure 1 - Quality variation with frame rate decimation.

However, we observe that this sequence did not always receive the highest quality. This suggests that content texture and/or semantic play also a role in the perceived quality of video impaired by frame rate decimation. In particular, motion impairment of natural movement such as lip movement seems to create a stronger negative impact on quality judgment. A 3-way repeated measures analysis of variance (ANOVA) indicated main effects of content and frame rate. On the other hand, the main effect of duration was not significant at the same level (p=0.0554). The p-value (p=0.0006) for the 2-way interaction between frame rate and duration indicated that the impact of frame rate duration was dependent on the value of the frame rate. The p-value (p=0.0001) for the 2-way interaction between content and frame rate indicated that the impact of frame rate decimation was content-dependent. Finally, the ANOVA found that the three-way interaction between content, frame rate, and duration was not significant (p=0.1276) 4.2. Quality variation with impairment duration

Figure 2 - Content dependency.

The effect of jerkiness by content is shown in Figure 2. For each content, MOS decreases with the frame rate. Some data points seem to indicate an occasional inversion, i.e. MOS becomes higher for a lower frame rate. However these points indicate in reality statistically equivalent MOS values with overlapping confidence intervals (not plotted for graph clarity). For a given frame rate, quality varies with content. Furthermore, the extent of quality loss with decreasing frame rate is also content-dependent. For the content „ski‟, quality varies within 1 MOS unit between 25 and 5fps, whereas for the content „drink‟, quality varies within 2 MOS units for the corresponding range of frame rates. It was reported in previous studies that the higher the motion in a video sequence, the more negatively motion impairment such as frame rate decimation should affect quality [5][9]. Our results do not confirm this and show that the content with the lowest motion magnitude („talk‟) did not necessarily obtained the highest quality at a given frame rate. On the contrary, we see that the talkinghead content was in overall one of the most affected content. If the „talk‟ sequence is not taken into account, „mountain‟ is the content with lowest motion magnitude.

We now examine the impact of jerkiness when frame rate decimation occurs only during a portion of the video. In our experiment, we considered frame rate decimation during 100%, 50% and 25% of the video duration. We see in Figure 3 that the decrease of average quality with frame rate has a similar shape in all duration cases. Figure 4 shows the regression curves for each of the duration of the frame rate decimation. The regression curves have each a correlation of 0.99 with their corresponding data. Between 25 and 12.5fps, the difference between the three impairment durations is only marginal. In other words, in this range of frame rates, quality decreases slightly with frame rate but independently from the duration of the frame rate decimation. Our experimental results in Figure 3 show that MOS at 12.5fps is equivalent for all durations of the impairment. In other words, quality was identical whether a frame rate of 12.5fps occurred during the entire video or during a portion (50% or 25%) of the video with the remaining of the video being at full frame rate of 25fps. This indicates that down-sampling the video temporal resolution from 25 to 12.5fps before encoding and transmission does not significantly affect end-user quality. Conversely, our results suggests that there is no advantage from the point of view of the end-user quality to keep a very high frame rate (i.e. using a higher QS size for a specified bit rate) if this frame rate drops at some point during the video playback. We further note that between 12.5 and 8.33 fps the regression curves remain similar for an impairment duration of 100% and 25% of the video. This means that frame rate decimation down to 8.33 fps had the same effect whether it occurred during the entire video or during a short portion of the video. On the other hand, within this range of frame rates, the curve for an impairment duration of 50% starts to diverge, indicating a slightly lower perceived quality. These results suggest that viewers were

less annoyed by jerkiness if it occurred over the entire video or during a short portion of the video, compared to the same impairment occurring during a significantly long duration of the video. Interestingly, we note that between 8.33 and 3.5 fps, a higher impairment duration at a given frame rate does not necessarily produce a lower quality. Indeed, in this range the quality will be highest to lowest for an impairment occurring during 25%, 100% and 50% of the video respectively. However, we see the opposite effect appearing for very low frame rates (around 2.5 fps). For this range, quality increases with a decreasing impairment duration at a given frame rate.

perceived quality is not necessarily highest for lowest motion content. We observed that the amount of motion is not the only factor affecting perceived quality of temporal artifacts. In particular, we suggest that low-motion talkinghead sequences represent a specific type of content that needs to be considered separately from other general content. The results of this study can also be considered for the optimization of encoders operating at low bit rates. They indicate that a higher video quality may be achieved by decreasing the video temporal resolution by a factor of 2 or 3. Such a reduction of the temporal resolution would allow a significantly higher average bits per picture and potentially lead to a significant gain in picture quality, and therefore improve overall video quality. However, decreasing further the frame rate to increase further the picture quality will have a negative impact on the video quality because excessive jerkiness will lead to a dramatic drop of video quality even if the picture quality is really high. In other words, frame rate can be decreased up to the perceptual breakpoint where jerkiness then becomes more annoying than compression artifacts. 6. REFERENCES

Figure 3 - Quality variation with duration of frame rate decimation

[1] M. Masry, S.S. Hemami, A.M. Rohaly, and W. Osberger, “Subjective Quality Evaluation of Low Bit Rate Video,” in Proc. SPIE Human Vision and Electronic Imaging, June 2001, pp. 102-113. [2] F. Speranza, A. Vincent, D. Wang, A. Mainguy, P. Blanchfield, and R. Renaud, “Rate Control for Improved Picture Quality in Low-bit Rate Video Coding,” in Proc. SPIE Visual Communications and Image Processing, Jan. 2002, vol. 4671, pp. 722-733. [3] R.R. Pastrana-Vidal, JC Gicquel, JL Blin, and H. Cherifi, “Predicting Subjective Video Quality From Separated Spatial and Temporal Assessment,” in Proc. SPIE Human Vision and Electronic Imaging, Jan. 2006, vol. 6057, pp. 276-286. [4] R.R. Pastrana-Vidal, JC Gicquel, and H. Cherifi, “Frame Dropping Effects on User Quality Perception,” in Proc. 5th Workshop on Image Analysis for Multimedia Interactive Services, Apr. 2004.

Figure 4 - Regression curves

5. CONCLUSION We conducted a subjective experiment to characterize the impact of jerkiness on the perceived video quality. We considered the impairment occurring during a portion of or during the entire duration of the video. For intermediate/high frame rate values, we found that quality was similar independently from the duration of the frame rate decimation impairment. For low/intermediate frame rate values, we found that quality was similar whether the impairment occurred during the entire video or during a short portion of the video. For very low frame rates, we found that quality decreased as the frame rate impairment duration increased. We found that, at a given frame rate,

[5] Z. Lu, W. Lin, C.S. Boon, S. Kato, S. Yao, E. Ong, and X.K. Yang, “Measuring the Negative Impact of Frame Dropping on Perceptual Visual Quality,” in Proc. SPIE Human Vision and Electronic Imaging, Jan. 2005, vol. 5666, pp. 554-562. [6] D. Wang, F. Speranza, A. Vincent, T. Martin and P. Blanchfield, “Towards Optimal Rate Control: A Study of the Impact of Spatial Resolution, Frame Rate, and Quantization on Subjective Video Quality and Bit Rate,” in Proc. SPIE Visual Communications and Image Processing, Jan. 2003, vol. 5150, pp. 198-209. [7] K.C. Yang, C.C. Guest, P.K. Das, and K. El-Maleh, “Perceptual Temporal Quality Metric for Compressed Video,” in Proc. 3rd Int. Workshop on Video Processing and Quality Metrics for Consumer Electronics, Jan. 2007 [8] ITU-T, “Subjective Video Quality Assessment Methods for Multimedia Applications,” Rec. P.910, Geneva, Sep. 1999. [9] R. Feghali, F. Speranza, D. Wang, and A. Vincent, “Video Quality Metric for Bit Rate Control via Joint Adjustment of Quantization and Frame Rate,” IEEE Trans. Broadcast., vol.53, no. 1, pp. 441-446, Mar. 2007.