Mobile stereo video broadcast - Tampereen teknillinen yliopisto

4 downloads 0 Views 2MB Size Report
encapsulation with forward error correction codes (MPE-FEC). As such, it looks quite capable of broadcasting 3D video to mobile users. In this report, the DVB-H ...
MOBILE3DTV

D3.2

Mobile stereo video broadcast Gozde B Akar M. Oguz Bici Anil Aksay Antti Tikanmäki Atanas Gotchev

1

MOBILE3DTV project has received funding from the European Community’s ICT programme in the context of the Seventh Framework Programme (FP7/2007-2011) under grant agreement n° 216503. This document reflects only the authors’ views and the Community or other project partners are not liable for any use that may be made of the information contained therein.

MOBILE3DTV Project No. 216503

Mobile stereo video broadcast Gozde B Akar, M. Oguz Bici, Anil Aksay, Antti Tikanmäki, Atanas Gotchev

Abstract: In this report, the DVB-H channel is studied for its capabilities to transmit stereo video in different formats and to provide adequate error protection for such content. The following stereo video representations are considered: two channel stereo, encoded either independently with H.264 encoder (simulcast) or by multi-view coding (MVC) and video plus depth both encoded by H.264. These encoding approaches are implemented and combined with an IP streaming application. DVB-H specific tools, such as MPE-FEC encapsulator and decapsulator and physical layer simulator complement the software environment built to run end-to-end video transmission simulations. In the experimental tests, various physical channel models are used to represent different reception conditions. These are put against different stereo video encoding modes and MPE-FEC rates. The preliminary results show the feasibility of the DVB-H channel to handle encoded stereo video streams. They also illustrate the problems of transmission in bad channel conditions, which should be addressed by optimizing the MPE-FEC rates and developing novel application layer error protection schemes

Keywords: Error characteristics, stereo, error resilience, 2D+depth

MOBILE3DTV

D3.2

Executive Summary The DVB-H standard has been developed to provide broad and flexible broadcast of reach data and to overcome the challenges of mobile reception environment. While largely based on the terrestrial digital video broadcasting (DVB-T) standard, it also introduces time-slicing for lower power consumption and special error protection through combining multiple protocol encapsulation with forward error correction codes (MPE-FEC). As such, it looks quite capable of broadcasting 3D video to mobile users. In this report, the DVB-H channel is studied for its capabilities to transmit stereo video in different formats and to provide adequate error protection for such content. In order to make the content self-readable, the report incorporates parts of the previous report on most typical error patterns of the DVB-H channel. The following questions are addressed and studied in the report: ensuring transmission of stereo video in different formats by designing a proper streamer for such content; building real physical channel and designing simulation tools of the link layer and physical layer; performing simulation tests for different channel conditions and video formats. The following stereo video representations are considered: two channel stereo, encoded either independently with H.264 encoder (simulcast) or by multi-view coding (MVC) and video plus depth both encoded by H.264. These encoding approaches are implemented and combined with an IP streaming application. DVB-H specific tools such as MPE-FEC encapsulator and decapsulator and physical layer simulator have been developed and are described in the report. Thus, a complete software environment has been built to run end-to-end video transmission simulations. It accompanies the fully operational DVB-H broadcast system set up at Tampere University of Technology (TUT). The system encompasses the whole broadcast chain including encoders, encapsulator, transmitter and receivers. In the experimental tests, various physical channel models have been used to represent different reception conditions. These are put against different stereo video encoding modes and MPE-FEC rates. While these preliminary results show the feasibility of the DVB-H channel to handle encoded stereo video streams, they also show that optimal MPE-FEC rates and other application layer error protection schemes have to be studied in order to ensure an adequate delivery of 3D video content over the targeted system.

2

MOBILE3DTV

D3.2

Table of Contents Executive Summary ...................................................................................................... 2 Table of Contents .......................................................................................................... 3 1. Introduction ............................................................................................................. 4 2. Overview of DVB-H System .................................................................................... 5 3. Broadcasting of 3D Video in Different Formats ....................................................... 6 3.1. 3D Video Characteristics ........................................................................................ 6 3.1.1.

Characteristics of two simulcast video ............................................................. 6

3.1.2.

Characteristics of joint coded stereo video ....................................................... 7

3.1.3.

Characteristics of 2D video+depth data ........................................................... 7

3.2. 3D Video Coder ...................................................................................................... 9 3.3. 3D Video Streamer/Client ....................................................................................... 9 3.4. 3D Video Decoder .................................................................................................. 10 4. DVB-H Transmission Channel ................................................................................ 11 4.1. Recommended Transmission Modes ..................................................................... 11 4.2. DVB-H Broadcasting System for Experiments and Demonstrations ....................... 13 5. DVB-H Channel Simulation .................................................................................... 15 5.1. Physical Channel simulator .................................................................................... 15 5.2. Physical layer experiments ..................................................................................... 16 6. Simulation of 3D Coded Video Transmission over DVB-H ..................................... 19 6.1. Simulation Environment .......................................................................................... 19 6.2. Experimental Results .............................................................................................. 20 7. Conclusion .............................................................................................................. 36 8. References ............................................................................................................. 37

3

MOBILE3DTV

D3.2

1. Introduction Currently, there are a few candidate formats for delivery and storage of 3D video. The first alternative is to transmit a pair of stereoscopic views captured by stereoscopic cameras. These views can be either coded separately, also known as simulcast [5], or they can be coded together using a multi-view codec such as the MVC extension of H.264 [7]. Multi-view codecs achieve higher coding efficiency than simulcast by exploiting the inter-view correlation in compression. Alternatively, the 3D video can be represented as a conventional monoscopic video sequence that is augmented with depth information. In this case, the receiver has to take care of generating the views appropriate for the display device. There are three main ways to encode these representations as shown in Figure 1. In all these case, there is a different dependency between streams compared to the monoscopic video.

Figure 1 Different ways to encode 3d data On the other hand, wireless networks are often error prone due to factors such as multipath fading and interferences. In addition, the channel conditions of these networks are often nonstationary, such that the available bandwidth and channel error rates are changing over time with large variations. In order to maintain satisfactory Quality of Service (QoS), a number of technologies have been proposed targeting different layers of the networks. Concentrating more on DVB-H, it uses Forward Error Correction (FEC) for error protection and comes with an optional FEC tool at the link-layer. This FEC uses Reed-Solomon (RS) FEC codes encapsulated into Multi-protocol encapsulated sections (MPE-FEC). The MPE-FEC was also introduced to provide additional robustness required for hand-held mobile terminals. MPE-FEC improves the carrier-to-noise (C/N) and Doppler performance in the DVB-H channel while also providing improved tolerance of impulse interference. However, MPE-FEC might fail in the presence of very erroneous conditions. Using a-priori knowledge of the transmitted media and tuning the way MPE-FEC is applied across the media, datagrams can provide better robustness. Since 3D data has a different layered structure and different dependencies between layers, this can be intelligently addressed to provide better robustness. In the following sections, we first give a brief overview of DVB-H system. Then we summarize different 3D representations that can be used for DVB-H transmission, the dependencies in these representations and the effect of error in the overall quality. In sections 3 and 4, we overview the DVB-H transmission channel and the simulation results. Finally, we present the preliminary results that show the effect of MPE-FEC on 3D coded video. We conclude the report by introducing ideas for more robust transmission over DVB-H. 4

MOBILE3DTV

D3.2

2. Overview of DVB-H System

A conceptual description of using a DVB-H system is shown in Figure 2. In the transmitter, the IP input streams coming from different sources as single elementary streams are multiplexed according to the time slicing method. The MPE-FEC error protection is calculated and added separately for each elementary stream. Afterwards encapsulation of IP packets and embedding into the transport stream follow. The next block is the DVB-T modulator. In addition to the 2K and 8K modes DVB-T provides, DVB-H also uses an intermediate 4K mode with a 4096-point Fast Fourier Transform (FFT) in the OFDM modulation. The objective of the 4K mode is to improve the network planning flexibility by trading off mobility and SFN size. To further improve robustness of the DVB-T 2K and 4K modes in a mobile environment and impulse noise reception conditions, an in-depth symbol interleaver is also standardized.

Figure 2 A conceptual description of using a DVB-H system [5]. The objective of time slicing is to reduce the average power consumption of the terminal and enable smooth and seamless service handover. Time slicing consists of sending data in bursts using a significantly higher instantaneous bitrate compared to the bitrate required if the data were transmitted using traditional streaming mechanisms. To indicate to the receiver when to expect the next burst, the time (delta-t) to the beginning of the next burst is indicated within the burst currently being received. Between the bursts, data of the elementary stream is not transmitted, allowing other elementary streams to share the capacity otherwise allocated. Time slicing enables a receiver to stay active for only a fraction of the time, i.e. when receiving bursts of a requested service. Note that the transmitter is constantly on (i.e. the transmission of the transport stream is never interrupted). Time slicing also supports the possibility to use the receiver to 5

MOBILE3DTV

D3.2

monitor neighbouring cells during the off-times (between bursts). By accomplishing the switching of the reception from one transport stream to another during an off period it is possible to accomplish a quasi-optimum handover decision as well as seamless service handover. The objective of the MPE-FEC is to improve the C/N and Doppler performance in mobile channels and to improve the tolerance to impulse interference. This is accomplished through the introduction of an additional level of error correction at the MPE layer. By adding parity information calculated from the datagrams and sending this parity data in separate MPE-FEC sections (Figure 1Figure 3), error-free datagrams can be output (after MPE-FEC decoding) even under bad reception conditions. With MPE-FEC, a flexible amount of the transmission capacity is allocated to parity overhead. For a given set of transmission parameters providing 25 % of parity overhead, the receiver with MPE-FEC may require about the same C/N as a receiver with antenna diversity and without MPE-FEC. The MPE-FEC overhead can be fully compensated by choosing a slightly weaker transmission code rate, while still providing far better performance than DVB-T (without MPE-FEC) for the same throughput.

Figure 3 The structure of MPE-FEC frame [11]. 3. Broadcasting of 3D Video in Different Formats 3.1. 3D Video Characteristics In this section we briefly describe the characteristics of the different 3D representations to show their effects on decoder complexity and transmission errors. 3.1.1. Characteristics of two simulcast video Simulcast coding is the process of coding each view independent of each other. It only utilizes the correlation between adjacent frames in a single stream. However the correlation between adjacent streams (inter-view correlation) is not utilized. This type of coding is backward compatible, i.e. it is possible to decode both streams using conventional 2-D video decoders. In 6

MOBILE3DTV

D3.2

addition, extra decoder complexity is eliminated. The effect of error on the transmitted streams is similar to 2D video streaming, however, the overall quality is related to how the human visual system responds to the artefacts in either views.

3.1.2. Characteristics of joint coded stereo video Stereo video is a special case of multiview video with the number of views equals to two. In stereo video coding both the temporal and inter-view prediction is utilized. One of the views (usually left view) is coded without reference to the right view using a conventional 2D video codec. This guarantees backward compatibility. The right view is coded using both inter-view prediction (with reference to the left view) and temporal prediction. Joint coding of stereo video utilizes the bandwidth more efficiently with a slight decrease in the Peak Signal-to-Noise Ratio (PSNR). Table 1 shows the PSNR (dB) and bitrate values for the original, simulcast and joint coded stereoscopic sequence “Botanic” [3]. Algorithm Original Simulcast Joint

Normalized bitrate N/A 2 1.79

PSNR overall INF 36.09 35.96

PSNR left INF 36.05 36.05

PSNR right INF 36.13 35.88

Table 1 Quality and bitrate differences for simulcast and joint coded stereo video. However, since the right view is predicted from the left view, any error on the left view will affect the quality of the right view directly, thus decreasing the overall quality. 3.1.3. Characteristics of 2D video+depth data 3D video can also be represented by 2D video signal and a per sample depth map. Per sample depth map determines the position of the associated colour data in the 3D space. Using a single 2D video signal and depth map, a stereo view can be rendered at the receiver. The coding of such a representation can be achieved by conventional 2D video encoders since the depth map can be assumed as a monochromatic video signal. In this case, similar to simulcast video, 2D video and depth video can be coded as two independent video streams. The reconstructed quality of the stereo video in the receiver side is heavily dependent on the accuracy of the depth map, so any error introduced on the depth map during transmission will have a severe effect on the reconstruction quality. In Figure 4, an example is given to show the importance of the depth map.

7

MOBILE3DTV

D3.2

(a)

(b)

(c)

(d)

(e)

Figure 4 (a) The 2D video sequence (left view) and associated depth map, (b) A zoomed segment of rendered right view using the original depth map, (c) A zoomed segment of depth map where some blocks are lost during transmission, (d) A zoomed segment of rendered right view using the depth map in (c). Since some depth values are missing, these pixels are filled by the values in the left view. (e)) Difference between (b) and (d).

8

MOBILE3DTV

D3.2

3.2. 3D Video Coder As stated before, there are a few candidate formats for delivery and storage of 3D video: Simulcast coding, joint coding (MVC coding), coding as Video-plus-depth (2D+depth). Hence, there are three main ways to encode these representations. Our system supports all of them. a) Encoding both views independently using H.264/AVC. Our system uses H.264/AVC reference encoder version 10.1 [8]. b) Video-plus-depth (2D+depth) (VPD) representation is coded by independent coding of video and depth signal by H.264/AVC with small information about depth data embedded into the high-level syntax. MPEG specified a corresponding container format “ISO/IEC 23002-3 Auxiliary Video Data Representations”, also known as MPEG-C Part 3 [9]. Similarly same encoder is used with extra syntax. c) Exploiting temporal and inter-view redundancy by interleaving camera views and coding using a hierarchical manner with some Multi-View Video Coding (MVC) specific tools like illumination and color compensation, improved disparity estimation and coding and some high-level syntax changes. MVC is an amendment (Amendment 4) to H.264/AVC [10]. MVC encoder used in our system is JMVM 3.0.2 0. Main prediction structure is quite complex introducing a lot of dependencies between images and views. This is valid even if the number of views to be coded is 2 (stereoscopic video). These dependencies make use of the redundancies present in both spatial and temporal directions to reduce the bitrate, however they also impose many restrictions in decoding and packet loss sensitivity. An alternative simplified structure is presented in [13], and shown to be very close to the main prediction structure in terms of overall coding efficiency. In this simplified prediction structure which is used in this study, the temporal prediction using hierarchical B-pictures remains unchanged when compared to original MVC prediction structure, but spatial references are only limited to anchor frames, such that spatial references are only allowed at the beginning of a group of pictures (GOP) between I and P pictures. This simplified version is shown in Figure 5 when the number of views = 2.

Figure 5 Simplified MVC coding scheme in case of two views.

3.3. 3D Video Streamer/Client

9

MOBILE3DTV

D3.2

In the current streaming structure, video coder produces coded frames which are sent to the Network Abstraction Layer (NAL). A NAL Unit (NALU) consists of one byte header followed by a bit-stream that contains macroblocks of a frame or a slice. NALU type specifies which type of data structure is contained in NALU. NALU type for MVC data is different than monoscopic video. The NALUs that are produced are then sent to the RTP packetizer (Figure 6). The streaming of H.264/AVC streams over RTP is standardized by the IETF in RFC 3984 [15], which defines the RTP header usage and necessary packetization rules for H.264/AVC. RFC 3984 is perfectly applicable for the streaming of MVC bit streams as a whole, because they are H.264/AVC compatible. Because of this, for simulcast coded representations (Two independent video streams or 2D+depth data), [15] is used. For MVC data, stream packetization is applied in a similar fashion to [15] with some modifications as proposed in [3] and [16]. Instead of sending the MVC stream as a single H.264/AVC stream over RTP, we break up the stream into several parts. NALUs belonging to different views are streamed over different UDP port pairs as if they were separate H.264/AVC streams. For synchronization, the time stamps corresponding to the left and right frames at the same time should be the same. Contrary to simulcast streaming, only the stream for the independently encoded view will be decodable by itself in the case of MVC streaming. The final step of the streamer is the UDP/IP packetization. IP datagrams corresponding to right and left views are encapsulated in the link layer and put into different elementary streams with different particular program identifiers. RTPheader | NALU11| RTPheader|NALU12|

……………………………

|RTPheader|NALUn1|RTPheader|NALUn2|RTPheader|NALUn3

RTP packets corresponding to NALU1 header

coded data

NALU1

header

coded data







NALU2

header

coded data

NALUn-1

header

coded data

NALUn

Figure 6 RTP packetization 3.4. 3D Video Decoder We are currently using FFMPEG library [14] for real-time decoding of H.264/AVC streams. In order to decode MVC streams, we modified FFMPEG library with the appropriate syntax changes and interleaving of two views. First of all, DPB (Decoded Picture Buffer) size is increased since MVC prediction requires more pictures to be held in the buffer. There are some modifications in SPS (Sequence Parameter Set) to signal for the prediction structure and to signal the MVC encoding and modifications in NALU header to signal for the view ID and related other information. Since prediction of each frame and memory management depends on view ID of the frame, view ID tag is added to each frame in DPB and related functions such as prediction list generation, buffer management are modified accordingly. Support for new picture reordering commands is also implemented to modify the list more efficiently for inter-view prediction. Since MVC required the base-view to be standard compatible, view ID and related other information cannot be signalled in the NALU 10

MOBILE3DTV

D3.2

header of the base-view NALU. It is decided to have another NALU following each NALU of base-view with the required information which is called suffix NALU. Also for compatibility reasons, B-pictures that are marked as non-reference for the base-view can be used for inter-view prediction and needs to be stored in DPB. Finally illumination compensation is added into the motion/disparity compensation module. In order to cope with losses, decoder examines picture order count (POC) of each frame and identifies missing frames. For 2D+depth decoding, again, FFMPEG library is used for real-time decoding of H.264/AVC streams. Once decoded, the stereo view is generated using the depth-based-image renderer [19].

4. DVB-H Transmission Channel 4.1. Recommended Transmission Modes The physical layer operation of DVB-H is mostly inherited from DVB-T although DVB-H adds a few new features that cannot be used in DVB-T broadcasts. Both of these technologies are specified in ETSI EN 300 744 standard [20]. The new DVB-H specific transmission modes, 4K mode, 5MHz bandwidth and in-depth interleaver are defined in annexes F and G of the standard. The transmitter operations for creating the DVB-H baseband signal from MPEG Transport Stream (TS) are briefly explained below: 1. 2. 3. 4.

5.

6.

7. 8. 9.

Transport stream is randomized and adapted for energy dispersal Outer coder. A shortened Reed-Solomon RS (204,188) code that allows correction of up to 8 erroneous bytes at the receiver is applied to each randomized TS packet Outer (bytewise) interleaver Inner, convolutional coder with code rate of 1/2. The code can be punctured to code rates 2/3, 3/4, 5/6, and 7/8, which provide higher bit rate for application data with the tradeoff of having lower error correction capability. Inner bitwise interleaver and symbol interleaver. The DVB-H specific in-depth interleaving mode can be used instead of the 'native' DVB-T interleaver to gain some additional protection against impulsive interference. The output of symbol interleaver is mapped to the carriers of the OFDM frame according to the constellation in use. If hierarchical modulation is used, it is possible to multiplex two separate transport streams in the high and low priority parts of the broadcast. Pilot and TPS carriers that aid the receiver devices are inserted OFDM Guard interval insertion

There are two coders at the physical layer adding redundant error correction information to the transmission. The MPE-FEC mechanism at the DVB-H link-layer acts as a third, additional layer of protection. The constellation, convolutional code rate and MPE-FEC code rate are the parameters having most significant impact on the channel C/N performance. Simpler constellations (QPSK) are more robust against channel errors than the more complex ones (6411

MOBILE3DTV

D3.2

QAM), stronger FEC codes naturally improve the reception performance under difficult conditions. The Doppler performance is affected also by the OFDM mode. [22] There are altogether five tunable physical layer parameters: constellation (QPSK, 16-QAM, 64QAM), OFDM mode (2K, 4K, 8K), code rate of the convolutional coder (1/2, 2/3, 3/4, 5/6, 7/8), inner interleaver mode (native, in-depth), bandwidth (5-8MHz), and guard interval (1/4, 1/8, 1/16, 1/32). Hierarchical transmission mode adds even more alternative ways to construct the broadcast. The large amount of different modulation parameters and their combinations gives flexibility to network planning but makes it more difficult to find the optimal transmission mode. Wing-TV project [23] studied the physical, link and application layer performances of different DVB-H transmission modes, as well as developed new physical channel models and more efficient methods for MPE-FEC decoding. The project used channel simulations and field experiments for comparison of C/N and Doppler performance of the different constellations and various combinations of MPE-FEC and convolutional code rates. In the study, the following transmission modes, which provided the best performance for the corresponding usable bit rate, were recommended:

Constellation

Convolutional code rate

MPE-FEC code rate

IP bit rate (Mbps)

QPSK

1/2

3/4

3.73

QPSK

1/2

5/6

4.15

QPSK

2/3

5/6

5.53

16-QAM

1/2

3/4

7.46

16-QAM

1/2

5/6

8.29

16-QAM

2/3

5/6

11.06

Table 2 Recommended DVB-H transmission modes

If high-speed reception (Doppler frequency 80Hz) is desired, only the QPSK modes and 16QAM with convolutional code rate ½ provide adequate performance. Additionally, the study showed that, at low Doppler frequencies, also the 64-QAM mode with code rate (2/3, 5/6) could provide acceptable quality. As a rule of thumb, they suggest that when choosing the code rates for the convolutional coder and MPE-FEC, the best overall performance can be achieved by spending most of the redundant data in the convolutional coder and using relatively weaker RS code for the MPE-FEC. [21][23] The bit rate requirements within the Mobile3DTV project are moderate as transmission of more than a few programs won't be needed. Therefore, one of the more robust QPSK or 16-QAM modes can be selected for use in future experiments and demonstrations. The optimal MPE-FEC code rate as well as various UEP schemes for 3D video transmission will be studied later on in the project.

12

MOBILE3DTV

D3.2

4.2. DVB-H Broadcasting System for Experiments and Demonstrations The project members have various DVB-H broadcasting equipment and software tools for performing field and laboratory experiments as well as for building demonstrations of end-to-end transmission systems. There is a full featured transmitter setup at TUT that can be used for field tests [Figure 7]. It consists of proprietary broadcasting equipment. It is equipped with 50W power amplifiers and rooftop antennas, and it is capable of delivering DVB-H broadcasts within the town district of Hervanta. The DVB-H playout server (Cardinal) is used for creating timesliced transport streams of A/V content in MPEG-4 files or live IP streams. File delivery over FLUTE data carousels is also supported. The playout server can create all the PSI/SI tables as well as Electronic Service Guide (ESG) files needed for a complete DVB-H service. The transmitted streams can be recorded for future use, which provides means for using the playout server for creating TS files for simulations. Transmitting complete, pre-generated or recorded transport streams is also possible. RF

MPEG-2 Transport Stream Source content (.mp4)

RTP Streamer

IP Network

Playout server Cardinal Compact for DVB-H

Exciter Rohde & Schwarz

Power amplifier Rohde & Schwarz

Figure 7 DVB-H Transmitter setup

The DVB-H exciter takes the transport stream as its input and creates an OFDM modulated DVB signal to be broadcasted by the power amplifier and rooftop antenna. Optionally, it is possible to utilize also a diversity unit that prepares the RF signal for two-antenna transmitter diversity broadcasting. For laboratory experiments and demonstrations, TUT, METU, and MMS have all obtained a PC based transmitter setup that uses a DVB-H modulator card (Dektec DTA-115/DTA-110) for creating the DVB signal [Figure 8]. A set of open-source software tools (FATCAPS, JustDVBIT) is used for creating the transport stream. Currently, this does not offer as full-featured solution as the Cardinal playout server because the ESG data cannot be automatically created. On the other hand, the open-source tools offer more flexibility in experimenting with new source/channel coding or content delivery schemes. In the link layer, the IP datagrams are first read column-wise into the application data table part of the MPE-FEC frame, which consists of a total of 191 columns. The application data table is encoded row-wise using a systematic Reed-Solomon (RS) code; the resulting 64 correction bytes per row are then added to the RS data table part of the MPE-FEC frame. The MPE-FEC code rate (cr) can be adjusted by either zero-padding the application data table, or puncturing the RS data table. After the MPE-FEC frame is constructed, the application data table IP datagrams are encapsulated into MPE sections. In a similar fashion, the RS data table columns are encapsulated into MPE-FEC sections. A stream of MPE and MPE-FEC sections are then put into an 13

MOBILE3DTV

D3.2

Elementary Stream (ES), i.e., a stream of MPEG-2 TS packets with a Particular Program Identifier (PID). The ESs are time sliced for transmission. The concept of time slicing is to send ESs in bursts using significantly higher bitrate compared to the bitrate required if the data was transmitted using conventional bandwidth management. Within a burst, the time before the start of the next burst (delta-t) is indicated. Between the bursts, data of the ES is not transmitted, allowing other ESs to use the bandwidth otherwise allocated. This enables a receiver to stay active only for a fragment of the time, while receiving bursts of a requested service, which enables considerable power savings. In order to realize DVB-H link layer, we used FATCAPS: A Free, Linux-Based Open-Source DVB-H IP-Encapsulator [25]. The FATCAPS implementation builds on JustDVB-IT [26], which is another open source project allowing for establishing a low-cost, highly configurable environment for DVB-T playout as well as possessing a set of tools to create, transform and multiplex MPEG transport streams, and adds the DVB-H specific features. The FATCAPS software reads IP packets, encapsulates in MPE sections and then in MPEG-2 transport stream packets. Finally, the timeslicer component of the software collects the different DVB-H transport streams and multiplexes them using a time division mechanism, so that only one DVB-H stream is seen on the channel at any given time. MPE-FEC sections can be optionally generated and inserted in the stream at this stage in order to provide additional error protection. The final output can be forwarded to a hardware modulator to perform real broadcasting. In our implementation, we have also modified the FATCAPS software for offline simulation purposes. Current implementation of the software captures IP datagrams from a streamer and encapsulates for transmission in real-time. In our modified implementation, stored IP datagrams can be fed to the software with a given rate and output TS packets can be stored offline. In this way, TS packet losses simulations can be performed.

RF

Source content (.mp4)

RTP Streamer

IP Network

IP encapsulator software

Modulator card

FATCAPS

Dektec

MPEG-2 Transport Stream

Figure 8 DVB-H Transmitter setup

The DVB-H terminal devices available on consumer market can be used for viewing the broadcast as long as the TS contains the full ESG data and the broadcasted audio and video streams follow the profile and level restrictions imposed by the terminal device. At TUT, we have Nokia models N92 and N77 which could be used as legacy clients that receive the backward compatible part of the 3D video transmission. In practice this is difficult since the devices do not support video resolutions in the range that is considered for the 3D video.

14

MOBILE3DTV

D3.2

For research and development work, a PC based receiver offers greater flexibility. For this, we use Linux computers equipped with DVB-H/T frontends. Linux operating system has built-in support for decapsulating the IP data in DVB-H transport streams (dvbnet). Alternatively, the streams can be decoded using the Decaps software [27] developed within this project. Compared to dvbnet, it adds several features such as MPE-FEC error correction and collection of error statistics. The IP decapsulator performs the reverse operations of IP encapsulator. It extracts MPE and MPE-FEC sections from the TS. Errors or erasures in the TS can be reliably detected by the CRC-32 code included in each section. The data and RS tables of an MPE-FEC frame are filled with the correctly received sections, and RS decoding is performed to recover the lost data. Integrity of the recovered data finally verified using the UDP checksum. After the error correction, the IP datagrams are fed to the stereo video client, either over a network connection or by saving them to a file. In our system, we implemented the IP decapsulator according to the specifications given in [28][29]. Video players with RTP support can be used to view the decapsulated broadcasts. The software components of the Linux receiver setup are illustrated in Figure 9. RF

DVB-T/H receiver

dvbnet

Linux DVB API

kernel decapsulation

IP Network

TS capture

RTP client

IP Decapsulator MPEG-2 Transport Stream

Packet data

Figure 9 Linux based DVB-H receiver

5. DVB-H Channel Simulation 5.1. Physical Channel simulator For simulating the physical transmission channel, we have a MATLAB/Simulink tool that models the DVB-T/H modulation and demodulation processes and the physical transmission channel as illustrated in Figure 10. The channel is modeled as multipath Rayleigh fading channel with additive white gaussian noise. Various commonly used channel models have been predefined in the simulator (Table 3). This tool can be used to collect reception statistics such as Bit Error Rate (BER), TS-Packet Error Rate (TS-PER), Average Burst Error Length (ABEL) and Variance in Burst Error Length (VBEL) as well as to record packet error traces for video transmission simulations.

15

MOBILE3DTV

D3.2

Digital Video Broadcasting

0.00112 2

8MHz, 2/8k Mode, M-QAM, guard interval , Nonhierarchical Transmission

Display

Tf :[0.00112 0 ]

[10240x1]

0 Probe

[188x1] [188x1]

Random Integer

[204x1]

RS Encoder

Convolutional Interleaver I=12

[204x1] [204x1]

Punctured Convolutional Code

[2448x1]

DVB Inner Interleaver

[24192x1]

DVB M-QAM Mapper

[6048x1]

TPS & Pilot Insertion

[8192x1]

OFDM

[8192x1]

[10240x1]

Guard interval Insertion

[10240x1]

(204 ,188 ) Shortened RS Encoder

Tx Rx

[188x1] [188x1] [188x1] received _ts _stream [188x1]

0 Tx BER

3

0

Rx

5.897 e+006

packet error trace

BER Total Errors Total Bits

0.0003238 [204x1] [204x1]

Tx BER

3

Rx

2078 6.417 e+006

Channel Model for Dynamic multipath fading

BER Total Errors Total Bits

[10240x1]

Tx [188x1] Error Rate received _ts _stream Calculation Rx [188x1]

received _ts _stream [188x1]

0 3

RS Decoder

SER

0

Total Errors

[204x1]

7.371 e+005

Total Bytes

[204x1]

[204x1]

Convolutional Deinterleaver I=12

[204x1] [204x1]

Tx Error Rate Calculation Rx

Viterbi Decoder

0.0007991 3

641 8.021 e+005

[2448x1]

[10240x1]

SER Total Errors Total Bytes

DVB Inner [24192x1] Deinterleaver

Delayed Scatter Plot

DVB Demapper

(204 ,188 ) Shortened RS Decoder

B-FFT

[6048x1]

[6048x1] [6048x1]

Spectrum Scope

TPS & Pilot Removal

Demapper

[6817x1]

Pilot Processing

[8192x1]

OFDM Receiver

[8192x1]

Guard interval Removal

[10240x1]

TPS & Pilot removal 1

Figure 10 Simulink model of DVB transmission

COST 207 o Typical Urban (6 and 12 taps) o Bad Urban (6 and 12 taps) o Rural Area (4 and 6 taps) o Hilly Terrain (6 and 12 taps) JTC o Indoor Commercial (7 taps) o Outdoor Residential (10 taps) Wing-TV o Pedestrian Indoor o Pedestrian Outdoor o Vehicular Urban o Motorway Table 3 Physical channel models

5.2. Physical layer experiments The Simulink model was used to simulate the DVB-H transmission channel behaviour with two physical channel models. A mobile use case is modelled with Typical Urban channel having maximum Doppler frequency of 30Hz. Residential Outdoor channel with maximum Doppler of 1Hz represents situation closer to conventional reception of terrestrial using rooftop antenna. The 16

MOBILE3DTV

D3.2

aim of this study was to compare the channel performances and to see whether the mobility of receiver devices introduces characteristically different errors also at the application layer. Both channel models were tested using the same DVB-H transmission mode: Constellation 16-QAM Convolutional code rate 2/3 OFDM mode 8K Bandwidth 8MHz Interleaver native The channel was simulated using SNR levels ranging from 11dB to 24 dB. The simulation results are illustrated as bit error rates and packet error rates in Figure 11. At low SNR levels both channel models show similar performance. At high SNR levels, Residential Outdoor channel has lower error rates while Typical Urban continues to suffer of errors caused by the Doppler shift. 0

0

10

10

tu6 residential

tu6 residential

-1

10

-1

10 -2

TS-PER

BER

10

-3

10

-2

10

-4

10

-3

10 -5

10

-6

10

10

-4

12

14

16 18 SNR (dB)

20

22

24

10

10

12

14

16 18 SNR (dB)

20

22

Figure 11 Bit error rate and transport stream packet error rate for both channels Besides error rates, the temporal distribution of errors plays a crucial role in real performance. Because MPE and MPE-FEC sections are transmitted in adjacent TS packets, dispersed errors may cause more sections to have a few errors, whereas dense error bursts would result in few sections being completely lost. If erroneous sections are marked wholly lost in MPE-FEC decoding as proposed in the standard (known as packet erasure method), the end result is that more dispersed errors cause more severe packet loss at the application layer. Figure 12 shows that although both channel models have similar average length of error bursts the variance of burst length is much higher in the Residential Outdoor channel. This is caused by relatively few very long error bursts.

17

24

MOBILE3DTV

D3.2

7

30

10

tu6 residential 6

25

10

Variance of error burst length

Average error burst length (TS packets)

tu6 residential

20

15

10

4

10

3

10

2

5

0 15

5

10

10

1

16

17

18

19 20 SNR (dB)

21

22

23

24

10

10

12

14

16 18 SNR (dB)

20

22

24

Figure 12 Transport stream burst error length statistics The application layer performance was approximated by mapping the transport stream error trace to an IP packet error trace, under the assumption that all IP packets are of the same size which is also the number of rows in the MPE-FEC frame. These assumptions simplify simulation of the FEC decoding because both IP packets and RS data sections map directly to columns of the MPE-FEC frame and the so-called column erasure decoding method can be used. MPE-FEC code rate 3/4 was used in the simulation (191 application data columns, 64 RS columns). The IP packet error rates before and after FEC decoding are shown for both of the channel models in Figure 13. In Typical Urban channel, a clear improvement of PER at SNR values from 16 to 22 dB can be seen, whereas with the Residential Outdoor channel the impact is negligible. These results show that MPE-FEC is quite effective against the dispersed errors caused by Doppler shift. The MPE-FEC frame size also affects the error correction performance as it is directly related to the interleaving depth. This is, however, not studied within this report.

18

MOBILE3DTV

D3.2

Typical Urban (6 taps)

Residential Outdoor (10 taps)

1

1 Before MPE-FEC After MPE-FEC

0.9

0.8

0.8

0.7

0.7

0.6

0.6

IP-PER

IP-PER

0.9

0.5

0.5

0.4

0.4

0.3

0.3

0.2

0.2

0.1

0.1

0 10

15

20

25

0 10

15

SNR

20

25

SNR

Figure 13 Application layer performance with and without MPE-FEC

6. Simulation of 3D Coded Video Transmission over DVB-H 6.1. Simulation Environment The transmission simulation environment is based on the same open-source tools, Fatcaps and Decaps, as the Linux based transmitter and receiver toolset. When a measured or simulated packet error pattern is applied to the transport stream, the system forms an end-to-end simulation of the DVB-H transmission chain. The simulator can be used for both online and offline simulations. In the online case, live IP streams are encapsulated in transport streams and sent back to IP network by decapsulator. In offline simulations the different phases can be separated from each other and it is not necessary to use live IP streams as input and output. For instance, one can first generate a transport stream that is then passed through various different channel conditions. The building blocks of our system can be seen in Figure 14. The input videos (Stereo video content with right and left view or 2D video+depth) are first compressed with a 3D encoder (2D video encoder for simulcast coding or stereo encoder). Resulting NALUs are fed to the 3D video streamer. The streamer encapsulates the NALUs into Real Time Transport Protocol (RTP), User Datagram Protocol (UDP) and finally Internet Protocol (IP) datagrams. The resulting IP datagrams are encapsulated in the DVB-H link layer where the Multi Protocol Encapsulation Forward Error Correction (MPE-FEC) and time slicing occurs. The link layer output MPEG-2 Transport Stream (TS) packets are passed to physical layer where the transmission signal is generated with a DVB-T modulator. After the transmission over a wireless channel, the receiver receives distorted signal and possibly erroneous TS packets are generated by the DVB-T demodulator. The errors are tried to be corrected in the link layer by the MPE-FEC functionality 19

MOBILE3DTV

D3.2

and the TS packets are decapsulated into IP datagrams. IP datagrams are handled in the 3D video streamer client and resulting NAL units are decoded with the 3D video decoder (2D video decoder, 2D video decoder+depth based image renderer or stereo video decoder) to generate right and left views. Finally, these views are put into an appropriate format to be displayed as 3D in the displayer.

Stereo Video / 2D video+ depth data

3D Video Encoder

NAL Units

Stream 1 (left view) RTP, UDP, IP

3D Video Streamer

Stream 2 (independent right view/ dependent right view/depth data)

DVB-H IP Encapsulator

TS Packets

Wireless Channel

RTP, UDP, IP

Right view

Displayer

Left view

3D Video Decoder

NAL Units

Stereo Video IP Streamer Client

DVB-T Modulator

DVB-H IP Decapsulator

TS Packets

DVB-T Demodulator

Figure 14 The system overview. The display unit of the clients is an autostereoscopic display which allows a 3D vision without any need for eye glasses or special equipment. In an autostereoscopic display, the right and the left views go into an interdigitation process in which an appropriate picture is generated from the pixels of left and right views. In the displayer of our system, as a pair of left and right frames is decoded, the interdigitation process takes places and the generated image is displayed on the autostereoscopic display. An important feature in our system is backward compatibility, i.e. existing mobile users without 3D features are able to receive stereo video broadcast and play the monoscopic video (either right view or left view). The backward compatibility is accomplished in the link layer where the IP datagrams corresponding to right and left views are encapsulated in the link layer and put into different ESs with different PIDs. In this way, the data for left and right view are transmitted in different bursts so that a non-stereo compliant user can play monoscopic video by only receiving corresponding ES. In this approach, the IP decapsulator is modified so that the receiver knows that it needs to filter two PIDs.

6.2. Experimental Results To validate the performance of the simulation setup, we performed several transmission experiments where (a) an MVC coded test sequence is transmitted over a simulated DVB-H channel, (b) simulcast coded test sequences are transmitted over a simulated DVB-H channel, (c) 2D + depth data coded independently are transmitted over a simulated DVB-H channel. We simulated three transmission modes: 1) QPSK constellation, convolutional code rate of 1/2 and MPE-FEC rate of 3/4 over Typical Urban (TU) channel with 6 taps, 2) 16-QAM constellation, convolutional code rate of 2/3 and MPE-FEC rate of 3/4 over TU channel with 6 taps and 3) 16QAM constellation, convolutional code rate of 2/3 and MPE-FEC rate of 3/4 over residential 20

MOBILE3DTV

D3.2

outdoor channel. The test sequence was encapsulated in a transport stream by putting each view or 2D +depth data in its own MPE stream. The erroneous channel conditions were simulated on the physical layer using the previously mentioned Simulink model of the transmission system. Different channel conditions were tested by varying channel SNR values and the simulation results were saved as transport stream level packet error traces [17]. The error traces were then mapped on to the test transport stream in order to introduce errors. We evaluated the impact of MPE-FEC by comparing two scenarios when extracting the H.264/MVC video (or simulcast video) from the transport stream: 1) Client receiving the FEC codes within the MPE-FEC frame, 2) Client receiving only the data part of the MPE-FEC frame where power consumption is reduced. Finally, the quality of received and decoded video was measured by computing Y-PSNR of the views using original, uncompressed video as the reference. This simulation process was repeated 50 times for each channel SNR condition in order to obtain figures corresponding to the average behaviour of the channel. In all the figures, PSNR values are all in dB and calculated according to the following formulas where Dl and Dr represent the distortions in left and right views respectively. PSNR jo int

10 log 10

255 2 , PSNRleft ( Dl Dr ) / 2

10 log 10

255 2 , PSNR right Dl

10 log 10

255 2 Dr

Eq. 1

For simulations, the same sequences (Hands and Horse) that are prepared by HHI are used [ref]. For 2D+depth coding, the 2D video is assumed to be the left view and both the original depth map and the received depth map are used to render the right view. The PSNRs are compted according to Eq. 1. For joint stereo and multicast coding, the sequences are coded in such a way that, no loss PSNRs are almost the same. Thus bitrate of joint coding is slightly less than that of simulcast coding. The summary of 3D sequences and coding parameters are shown in Table 4.Table 6 Producer

KUK

Sequences

Hands

Horse

Length [frames]

129

129

Framerate [frames/second]

30

Resolution [pixel]

480 x 272

Coding Type

Simulcast

Stereo

2D+Depth

Data Format

VL + VR

VL + VR

VL + VD

Table 4 3D video data set HANDS 21

MOBILE3DTV

D3.2

VL

VR

VD

Coding Type

Rate [kbps]

PSNR [dB]

Rate [kbps]

PSNR [dB]

Rate [kbps]

PSNR [dB]

Simulcast

727.0203

32.893

624.9873

33.653

N/A

N/A

Stereo

727.0185

32.893

600.7522

33.682

N/A

N/A

2D+Depth

727.0203

32.893

N/A

30.534

142.4655

37.613

Table 5 Hands sequence coding parameters HORSE VL

VR

VD

Coding Type

Rate [kbps]

PSNR [dB]

Rate [kbps]

PSNR [dB]

Rate [kbps]

PSNR [dB]

Simulcast

353.9844

32.499

359.7293

32.399

N/A

N/A

Stereo

353.9826

32.499

198.0669

32.295

N/A

N/A

2D+Depth

353.9844

32.499

N/A

32.745

21.31177

44.331

Table 6 Horse sequence coding parameters

Simulation set 1 QPSK constellation, convolutional code rate of 1/2 and MPE-FEC rate of 3/4 over TU channel with 6 taps

34

33 32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34

No loss FEC No FEC

31 30 29 28 27 11

12

13

14

33 32

30 29 28 11

15

Channel SNR (dB)

No loss L FEC L No FEC L No loss R FEC R No FEC R

31

12

13

Channel SNR (dB)

(a)

(b)

22

14

15

MOBILE3DTV

D3.2

Figure 15 Simulation results for the Hands sequence. The videos are jointly coded where the stereo GOP size is 16. (a) joint PSNR, (b) individual PSNRs.

34

33 32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34

No loss FEC No FEC

31 30 29 28 11

12

13

14

15

Channel SNR (dB)

33 32 No loss L FEC L No FEC L No loss R FEC R No FEC R

31 30 29 28 11

12

13

14

15

Channel SNR (dB)

(a)

(b)

Figure 16 Simulation results for the Hands sequence. The videos are simulcast coded. (a) joint PSNR, (b) individual PSNRs.

32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33

31.5 31 No loss FEC No FEC

30.5 30 29.5 29

32 31 30

No loss L FEC L No FEC L No loss R FEC R No FEC R

29 28

28.5 28 11

12

13

14

15

Channel SNR (dB)

23

27 11

12

13

Channel SNR (dB)

14

15

MOBILE3DTV

D3.2

(a)

(b)

Figure 17 Simulation results for the Hands sequence. 2D+depth coding is utilized. PSNR is computed using the right views rendered by the original sequence+original depth map and the received sequence+received depth map.

33

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33 32 No loss FEC No FEC

31 30 29 28 27 11

12

13

14

15

Channel SNR (dB)

(a)

32 31 No loss L FEC L No FEC L No loss R FEC R No FEC R

30 29 28 27 11

12

13

14

Channel SNR (dB)

(b)

Figure 18 Simulation results for the Horse sequence. The videos are jointly coded where the stereo GOP size is 16. (a) joint PSNR, (b) individual PSNRs.

24

15

MOBILE3DTV

D3.2

33

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33

32 No loss FEC No FEC

31

30

29

32.5 32 31.5 No loss L FEC L No FEC L No loss R FEC R No FEC R

31 30.5 30 29.5 29

28 11

12

13

14

15

Channel SNR (dB)

28.5 11

12

13

14

15

Channel SNR (dB)

(a)

(b)

Figure 19 Simulation results for the Horse sequence. The videos are simulcast coded. (a) joint PSNR, (b) individual PSNRs.

33

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33 32 No loss FEC No FEC

31

30

29

32 31 30

No loss L FEC L No FEC L No loss R FEC R No FEC R

29 28 27

28 11

12

13

14

15

Channel SNR (dB)

26 11

12

13

Channel SNR (dB)

(a)

(b)

25

14

15

MOBILE3DTV

D3.2

Figure 20 Simulation results for the Horse sequence. 2D+depth coding is utilized. PSNR is computed using the right views rendered by the original sequence+original depth map and the received sequence+received depth map.

Simulation set 2 16-QAM constellation, convolutional code rate of 2/3 and MPE-FEC rate of 3/4 over TU channel with 6 taps

34

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34 32 No loss FEC No FEC

30 28 26 24 22 18

19

20

21

22

23

Channel SNR (dB)

32 30

No loss L FEC L No FEC L No loss R FEC R No FEC R

28 26 24 22 18

19

20

21

22

Channel SNR (dB)

(a)

(b)

Figure 21 Simulation results for the Hands sequence. The videos are jointly coded where the stereo GOP size is 16. (a) joint PSNR, (b) individual PSNRs.

26

23

MOBILE3DTV

D3.2

34

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34

32 No loss FEC No FEC

30

28

26

33 32 No loss L FEC L No FEC L No loss R FEC R No FEC R

31 30 29 28 27 26

24 18

19

20

21

22

25 18

23

19

Channel SNR (dB)

20

21

22

23

Channel SNR (dB)

(a)

(b)

Figure 22 Simulation results for the Hands sequence. The videos are simulcast coded. (a) joint PSNR, (b) individual PSNRs.

34

31

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

32

No loss FEC No FEC

30 29 28 27 26 25 18

19

20

21

22

23

Channel SNR (dB)

32 30 No loss L FEC L No FEC L No loss R FEC R No FEC R

28 26 24 22 18

19

20

21

22

Channel SNR (dB)

(a)

(b)

Figure 23 Simulation results for the Hands sequence. 2D+depth coding is utilized. PSNR is computed using the right views rendered by the original sequence+original depth map and the received sequence+received depth map. 27

23

MOBILE3DTV

D3.2

34

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34 32 30 No loss FEC No FEC

28 26 24 22 18

19

20

21

22

32 30 28 26 24 22 18

23

No loss L FEC L No FEC L No loss R FEC R No FEC R

19

Channel SNR (dB)

20

21

22

23

Channel SNR (dB)

(a)

(b)

Figure 24 Simulation results for the Horse sequence. The videos are jointly coded where the stereo GOP size is 16. (a) joint PSNR, (b) individual PSNRs.

33

32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33

31 No loss FEC No FEC

30 29 28 27 26

31 No loss L FEC L No FEC L No loss R FEC R No FEC R

30 29 28 27 26

25 24 18

32

19

20

21

22

23

Channel SNR (dB)

25 18

19

20

21

Channel SNR (dB)

(a)

(b)

28

22

23

MOBILE3DTV

D3.2

Figure 25 Simulation results for the Horse sequence. The videos are simulcast coded. (a) joint PSNR, (b) individual PSNRs.

34

32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33

No loss FEC No FEC

31 30 29 28 27 26 25 18

19

20

21

22

23

Channel SNR (dB)

32 30 No loss L FEC L No FEC L No loss R FEC R No FEC R

28 26 24 22 18

19

20

21

22

23

Channel SNR (dB)

(a)

(b)

Figure 26 Simulation results for the Horse sequence. 2D+depth coding is utilized. PSNR is computed using the right views rendered by the original sequence+original depth map and the received sequence+received depth map.

Simulation set 3 16-QAM constellation, convolutional code rate of 2/3 and MPE-FEC rate of 3/4 over residential outdoor channel with 6 taps

29

MOBILE3DTV

D3.2

34

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34 33 No loss FEC No FEC

32 31 30 29 28 18

19

20

21

22

32

28 26 24 22 18

23

No loss L FEC L No FEC L No loss R FEC R No FEC R

30

19

Channel SNR (dB)

20

21

22

23

Channel SNR (dB)

(a)

(b)

Figure 27 Simulation results for the Hands sequence. The videos are jointly coded where the stereo GOP size is 16. (a) joint PSNR, (b) individual PSNRs.

34

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

34 33 No loss FEC No FEC

32 31 30 29

33 32 No loss L FEC L No FEC L No loss R FEC R No FEC R

31 30 29 28 27 26

28 18

19

20

21

22

23

Channel SNR (dB)

25 18

19

20

21

22

Channel SNR (dB)

(a)

(b)

Figure 28 Simulation results for the Hands sequence. The videos are simulcast coded. (a) joint PSNR, (b) individual PSNRs.

30

23

MOBILE3DTV

D3.2

32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33 31.5 No loss FEC No FEC

31 30.5 30 29.5 29 28.5 18

19

20

21

22

32 31 30 29 28 27 18

23

No loss L FEC L No FEC L No loss R FEC R No FEC R 19

Channel SNR (dB)

20

21

22

23

Channel SNR (dB)

(a)

(b)

Figure 29 Simulation results for the Hands sequence. 2D+depth coding is utilized. PSNR is computed using the right views rendered by the original sequence+original depth map and the received sequence+received depth map.

33

32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33

No loss FEC No FEC

31

30

29

28 18

19

20

21

22

23

Channel SNR (dB)

32 No loss L FEC L No FEC L No loss R FEC R No FEC R

31

30

29

28 18

19

20

21

Channel SNR (dB)

(a)

(b)

31

22

23

MOBILE3DTV

D3.2

33

33

32.5

32.5

32

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

Figure 30 Simulation results for the Horse sequence. The videos are jointly coded where the stereo GOP size is 16. (a) joint PSNR, (b) individual PSNRs.

No loss FEC No FEC

31.5 31 30.5 30 29.5

No loss L FEC L No FEC L No loss R FEC R No FEC R

31.5 31 30.5 30 29.5

29 28.5 18

32

19

20

21

22

29 18

23

19

Channel SNR (dB)

20

21

22

23

Channel SNR (dB)

(a)

(b)

Figure 31 Simulation results for the Horse sequence. The videos are simulcast coded. (a) joint PSNR, (b) individual PSNRs.

33

Stereo Video PSNR (dB)

Stereo Video PSNR (dB)

33 32.5 No loss FEC No FEC

32 31.5 31 30.5 30 29.5 18

19

20

21

22

23

Channel SNR (dB)

32 No loss L FEC L No FEC L No loss R FEC R No FEC R

31 30 29 28 18

19

20

21

Channel SNR (dB)

32

22

23

MOBILE3DTV

D3.2

(a)

(b)

Figure 32 Simulation results for the Horse sequence. 2D+depth coding is utilized. PSNR is computed using the right views rendered by the original sequence+original depth map and the received sequence+received depth map. In the figures, “No loss” corresponds to the scenario where there is no loss in the transmission, “FEC” corresponds to the scenario where MPE-FEC is decoded in the case of lossy transmission and “No FEC” corresponds to the scenario where the receiver discards the FEC bytes. The results show a clear improvement of quality when the received MPE-FEC data is used especially in low channel SNR cases. Another observation is that using MPE-FEC frame achieves almost no loss performance for high channel SNR values. In the results, it is shown that because of the dependencies between right and left views, joint coding is less robust to errors. However, using Unequal Error Protection (UEP), this robustness can be increased. In terms of complexity at the decoder side, for simulcast decoding and 2D+depth decoding conventional video decoders can be used. However for depth data additional rendering has to be done. For joint coding simplified MVC decoder has to be implemented at the decoder. In the figures, it is seen that all representations give approximately the same average PSNR. However when the bitrates are compared, simulcast coding is the highest and 2D+depth is the lowest. However for 2D+depth coding, we are not using the original right view for computing the PSNR of right view. Instead, we are using the right view which is rendered using the original left view and the depth information. In Figure 33- Figure 35, we also present the nature of the packet losses for different channel conditions. The results are given for the Horse sequence transmitted over TU 6 Channel with SNR = 18 dB and SNR = 23dB. In both cases only frames from the left view are lost. For the case of SNR=18 dB, the lost frames are shown in Table 7. For the case of SNR=23 dB, only the 32th frame (I frame) from left view stream is lost. In both cases, the right view streams are received without any losses. But as the anchor frames of right views depend on the left view, the right views are also affected from the losses as well. P O C C O R E C P O C C O R

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

33

57

53

58

51

59

54

60

50

61

55

62

52

63

56

64

49

73

69

74

67

75

70

76

1

0

0

0

1

0

0

0

1

0

0

0

1

0

0

0

1

0

1

0

1

0

1

0

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

66 1

77 1

71 0

78 0

68 0

79 1

72 0

80 0

65 0

89 1

85 0

90 0

83 0

91 1

86 0

92 0

82 0

93 1

87 0

94 1

84 0

95 1

88 0

96 1

33

MOBILE3DTV

D3.2

E C

Table 7 Loss distribution of frames for the left view for channelSNR= 18dB. POC: Picture Order Count, the order displayed on the screen, CO: Coding or Transmission Order, in our case, the order obeys the hierarchical structure, REC: 1: received, 0: lost

32

Frame PSNR (dB)

30

28

26

24

22 Left Right Aver Lost left frames

20

40

50

60

70

80

90

100

110

120

Frames

Figure 33 PSNRs for reconstruction of individual frames for the loss conditions given in Table 7

Fr 74

Left

Right

34

MOBILE3DTV

D3.2

75

76

77

78

79

Figure 34 The reconstructed frames (74-79) for the loss conditions given in Table 7

35

MOBILE3DTV

D3.2

33

32

31

Frame PSNR (dB)

30

29

28

27 Left Right Aver Lost left frames

26

25

24

23 0

10

20

30

40

50

60

Frames

Figure 35 PSNR for reconstruction of individual frames for the channelSNR=23dB.

7. Conclusion We have developed software tools for real-time transmission of stereo video over DVB-H channel and for its off-line simulation. Our system supports coding of stereo video in three formats: simulcast, MVC and video plus depth. Either of these can be properly packetized in RTP streams for further encapsulation into MPE-FEC tables, followed by creating transport streams and putting them into time-slices for channel transmission. The system also supports receiver application where the MPE-FEC packets are properly recognized within the DVB-T TS, decapsulated and fed into decoder/player application. The latter application works under Linux for three types of portable terminals. The system is fully operational and works constantly by broadcasting stereo video over DVB-H in the region of Hervanta suburb of the city of Tampere. A mobile version of the system utilizing a PC-based DVB transmitter has been demonstrated at the NEM summit, October 2008, SaintMalo, France and at the ICT Event, November 2008 Lyon, France. In addition to the real working application, we have built simulation tools to further study the impact of different channel conditions over the quality of the delivered stereo video. This includes simulator of the physical layer (playing with different channel models) and software control over the DVB-H link layer (possibility for changing FEC rates, etc.). As shown in the 36

MOBILE3DTV

D3.2

experimental results, although MPE-FEC provides the much needed data robustness for 3D video transmission in wireless channels, under very erroneous conditions it may fail. In our further studies, we will utilize the developed tools to further optimize the MPE-FEC functionality. Using a-priori knowledge of the transmitted media and tuning the way MPE-FEC is applied across the media datagrams can provide better robustness. Unequal Error Protection (UEP) is such a scheme that uses a-priori knowledge of the media to differentially protect data using FEC. In UEP, the coded data is divided into layers with different importance. High priority (HP) layers are well protected and low priority (LP) layers are less protected. In the next phase of the project we plan to apply UEP to 3D data. 8. References [1] Matusik, W.; Pfister, H., "3D TV: A Scalable System for Real-Time Acquistion, Transmission and Autostereoscopic Display of Dynamic Scenes", ACM Transactions on Graphics (TOG) SIGGRAPH, ISSN: 0730-0301, Vol. 23, Issue 3, pp. 814-824, August 2004 (ACM Press) [2] Kai Willner, Kemal Ugur, Marja Salmimaa, Antti Hallapuro, Jani Lainema, "Mobile 3D Video Using MVC and N800 Internet Tablet," 3DTV-CON 2008, Istanbul, Turkey [3] E. Kurutepe, A. Aksay, C. Bilen, C. G. Gurler, T. Sikora, G. Bozdagi Akar, A. M. Tekalp, "A Standards-Based, Flexible, End-to-End Multi-View Video Streaming Architecture,” Packet Video Workshop 2007, Lausanne, Switzerland [4] S. Cho, N. Hur, J. Kim, K. Yun, and S-I. Lee, “Carriage of 3D audio-visual services by TDMB”, Electronics and Telecommunications Research Institute, Republic of Korea, in Proc ICME 2006. [5] Digital Video Broadcasting (DVB): Transmission System for Handheld Terminals (DVB-H), ETSI EN 302 304 V1.1.1 (2004-11) [6] I. Rec, “H. 264 & ISO/IEC 14496-10 AVC, Advanced video coding for generic audiovisual services,” ITU-T, May, 2003. [7] K. Müller, P. Merkle, H. Schwarz, T. Hinz, A. Smolic, T. Wiegand, Multi-view Video Coding Based on H.264/AVC Using Hierarchical B-Frames, in Proc. PCS 2006, Picture Coding Symposium, Beijing, China, April 2006. [8] JM Reference Software Version 10.1. http://iphome.hhi.de/suehring/tml/download/ [9] A. Bourge and C. Fehn (Editors). White Paper on ISO/IEC 23002-3 Auxiliary Video Data Representations. ISO/IEC JTC 1/SC 29/WG 11, WG 11 Doc. N8039, Montreux, Switzerland, April 2006.

37

MOBILE3DTV

D3.2

[10] A. Vetro, P. Pandit, H. Kimata, and A. Smolic, “Joint Draft 3.0 on Multiview Video Coding,” Joint Video Team, Doc. JVT-W209, 2007. [11] A. Bria, D. Gomez-Barquero, “Application Layer FEC for Improved Mobile Reception of DVB-H Streaming Services”, IEEE 64th Conf. on Vehicular Tech., Sept 2006, pp. 1-5. [12]

P. Pandit, A. Vetro, and Y. Chen, “JMVM 3 software,” ITU-T JVTV208, 2007

[13] P. Merkle, A. Smolic, K. Mueller, and T. Wiegand, “Comparative Study of MVC Prediction Structures,” ITU-T JVT-V132, 2007. [14]

Ffmpeg homepage. [Online]. Available: http://ffmpeg.sourceforge.net/

[15] S. Wenger, M. M. Hannuksela, T. Stockhammer, M. Westerlund, and D. Singer, “RFC 3984: RTP payload format for H.264 video.” [Online]. Available: http://tools.ietf.org/html/rfc3984 [16] Ye-Kui Wang and Thomas Schierl, “RTP Payload Format for MVC Video” , Internet Engineering Task Force (IETF), Audio Video Transport Group (avt), August 21, 2008, http://tools.ietf.org/html/draft-wang-avt-rtp-mvc-02.txt. [17] M. Oksanen, A. Tikanmäki, A. Gotchev and I. Defee, ”Delivery of 3D Video over DVBH: Building the Channel”, NEM Summit, 2008 [18] G. Faria , J. A. Henriksson , E. Stare and P. Talmola "DVB-H: digital broadcast services to handheld devices," Proc. IEEE, vol. 94, pp. 194, Jan. 2006. [19] C. Fehn. Depth-Image-Based Rendering (DIBR), Compression, and Transmission for a Flexible Approach on 3DTV, Dissertation, Technical University, Berlin. Mensch & Buch Verlag, Berlin, Germany, 2006. ISBN 3-86664-118-4 [20] ETSI, EN 300 744, "Digital Video Broadcasting (DVB); Framing Structure, Channel Coding and Modulation for Digital Terrestrial Television." V1.5.1 (2004-11) [21] M. Aparicio (Editor), “WING TV – Services to Wireless, Integrated, Nomadic, GPRSUMTS & TV handheld terminals, D6: Common field trials report”, 2006 [22] T. Bouttevin (Editor), “WING TV – Services to Wireless, Integrated, Nomadic, GPRSUMTS & TV handheld terminals, D8: Wing-TV Measurement Guidelines & Criteria”, 2006 [23] “WING TV – Services to Wireless, Integrated, Nomadic, GPRS-UMTS & TV handheld terminals, D15: Simulation Report”, 2006 [24]

http://research.microsoft.com/IVM/3DVideoDownload/ 38

MOBILE3DTV

D3.2

[25] FATCAPS: A Free, Linux-Based Open-Source DVB-H IP-Encapsulator [Online]. Available: http://amuse.ftw.at/downloads/encapsulator [26] JustDvb-It [Online]. Available: http://www.cineca.tv/labs/mhplab/JustDVbIt%202.0.html [27] Decaps software [Online] Available. http://www.mobile3dtv.eu/ [28] ETSI, TR 102 377, "Digital Video Broadcasting (DVB); DVB-H Implementation Guidelines," V1.2.1 (2005-11). [29] ETSI, EN 301 192, "Digital Video Broadcasting (DVB); DVB Specification for Data Broadcasting." V1.4.1 (2004-11) 1.

39

Mobile 3DTV Content Delivery Optimization over DVB-H System

MOBILE3DTV - Mobile 3DTV Content Delivery Optimization over DVB-H System - is a three-year project which started in January 2008. The project is partly funded by the European Union 7th RTD Framework Programme in the context of the Information & Communication Technology (ICT) Cooperation Theme. The main objective of MOBILE3DTV is to demonstrate the viability of the new technology of mobile 3DTV. The project develops a technology demonstration system for the creation and coding of 3D video content, its delivery over DVB-H and display on a mobile device, equipped with an auto-stereoscopic display. The MOBILE3DTV consortium is formed by three universities, a public research institute and two SMEs from Finland, Germany, Turkey, and Bulgaria. Partners span diverse yet complementary expertise in the areas of 3D content creation and coding, error resilient transmission, user studies, visual quality enhancement and project management. For further information about the project, please visit www.mobile3dtv.eu.

Tuotekehitys Oy Tamlink Project coordinator FINLAND

Tampereen Teknillinen Yliopisto Visual quality enhancement, Scientific coordinator FINLAND

Fraunhofer Gesellschaft zur Förderung der angewandten Forschung e.V Stereo video content creation and coding GERMANY

Technische Universität Ilmenau Design and execution of subjective tests GERMANY

Middle East Technical University Error resilient transmission TURKEY

MM Solutions Ltd.

Design of prototype terminal device BULGARIA MOBILE3DTV project has received funding from the European Community’s ICT programme in the context of the Seventh Framework Programme (FP7/2007-2011) under grant agreement n° 216503. This document reflects only the authors’ views and the Community or other project partners are not liable for any use that may be made of the information contained therein.