Multimedia Conferencing Standards

8 downloads 126287 Views 2MB Size Report
Table 6.1 ITU-T Multimedia Conferencing Standards (Basic Modes) .... To provide compatibility with normal G.7H voice telephone calls, all H.320 calls start by ...
Sub: Multimedia Communication Batch : BIT VIII CHAPTER 6

Multimedia Conferencing Standards DAVID LINDBERGH

Document2

6.1 INTRODUCTION The International Telecommunication Union Telecommunication Standardization Sector (ITU-T, known before 1993 as the CCITT) has produced a number of international standards ("Recommendations," in ITU parlance) for real-time digital multimedia communication, including video, and data conferencing. This chapter covers the most important of these standards, the ITU-T H-series, including H.320 through H.324, and H.310, together with their associated video and audio codecs and component standards, as well as the ITU-T T. 120 series for data/graphics conferencing and conference control. Audio and video codecs used by the ITU-T H-series are covered from a systems viewpoint, focusing on what the codecs do, not how they do it. Table 6.1 summarizes the ITU-T H-series standards, their target networks, and the basic video, audio, multiplex, and control standards for each. Table 6.1 ITU-T Multimedia Conferencing Standards (Basic Modes) Standard H.320(1990) H.321 (1995) H.322 (1995) H.323(1996) H.324 (1995) H.310(1996)

Network ISDN ATM/B-ISDN IsoEthernet LANs/Internet PSTN ATM/B-ISDN

Video Audio Multiplex Control H.26I G.711 H.221 H.242 Adapts H.320 to ATM/B-ISDN network Adapts H.320 to IsoEthernet network H.261 G.7I1 H.225.0 H.245 H.263 G.723.1 H.223 H.245 H.262 MPEG-1 H.222 H.245

All ITU-T H-series systems standards support real-time conversational two-way video and audio (limited to one stream of each in H.320, H.321, and H.322), with provisions for optional data channels for T.120 data/graphics conferencing and other purposes. Extensions allow multipoint operation (in which three or more sites can join in a group conference), and in some systems encryption, remote control of far-end cameras, and broadcast applications. Each standard specifies a common baseline mode to guarantee interoperability, but allows the use of other optional modes, both standard and nonstaridard, to be automatically negotiated using the control protocol. These systems fall into two generations H.320, H.321, and H.322 are first-generation standards, based on H.320 for ISDN networks approved in 1990. H.321 and H.322 specify the adaptation of H.320 terminals for use on ATM and IsoEthernet networks, respectively. H.323, H.324, and H.310 are the second-generation H-series system standards. Approved in 1995 and 1996, they benefit from industry's experience with H.320, avoiding the problems and limitations that were discovered. They all use the new H.245 control protocol and support a common set of improved media codecs.© 2000 College of Information Technology & Engineering. All rights reserved, Circulation copy for CITians only.

1

H.324, which like H.320, is intended for low-bit-rate circuit switched networks [initially analog public switched telephone network (PSTN), often known as plain old telephone service (POTS)], makes use of some H.320 extension standards, including H.233/H.234 encryption and H.224/H.281 far-end camera control. All these H-series terminals can interoperate with each other through appropriate gateways and can participate in multipoint conferences, as illustrated in Figure 6.1. 6.2 H.320 FOR ISDN VIDEOCONFERENCING The ITU-T H.320 standard, known during its development as "px64" for its use of bandwidth in 64-Kbit/s increments, covers videoconferencing and videotelephony over ISDN and switched-56 circuits at rates from 56 Kbit/s to 2 Mbit/s. Like the other H-series systems, H.320'supports real-time conversational two-way video and audio (one channel each), with provisions for optional data channels. Extensions allow multipoint operation (in which three or more sites can join in a group conference), encryption, remote control of far-end cameras, and broadcast applications. H.320 was developed during the late 1980s and approved by the CCITT (now ITU-T) in 1990. It was the first successful low-bit-rate video communications standard and remains the universally accepted standard for ISDN videoconferencing. 6.2.1

The H.320 Standards Suite

The H.320 document is a "systems standard" that, calls out a number of other ITU-T standards for various parts of the system, as shown in Figure 6.2. The core components of H.320 are the following: • H.221 multiplex: Mixes audio, video, data, and control information into a single bit stream. Uses synchronous time division multiplexing with 10-ms frames. • H.230/H.242 control: Mode control commands and indications, capabilities exchange. Operates over a fixed 400-bits per second (bps) channel (BAS) in the H.221 multiplex. • H.231/H.243 multipoint: Specifies central multipoint bridges and operation for multiway group conferences (optional in the H.320 standard, but universally implemented). • H.261 video coding: Compresses color motion video into a low-rate bitstream. Quarter Common Intermediate Format (QCIF) (176 x 144) and Common Intermediate Format (GIF) (352 x 288) resolutions. • G.711 audio coding: 8-kHz sample rate, 8-bit log-PCM (64 Kbit/s total) for toll-quality narrowband audio (3-kHz bandwidth).

Baseline H.320 components are shown in bold in Figure 6.2 and include QCIF resolution H.261 video, G.711 logarithmic-Pulse Code Modulation (PCM) audio, the H.221 multiplexer, and H.230/H.242 control. Improved standard modes, such as H.263 and Common Intermediate Format (GIF) resolution H.261 video and improved-quality or lower-bit-rate audio modes (leaving more bandwidth for video) can be negotiated using control protocol procedures, as can non-standard or proprietary modes.

© 2000 College of Information Technology & Engineering. All rights reserved, Circulation copy for CITians only.

2

In addition to the core components of H.320, optional standard extensions support remote control and pointing of far-end cameras (H.224/H.281), encryption according to H.233/H.234, and data conferencing using T.120 for sophisticated graphics and conference control. T.I20 supports applications like Joint Photographic Experts Group (JPEG) still image transfer, shared document annotation, and personal computer (PC) application sharing. The H.331 standard (not shown) specifies how H.320 terminals can be used for low-bit-rate broadcast (send or receive only) applications. 6.2.2

Multiplex

The multiplexer component of a multimedia conferencing system mixes together the audio, video, data, and control streams into a single bit stream for transmission. In H.320, the H.221 time division multiplexer (TDM) is used for this purpose. H.221 supports a total of eight independent media channels, not all of which are present in every call. The BAS and FAS channels carry H.320 system control and frame synchronization-information and are always required. There is provision for one channel each of audio and video and three user data channels, LSD, HSD, and MLR The optional ECS channel carries encryption control messages if encryption is used. The multiplexing scheme uses repeating 10-ms frames of 80 bytes (640 bits) each. Various bit positions within each frame are allocated to the different channels in use. This system makes efficient use of the available bandwidth, except that the allocation of bits to different channels can change among only a small number of allowed configurations. 6.2.3

System Control Protocol

The control protocol operates between a pair of H.320 systems to govern the terminal's overall operational mode, including negotiation of common capabilities, selection of video and audio modes, opening and closing of channels, and transmission of miscellaneous commands and indications to the far end. H.242 and H.230 together define the basic H.320 control protocol and procedures. All H.320 control messages are drawn from tables in the H.221 standard, with each 8-bit code assigned a particular meaning. Each table entry is called a codepoint. Longer or less frequently used messages can be sent as multibyte messages, using escape values in the initial table. The meaning of and procedures for using the messages are defined variously in H.221, H.230, H.242, and, for multipoint-related messages, H.243. The messages are sent directly in the (net) 400-bit/s H.221 BAS channel without any packeti-zation, headers, or CRC, and without the use of any acknowledgment or retransmission protocol. The low error rate of Integrated Sendees Digital Network (ISDN) channels, combined with the forward error correction applied to all BAS messages, ensures that control messages are nearly always received without error. Although this system has worked well for H.320 terminals, the potential for undetected errors in the control channel sometimes results in added procedural complexity, such as redundant transmission. The newer H.245 control protocol, which replaces H.230/H.242 control in the second-generation ITU-T conferencing standards, is based instead on a reliable link layer that retransmits errored messages automatically until a positive acknowledgment is received, thus allowing the control protocol to assume that messages are guaranteed to arrive correctly. 6.2.4 6.2.4.1

Audio Coding G.711 Baseline Audio

The baseline audio mode for H.320 is the G.711 log-PCM codec, a simple 8-kHz sample rate logarithmic PCM scheme that has long been used as the primary voice telephony codec for digital telephone networks (long-distance voice telephone calls are today carried on digital networks, even if they originate from analog telephones). G.711 is denned to use 8-bit samples, for a total bit rate of 64 Kbit/s, but for use with H.320 each sample is truncated to 6 or 7 bits, resulting in alternative bit rates of 48 or 56 Kbit/s. G.711 provides excellent tollquality narrowband (3-kHz audio bandwidth) audio with insignificant codec delay (well under I ms) and very low implementation complexity. To provide compatibility with normal G.7H voice telephone calls, all H.320 calls start by sending and receiving G.711 audio while performing initial synchronization and mode negotiation in the H.221 FAS and BAS channels. Unfortunately, G.711 specifies two alternative encoding laws, A-law and -Iaw; both schemes were already in use in different parts of the world at the adoption of G.711, and the CCITT was unable to agree on a single law. As a result, H.320 systems must attempt to automatically detect the coding law in use © 2000 College of Information Technology & Engineering. All rights reserved, Circulation copy for CITians only.

3

by the far end at the start of each call or avoid using audio until H.320 control procedures can be used to establish another audio mode. 6.2.4.2

Lip Sync

Audio codecs generally involve less delay than video codecs. H.320 provides lip synchronization, in which the decoded audio output matches lip movements displayed in the video, by adding delay in the audio path of both the transmitter and receiver, keeping the audio and video signals roughly synchronized as they are transmitted. This makes it impossible for receivers to present audio with minimal delay if the user does not want lip sync. Second-generation systems support lip sync without adding audio delay at the transmitter, instead using timestamp or time-skew methods that let the receiver alone add all necessary audio delay if desired. 6.2.4.3

Optional Audio Modes

G.711 was chosen as the baseline H.320 audio mode for its low complexity and compatibility with ordinary telephone traffic, but it is quite inefficient in its use of bandwidth compared to optional H.320 audio modes. The data bandwidth saved by switching to an alternative audio mode can be used to send additional video bits, making a big difference to H.320 video quality, especially on common 2-B (128-Kbit/s) H.320 calls. Table 6.2 summarizes the set of approved and planned ITU-T audio codecs used in H-series conferencing. [Note that narrowband codecs pass 200-3400 Hz audio, and wideband codecs pass 50-7000 Hz. In Table 6.2 and throughout this chapter, these are referred to as "3-kHz" and "7-kHz" audio bandwidth codecs, respectively. Also note that audio codec delay is highly dependent on implementation. The delay values in Table 6.2, and in the rest of this chapter, use a (3 * frame size) + look-ahead rule of thumb, which includes algorithmic delay, one frame time for processing, and one frame time for buffering. Gibson et al. (1998) provides a more complete discussion of codec delay and other noncodec factors contributing to total end-toend delay.] The most important and commonly supported optional H.320 audio modes are G.728, for 16i Kbit/s narrowband audio, and G.722, for 56-Kbit/s wideband (7-kHz audio bandwidth) audio. Table 6.2 Audio Codecs Used in Multimedia Conferencing Standard G.711 (1977) G.728(1992) G. 723.1 (1995) G.729 (1995) G.729A(1996) G.722 (1988) G.722.1 (1999)

Bit Rates Kbit/s 48, 56, 64 16 5.3,6.4 8 8 48,56,64 24,32

Audio Bandwidth (kHz) 3 3 3 3 7 7

Complexity (Fixed-Point) MIPS Near zero ~ 35-40 ~ 18-20 ~ 18 11 ~ 10 ~ 14

Frame Size 125 s 625 30 ms 10 ms 10 ms 125 s 20ms

Delay