A Packet-switched Multimedia Conferencing System - CiteSeerX

62 downloads 97971 Views 31KB Size Report
The Multimedia Conferencing project, a collaborative effort between ISI and BBN .... The live audio and video components of the conferencing system are ...
Eve M. Schooler Stephen L. Casner

A Packet-switched Multimedia Conferencing System Reprinted from the ACM SIGOIS Bulletin, Vol.1, No.1, pp.12-22 (January 1989).

University of Southern California Information Sciences Institute 4676 Admiralty Way, Marina del Rey, CA 90292-6695 213-822-1511

This research was sponsored by the Defense Advanced Research Projects Agency under contract number MDA903-87-C-0719. Views and conclusions contained in this report are the authors’ and should not be interpreted as representing the official opinion or policy of DARPA, DSSW, the U.S. Government, or any person or agency connected with them.

A Packet-switched Multimedia Conferencing System Eve M. Schooler Stephen L. Casner Information Sciences Institute University of Southern California 4676 Admiralty Way, Marina del Rey, CA 90292 213-822-1511

1. Overview The Multimedia Conferencing project, a collaborative effort between ISI and BBN Systems and Technologies Corporation, has developed an experimental system for real-time, multisite conferences. The three essential components of the conferencing environment are voice, video, and a shared workspace. The underlying communication model is based on packet-switched voice and video protocols for real-time data, operating in the Internet [8]. Due to high bandwidth requirements, communication is currently possible only over the Wideband Satellite Network (WBnet) [5]. The BBN Diamond/MMCONF system [7][11] provides a shared workspace and is used for the presentation and collaborative editing of mixed media documents. It runs on a Sun workstation along with the multimedia conference control program (MMCC) that establishes conference connections and controls the video cameras and monitor. This paper describes the architecture of the multimedia conferencing system, giving an introduction to each of the system’s components. It presents the prototype conference configuration at ISI, describes the usage of the system, and discusses our experience with the system. It concludes with a summary of future directions. 1.1 Voice and Video During a conference, video images from all sites (up to four) are displayed simultaneously in quadrants on the video screen. Since the video codec takes two camera inputs, each site has a room-view camera plus a copy-stand camera. The room-view camera captures the panoramic view of the conference participants at a site, while the copy-stand camera is intended for separate slides or graphic stills. There are remote controls for the room-view camera, enabling it to pan from left to right, to zoom in for a closer view, or to adjust the focus. At each site, voice data from the remote sites is mixed for playback, allowing all sites to talk at once if they wish. Voice is played back through headphones so that it will not be picked up by the open microphone and echoed back to the remote sites. The headphones are attached to individual infrared or radio receivers so participants are not wired to the table. This arrangement may be replaced in the future by a loudspeaker and an acoustic echo canceller if adequate cancellation can be achieved.

1

Voice Funnel/ ST Gateway

Conference Room

Microphone Mixer

Receiver & Headphones

Transmitter

STNI (Switched Telephone Network Interface)

VT (Voice Packetizer) Video Cameras

NTSC RGB

Video Codec Plus PC

Video Monitor

PVP (Video Packetizer)

VF (Voice Funnel/ ST Gateway)

Multimedia Workstation (Monitor & Keyboard)

(Workstation Body) Ethernet

Figure 1. Multimedia Conferencing System

2

BSAT

Voice and video are processed by separate programs in a dedicated BBN Butterfly multiprocessor called the Voice Funnel (see Figure 1). The VT and PVP programs packetize the voice and video data, provide reordering and buffering of packets on receipt, et cetera. The VF program acts as a gateway to the WBnet, allocating reserved bandwidth for voice and video traffic. It is connected as a host to a Butterfly Satellite IMP (BSAT) [5], the switch node of the WBnet. Video is compressed and decompressed at the rate of 128 Kb/s by the Image 30 video codec from Concept Communications, Inc. Commercially available codecs, including the Image 30, are designed for use with circuit-switched communication links, not packet-switched networks. The Image 30, however, has several features that made it easier to adapt than others: o

Data generated for each video frame is absolute, not relative, and is processed independently. Therefore loss of a packet does not require resynchronization. Typically packet loss only results in a small area of the screen not being updated.

o

Because the compression and decompression delay is less than 50 milliseconds total, compared to 500 milliseconds for some other commercial codecs, more packet-switching delay can be tolerated.

o

The RS449 interface is programmable for HDLC framing as required to connect to the high-speed ports of the Butterfly multiprocessor.

o

It is capable of screen quadrant partitioning.

Originally designed for point-to-point conferencing, the Image 30 firmware was modified by ISI to accept a merged stream of packets from multiple sites and display them in separate quadrants. The broadcast satellite network allows each site to receive packets from all the others. The PVP program queues the packet streams separately and then delivers a merged stream to the codec. The analog voice signal from the microphone is converted to a 64 Kb/s PCM digital signal by the Switched Telephone Network Interface (STNI) in the Voice Funnel. Digitized voice information is forwarded by the STNI to the VT program for packetization. Likewise, incoming digital voice information from all sites is mixed digitally and then converted back to an analog signal through the STNI. Essentially, the STNI acts as a voice codec, but does not perform any compression or decompression of data. Although currently unused, one of the original functions of the STNI was to allow use of the packet voice system from any telephone. The STNI converts touch-tone input to ASCII characters for recognition by the VT program. While conference connections can still be initiated this way, they are normally handled by the MMCC program.

3

1.2 Protocols The Internet Protocol (IP) [9] and Transmission Control Protocol (TCP) [10] are widely used in the DARPA Internet for data-oriented packet communication. These protocols, however, are inappropriate for real-time packet voice and video. TCP uses an end-to-end acknowledgement and retransmission strategy to ensure reliable delivery of data, but does so by adding extra delay. For interactive voice and video, it is more important to minimize delay than to ensure reliable delivery [1]. Therefore, an alternate suite of protocols was developed; the Stream Protocol (ST) [6][12], Network Voice Protocol (NVP) [2][3], and Packet Video Protocol (PVP) [4]. Using this set of protocols, the conferencing system tolerates occasional packet damage or loss without severe degradation in quality. ST operates at the same level as IP but is connection-oriented instead of datagram-oriented (see Figure 2). Features include network resource reservation facilities, multidestination addressing for multisite conferences, small packet headers to reduce delay, and aggregation of packets from multiple users into more efficient large packets for transmission across long-haul paths. Unlike IP gateways, ST gateways such as VF must keep track of state information about connections established through them. Connection setup involves processing in both hosts and gateways to establish a static route according to resource requirements.

NVP

TCP

PVP

ST

IP

Figure 2. Protocol Relationships The specifications for NVP and PVP both consist of a control protocol and a data protocol. The protocol messages are to be carried in IP datagrams before the ST connection is established (although the current PVP implementation does not follow this model). This allows the callee to accept or reject the connection and to verify the compatibility of the data encoding algorithms. Once the encoding has been selected, the appropriate resource allocation parameters are then used to establish the ST connection. The data protocols for NVP and PVP simply specify the headers to be included in data packets. The header fields provide sequencing and other control information as required for the particular medium.

4

1.3 Multimedia Conferencing Control The user interface for establishment of voice and video connections is the multimedia conferencing control program (MMCC). We have implemented a control panel that appears as a small window with eight software buttons at the top of the workstation screen. The current version of MMCC is responsible for connection management and video camera and monitor control. MMCC establishes a conference connection by communicating with the VT and PVP programs to create separate voice and video connections. Connections are initiated with the "connect" button that presents the user with a pop-up menu of sites from which to choose. Multiple sites may be selected at once. Connections are terminated with the "disconnect" button. If there are only two sites in the conference, the disconnect button completely shuts down the connection. Otherwise, it only disconnects one’s own site leaving the other participants still in conference. Feedback of a site’s conference status is always reflected in the header of the MMCC window. MMCC buttons are provided to control switching between room-view and copy-stand cameras. Buttons also exist to control remote camera selection, although it is possible to restrict this functionality if desired. There is also the option to use the video monitor in window-mode or fullscreen-mode. The monitor can be partitioned into four quadrants (windows) with video information from four different sites. Alternatively, any one of the four quadrants can be expanded to fill the full screen. Eventually, as conferences scale up to include more sites than it is possible to display, a new site selection scheme will become imperative. 1.4 Diamond/MMCONF The live audio and video components of the conferencing system are combined with a shared workspace for collaboration on computer-based Diamond multimedia documents [11]. A Diamond mixed media document may contain text, graphics, bit-mapped images, speech, and spreadsheets. Such a document may serve as the basis for the conference’s technical discussions, the meeting’s agenda or for note-taking as the conference progresses. MMCONF [7] runs on a workstation and acts as a conferencing umbrella program to Diamond (or other programs), in that it allows the Diamond editor, which is otherwise a single-user program, to be shared among conference participants. It takes keyboard and mouse inputs generated by any participant who has the conference "floor" and duplicates them at all sites. Once a Diamond mixed-media document is replicated at each site, MMCONF keeps the copies in synchronization during display and editing. If a section of text is highlighted by one participant, all participants see it. If a speaker scrolls through a document or uses the mouse as a pointing device, these events are echoed at every copy of the document. Finally, MMCONF provides some degree of authentication of conference users. When MMCONF connections are established, the callee must accept the connection by giving his or her login identification and password to protect access to files.

5

11’

Room-view Camera & Video Monitor

Copy-stand Camera

18’

Backup Speakerphone & Remote Camera Control

Workstation Monitors

Microphone

Drapes

Sound-absorbing Panels

Figure 3. ISI Conference Room

6

Currently, the MMCONF program operates independently of the MMCC program which controls voice and video. Later, these two will be merged into one integrated conferencing program.

2. Site Configuration The teleconference room at ISI has a custom-designed half circle 6’ diameter table with angled insets (from the flat side of the half circle) where two Sun monitors are mounted (see Figure 3). At ISI we have slaved the two monitors to one Sun-3 for better viewing and have angled the monitors back to keep them out of the way. The geometry is somewhat tricky. The room-view camera angle is about 60 degrees. Using stacking chairs that are narrower than normal conference room chairs, six people can sit around the table while viewing the monitors and still remain within the field of view of the camera. The 37’’ video monitor is placed on its own stand and the room-view camera is mounted on top of it. These go at the far end of the room. Because the camera is coaxial with the monitor, participants face the camera when they look at the monitor. The conference room also has a side table where a copy-stand camera has been placed.

3. Using the Multimedia System The prototype conferencing system has been used for several all day tele-meetings. Most meetings have been between two sites, but it is now possible to conference among three or more sites. The first multisite meeting, a two day meeting among people located at BBN, DARPA, and ISI, took place in July, 1988. It was felt that the three site conference was effective, but that participants needed to concentrate more, resulting in a more tiring session than in previous teleconferences with two sites. A contributing factor may have been that this meeting included the largest total number of participants of any thus far; 2-4 people per site is best. Two mechanisms were suggested to help with this problem: 1.

as a guide to the chairman, a tool running on the workstation showing a queue of people who would like to speak, as an improvement over hands held in the air.

2.

the ability to zoom in, audibly and/or visually, to help focus the conference during discussions.

Even though video zooming is possible now, it may not be easy enough to use except by a non-participant operator. One weakness in the current implementation of the conferencing system is the limited resolution of the video image, about 128x100. While the resolution has been described as very usable, it is not comparable to ordinary television resolution. Although other commercial codecs give better resolution, they cost substantially more and are not tolerant of packet loss. We expect in one or two years time to see improvements in the products of Concept and others that will provide better resolution for lower cost. In the meantime, we

7

are investigating how to make the best use of the available resolution and whether improvements could be made by trading off other parameters. Although the copy-stand camera can display paper documents, the resolution of the motion video is often insufficient. In the future, we plan to improve this by implementing a higher resolution still-frame mode. The best resolution is obtained by creating documents using the text and graphics capabilities of the Diamond system and then displaying them directly through MMCONF. As an alternative, sites with scanners can scan documents on paper into bitmaps to be shown in the MMCONF window, approximately 1000x800 pixels. For a large number of pages, however, this may be inconvenient. Since bitmap files tend to be large, it may also use up a considerable amount of disk space. Another outstanding issue is sound quality. Conference participants wear headphones; some users have complained that they find headphones inconvenient, while others have noticed a crisper sound through the headsets. An alternative approach has been to use loudspeakers with echo cancellation hardware. Our first pass at such a configuration was only somewhat effective. While participants enjoyed the freedom from headphones, conversations seemed hard to hear, remote participants seemed farther away, and people consequently raised their voices. Headphones seem to offset the problem of participants shouting during conferences. Since the remote voices are right at one’s ears, people relax their own voices to normal levels. We would like eventually to use a loudspeaker and echo canceller if satisfactory sound quality can be achieved.

4. Future Issues Other aspects of the project continue to evolve. There is an ongoing effort at BBN to integrate ST gateway functions, currently implemented only in the VF program, with the IP gateway functions of the existing Butterfly Internet Gateway. This would allow the Butterfly used in the conferencing system to have a dual purpose, one for conferencing, and one to provide normal gateway services. Butterfly Internet Gateways are already installed in many locations around the Internet, so the ST protocol would be available for use in more applications. As more high-speed networks become part of the Internet, conferencing will not be constrained to WBnet. Our goal is to provide a unified interface to the teleconferencing environment that transparently handles the underlying pieces of system software. Therefore the functionality of Diamond/MMCONF should be integrated with that of the MMCC program sometime in the future. There are planned improvements to both the Diamond and MMCONF software. The Diamond editor has been undergoing changes to support embedded video segments in multimedia documents, the Office Document Architecture (ODA), and the MIT X window system interface. MMCONF is being redesigned for generalization, so that it may work in conjunction with programs besides Diamond. For example, a video intensive map previewer tool is being prototyped for multi-user, conferencing use. Another application in

8

development to run under MMCONF is a conference presentation tool that should simplify the task of making presentations during the course of a conference. Finally, the notion of making conferences private and/or secure needs to be addressed. Exclusive conferences should be provided. Perhaps a smart scheduler should be resident to remind non-participating sites that the conferencing medium has been reserved during certain times. Even though the Diamond/MMCONF system requires a login id and password before commencing a conference, the MMCC program does not. When these programs become more integrated, there should be one point at which login is required and which affects all aspects of the conference -- shared documents, as well as voice and video. A more serious degree of security could be enforced using encryption techniques.

Acknowledgements This work is the result of the ongoing efforts of many people at ISI and BBN Systems and Technologies Corporation in addition to the authors; Dave Walden at ISI and Lou Berger, Terry Crowley, Harry Forsdick, Phil Park, Vanessa Rudin, and Claudio Topolcic at BBN. This work has been funded by the Defense Advanced Research Projects Agency (DARPA).

References [1]

Casner, S.L., Cohen, D., Cole, E.R., ‘‘Issues in Satellite Packet video Communication’’, Technical Report ISI/RS-83-5, USC/Information Sciences Institute, Marina del Rey, CA (July 1983).

[2]

Cohen, D., ‘‘Specifications for the Network Voice Protocol’’, Technical Report ISI/RR-75-39, USC/Information Sciences Institute, Marina del Rey, CA (Mar 1976).

[3]

Cohen, D., ‘‘A Network Voice Protocol NVP-II’’ and ‘‘Sample NVP/ST Scenarios’’ (unpublished memorandums), USC/Information Sciences Institute, Marina del Rey, CA (Apr 1981).

[4]

Cole, E.R., ‘‘PVP - A Packet Video Protocol’’ (unpublished memorandum), USC/Information Sciences Institute, Marina del Rey, CA, W-Note 28 (Aug 1981).

[5]

Edmond, W.B., Blumenthal, S., Echenique, A., Storch, S., Calderwood, T., Rees, T., ‘‘The Butterfly Satellite IMP for the Wideband Packet Satellite Network’’, Proceedings ACM SIGCOMM, pp.194-203 (Aug 1986).

[6]

Forgie, J.W., ‘‘IEN 119: ST -- A Proposed Internet Stream Protocol’’ (unpublished memorandum), MIT Lincoln Laboratory (Sept 1979).

[7]

Forsdick, H.C., ‘‘Explorations into Real-time Multimedia Conferencing’’, Proceedings 2nd International Symposium on Computer Message Systems, pp.299-315 (Sept 1985).

[8]

Leiner, B., Cole, R., Postel, J., Mills, D., ‘‘The DARPA Internet Protocol Suite’’, Proceedings INFOCOM ’85, Computers and COmmunications Integration: The Confluence at Mid-decade (Mar 1985).

9

[9]

Postel, J.B., ‘‘Internet Protocol’’, RFC 791, USC/Information Sciences Institute, Marina del Rey, CA (Sept 1981).

[10] Postel, J.B., ‘‘Transmission Control Protocol’’, RFC 793, USC/Information Sciences Institute, Marina del Rey, CA (Sept 1981). [11] Thomas, R.H., Forsdick, H.C., Crowley, T.R., Schaaf, R.W., Tomlinson, R.S., Travers, V.M., Robertson, G.G., ‘‘Diamond: A Multimedia Message System Built on a Distributed Architecture’’, IEEE Computer, pp.65-77 (Dec 1985). [12] Topolcic, C., Park, P., Draft of ‘‘Proposed Changes to the Experimental Internet Stream Protocol (ST)’’, BBN Laboratories, Cambridge, MA (Apr 1987). Note: RFC and IEN references are available from the Network Information Center, SRI International, Menlo Park, CA.

Appendix: Glossary BSAT Butterfly Satellite IMP. Butterfly The BBN Butterfly Parallel Processor which is a tightly-coupled, coursely-grained multiprocessor. circuit-switching Method by which a dedicated communication path is established for the transmission of a stream of data. Bandwidth is guaranteed and delay is equivalent to propagation time. The telephone system is a circuit-switched communication medium. codec coder-decoder. Acts as an analog to digital converter (coder), and a digital to analog converter (decoder) of data. In the case of a video codec, the conversions are usually accompanied by compression and decompression. Diamond A multimedia document editor developed at BBN for combining text, graphics, spreadsheets, bitmaps, and voice data. It is the basis of the shared work space in the Multimedia Conferencing Project. HDLC High-level Data Link Control. An ISO (International Standards Organization) bit-oriented data link protocol. IMP Interface Message Processor. A specialized computer that serves as a network switching element. IP Internet Protocol. Provides connectionless service across multiple packet-switched networks. MMCC Multimedia Conferencing Control program. Establishes and terminates conference connections, as well as controls the video camera and monitor. MMCONFActs as a conferencing umbrella program to Diamond (or other programs), in that it allows the Diamond editor, which is otherwise a single-user program, to be shared among conference participants. It takes keyboard and mouse inputs generated by any participant who has the conference "floor" and duplicates them

10

at all sites. Once a Diamond mixed-media document is replicated at each site, MMCONF keeps the copies in synchronization during display and editing. NTSC National Television System Committee (the title of the standard for composite video). In contrast to RGB, NTSC is a single composite color video signal. NVP protocol Network Voice Protocol. Works in conjunction with the Stream Protocol to provide real-time transmission of voice information over packet-switched networks. packet-switching Method by which messages and streams of data are subdivided into packets for transmission through a communication network. Each packet includes the destination address since it is independently stored and forwarded from node to node toward the destination. PVP protocol Packet Video Protocol. Works in conjunction with the Stream Protocol to provide real-time transmission of video information over packet-switched networks. PVP program Resides in the Voice Funnel Butterfly, packetizes video information received from the Image 30 video codec and re-orders any out-of-order packets coming from the WBnet. RGB Individual red, green, & blue signals for color video. ST Stream Protocol. Operates at the same level as IP, but is connection-oriented instead of datagram-oriented and provides multicast delivery and mechanisms for resource reservation. STNI Switched Telephone Network Interface. Converts analog voice data to a 64 Kb/s PCM digital signal. T1 1.544 Mb/s data communication circuit. TCP Transmission Control Protocol. A connection-oriented transport-level protocol designed to work with the IP protocol and to provide a reliable communication path. VF Voice Funnel program. Runs on the Voice Funnel Butterfly and serves as the ST gateway to the WBnet allocating reserved bandwidth for voice and video traffic. VT Voice terminal program. Runs on the Voice Funnel Butterfly and packetizes digital voice information received from the STNI into NVP packets it then forwards to the VF program. WBnet Wideband Network. A broadcast satellite network in which multiple BSAT nodes share a single 3 Mb/s channel using a packet-based demand-assigned-multiple-access scheme.

11