Remote Conferencing with Multimedia Objects BSD.org

0 downloads 0 Views 263KB Size Report
Oracle. Fig. 1. The architecture of the conferencing system. Automatic .... i ,...,cim i }. For example, an CT image in a medical record can be presented in the flat ...
Remote Conferencing with Multimedia Objects Ehud Gudes, Carmel Domshlak, and Natalia Orlov Department of Computer Science Ben-Gurion University, Beer-Sheva, Israel. {ehud,dcarmel,orlovn}@cs.bgu.ac.il

Abstract. Advances of the Internet infra-structure with its high bandwidth will provide for the first time the opportunity for Cooperative work over the Internet. A typical example is a group of physicians discussing together or browsing separately a patient file which includes CT images, voice fragments, tests results, etc. This paper presents an architecture and a prototype for a multimedia cooperative work environment. The highlights of this system include: a general architecture for cooperative work over the Internet, an objectoriented multi-media database, a new and original presentation module which makes presentation decisions based on author preferences, user needs and actions, and network constraints, and integration of several multi-media processing modules which can process images, voice, or textual documents. Some examples of the system operation in the medical domain are also presented.

1

Introduction

Advances of the Internet infra-structure with its high bandwidth will provide for the first time the opportunity for Cooperative work over the Internet [15]. A typical example is a group of physicians discussing together or browsing separately a patient file which includes CT images, voice fragments, tests results, etc. While discussing the case, some of them would like to consider similar cases either from the same database or from other medical databases [15]. Furthermore, some of them may like to support their views with articles from databases on the web, whether from known sources or from dynamically searched sites. The results of the discussions, either in forms of text, or marks on the images, or speech discussions may be stored in the file or in other locations for future search and reference. The above scenario calls for a very flexible and powerful distributed multimedia database and a semantically rich knowledge-based layer above it which enable fast and intelligent retrieval of the information [2]. Additionally, the complex structure of the scenario calls for making decisions about what material is presented, based on a set of preferences that depends on context - the structure of the conveyed information, as well as the existing bandwidth, which may limit the ability to share some parts of the multi-media objects. This paper describes the design and implementation of a software system which supports this type of multi-media conferencing. The highlights of this system include: a general architecture for cooperative work over the Internet, an A.B. Chaudhri et al. (Eds.): EDBT 2002 Workshops, LNCS 2490, pp. 526–543, 2002. c Springer-Verlag Berlin Heidelberg 2002 

Remote Conferencing with Multimedia Objects

527

object-oriented multi-media database, a new and original presentation module which makes presentation decisions based on author preferences, user needs and actions, and network constraints, and integration of several multi-media processing modules which can process images, voice, or textual documents. The rest of the paper is organized as follows. Section 2 presents some related work. Section 3 presents the overall architecture of the system. Section 4 discusses in detail the presentation module which is a major contribution of this paper. Section 5 discusses some implementation details and gives examples of the system operation. Section 6 is the summary.

2

Related Work

Multimedia databases is an active area of research [24]. Topics which were investigated heavily include Query languages [14] and multimedia and spatial indexing [16]. The main objective of these fields is to define general-purpose access structures that represent the relevant ”features” of the data, given a set of media sources, each of which contains information represented in a way that is (possibly) unique to that medium, and the provision of query and update facilities which can operate on these heterogeneous objects. In terms of Cooperating systems, we should mention [19] which discusses some military applications of such systems. However, this article is concerned more with the operations carried out sequentially by a group of agents (such as: media generation or media analysis) and less on their concurrent viewing of the information, such as in a Telemedicine application. In terms of medical applications, we should mention [26] which discusses the potential and problematic of Telemedicine, especially from the business point of view. In terms of presentation of multimedia documents, we should mention the system called ZyX [25] which discusses the presentation of a multimedia document content under various layout constraints and user profile. For works on knowledge based multimedia presentations we refer reader to [4,9,17, 23], and this list is definitely not exhaustive. Our system extends the above capabilities and especially in the presentation area. In particular, most other works deal with the planning and design of presentations before the presentation is performed, while our system provides also for changing presentations dynamically based on user interaction.

3

System Concepts and Architecture

Figure 1 shows the general architecture of the system. This is a Client/Server architecture with three major components: 1. Client module. This module resides at the user site. It is responsible for displaying the multi-media documents as requested by the server. The client can reside at any Internet site authorized to use the system. Multiple clients may enter a shared ”room”. In that case, each one of them sees the actions of the other. For example, when one user writes some text on an image (or speaks ), the others can see the text (or hear the speech). The objects the

528

E. Gudes, C. Domshlak, and N. Orlov

clients see are brought into the room from the multi-media database, and their presentation format is decided by the Presentation module. 2. Interaction server. This module is responsible for the cooperative work in the system. It also calls the presentation module when needed. The interaction server keeps track of all objects in and out of shared rooms. If a client makes a change on a multi-media object, that change is immediately propagated to other clients in the room. The interaction server also calls the database server to fetch and store objects in the system. The interaction server also keeps track of user actions and transfer them to the presentation module, since such actions may change the way presentation will be done. For example, a ”zoom” action may not only zoom the image pointed, but may also hide other objects on the user window. Which objects are ”hidden” is the decision of the presentation module. 3. Database server. This module is responsible for storing and fetching multimedia objects from the database. We currently use Oracle object-relational database, where many of the components of the multi-media objects are stored as ”blobs”. Each type of object may have a set of methods to process it. Such methods may include segmentation of the image, drawing text on an image, identifying speakers within voice segments, etc. The objects and their corresponding methods are imported from the database to their respective Java classes, thus portability of the software is assured. To enhance the functionality of the system we have integrated it with three existing modules for processing multi-media documents. The three integrated modules are: 1. The image-processing module. This is a standard IP module which enables several image-processing operations to be done and be visible by all partners of an interaction. The main operations they can perform are: – Zooming of a selected part of image. – Deleting of text elements and line elements. – Adding Segmentation grid with possibility to fill different segments of the segmentation with different colors or patterns. – Freezing of Multimedia Objects (by one partner from the rest) and releasing the freeze. 2. The voice-processing module. This module was developed by A. Cohen [8] It enables searching and sounding of voice fragments. In particular, it enables both keyword and speaker identification. Since the module is integrated, it is by default Cooperative, that is, if one does keyword searches, the results will be visible and usable to other partners in the ”chat room”. Voice, audio and various other one-dimensional signals are an integral part of the tele-consulting system. In a tele-consulting task, it is often required to browse an audio file and answer questions such as: How many speakers participate in a given conversation? Who are the speakers? In what language are they talking? What is the subject of the talk? The main issues that were tackled in order to develop an efficient audio browsing for a tele-consulting system were:

Remote Conferencing with Multimedia Objects

Client #1

...

Client #2

529

Client #N

queries

Java objects

Interaction server

queries

Oracle data

JDBC

Object-relational Oracle database

Fig. 1. The architecture of the conferencing system

Automatic segmentation of audio signals. The segmentation algorithm is able to distinguish among signal and background noise and among the various types of signals present in the audio information. The audio data may contain speech, music, or audio artifacts, which are automatically segmented. Speech segmentation is the process of segmenting speech data into various types of speech signals such as male speech, female speech, child speech, various types of pronunciations, etc. Word spotting is a basic function of speech browsing. Word spotting algorithms [22] accept a list of keywords, and raise a flag when one of these words is present in the continuous speech data. Word spotting systems are usually based on keywords models and ”garbage” model that models all speech that is not a keyword. A word spotting algorithm, based on word models, has

530

E. Gudes, C. Domshlak, and N. Orlov

been developed. This algorithm works well when the keywords list is a priori known and keyword models may be trained in advance. Speaker spotting [8] is dual to word spotting. Here the algorithm is given a list of key speakers and is requested to raise a flag when one of them is speaking. The general problem is defined as text independent speaker spotting. The assumption is that the text the speaker is going to utter is a priori unknown so that the algorithm has to ”spot” the speaker independently of what she is saying. The main tool by means of which the above algorithms was implemented is the Continuous Density Hidden Markov Model (CD-HMM). This model has been proved to model very effectively the speech signal and many other audio signals. It was used both for training and for matching purposes. 3. The Image-compression-transfer module. This module was developed by A. Averbuch [1,3]. It enables the compression and transfer of images in various degrees of resolution. By integrating it with the Cooperative architecture and the Intelligent Objects Presentation module (see below), one is able to customize the way the same image is shown with different resolutions to the various partners in the chat room. The scheme is based on a new paradigm for image compression [20] which is hybrid (multi-layered) representation of the image. An image is encoded as the superposition of one main approximation, and a sequence of residuals. The strength of the multi-layered method comes from the fact that we use different bases to encode the main approximation and the residuals: – a wavelet compression algorithm encodes the main approximation of the image, and a wavelet packet or local cosine compression algorithm encodes the sequence of compression residuals. By selecting different wavelet and wavelet packet or local cosine bases, we allow different features to be discovered in the image. Furthermore, with each new basis we can encode and compensate for the artifacts created by the quantization of the coefficients of the previous bases. The above integrated modules enhance the functionality of the system. Their integration was quite easy thanks to the general design of the Interaction server. We next discuss in detail the presentation module of our system.

4

The Presentation Module

In this section we describe the concept of a “multimedia document” in our system, which is mostly entailed by the issue of document presentation. A multimedia document can be viewed as a richly structured collection of lower-layer multimedia components. In our work we assume that the multimedia documents make use of a hierarchical component structure. For example, consider a medical record of a patient. It may contain CT and X-ray images, test results in a special format, texts, voice fragments, etc. These information items may be organized in a hierarchical, tree-like structure, with source being the actual Medical Record. Note that different parts of the medical record need not be stored together: all

Remote Conferencing with Multimedia Objects

531

the components of the record can be retrieved from their actual storage on demand. An example and some discussion of a medical record hierarchical structure is presented in [21]. The issue of presentation of a multimedia document raises many interesting issues: 1. The amount of information (the number of different components) in a multimedia document may be very large. Consider our medical record in which the (multimedia) information about the patient is continuously gathered. It arrives from different clinics, diagnostic centers, home and nursing care, laboratories, etc. Therefore, generally, presentation of a multimedia document can not be considered as a total exposure of all the document’s content. Hence, a decision about what to present from the whole content of the document is essential. 2. The authors of a multimedia document can be considered as experts on this document content. The knowledge of the authors may be the central factor in how the document should be presented. For example, the author of the document may prefer to present a CT image together with a voice fragment of expertise, and a graph describing some tendencies. Likewise, if a CT image is presented, then a correlated X-ray image is preferred by the author to be hidden, or to be presented as a small icon. Hence, the authors of a multimedia document may define both what to present from the whole content of the document and how to present the presented parts of it. In addition, one of the central goals of multimedia document management is providing a viewer-oriented personalization of the document’s presentation. Much of the content personalization literature focuses on learning user profiles. Although this technique is useful, it generally suffers from low availability, and tends to address only long-term user preferences. These schemes are thus typically applicable only to frequent viewers, that are, in addition, amiable to having information about their behavior managed by an external agent. The presentation module in our system is based on a conceptually new model for representing a multimedia document content that was proposed in [11], where it was also illustrated on a prototype system for web-page authoring and presenting. This model is unique in two ways. First, it emphasizes the role of the author in the process, viewing her as a content expert whose knowledge and taste are important factors in how the document will be presented. The resulting model exhibits dynamic response to user preferences, but does not require learning long-term user profiles. Second, to accomplish this behavior, well-founded tools for preferences elicitation and preference optimization are used. These tools are grounded in qualitative decision theory [13], and help the author structure her preferences over document content off-line, in an intuitive manner, and support fast algorithms for optimal configuration determination. The preference elicitation, and the subsequent preference representation are done in an intuitive yet expressive manner using a CP-net [5,6,7,10]. The CPnet is an intuitive, qualitative, graphical model of preferences that captures statements of conditional preferential independence. The description of these preferences, as captured by the CP-net, becomes a static part of the multimedia

532

E. Gudes, C. Domshlak, and N. Orlov

document, and sets the parameters of its initial presentation. Then, for each particular session, the actual presentation changes dynamically based on the user’s actual choices. These choices exhibit the user’s content preferences. They are monitored and reasoned about during each session. No long-term learning of a user profile is required, although it can be supported. Using this approach, the content personalization is achieved through dynamic preference-based reconfiguration of the document presentation. Whenever new user input is obtained (e.g., a click indicating his desire to view some item in a particular form), the configuration algorithm attempts to determine the best presentation of all document components with respect to the author’s preferences among those presentations that satisfies the user’s viewing choices. This process is based on an algorithm for constrained optimization in the context of a CP-net. The resulting behavior is dynamic and user dependent. The personalization stems from the interaction between the user’s choices and the author’s preferences. 4.1

Configuration and Qualitative Preferences

In this section we present preference-based multimedia document presentation, and show how decision theoretic tools provide a basis for this application. Any multimedia document can be considered as a set of components C = {c1 , . . . , cn }. Each component is associated with its content. For example, the content of a component may be a block of text, an image, etc. Each component may have several optional presentations to the viewer, and these options for ci are denoted by D(ci ) = {c1i , . . . , ciim }. For example, an CT image in a medical record can be presented in the flat form, in the segmented form, or omitted altogether. The document’s components define a configuration space C = D(ci ), × · · · × D(cn ). Each element σ in this space is a possible presentation (= configuration) of the document content. Our task will be to determine the preferentially optimal presentation, and to present it to the current viewers of the document. In terms of decision theory, the set of components of document is a set of features, the optional presentations of document’s content are the values of the corresponding feature, and presentations are outcomes, over which a preference ranking can be defined. First we define a preference order  over the configuration space: σ1  σ2 means that the decision maker views configuration σ1 as equal or more preferred than σ2 . This preference ranking is a partial order, and, of course, it will be different for different decision makers. Given a preference order  over the configuration space, an optimal configuration is any σ ∈ C such that σ  σ  for any σ  ∈ C. The preference order reflects the preferences of a decision maker. The typical decision maker in preference-based product configuration is the consumer. However, in our application the role of the decision maker is relegated to another actor – the document authors. The authors are the content experts, and they are likely to have considerable knowledge about appropriate content presentation. We would like the document to reflect their expertise as much as possible.

Remote Conferencing with Multimedia Objects

533

During the, possibly ongoing, creation of the document, the authors describe their expectations regarding content presentation. Therefore, the preference order  represents the static subjective preferences of the document authors, not of its viewer. Thus, preference elicitation is performed only to the document authors off-line, once for all subsequent accesses to the created document. The dynamic nature of the document presentation stems from the interaction between the statically defined author preferences and the constantly changing content constraints imposed by recent choices of the current viewers. Because of these requirements, the model for preference-based data presentation in [11] exploits the advantages of the CP-network model developed in [6]. This is an intuitive, qualitative, graphical model, that represents statements of conditional preference under a ceteris paribus (all else equal) assumption. Each CP-network is a directed acyclic graph, and each node in the CP-network stands for a variable, which is a document’s component in our domain. The immediate predecessors of a variable v in the CP-network, denoted by Π(v), are the variables, values of which affect the preference ordering over the values of v. Formally, if Y = C{v, Π(v)} then v and Y are conditionally preferentially independent given Π(v). This standard notion of multi-attribute utility theory can be defined as follows [18]: Let X, Y , and Z be non-empty sets that form a partition of feature set F . X and Y are conditionally preferentially independent given Z, if for each assignment z on Z and for each x1 , x2 , y1 , y2 we have that x1 y1 z  x2 y1 z iff x1 y2 z  x2 y2 z. Finally, each node v is attached with a table CP T (v) of preference orderings of the values of v, given any assignment on Π(v). In terms of our domain, this conditional ceteris paribus semantics requires the document author to specify, for any specific component ci of interest, content presentation of which other components Π(ci ) can impact her preferences over the presentation options of ci . For each presentation configuration A(Π(ci )) of Π(ci ), the designer must specify her preference ordering over the presentation options of ci given A(Π(ci )). For example, suppose that ci is a component with a binary domain D = {c1i , c2i }, and suppose that an author determines that Π(ci ) = {cj , ck } and that c1i is preferred to c2i given that cj is presented by cxj and ck is presented by cyk all else being equal. This means that given any two configurations that agree on all components other than ci and in which cj = cxj and ck = cyk , the configuration in which ci = c1i is preferred to the configuration in which ci = c2i . An example CP-network with the corresponding preference table is shown in Figure 2. We see that the designer specifies unconditional preference for presenting the content of component c1 (denoted in figure by c1 c1 ). However, if c1 is presented by c11 and c2 is presented by c22 , then the designer prefers to present the content of c3 by c23 (denoted by (c11 ∧ c22 ) : c23 c13 ). One of the central properties of the CP-net model, is that, given a CP-net N , one can easily determine the preferentially optimal outcome [6]: Traverse the nodes of N according to a topological ordering and set the value of the processed node to its preferred value, given the (already fixed) values of its parents. Indeed, any CP-net determines a unique best outcome. More generally, suppose that we

534

E. Gudes, C. Domshlak, and N. Orlov

89:; ?>=< ?>=< 89:; c1 c2 77 77    89:; ?>=< c3 7  777    89:; ?>=< 89:; ?>=< c4 c5

ci c1 c2 c3

CPT(ci ) [c11  c21 ] [c22  c12 ] [(c11 ∧ c12 ) ∨ (c21 ∧ c22 ) : c13  c23 ; (c11 ∧ c22 ) ∨ (c21 ∧ c12 ) : c23  c13 ] c4 [c13 : c14  c24 ; c23 : c24  c14 ] c5 [c13 : c15  c25 ; c23 : c25  c15 ]

Fig. 2. An example CP-network

are given ”evidence” constraining outcomes in the form of a partial assignment π on the variables of N . Determining the best completion of π, i.e., the best outcome consistent with π, can be achieved in a similar fashion by projecting π on the corresponding variables in N before the top-down traversal described above. 4.2

Online Document Update

One of the differences between the web-pages, presentation of which was investigated in [11], and the multimedia documents, is that latter may be updated online by any of the current viewers. Possible updates are: 1. Adding a component. 2. Removing a component. 3. Performing an operation on a component. For each kind of updates we should provide a policy for updating the CP-network associated with the document, since we do not want to ask our viewer to update the underlying CP-network (although it is possible). For the first two kinds of update it is easy to provide simple yet reasonable policies of the CP-network update, thus we left this discussion. However, the last case is more tricky. Suppose that a component ci stands for an X-ray image that can be presented in three different levels of resolution, thus D(ci ) = {c1i , c2i , c3i }. Now suppose that a viewer performed a segmentation operation on this image, while this was presented by its value c2i . In this case we add to the CP-network   a variable ci  that will stand for the segmentation of ci , and D(ci  ) = {c1i , c2i }   where c1i and c2i stand for presenting ci in segmented and in flat form, respectively. This new variable is incorporated into the CP-network as follows:   Π(ci  ) = ci , and c1i c2i iff ci = c2i . Clearly, this way, the domain of the variable ci remains unchanged, and thus we should not revisit the CP-tables neither of ci nor of the variables that depend on ci . Actually, it provides us an additional important flexibility. After performing an operation on one of the components of the document, the viewer can decide about the importance of this operation for the rest of the viewers. If she will decide that the result of her operation emphasis something that supposed be important to all or most of the potential viewers of the document, then

Remote Conferencing with Multimedia Objects

535

the CP-network will be updated globally. Otherwise, this change will be saved as an extension of the CP-network for this particular viewer. Note that the original CP-network should not be duplicated, and only the new variables with the corresponding CP-tables should be saved separately.1 4.3

Overview of the Presentation System

The architecture of the Presentation system is illustrated in Fig. 3. It consists of sub-modules of the overall system depicted in Figure 1, i.e. client modules, interaction server, and the documents’ storage module.

Fig. 3. General illustration of the system.

Client module – Each viewer can examine the content of a multimedia document using a separate client module. This module is used only for (i) presenting the content of the document, and (ii) an interface between the viewer’s interactions with the document and the interaction server. The GUI of a client module is illustrated in Fig. 5. At the left side of the window, the viewer is exposed to the hierarchical structure of the whole document. The right side of the window is devoted to the actual presenting of the document content. Recall that the appearance of document content is dynamic, and supposed to be optimal w.r.t. the preferences of the documents authors and the recent choices (= interests) of all current viewers of this particular document. By a choice of a viewer we mean its explicit specification of the presentation form for some component. Note that one of the possible forms of presentation can be hiding of the component. Interaction server – Each interaction server serves an arbitrary number of concurrent viewers, and provides both an access to the stored documents and the reasoning abilities about the presentation of the currently examined documents. When a new client module appears, and a request for a document D is received by the interaction server, then the interaction server acts according to the use case description presented in Fig. 4.a. 1

In this case, the system design presented in Section 5.1 should be extended to deal with these viewer-oriented parts of the CP-network, but this change can be easily incorporated.

536

E. Gudes, C. Domshlak, and N. Orlov

After the initial deployments of the documents to the viewers, the interaction server continuously receive from the client modules the information about the recent choices of the viewers. Given these choices, it determines the optimal presentations for all relevant documents, and returns to the client modules specifications of the updated optimal presentations of these documents. The corresponding use case is presented in Fig. 4.b. For detailed description of the presentation model we refer readers to [11].

Fig. 4. Use cases: (a) Retrieving a document; (b) Updating the presentation.

4.4

Performance Issues

The dynamic nature of presenting multimedia documents raises a performance issue. Large amounts of information must be delivered to the user quickly, on demand. To see the issues involved, consider a medical record in which (multimedia) patient information is continuously gathered. In addition, some record components have several presentation options, e.g. a CT image can be presented either plain, or segmented. Presentation options may have to be pre-generated and stored to be later delivered on demand. A medical record may be accessed remotely, as in the case of web pages in [11], or by other means from a centralized database serving a number of physically distant clinics. In all such cases, the viewing physician should be provided with the lowest possible response time. Two related problems hamper our ability to provide fast response times to handle user dependent presentation needs: (i) communication bandwidth limitations, and (ii) limited client buffer size. There are two potential approaches to handle this issue in a well-defined manner: First, if the above parameters are measurable, then we can add corresponding ”tuning” variables into the preference model of the document presentation, and to condition on them the preferential ordering of the presentation

Remote Conferencing with Multimedia Objects

537

value 1 value 2 value 3 value 4

Fig. 5. GUI for the client module.

alternatives for various bandwidth/buffer consuming components. Such model extension can be done automatically, according to some predefined ordering templates. The other alternative, which is currently being implemented in our system, is of pre-fetching likely components ahead of time. Ideally, we would have liked to download the whole document ahead of time. However, the limited buffer size and communication bandwidth prevent this. Instead, we download components most likely to be requested by the user, using the user’s buffer as a cache. Thus, the model for CP-net based multimedia systems [11] is extended by a preference-based optimized pre-fetching of the document components. For formal description of this approach we refer our readers to [12]. Note also that although not currently used, the pre-fetching option allows the use of various transcoding formats of the multimedia objects according to the communication bandwidth and the client’s software.

5

System Implementation and Operation

In this section we give some more details on the system implementation and also present some of its operation screens. The interaction server interface is imple-

538

E. Gudes, C. Domshlak, and N. Orlov

mented in Java, and image and voice processing algorithms are implemented in C and are invoked using Java native interface. 5.1

Implementation of Multimedia Documents

Each multimedia document in our system (MultimediaDocument) consists of the actual multimedia hierarchically-structured data (MultimediaComponent) and the preference specification for the presentation of this data (CPNetwork). An object-oriented description of the entire entity relation is presented in Figures 6. As was mentioned earlier, all multimedia objects are mapped into corresponding Java classes which are also described below.

Fig. 6. Multimedia component OOD

MultimediaComponent is an abstract class that has two ground specifications - CompositeMultimediaComponent and PrimitiveMultimediaComponent. The CompositeMultimediaComponent stands for an internal node in the hierarchical structure of the document, while the PrimitiveMultimediaComponent stands for a leaf node. Each component can be presented in various manners; we denote such a set of a component’s possible presentations by the domain of this component. An instance of PrimitiveMultimediaComponent may have a domain of an arbitrary size, while the instance of CompositeMultimediaComponent are restricted

Remote Conferencing with Multimedia Objects

539

to only binary domains, since it only can be either presented or hidden. Each instance of PrimitiveMultimediaComponent contains a list of MMPresentation instances, in which the ith element stands for the ith option of presenting this PrimitiveMultimediaComponent. MMPresentation is an abstract class, ground specifications of which represent different alternative presentations, such as Text, JPGImage, SegmentedJPGImage, etc. Now we are back to the MultimediaDocument class, and its interface is as follows: Method getContent

Description Accessor method to the MultimediaComponent data member. def aultP resentation() Returns a description of the optimal presentation of the component given no choices of the viewers (delegated to the CPNetwork data member). reconf igP resentation(eventList) Given a list of the viewers’ recent decisions, provide the optional configuration (delegated to the CPNetwork data member).

5.2

Mapping Multimedia Objects to the Database

Figure 7 shows a simplified database schema for the Multi-media objects. Multimedia objects are stored in Oracle database as Large Binary Objects (BLOBs), Oracle data type that allow to store binary objects of size up to 4GB. The main table MULTIMEDIA OBJECTS TABLE contains the list of all supported multimedia types (audio, image etc.) together with reference to tables that contain multimedia objects themselves. For example, a record of the Image type contains reference to the IMAGE OBJECTS TABLE table which in turn contains all objects of type Image. This approach was adopted in order to allow addition of new data types as the system evolves and to make the format of each data type independent. 5.3

Implementing the Interaction Server

The system depicted in Figure 1 is implemented using modern Java technology such that each of the three modules can reside on a geographically separate site, and in particular the clients may reside anywhere on the network. The Interaction server is implemented using two efficient Java packages: RMI and JDBC. Remote interface package (RMI) serves to identify interfaces whose methods may be invoked from a non-local virtual machine, thus eliminating the need for code duplication and decreasing the code size. JDBC package provides remote interface from Java program to the database server, taking care of security, information retrieval and database modification in a way natural for Java programmer, not requiring any additional software for connection with the database server. Furthermore, the change of database server location, software version and security requirements will have only minor effect on system that uses JDBC (i.e. can be done on-the-fly). The ”chat” room is implemented by a large memory buffer which maintains the changes made on the changed objects. These changes

540

E. Gudes, C. Domshlak, and N. Orlov MULTIMEDIA_OBJECTS_TABLE ID FLD_NAME FLD_MIME FLD_ACCESSTYPE OBJECTTABLES DESCRIPTION

List of object tables

AUDIO_OBJECTS_TABLE

IMAGE_OBJECTS_TABLE

ID

FLD_QUALITY

FLD_FILENAME

FLD_TEXTS

FLD_SECTORS

FLD_CM

BLOB

FLD_DATA

BLOB

FLD_DATA

CMP_OBJECTS_TABLE ID

ID

FLD_FILENAME FLD_FILESIZE

BLOB

FLD_CURRENTPOSITION FLD_HEADER

BLOB

FLD_DATA

BLOB

Fig. 7. Schema for Multi-media objects

are propagated fast to all clients since the hierarchical structure of the object permits sending only the relevant parts of the object for redisplay by the client. The changed objects are saved and discarded from the room as soon as they are not needed by the clients. The interaction server is responsible for maintaining a connection to the Oracle database, retrieving, modifying and deleting objects from a database. Each client can request the server to show all objects stored in the database, display an additional information about the object, modify an object or add a new object (providing that the client has the appropriate permissions). The fetching and the actual storage of multimedia objects occurs at the server’s side. 5.4

Examples for System Operation

The following figures demonstrate some of the capabilities in the system using a medical multi-media database. Figure 8 shows a user entering into a shared ”room”. Figure 9 shows the presentation of the same CT image for two users in the same ”room” in two different resolutions based on their preferences network. Figure 10 shows a demonstration of speaker identification in our system. The two colored regions correspond to two voice segments of two different speakers.

6

Summary

This paper presented the architecture and the prototype implementation of a cooperative multi-media system. The innovation and strength of the system stems from its Preferences-based Presentation module, and the ease with which different multi-media types and processing can be incorporated and integrated into

Remote Conferencing with Multimedia Objects

541

Fig. 8. Example of ”room” interface

Fig. 9. Multi-resolution views

the system. Future work includes integration of additional voice and image processing algorithms, enhancement of the presentation module with an advanced authoring tool, and integrating broadcasting and dynamic event triggers into the system. We also plan to test the system on a real-life test-case of cooperating consultation on Ultra-sound images.

542

E. Gudes, C. Domshlak, and N. Orlov

Fig. 10. Speaker identification interface

Acknowledgments. We thank E. Shimony, A. Meisels, A. Cohen, A. Awerbuch for their helpful comments, and Y. Gleyzer for his contribution in implementing parts of the prototype.

References 1. Averbuch A. and Nir R. Still image compression using coded multiresolution tree. In unpublished manuscript, 1996. 2. A. Abu-Hanna and W. Jansweijer. Modeling domain knowledge using explicit conceptualization. IEEE Expert, 9(5):53–64, 1994. 3. A. Averbuch, G. Aharoni, R. Coifman, and M. Israeli. Local Cosine Transform - A method for the reduction of the blocking effect in jpeg. In Journal of Mathematical Imaging and Vision, Special Issue on Wavelets, volume 3, pages 7–38, 1993. 4. M. Bordegoni, G. Faconti, M. Maybury, T. Rist, S. Ruggieri, P. Trahanias, and M. Wilson. A Standard Reference Model for Intelligent Multimedia Presentation Systems. Computer Standards and Interfaces, 18(6-7):477–496, December 1998. Special Issue on Intelligent Multimedia Presentation Systems. 5. C. Boutilier, R. Brafman, C. Geib, and D. Poole. A Constraint-Based Approach to Preference Elicitation and Decision Making. In AAAI Spring Symposium on Qualitative Decision Theory, Stanford, 1997. 6. C. Boutilier, R. Brafman, H. Hoos, and D. Poole. Reasoning with Conditional Ceteris Paribus Preference Statements. In Proceedings of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence, pages 71–80. Morgan Kaufmann Publishers, 1999. 7. R. Brafman and C. Domshlak. CP-networks for Preference-based CSP. Proceedings of the Workshop on Soft Constraints (in CP-01), December 2001. 8. A. Cohen and V. Lapidus. Unsupervised, Text Independent, Speaker Classification. In Proc. of ICSPAT, Vol. 2, pp. 1745-1749, 1996.

Remote Conferencing with Multimedia Objects

543

9. A. Csinger, K. S. Booth, and D. Poole. AI Meets Authoring: User Models for Intelligent Multimedia. Artificial Intelligence Review, 8:447–468, 1995. Special Issue on User Modeling. 10. C. Domshlak and R. Brafman. CP-nets - Reasoning and Consistency Testing. In Eighth International Conference on Principles of Knowledge Representation and Reasoning, Toulouse, France, April 2002. 11. C. Domshlak, R. Brafman, and S. E. Shimony. Preference-based Configuration of Web Page Content. In Proceedings of Seventeenth International Joint Conference on Artificial Intelligence, pages 1451–1456, Seattle, August 2001. 12. C. Domshlak and S. E. Shimony. Predicting Likely Components in CP-net based Multimedia Systems. Technical Report CS-01-09, Dept. of Computer Science, BenGurion Univ., 2001. 13. J. Doyle and R.H. Thomason. Background to Qualitative Decision Theory. AI Magazine, 20(2):55–68, 1999. 14. Ronald Fagin. Fuzzy Queries in Multimedia Database Systems. In PODS, 1998. 15. T. Gaasterland. Cooperative Answering through Controlled Query Relaxation. IEEE Expert, 12(5):48–59, 1997. 16. Aya Soffer Hanan Samet. Image Database Systems and Techniques : A Symbolic Approach. In Morgan Kaufmann Publishers, 2002. 17. C. Karagiannidis, A. Koumpis, and C. Stephanidis. Adaption in IMMPS as a Decision Making Process. Computer Standards and Interfaces, 18(6-7), December 1998. Special Issue on Intelligent Multimedia Presentation Systems. 18. R. L. Keeney and H. Raiffa. Decision with Multiple Objectives: Preferences and Value Tradeoffs. Wiley, 1976. 19. Mark T. Maybury. Toward Cooperative Multimedia Interaction. In Multimodal Human-Computer Communication, 1995. 20. F. Meyer, A. Averbuch, and R. Coifman. Multi-layered Image Transcription: Application to a Universal Lossless Compression. In Wavelet Applications in Signal and Imaging Processing, VI, SPIE, 1998. 21. D. Pothen and B. Parmanto. XML Furthers CPR Goals. Journal of AHIMA, October 2000. 22. R.C. Rose. Keyword detection in conversational speech utterances using hidden Markov model based continuous speech recognition. In Comuter Speech and Language, Vol. 9, pp. 309-333, 1995. 23. S. Roth and W. Hefley. Intelligent Multimedia Presentation Systems: Research and Principles. In M. Maybury, editor, Intelligent Multimedia Interfaces, pages 13–58. AAAI Press, 1993. 24. V. S. Subrahmanian. Principles of Multimedia Database Systems. In Morgan Kaufmann Series in Data Management, Morgan Kaufmann Publishers, 1998. 25. Wolfgang Klas Susanne Boll. Zyx-a Multimedia Document Model for Reuse and Adaptation of Multimedia Content. In TKDE 13(3): 361-382, 2001. 26. Janis L. Huston Terry L. Huston. Is Telemedicine a Practical Reality? In CACM 43(6): 91-95, 2000.