Multimedia Databases: Integrated Storage and Retrieval ... - CiteSeerX

Multimedia Databases: Integrated Storage and Retrieval of Text, Images, Sound, and Video Klaus Meyer-Wegener Friedrich-Alexander-Universität Erlangen-Nürnberg Institut für Mathematische Maschinen und Datenverarbeitung (Informatik) Martensstr. 3 W-8520 Erlangen

Abstract New hardware offered at reasonable prices allows to develop improved user interfaces for computer systems: I/O devices can capture and present not only text and graphics, but also photos, sound, and video; storage devices (mainly optical disks) provide the capacity needed for those types of data. The effect is not only better user interfaces, but also more information held in the system. Applications that use multimedia thus have to handle I/O on one hand and storage and retrieval of images, sound, video, etc. on the other. Multimedia databases are being developed to reduce the burden for the programmer with respect to the second task. They provide powerful operations to store and retrieve the multimedia data objects while insulating the programs from the specifics of the new storage devices and the plethora of data formats. In this paper we describe the tasks of a multimedia database management system in more detail. An overview is given of the new types of data that have to be managed, i.e. text, graphics, image, sound, and video. This shows how the data objects look like, what their size and structure are, and how they are accessed from various applications. Then, the embedding of those data objects into a data model is discussed. While many researchers claim that a multimedia database must be object-oriented, a way of extending relational systems for multimedia is also shown. The paper concludes with an outlook on the important problem of search in large sets of multimedia data objects and with a discussion of architectures for multimedia database management systems.

1

Introduction

A broad range of computer systems including personal computers now offer input and output facilities that allow for multimedia capture and presentation of data. Scanners, audio and video adapters, and high-resolution monitors are no longer very expensive devices used only in special applications. The consequences for the design of computer-system applications are:

-1-

-

On input, the need to convert data to numbers or text diminishes. Instead, images (photos) and sound can be captured directly. They can later be converted to formatted representations (automatically, if possible) or can be kept in the system as they are.

-

On output, a much richer set of presentation alternatives is available. Data can be presented in the medium most suitable, e.g. tables can be converted to graphs, histograms, or pie charts; text can be read aloud and thus be transmitted over the telephone; instead of describing a house (that is for sale) by a long text, a photo can be shown. Ideally, the system will dynamically switch to the medium the user prefers. However, not all conversions from one medium to another can be done automatically, so that redundant storage of the same information represented in different media might be necessary.

Multimedia I/O devices are thus improving the user interface of computer systems. This has also been described as “increasing the information bandwidth” between the user and the system [Woel85a]. However, we would like to point out another aspect that follows from the use of those devices: There is also more information stored in the system and accessible by the user. It is simply not possible to convert all the information contained in an image or a sound recording into forms (tables) or text. Only if users can look at the image, or listen to the recording, respectively, they will grasp the complete information.This is the reason why multimedia is not just a matter of input and output, but multimedia data must also be kept in the system. Again, progress in the development of computer hardware has made it possible to do that. Multimedia data tend to be very large compared to the usual formatted (tabular) data. The advent of new storage devices, mainly optical disks, provides at low cost the capacity that is needed for multimedia data. The next question is how these multimedia data should be organized. I/O hardware typically comes with some device drivers that can write data to files in a very specific format and later read data from those files. Keeping track of all the files, locating the right file, and maybe converting it to another format for output on some other device is then the task of the user. While it is possible to build multimedia applications like document archives, computer-aided instruction, electronic publishing, and others on the basis of files, the development and maintenance effort will be prohibitive in general. Potential users like teachers and architects need system support and tools. One such tool could be a database management system (DBMS) that provides integrated storage of formatted data and multimedia data. DBMS have been very successful in managing formatted data; they offer data independence, centralized control, easy access to the data through high-level query languages, and systemenforced integrity. The task of designing and building a multimedia DBMS is to apply the same principles to the management of multimedia data. In this paper, we shall proceed step by step on the path that finally leads to such a multimedia DBMS. First, we shall take a closer look at the hardware devices that enable the use of multimedia in computer systems in chapter 2. Then in chapter 3 we shall shortly introduce some applications that can benefit from multimedia capabilities. That will give us an idea of how the devices can be used. To support the development of such

-2-

applications by a data management system, we shall investigate in detail the data structures produced by the I/O devices and accessed and manipulated by the applications. This will first be done medium by medium and then from the perspective of integrating data objects from different media into higher-level objects, e.g. documents. Section 4.4 will discuss ways of integrating abstract views of those multimedia data objects into data models. Section 4.5 identifies the important problem of search for multimedia data which must be solved by a multimedia DBMS in a way that meets the user´s needs. Finally, section 4.6 discusses some architectures of multimedia DBMS that show how such a system could be built: either by adding on to exisiting systems, or by starting from scratch.

2

Hardware

Hardware is the driving force in the development of multimedia information systems. It offers capabilities that we are only beginning to exploit in the development of applications. In this chapter, an overview of some types of devices will be given. Emphasis is on the way a programmer has to operate them, the data they produce, and the data they need as input. We shall not go into the technical details; instead, we shall try to give an abstract view of what the devices can do and how they can be used.

2.1

I/O Devices

Input devices such as keyboard and mouse are well established by now and need not be explained. A keyboard is used to produce text, while a mouse yields graphical information (2D coordinates). Other pointing devices like trackball or digitizer tablet can be regarded as variants of the mouse. The first input device that offers additional capabilities with respect to multimedia is the scanner. A scanner photographs a flat, twodimensional surface (usually paper) like a copier and produces a raster image file, i.e. a matrix of pixels. It does so by “scanning” the original in lines according to an adjustable resolution. A typical value is 300 lines (and pixels) per inch. Higher resolution yields better images at the expense of storage capacity. There are color scanners that digitize the original in three scans, exposing it to a red, green, and blue lamp, respectively [Thom89a]. The other input device to produce raster images is the video camera. It generates a continuous analog signal for recording by a VCR. This signal can also be fed into a video capture board called a frame grabber that picks a single frame (an image) from the continuous stream and turns it into a raster image file. Hence, the combination of video camera and frame grabber allows to take pictures not only of flat surfaces (as does the scanner), but of anything that can be photographed by a camera. To record sound on a computer, a microphone, an amplifier, an A/D converter, and some filtering and compression hardware are needed. The latter often is a Digital Signal Processor (DSP), a programmable realtime processor that accepts a sequence of numbers as input and produces another sequence of numbers as output [Mind89a]. The quality of the recording is determined by the sampling rate and the resolution. The

-3-

sampling rate is the frequency of measuring the sound volume. As stated by the Nyquist theorem, it must be at least twice as much as the highest sound frequency to be captured.Thus telephone quality (3000 Hz) requires 6000 measurements per second while Hifi quality (22,000 Hz) needs 44,000. Higher quality of recording increases the storage demand significantly. The resolution gives the number of volume levels to be distinguished and thus determines the number of bits occupied by a single measurement value. With 8 bits, 256 levels can be distinguished. Hifi quality (e.g. the Audio-CD) usually requires a resolution of 32,000 and needs 16 bits per value. Since storage requirements of sound are very high, compression is mandatory. It has to be done in real-time during the recording, and the DSP can be programmed to do that. Many different compression techniques have been developed and tested for sound recordings. For an overview see [Lee83a]. To capture video, we do not need another input device. We only have to combine some of the ones already mentioned, i.e. the video camera and the microphone. Digitizing the continuous video signal in real-time so that all frames are captured (not just a single one as with the frame grabber) still goes beyond the capabilities of today´s computer hardware. Also, the volume of data generated can hardly be held on storage devices, not even on optical disks. As a consequence, VCRs are used to store the video data. They can be controlled by the computer (see next section). In complement with the input devices, there are also new output devices that can display or reproduce the captured data. The standard line printer is tailored to text data. A laser printer can offer a sophisticated interface (e.g. PostScript) which accepts readable ASCII text with a specific syntax. It nevertheless remains the device that fits best for raster images, in particular for black and white (bitmap) images. The same is true for the range of high-resolution raster displays. Graphics can be displayed direcly on a plotter or on a vector display. A sound recording must be decompressed and filtered (again by a DSP), before it can be sent through a D/A converter and an amplifier to a speaker or headphone. Needless to say that sampling rate, resolution, and compression technique must be obeyed to correctly reproduce the sound. Finally, a video recording can be shown on a monitor while the sound track is played through a speaker. Video boards are available that allow to display a video signal on a computer screen so that no separate monitor is needed. Some of them occupy a portion of the screen only, that can be moved around and scaled like a window.

-4-

keyboard

line printer

text

character display mouse, trackball

graphics plotter

laser printer

image

scanner frame grabber video camera

microphone Figure 1:

audio board

raster display video

video board

sound

audio board

speaker

Relationship of Input/Output Devices and Media Data

Figure 1 gives an overview of the input devices and the types of data they produce. It also shows the output devices that can be fed with those data. Please note that no conversions are shown to keep the figure simple. Of course, optical character recognition (OCR) can be used to convert an image read from a scanner into text (supposed that a text page has been scanned). Also, graphics can be shown on a raster display; conversion to image is often done by the hardware without the user even noticing.A “multimedia workstation” will in fact house all or at least most of the I/O devices shown in figure 1.

2.2

Storage Devices

Multimedia data objects, as they are created with the help of the input devices introduced in the previous section, are very large, even if compression methods are used. For instance, a single image and a single sound recording of just a few minutes easily occupy a megabyte of storage each, not to speak of video. Although magnetic disks have been improved to offer more than 1 gigabyte of memory, they do not suffice to hold large collections of images or sound recordings. Also, they are usually not removable at such a capacity. Optical

-5-

disks provide an alternative. They offer the same capacity (sometimes even more) at much lower cost, and they are removable. However, some of them have different characteristics compared to magnetic disks which restrict their use to certain applications. We shall discuss the three main classes of optical disks, i.e. CD-ROM, WORM, and Erasable. CD-ROM The “Compact Disk - Read-Only Memory” has been marketed since 1984. It has been derived from the Audio-CD. Both have the same size and format, so that the CD-ROM can benefit from the availability of the mass production facilities established for the Audio-CD. However, mass production usually causes some defects in the individual products, and while they hardly matter on the Audio-CD (nobody can “hear” a single wrong byte among the 44,000 that produce a second of sound), they certainly can have bad effects in machineprocessable data, e.g. in executable computer programs. As a remedy, sophisticated error-correcting codes are stored with the data. They take up 288 byte for a block of 2048 bytes. Despite this “waste,” a single disk can hold 540 megabytes (not counting the error codes) [Laub86a, p. 164]. Another heritage of the Audio-CD is that the sequential read is the favorite access method. Random access to a specific block is possible, but takes 100 milliseconds on the average and one second at maximum, that is, it is much slower than on the magnetic disk. The main point about the CD-ROM is the fact that it is read-only. A CD-ROM cannot be written like a magnetic disk, it must be manufactured. That is, the user produces a magnetic tape with all the data to be included in the CD-ROM, and the manufacturer presses the requested number of copies. This is much like having a book printed (or a record pressed) by a publishing company. Nevertheless, due to its large capacity and its robustness, the CD-ROM can be used with great success for certain kinds of data: -

Reference books and dictionaries

-

Catalogues (products, parts, medicine, ... )

-

Economical data and statistics

-

Manuals (Auto repair, airplanes, ships, ... )

-

Law code and comments

-

Self-assessment material and courses

-

Standard and public domain software.

Whenever there is a need to disseminate large volumes of information in large quantities with a need for update only in terms of weeks or months, the CD-ROM is certainly a strong candidate. WORM The “Write Once, Read Many times” optical disk differs from the CD-ROM in that it can be written by the users themselves, but only once. A powerful writing laser burns holes (“pits”) into the surface of the disk, and these holes can later be read by a weaker reading laser. Once the holes have been burned, they cannot be

-6-

erased. Because of this property, the WORM is regarded as “electronic paper” and can be used for documents that must have legal quality. There are several applications where users write to WORMs only data they will read themselves if necessary, but that will not be passed to others like a CD-ROM. Archiving, history logging, and backup are examples for that. As a consequence, having a single common format is not as important as in the case of the CD-ROM. Several different disk sizes and formats exist side by side. For the disk size of 5.25 inch alone there are 24 manufacturers with 6 different disk formats and 8 different recording formats [Kamp90a]. Random access is slightly better than CD-ROM, since the reading lasers of the drives are often equipped with adjustable mirrors. These mirrors allow to read neighboring tracks without mechanically moving the head. The area that can be covered by the mirrors is called a scan (typically 100 tracks). Because of this, average access time for a single block comes near the time needed on a magnetic disk. As already mentioned, typical applications suited for the WORM disk are different from those of the CDROM. Archiving, backup, and logging are the most prominent. Another advantage of the WORM is also in favor of these applications: it has a lifetime of more than 10 years. In contrast, magnetic tapes must be refreshed after 2 or 3 years. In summary, we regard WORM disks as strong competitor for the magnetic tape in providing long-term archival storage, not so much for the magnetic disk in providing a stable, but active workspace. Erasable Optical Disk In 1988, the erasable optical disk became available. Different techniques can be used to allow for repeated overwrite of the data; for the details see [Burk89a]. Storage density is not as high as with the WORM disk, but still much higher than with magnetic tape. Some of the early systems were restricted in use by an upper limit for the number of writes. Although this limit was around one million, it still prevented the disks from being used as a dynamic workspace, e.g. as a paging device. We assume that technological progress has overcome this limitation by now. Erasable optical disks are only beginning to be used in standard computer systems. One example is the NeXT computer that comes with an Erasable [Thom88a]. Right now, the Erasables are rather used as a large storage medium next to and along with the magnetic disks. Whether they will replace the magnetic disks is still an open question. It should be considered that magnetic disks have been improved and optimized almost to the limits over the past decades, while optical disks are just at the beginning of their development. Video Cassette Recorder (VCR) Finally, we have to talk about a storage device that does not belong to the class of optical disks (it uses magnetic recording), but that is simply a necessity when aiming at “full” multimedia. Video recordings are still to voluminous, even for optical disks and even if compression is used. (This might change in the near future, since many reserachers are working on it; see CD-I and DVI [Brew87a, Ripl89a]). Fortunately, VCRs are available as a storage device developed especially for video. Professional machines use a digital recording

-7-

format that allows for sophisticated manipulation [Mack89a] and can be controlled by a computer. Commands to wind, rewind, play, and show a still image can be issued by programs. VCR output can be directed to a separate monitor, a video board that generates a display on the computer screen, or a frame grabber (see previous section). VCR input can be taken from camera or from the computer (frame by frame when producing an animation). The main deficiency of a VCR is slow access to a specific frame. Certainly there are other storage devices than the ones we mentioned. It is not our intention to go into all the details, since we just want to pave the ground for a discussion of multimedia data management. The reader should now have an impression of the variety of the hardware devices used and their capabilities.

3

Applications

In this chapter we shall look into ways how the hardware can actually be used to implement various types of applications. We shall give an overview of multimedia applications and try to classify them with respect to data handling. Two critera can be used: First, the whole set of (multimedia) data held in the system can be either static or dynamic. Static means that the data are primarily read, while dynamic means that they are repeatedly written or modified. Of course a static data set is also updated, but its purpose is rather to be available for read access. Second, the set of data can be either passive or active. Passive data sets simply wait for user commands that read or update them, while active data may trigger the display or the modification of some (other) data. The criteria will become clearer as we discuss examples of applications. By combining them, we get four classes of applications:

3.1

-

Archiving (static / passive);

-

Teaching, advertising, and entertainment (static / active);

-

Design, authoring, and publishing (dynamic / passive);

-

Supervision (dynamic / active).

Archiving

Archiving gets a new dimension when extended to multimedia data. In office automation, there is a strong need to archive documents; quite a few experimental system have been built for this purpose (e.g. MINOS [Chri86a, Chri86b, Chri86c]). Newspaper archives are continuing to be converted from paper to electronic storage. Hospitals consider storing X-ray photographs as well as tomograms and szintigrams along with the formatted patient data. Also, public libraries could undergo a transition to computer-supported reading: Electronic books have been proposed and developed for some time [Yank85a, Weye85a]. The most important concepts for these systems are information retrieval and hypertext. While information retrieval supports direct access to a subset of the stored data (documents) with the help of descriptors [Salt83a], hypertext supports navigational access from one document to related documents. Both concepts

-8-

have been developed for text only, but can easily be extended to other media. In case of hypertext, this leads to the notion of hypermedia. It should be noted that archiving is not restricted to documents, although they certainly are of prime importance. Formatted data (files and records) can also be archived, but are usually not considered documents. A useful distinction assumes that documents are primarily for humans to read, while “non-documents,” i.e. formatted data, are for machine processing. The main operation to be supported by an archival system is search. Computer archives can easily outperform paper archives on that, since following a link or a reference as well as locating all the documents with a certain descriptor can be done much faster. In addition to that, new search techniques could be developed to exploit computer capabilities even further.

3.2

Teaching, Advertising, and Entertainment

In this large field of applications, the system plays a more active role in presenting information, while the users remain passive as readers, spectators, or listeners. Of course, they can influence and control the presentation, i.e. interrupt it, skip portions, repeat others, and so on. However, a moment of surprise is not only accepted, it is essential for entertainment. In addition to showing text and pictures and playing a sound recording, the presentation can include animation and video sequences (shots). Architects can give their clients a better impression of the house they are going to build if they can show a three-dimensional model from arbitrary viewpoints. They could even guide the clients on a “video tour” through the house [Phil88a]. The same applies to real estate agents who could add photos of the house they try to sell. This is done with video tapes already. Another example is “surrogate travelling” [Ripl89a]. Based on a set of video shots recorded in digitized form, an interesting building, town, or landscape is modelled in the system. The users may “walk” through the region, looking left or right as they please, and decide on their own what to visit when. The video shots are selected according to position and view of the users. In case there are no appropriate shots, the scene is “interpolated” from existing shots, e.g. from fisheye photographs. This is an example of what is often called “artificial reality” [Woel85a]. Along the same lines, the system could simulate some kind of virtual office with mailbox, telephone, file drawer etc. that can be used in the same way as the real facilities. Similar approaches can be imagined for process control and robotics in factories and power plants.

3.3

Design, Authoring, and Publishing

The multimedia data sets contained in an archive or used for teaching and entertainment must be created before they can be used. Right now, detailed systems knowledge is required to do so. This must be changed if we want architects, authors, teachers, and designers to produce the material themselves without the help of a programmer.

-9-

Appropriate tools should be provided to help organize work, particularly for publishing purposes, i.e. preparing reports and articles. Text segments, pictures, graphics, sketches, and notes or dictation on audio tape are to be linked in an arbitrary, multi-dimensional manner. Yet the organization is highly personal, not suited to be understood by someone else before the final editing is done. The document that results from this process may retain some of the links and references built up during preparation, but usually many of them are discarded to focus the paper on the important issues. Some hypotheses or links may have proven to be wrong or misleading. This is why the set of data used in these applications is termed “dynamic.” It is also “passive” in that it waits for the author´s commands and does not generate output by itself as do the documents in teaching and entertainment. In fact, preparing a document for publishing is just a special case of design, so most of what has been said holds for computer-aided design (CAD) as well. The central tool for all the applications of this class is an editor. In addition to the well-known text editors, there is also a need for graphics editors, pixel editors, and sound editors. Also, some of the hypertext systems have been designed to support authoring rather than archiving, e.g. NoteCards [Hala88a]. Intermedia shows how editors can be integrated with hypertext links [Yank88a].

3.4

Supervision

The fourth and final class of applications deals with collecting data from various sensors, including cameras, microphones, radio receivers, radar, etc. Examples are traffic control, weather and climate analysis, prisons, reconaissance, and process control in industrial plants. A continuous stream of data from different media is received and must be recorded, analysed, and cross-referenced. The data are certainly dynamic, and they are also “active” in that they should alert the user in case of critical or abnormal situations (e.g. geo-scientific sensors and earthquake warning). Systems like that are operating already, but they have separate subsystems for different media, e.g. a processcontrol computer for the numerical sensor data and a camera video system. Usually both systems are not connected with each other. Whenever the sensors signal an abnormal situation, the monitor has to be switched to the right camera manually. It would be helpful to integrate archive data with the highly dynamic data, i.e. technical manuals, the history of repair for machines and parts, phone numbers of service, etc. This opens a new perspective on performance improvement: Whenever the sensors detect something unusual at a machine, the system could start gathering all the data on that particular machine (prefetch). Thus they can be displayed instantaneously when the operator asks for them. An important problem in many of these applications is the so-called “data fusion.” This denotes the task of realizing whether two different input data, e.g. a photo and a radar signal, refer to the same object or to two different objects. A related question is whether two recordings from the same medium at two different points

- 10 -

in time refer to the same object or not. No pat solution has been found yet; many approaches use heuristics and expert systems [Wils85a]. This concludes our short overview of application classes that could benefit (and often do so already) from multimedia facilities offered by the devices presented in chapter 2. Figure 2 shows the matrix of the four classes together with some transitions. Frequently, data will be moved from one class to another. Design may borrow from the material already published and available in the archives, and it will deliver new documents, articles, and material to be included in the archives. Supervision data can also be used for presentation purposes. They can be archived, and they can also be used in instructional material. Teaching will access the archives to select courses and textbooks, and it may enter private or public annotations as well as test results into the archive.

passive:

active: Visualization

dynamic:

Design

Supervision

Building Blocks

Release

Training Selection

static:

Archiving

Annotations

Teaching

Test Results Figure 2:

4

Multimedia Application Classes and Data Transfers Among Them

Multimedia Data Management

We have seen that many applications could benefit from the use of multimedia facilities. To build such applications still requires much effort. Hence, powerful tools are needed. A DBMS is one such tool. In this chapter, we shall identify the basic services that a DBMS has to provide to support the applications.

4.1

Task

If they exist at all, the applications today are based on highly specialized solutions for data storage and organization. Often several system platforms are used, one for image processing, another one for voice recording, yet another for signal processing. Even if the same object is represented in each of the systems, there is no system-maintained link among the data describing it. For instance, ships can be represented in a

- 11 -

picture database, in a database of structured (formatted) data giving length, manufacturer, year built, capacity, etc., and in an audio database that holds the sound pattern of the ship´s engine. This necessarily leads to a significant amount of redundancy, because some of the structured data required to identify the object or to process the images and sounds are repeated in each of the systems. In addition to that, the same piece of information, e.g. an image, may be stored several times in the different formats needed to display it on different output devices. In contrast to this, it is the general philosophy of database systems to manage all the data shared by a set of applications and to provide each single application with the specific view of the data it needs. This avoids redundancy and makes it much easier to maintain consistency among all the data that belong to an object. In addition, new applications can be built that make use of the cross-referencing provided by a system that holds all the information. Database systems further provide mechanisms to handle multiuser operation, to preserve consistency, and to recover after various kinds of failures. Extending these capabilities to the management of multimedia data leads to the concept of a multimedia database management system. The task of a DBMS is storage and retrieval, not processing [Masu87a]. Some proposals for multimedia DBMS suggest that powerful editing facilities for the different media data and management of the I/O devices should also be included. Object-oriented systems allow to attach application algorithms to the data. This tends to collect more and more functionality under the control of the DBMS and hence leads to a very large and hard-to-manage software system. In favor of a more modular approach, we would like to concentrate on the traditional task of DBMS, namely storage and retrieval. Instead of integrating the applications (or significant parts of them), we would like to provide them with the appropriate data management support. Media-specific editors should be able to access (load) not only files, but media data in the database as well. Even when restricting the task of multimedia DBMS to storage and retrieval (of multimedia data and formatted data), a number of properties and subtasks immediately follow that require sound technical solutions. They are: -

Device independence

-

Format independence

-

Relationships among data objects

-

Search (content addressability).

As we have seen in chapter 2, the storage of multimedia data objects requires the use of new storage devices (optical disks, VCRs) that have characteristics different from the standard magnetic disks. If the applications have to take these characteristics into account, they will be tied to the device used for a specific object. Moving the data object to another device or replacing the device with a more advanced one will cause program modifications. The lesson learned from the standard DBMS for formatted data is that device independence must be provided to the applications, so that reorganization of device allocation will leave the programs as they are.

- 12 -

Multimedia data objects also come in a rich variety of data formats (data structures). For instance, images can be coded in GIF, TIFF, PBM, FBM, Pixmap, and many other formats. Which format is most appropriate depends on the application environment and the I/O devices at hand. Since a multimedia DBMS is supposed to support many applications, it will have to host many different formats, too. In particular, a certain media object, e.g. an image, can be shared by different applications that would like to see it in their respective formats. It is the task of the multimedia DBMS to provide format independence to the applications, i.e. to supply each the format it needs while hiding the internal storage format actually used. So if 80 % of the applications need an image in GIF, the internal format could be GIF, too, and a conversion is performed for the remaining 20 %. However, if the application profile changes and now 80 % need the same image in PBM format, the internal format can be switched to PBM without any application program being affected. Format independence also covers the choice to use two different internal formats for the same image if the applications are split fifty-to-fifty. Since this is completely under control of the DBMS, consistent update of the two copies is guaranteed. Single-medium data objects are usually not on their own, but are related with other media objects and with formatted data. There are at least three types of relationships that a multimedia DBMS should be able to represent and to use for retrieval: -

Object-attribute relationship: Objects (or entities) like ships, cars, persons, etc. can also be described by photos, graphics, and sound recordings. Thus a media object shows a property of the object represented in the system.

-

Aggregate-component relationship: Multimedia data objects, mostly documents, are composed of a number of single-medium data objects. The components are not just attributes, but entities in their own right. There may be additional relationships among the components, of which the synchronization (e.g. show text and graphic together) is the most important [Stei89a].

-

Equivalence: The same information can often be represented in different media, e.g. as a table (formatted) or as a graph. Automatic translation is possible in some cases, but in others it is not (sound to text). Hence, it can be necessary to store both representations to be able to cope with different workstation configurations or user preferences. In that case, a link should be maintained between the two to allow for system control of the equivalence. This means the DBMS can switch to the alternative representation when required, and it can warn the users when just one of the equivalent objects is updated.

Finally, search for media data objects is the task of the DBMS. It cannot be done efficiently by the applications themselves. If the media objects to be retrieved can be identified with the help of associated formatted data, standard DBMS techniques can be used. If the media objects themselves have to be inspected, pattern matching can be performed. Data structures and algorithms have been developed to increase its efficiency significantly (e.g. signature files for text [Falo85a] and iconic indexing for images [Chan87a]).

- 13 -

Neither of the two covers what is usually called content addressability of media data objects: Hardly everything represented in an image will also be coded in formatted data (“storm”, “foggy night”), and it can be very hard to find the right pattern to retrieve only relevant images. There are no pat solutions at hand for this; we shall discuss some preliminary results in section 4.5. Device independence, format independence, representation of relationships, and content search must all be reflected in the data model. The modelling concepts must be complemented with a set of operations that store, manipulate, and retrieve the data in a consistent way. Finding a basic and wisely restricted set of DBMS functions that support a variety of application programs as well as interactive users seems to be the most important design issue for a multimedia database system.

4.2

Media Data

In this section, we want to describe more precisely what kind of data are to be stored and retrieved by a multimedia DBMS in addition to the formatted data. We shall use the term “multimedia data” to refer to the whole collection of them, and “media data” to indicate that we are talking about just one medium. Multimedia data have been introduced as text, graphics, images, voice, sound, and signal. They all have in common that a single “value” or object of that type tends to be large, i.e. in the range of 100 K to 10 M bytes. They are often referred to as being unformatted, which means they consist of a large and varying number of small items, e.g. characters, pixels, lines (vectors), or volume levels (numbers). They all carry a more complex structure which varies strongly from value to value and is often not known when the object is stored. Detecting it requires some level of understanding and recognition. Although this can only be done to a very limited extent, multimedia data is still much more than just unformatted data, as we shall see when we take a closer look at each of the media data types and their specific properties.

4.2.1 Text There is a long tradition of storing text in computer systems. Word processing is used to produce the texts, mainly for paper output. Texts are also kept in the system, sometimes only to serve as a template for future text production, but in a growing portion to be accessed and read in the system without a hardcopy on paper. Usually, texts are managed in files. Collections of texts may be structured using features of the underlying file systems, i.e. directories or folders. A text consists of a sequence of printable characters. This includes some formatting characters like “tabulator” and “line feed” that are also considered to be printable. To properly handle a text, we must know quite a few things that are mostly determined by the environment so that we are hardly aware of them. However, in the more general setting of a multimedia DBMS that serves many different environments, they must be made explicit. This even includes the number of bits used for a single character - although we can hardly think of values other than 8, people in Japan and China easily can. Next, the character code must be known; it could be EBCDIC or ASCII, for instance. The list of these so-

- 14 -

called registration data does not end with the special characters used to separate lines or to terminate the whole text; the definition of a neutral data structure for storing text in the definition of standards for document interchange, e.g. ODIF [Hora85a], must encompass much more. Typical operations on text are: -

get length,

-

get substring (position, length)

-

search for pattern

-

append other text

-

replace substring.

Many other operations can be imagined. However, some of them are restricted to specific types of texts, e.g. to natural-language text (spell checker) or to source code (syntax checker). Searching in a large set of texts is covered by the scientific discipline of information retrieval [Lanc73a, Shar64a] which is also called information science. The abstract or the full text of a document is augmented with keywords, also known as descriptors. This can be done manually or semi-automatically, using a given set of descriptors (listed in the thesaurus). Automatic indexing assigns keywords to a text and uses special forms of text understanding to do so. However, many problems remain to be solved, before the systems can beat a human reader. Sometimes the most important keyword is not mentioned in the text, e.g. due to a terminological rivalry among authors. Sometimes a word from the thesaurus appears in the text, but should not be selected as a descriptor: “The issue of ... is not covered here due to space limitations.” In competition with information retrieval, hypertext took a different approach to locate text objects [Conk87a]. All texts are decomposed into fragments, i.e. chapters, sections, or even single paragraphs, that can be linked in arbitrary ways. Indexes are special kinds of text fragments with links to all documents that treat a certain subject. The reader navigates through the “hyper-document” by following the links, be it within a document from one fragment to the next, be it across documents to follow references (“see also”). Hypertext enjoys a growing popularity and must be considered as a convenient way (among others) to access texts.

4.2.2 Graphics The term “graphics” (or drawing) is used to denote pictures defined as a set of geometrical objects, i.e. lines, circles, curves, and (filled) areas. They are also called “vector images” to distinguish them from raster images Please note that the elements form a set, and not a sequence as in the case of text. The basic items of graphics are rather complex compared to the characters in text and the pixels in a raster image. They must carry coordinates that position them on a two-dimensional drawing board. Fig. 3 shows a simple example. Graphics here is mainly a means of presenting information. It can be associated with a three-dimensional model of some object in that it shows a particular projection of the model. Manipulating the model, e.g. in a CAD system, invalidates the projection. However, once the design is finished, this will be a rare event, while

- 15 -

output of the drawing could still be done frequently. Storing the projection as graphics instead of generating it over and over again could offer significant savings.

starting point ending point line width 0

30

20

30

2

9

8

10

10

1

20

10

20

30

2

0

10

0

30

2

10

0

10

10

1

11

8

10

10

1

20

10

0

10

2

a) Set of Line Definitions Figure 3:

(0, 0)

(20, 0)

(0, 30)

(20, 30)

b) Drawing Produced

Data Structures Representing Vector Graphics

Graphical editors are the software packages used to generate and manipulate drawings. They offer a large set of operations, and they store their data in specific formats. To move graphics from one system to another, a converter has to be employed. The definition of exchange formats reduces the number of converters. Examples are GKS Metafile and Computer Graphics Metafile (CGM) [Bono85a]. It has been stated several times that the standard data models do not suit well the represention of twodimensional objects. Instead new data models have been proposed that allow for new types and operations to manage so-called complex objects [Hask82a]. Several prototype database systems are currently under development to support extensibility. Sooner or later multimedia systems will borrow from their concepts.

4.2.3 Raster Images As we mentioned earlier, images originate from scanners, cameras, or video recordings and can be stored in the video signal format or in the bitmap (raster) format [Woel85a]. The video signal format is used on a video tape, which is relatively slow in access, or on an optical video disk. The latter can hold up to 54,000 image frames with an access time of 1-2 seconds for a single frame. Images in the bitmap or raster format can be compressed at least an order of magnitude. (More compression can be achieved by detecting lines and segments, that is, by transferring the image to graphics. Unfortunately, this always incurs a loss of information.) Even then an 8.5" by 11" page will require more than 4 M bytes, and a color display up to 48" by 80" needs between 2 M and 40 M bits [Woel85a].

- 16 -

In the raster format the image is represented as a matrix of pixels (picture elements). Each pixel may occupy just one bit to indicate black or white, but it might as well need several bits to code color and greyness. For instance, the RGB encoding uses real numbers between 0 and 1 to quantify the intensity of the three colors red, green, and blue. Alternatively, the IHS system and the YIQ system can be used [Ball82a]. Formulae are available to calculate one encoding from the other. Different image-displaying devices (monitors) need different encodings for the pixels. Most formats include a colormap to avoid the repeated definition of colors (in terms of red, green, and blue portions) in the pixels. The colormap holds the definition of all the colors used in the image, and the pixels simply refers to a colormap entry by giving the appropriate index. This not only allows to save storage, but also supports some image processing (improve the contrast by modifying the colors) and allows primitive animations. See [Shou79a] for details. Concepts for managing images in databases have been developed since 1974 [Kuni74a]. Between 1977 and 1981, several conferences and special issues of journals have been dedicated to the subject of “pictorial information systems.” Quite a few prototype systems have been built, but they tended to be too narrow in range (e.g. tailored to satellite photos). Afterwards, the activity calmed down. Overview articles summarized what had been achieved [Lee84a, Tamu84a]. Since the database community was already heading towards “non-standard” and “extensible” DBMS, the expectations were that these new systems could handle images as well and thus provide the full functionality of the former stand-alone picture databases in a more general setting.

4.2.4 Sound Voice recording seems to be a much more convenient way of data input, especially for people like doctors or managers who are known to be unwilling to type. As we have seen in chapter 2, the technical equipment is available to record and play sound on a computer. There are different types of speech coding [Woel85a]: source modelling (VOCODER), parametric methods, and waveform coding methods. Data encoding rates from 2400 bits per second of speech for Linear Predictive Coding to 64,000 bits per second of speech for pulse code modulation [Lee83a]. This sums up to 18,000 - 480,000 bytes for a minute voice note. Of course, the compression method used must be known to properly replay the recording. Sound is the first medium we discuss that has a real-time constraint. When recording sound, writing the data to disk must be fast enough not to loose information. Accordingly, reading the data from disk when replaying the recording should not create pauses in the sound produced. Fortunately, even PCs can be fast enough to do that, if quality of the recording need not be too high and if only few software layers are to be crossed. The operations on a piece of sound recording best simulate a tape recorder with buttons for play, wind, and rewind plus a position indicator (track, minutes played). Also, one would like to label “pieces of tape” arbitrarily, stick them to pictures and documents, and find them fast. Without knowledge of the internal

- 17 -

structure, access to parts of a tape is only possible by referring to time, e.g. go back 10 minutes. A low-level structuring aid can detect pauses that separate portions of the recordings [Terr88a]. As with some dictation machines, the user can also indicate structure explixitly by pushing a button to generate acoustic marks [Chri86a].

4.2.5 Video Video is special because it combines media data that have been introduced already, namely raster image and sound. Among the components, a strict synchronization must be represented and obeyed. Data volume is much higher than in the other media: Assuming 25 pictures a second at 25 K bytes each, a single second of video occupies 6.25 M bytes of memory! This does not count the sound, which at the usual sampling rate and resolution would consume only some K bytes and thus would not change the picture, anyway. Beacuse of this, VCRs must be used to store video recordings. As do the other types of media objects, video comes with a variety of recording formats, e.g. Beta-com, D-2, MII, 1-Inch-B, and U-Matic. There are techinques under development to compress video recordings so that they may fit on an optical disk. CD-I and DVI are the most prominent ones [Brew87a, Ripl89a]. Operations on video are much like those on sound: play, wind, cut, and paste. Additionally, one would often like to extract a single frame and store it as a raster image. Replacing the sound track could be another useful operation. Recording and playing must be done under real-time constraints, and in this case most computers today are not fast enough. The best they can do is switch a connection from a camera to the VCR, or from the VCR to a monitor, respectively, and then stay out of the data transfer.

4.2.6 Summary Having looked into the structure of the different media objects and the operations used to generate and manipulate them, we must conlude that they are not just unformatted data. They certainly comprise a set or sequence of elements. We shall call that the raw data. However, each type of media object needs additional (formatted) data to allow for proper interpretation of the raw data. If height and width of a raster image are not known, it is impossible to generate a screen output from the string of pixels that forms the raw data. We have already used the name “registration data” to refer to these data that must accompany the raw data. Finally, the media objects may also come with description data. These data are not mandatory as are the registration data, but they can be very helpful in working with the media objects. They can be formatted or unformatted, and their task is to represent in another medium some information or structure contained in the media object. Description data can hold the results obtained from a lengthy analysis, and they can be used to enable sophisticated search over large sets of media objects. We shall return to this in section 4.5.

- 18 -

4.3

Multimedia Data

The next step now is to combine single-media data objects into multimedia data objects. To do so, we shall take a look at some experimental systems that manage multimedia data. They do not use a database system for their data, but the structures they maintain define the requirements that a multimedia DBMS must fulfill, too. In the end, it should be possible to rebuild these systems on top of a multimedia DBMS with the same external functionality and the additional option to have other applications access the data.

4.3.1 Documents The most important class of multimedia data objects are certainly documents. Proposals for multimedia DBMS have often been generalized from office information systems. MINOS is the best example of that. Also, hypertext systems have been enhanced to allow media data objects other than text as nodes in the network of fragments and links. This leads to the notion of “hypermedia.” Remember that documents are intended to be read by humans, rather than to be processed by machines. This means that there may well be other types of multimedia objects that are not considered documents. MINOS has been developed at the Universities of Toronto and Waterloo [Chri86a, Chri86b, Chri86c]. It manages highly structured multimedia objects that consist of attributes, a text part, an image part, and a voice part. Objects are either in an editing state where they can be modified, or in an archived state where they are amenable for presentation and browsing only. Sophisticated browsing features let the user follow the object´s structure, stepping through visual pages and audio pages as well as sections and paragraphs. Logical messages (visual or audio) can be attached to text, image, or audio segments, so that they are shown or played along with them. An important design goal has been to regard voice information as equivalent to text (“symmetric approach”). Voice allows for faster input of information especially by non-typing users like managers and doctors. It also makes possible the access to the database through the telephone. Multimedia objects can be linked to each other. Further document components can be transparencies, tours (sequences of images played automatically), and process simulations (animated images). The emphasis is on the user interface. Storage management is kept as simple as possible, i.e. a file system is used, and the data schema is fixed so that it consists of a predefined set of elements with associated operations. As updates are only possible after the complete “checkout” of the whole multimedia object into the editing mode, synchronization of updates is fairly simple, but adding a small annotation to an object requires significant overhead. An overview on the numerous projects developing hypermedia systems is given in [Conk87a]. One of the most experienced systems seems to be Intermedia developed by the Institute for Research in Information and Scholarship (IRIS) at Brown University [Yank88a]. It is based on twenty years of research and three prior generations of hypertext systems. It currently consists of:

- 19 -

-

a text processor,

-

a graphics editor,

-

a timeline editor,

-

a viewer for sections of 3D objects,

-

a scanned-image viewer.

A video editor and a 2D animation editor are under development. Intermedia is clearly aimed at teaching [Conk87a]: It is meant to be a tool for professors to organize and present their lesson material. In also serves as an interactive medium for the students to study the materials and add their own annotations and reports. There are many other hypermedia systems that we cannot describe in detail due to space limitations: KMS [Aksc88a], NoteCards [Hala88a], HyperCard [Will87a], Neptune [Deli86a], CONCORDE [Hofm91a], and Hyperties [Shne87a], to name just a few. The interested reader is referred to the literature.

4.3.2 Other Unfortunately, we cannot present a prototype system that does not use the paradigm of a document when offering a view of its data to the users. However, we are convinced that there is a chance to employ such a system, for instance in supervision applications. Whenever the incoming multimedia data are primarily fed into automatic evaluation and analysis systems, they need not be regarded as documents. Of course, this does not preclude to present them in form of a document whenever the users would like to see them, but this will be the case only for a small portion of the data actually stored (e.g. process data of the time when an accident occurred). Consequently, the data model of a multimedia DBMS must not be tied to a document model, while it must be general enough to support any document model of either office information systems or hypermedia systems.

4.4

Data Model

The role of a data model is to provide a collection of (abstract) object types that allow to describe and organize data. It must also include a complete set of operations on objects of those types. In case of the relational data model [Codd70a], the object types are domains, relations, attributes, tuples, primary key, and foreign key, while the operations are defined as the relational algebra. Many scientists claim that the data model of a multimedia DBMS can only be object-oriented [Woel86a, Woel87a, Masu87a]. Unfortunately, object-oriented DBMS are still subject to lively development, and the numerous proposals and prototypes differ in many aspects. It is far from clear today which of the proposals will finally prevail. So building multimedia management on top of one of the systems will yield only a specific solution with little implications for the other environments. Even worse, the proposals for multimedia DBMS usually define their own object-oriented data model and thus add to the variety.

- 20 -

So we decided that the one thing we do not want is to invent yet another data model. Instead, we try to define abstract data types (ADTs) for the multimeida data objects that could later be integrated into a standard relational data model as well as into an object-oriented data model. While all the different object-oriented models certainly offer a richer set of object types and more modelling power (e.g. inheritance), the relational model is compatible with the large number of databases currently in use. For a discussion of the pros and cons see the two manifestos [Atki89a, Ston90a]. So we try to discuss multimedia data types independently of a specific data model. We shall however use relational model as an example to show the consistent embedding of those types. The reason is that most readers will be familar with the relational model, and lengthy explanations are not required. Please keep in mind that this does not mean that object-orientation will be irrelevant for multimedia data!

4.4.1 The Data Type IMAGE as an Example We shall use raster images as an example to demonstrate the definition of an abstract data type for media data objects. It will be called IMAGE, and it will have its own set of operators to access and manipulate its values. The definition of data types and operators for the other media has been done in a similar way and will not be presented here. As we have seen, a value of type IMAGE is more than just a matrix of pixels. It contains some other information such as height, width, pixel depth, color definition, colormap, etc. Since the registration data are mandatory, they are included in the values of the IMAGE type. This is shown in Fig. 4. As in the case of any abstract data type, the components are only accessible through functions. This is to hide the internal strorage structures and formats used to represent them and thus fulfills the requirement to make the programs device independent and format independent. The function create_image assembles an IMAGE value from the contents of program variables and constants. It checks the parameter values for consistency and map them to whatever internal format used for the images. Please note that the encoding parameter indicates only how the input in pixelmatrix and colormap has to be interpreted, and does not imply any preference for the internal format. create_image

(height : INTEGER, width : INTEGER, depth : INTEGER, aspect_ratio : REAL, encoding : CODE, colormap_length : INTEGER, colormap_depth : INTEGER,

- 21 -

IMAGE registration data

colormap

height

width

depth

encoding

length

depth

raw data: bitmap/raster

description data Reagan and Gorbachev signing a treaty the event occurred on December 7, 1987 it took place in Washington, D.C.

Figure 4:

Conceptual View of an Instance or Value of the Abstract Data Type IMAGE colormap : ARRAY [2:*, 1:*] OF INTEGER, pixelmatrix : ARRAY [1:*] OF BIT ) : IMAGE;

Other create functions can be defined that use specific image formats as input parameters and thus reduce the number of parameters significantly. Access to components of an IMAGE value is performed with the help of functions like: height (i : IMAGE) : INTEGER; width (i : IMAGE) : INTEGER; etc. Again, specific data structures used in some application environments (e.g. Ximage) can be filled with an IMAGE value in a single function call. In an interactive environment, loading data structures is of no use; one would like to display an image on a monitor or in a separate window of the screen: display (i : IMAGE; d : DEVICE) : BOOLEAN;

- 22 -

The return value of this function indicates whether the display operation was successful or not. Modifying IMAGE values must also be done through functions that guarantee consistent update of raw data and registration data. Examples are: replace_colormap (i : IMAGE, encoding : CODE, colormap_length : INTEGER, colormap_depth : INTEGER, colormap : ARRAY [2:*, 1:*] OF INTEGER ) : IMAGE; replace_pixelvalue

(i : IMAGE, x, y : INTEGER, pixelvalue : ARRAY [1:*] OF BIT ) : IMAGE;

This list of examples is anything but complete. We are convinced that each application will add specific functions to derive other data from an IMAGE value or to perform complex updates in a single call. The DBMS must be flexible enough to allow for the definition of those functions. After some experience, a sufficiently large library will be available to new users.

4.4.2 Embedding in the Relational Data Model In the relational model, the abstract data types defined for media data objects (like the IMAGE shown above) can be used as domains. If a relational DBMS offers a facility to define new domains, the embedding can be done in a straight-forward manner. University Ingres [Ong84a] and its successor Postgres [Ston86a] offer such a facility, although it is not perfecly suited to multimedia data. In the following we shall assume that such a facility exists. Since IMAGE is a domain, an image is supposed to be handled as an attribute value of some object or entity (a ship or an aircraft, for instance). Usually it is an attribute of the object shown on the picture, but that need not be the case. Making image an attribute does not prevent the treatment of pictures as objects in their own right. We shall introduce different relational schema types to show the modelling power of this approach. Schema Definition The simplest way of assigning an image to an object leads to a relation schema like this: OBJECT (O-ID, ..., O-IMAGE) OBJECT is the name of the relation, followed by a list of attributes. The object identifier O-ID is underlined to indicate that it is the primary key. We denote this as the relation schema type 1. Its advantage is that access

- 23 -

to the tuple describing an object fetches the image, too. More than one attributes of type IMAGE can be defined for a relation. Typical examples are: INTEGER,

Employee (EmpNo ....

Portrait Inmate

(InmNo

IMAGE) INTEGER,

.... FrontView

IMAGE,

SideView

IMAGE,

FingerPrints

IMAGE)

This kind of schema easily models the object-attribute relationship demanded in section 4.1. However, it may often be the case that the number of images per object varies. If first normal form is required, such groups of values can only be modelled by a separate relation. Hence, there is a relation schema type 2: OBJECT (O-ID, ... ) OBJECT-IMAGE (O-ID, O-IMAGE) An example could be the management of X-rays in a hospital: Patient

STRING,

(Name .... Portrait

X-Ray

(PatientName

IMAGE) STRING,

Date

STRING,

View

STRING,

BodyPart

STRING,

Photo

IMAGE)

The fact that an attribute of type IMAGE is part of the primary key might lead to some implementation problems, but we do not consider them here (introducing an internal image identifier may help). Access to an image is no longer as simple as it was with schema type 1, for a natural or outer join must be used. If the tuple of the object has been read already, a selection on the OBJECT-IMAGE relation must be performed, using the given object identifier. Another problem with the two approaches discussed so far is that a picture showing several objects must be stored redundantly. The database system treats the copies as different images. To avoid this, a relation schema type 3 has to be used:

- 24 -

OBJECT (O-ID, ...) IMAGE-OBJECT (I-ID, I-IMAGE) IS-SHOWN-ON (O-ID, I-ID, COORDINATES, ...) The COORDINATES can be used to give the approximate position of the object on the image. An example could be the management of horse-races: Horse

(Name

STRING,

Age

INTEGER,

...

)

RacePhoto (ArchiveNo

Is-shown-on

INTEGER,

Date

STRING,

Place

STRING,

Photo

IMAGE)

(HorseName

STRING,

ArchiveNo

INTEGER,

Position

STRING,

....

)

With this type of schema, it becomes even more complex to find the images of a given object: NATJOIN (SELECT O-ID=object1 (IS-SHOWN-ON), IMAGE-OBJECT) NATJOIN stands for the natural join of two relations, i.e. the equi-join on the attributes with the same name (IS-SHOWN-ON.I-ID = IMAGE-OBJECT.I-ID). Each image is stored only once, regardless of how many objects it shows. It is possible to start with an image and to retrieve the depicted objects: NATJOIN (OBJECT, SELECT I-ID=image1 (IS-SHOWN-ON)) One could even define a window on the image, use it to restrict the coordinates, and thus retrieve only the objects shown in the window. Hence, the third type of relation schema is a little unwieldy, but it provides the highest degree of flexibility in modelling and processing. Images with unknown contents can be stored, too. The three schema types are depicted in Fig. 5. The dotted line indicates a primary-key-foreign-key relationship. A relational database system extended with image attributes offers all of them; the choice is left to the application. If there is at most one image per object, and each image shows only one object (e.g. a database of employees with passport photo), then type 1 is most appropriate.

- 25 -

OBJECT O-ID ....

OBJECT O-ID ....

O-IMAGE

OBJECT-IMAGE O-ID O-IMAGE a) Type 1

b) Type 2 OBJECT O-ID ....

IMAGE-OBJECT I-ID I-IMAGE

IS-SHOWN-ON O-ID I-ID

COORDINATES

....

c) Type 3 Figure 5:

The Three Relation Schema Types for Storing Images

A preliminary analysis of relational data modelling with media data types (not taking query languages into account) yields the following results: -

All the three kinds of relationships between media objects and entities (one-to-one, one-tomany, many-to-many) can be represented, although not with specific semantics (e.g. synchronization).

-

Media objects can be attributes as well as objects in their own rights (schema types 2 and 3).

-

Access can be a little awkward (and slow), if one or two join operations must be used.

There is one problem with schema type 3 that has not been mentioned yet: There may be different types of objects shown on an image, e.g. ships, aircrafts, and submarines, each represented by a different relation. In that case multiple IS-SHOWN-ON relations must be defined, for the domain of the O-ID cannot be the union of the domains of all the object identifiers. This makes following the link from a picture to the shown objects really awkward. A solution would be the introduction of a generalization hierarchy with a superclass OBJECT, but that goes beyond the relational model. Query Language Access to a relational database that also holds media objects has been characterized in vague terms only until now. To give a better impression, a hypothetical integration of the operators defined for the media data types

- 26 -

with a standard query language will be shown. We shall use SQL as the query language, since it is in widespread use and has been standardized by ANSI and ISO [ANSI86a]. The example relation that will be used in the queries is even simpler than the ones shown above; it consists of two attributes only: AerialPhoto

(No

INTEGER,

Photo

IMAGE)

To fill tuples into this relation, the normal SQL INSERT statement can be used: INSERT INTO AerialPhoto VALUES (:number, create_image (512, 480, 8, ... )); The leading colon helps the SQL compiler to tell program variable names from relation names and attribute names. It is important that type check can be done at compile time already. The first entry in the value clause must specify a value for the attribute No and must thus be of type INTEGER. Accordingly, the second entry must be of type IMAGE. Since the result type of the operation create_image is in fact IMAGE, the compiler accepts the function call. Hence, this embedding allows for static type checking. An update operation that includes an IMAGE attribute must also use an operator which returns an IMAGE value. For instance: UPDATE AerialPhoto SET Photo = replace_colormap (Photo, RGB, 256, 24, :cm) WHERE No = 1234; To select particular images from the database, the usual SQL expressions can be used to identify the tuples. However, the typical comparison of attribute values and constants is only allowed for the standard data types. So again the operators must be used to extract components that have INTEGER or BIT values, before a comparison can be performed: SELECT height (Photo), width (Photo) INTO :h, :w FROM AerialPhoto WHERE encoding (Photo) = RGB_COLORMAP AND depth (Photo) = 8; The example also shows how IMAGE values can be retrieved from the database. Since an IMAGE value cannot be assigned to a program variable as a whole, operators are used to select components of basic data types or structured data types like arrays that can be assigned to program variables. Given a specific environment, additional operators can be defined to fill a complex record structure in a single call (e.g. Ximage).

- 27 -

Such a convenient embedding of operators on media data types can be implemented with the ADT facilities offered by relational DBMS such as Postgres [Ston86a]. However, SQL has a few problems with operators like the ones introduced. The two most important deficiencies are: -

Repeated access to the same attribute value is not supported. The need only arises with the new media data types, since their values are large and are often accessed piecemeal. For instance, a program could first retrieve height and width of an image to allocate sufficient memory, before it fetches the pixelmatrix. For the SQL system, these are two different statements that just happen to refer to the same tuple. The problem is worse when a cursor is used to retrieve the tuples. A FETCH operation assignes attribute values to program variables and moves the cursor to the next tuple, so that another access to the previous tuple must be done using the primary key. That of course is much less efficient. A remedy would be to provide two separate operations for fetch and cursor movement.

-

The SQL UPDATE statement can only replace the value of an attribute completely. It cannot modify parts of it, which would be appropriate for the update operations on an IMAGE value. Instead, the operations must be implemented to generate a copy of the whole media data object that then replaces an (almost) equal copy. A better version of the UPDATE statement would allow for the application of a procedure to the attribute values of a tuple: UPDATE AerialPhoto APPLY replace_colormap (Photo,

RGB, 256, 24, :cm)

WHERE No = 1234; Despite the problems of schema definition and query language embedding relational database systems still provide a notable potential to work with media data objects. They offer stability and upward compatibility to their users. The learning effort is incremental. Hence, the relational system is a useful environment to test the media data types and in particular the set of operations given to work with them.

4.5

Search

To search a large set of multimedia data objects has been identified as a prime task for a multimedia DBMS. In what has been discussed so far, it can be done using the standard formatted attributes associated with the media objects, or using result values of operators. This is easily implemented with standard database search techniques (e.g. indexing), but it will not suffice to answer all questions. Certain phenomena shown on images will hardly be represented by attributes or relations (as are ships, persons, horses, cars), e.g. “storm,” “fog,” “night,” etc. So the media object itself must be used in the qualification. Pattern matching is a technique that can do just that. Full-text retrieval is an example. However, little meaning is involved in the comparison of the pattern and the raw data. In the case of image and sound, it can be very hard to find the right pattern. If you are looking for photos of a snowstorm, you must specify something like

- 28 -

“lots of white pixels,” which does not necessarily match only snowstorms, nor does it match all snowstorm images. What we would like to have lies somewhat in between: It does take the media object itself into account, but it compares on a more semantical level. For instance, we would like to find: -

mugshots on the basis of witnesses´ reports: beady eyes, long nose

-

aerial photos that show an airport (a river, an industry plant, ... )

-

press photos of Bush and Gorbachev signing a treaty

-

text passages on the subject of urban traffic

-

recordings of interviews on speed limits.

It is obvious that questions like these can hardly be mapped to pattern matching. Run-time analysis of a media object would be appropriate (e.g. run a specific image analysis to find airports), but is simply not possible with today´s technology. Text understanding and speech recognition only work in restricted domains and are still pretty costly. The only way out is to do the analysis off-line and store the results together with the media objects. As a consequence, we delegate the task of generating the content descriptions to the users of the DBMS. They may run complex image analysis and text understanding software to do so, or may just as well decide to enter the descriptions manually. As of today, the latter will still produce the better descriptions, but it can be timeconsuming in some situations. However, authors have been living with the obligation to provide the keywords along with their texts for some time now. The task of the DBMS then is to store the descriptions and to use them in search for the multimedia objects. The next question then is how these content descriptions of media data objects should look like. We have already ruled out formatted data because of their lack of expressiveness, although they could be searched efficiently. At least four other types of content descriptions could be considered: -

keywords

-

knowledge representation

-

natural-language text

-

restricted text (captions).

Keywords have been used in libraries and in information retrieval systems for a long time. Elaborated techniques are available to perform search on the basis of keyword descriptors [Salt83a]. The expressiveness however is still to weak to describe the contents of pictures, sound recordings, and video. Interdependencies, complex sequences of actions, and causal connections can hardly be represented. Knowledge representation techniques on the other hand are rich enough to model them, and they also lend themselves to efficient search. The only problem with them is that they are rather hard to understand. Unskilled users are not able to work with them directly; they need a “knowledge engineer” who maps their expertise onto knowledge representation structures.

- 29 -

Natural-language text has just the opposite characteristics: It is easy to generate, but it is hard to search. Expressiveness is as good as with knowledge representation, if not better. However, text descriptions are themselves unformatted and thus not very helpful in conducting search. Full-text search is the only appropriate method, and we have mentioned already that it bears some problems. Actually, the scope of content addressability has only been shifted from an arbitrary medium to the single medium “text,” but that is no solution. As a compromise, we would like to propose restricted text, in particular captions (noun phrases). They are almost as easy to write as free text, but at the same time can be processed and turned into knowledge representations used internally for search. Hence, we get both: Easy input of descriptions, even by laymen, and efficient and well-defined search. It seems appropriate to look into this approach a little further and to enumerate the consequences. First, the embedding with the media data types and the query language will be shown. Since the descriptions are not objects in their own right, but should only exist in connection with a media data object, they are encapsulated together with the media objects. That is, they are created, modified, and used to qualify objects through operators of the media data types. These operators are: new_descr (i : IMAGE, d : STRING) : IMAGE; more_descr (i : IMAGE, d : STRING) : IMAGE; descr_length (i : IMAGE) : IMAGE; descr (i : IMAGE) : STRING; contains (i : IMAGE, q : STRING) : BOOLEAN; Again, IMAGE is used only as an example; the operators are the same for all the media data types. The strings declared as arguments must contain the captions. The operator new_descr replaces the old description (if any) with the captions given in string, while more_descr appends the new description to the old one. To retrieve a description from the database, descr_length and descr are used. The most important operation of course is contains which tests whether the a specific image contains the objects (actions, circumstances, events, etc.) given in the query string q. The comparison operator has a Boolean result value and thus can be used in the SQL SELECT statement´s WHERE clause. Using the same relation as in the previous examples, we can demonstrate the use of the new operators in statements of the SQL query language. Content descriptions are not mandatory, hence they are not specified as an argument of the create_image operation. Either new_descr is applied to the result of create_image in an INSERT statement, or new_descr is used in an UPDATE statement to add the description later:

- 30 -

UPDATE AerialPhoto SET Photo = new_descr (Photo, "a road in an area of rocks; three houses on the right") WHERE No = 1234; Now search for images can be specified easily: SELECT descr (Photo), ... FROM AerialPhoto WHERE contains (Photo, "rocky area"); Please note that “rocky area” and “an area with rocks” mean the same thing, but a simple string matching would not include the image with number 1234 in the result set of the query. Only if an internal mapping to the semantics takes place before the comparison, a match can be found. We shall not go into the details of how the captions can be processed and transformed into knowledge representation structures. The interested reader is referred to [Lum90a]. Some interesting research problems are enclosed in this approach, and not all of them have been solved yet. However, the perspective is quite promising.

4.6

Architecture

The media data types, their embedding into a data model, and the extension ot the query language with the operators define the functionality of a multimedia DBMS. The next question is how such a DBMS could be implemented. A software system of the size and complexity of a DBMS should be decomposed into modules. Traditionally, these modules are organized in layers as described for instance in [Härd83a]. However, performance is much more important when real-time requirements for audio and video data must be satisfied, so the layers must not cause a significant software overhead. Instead, the goal should be to minimize the number of copying and mapping operations, even if this forces upon us compromises with respect to data independence. To reduce the amount of copying in the process of capturing or presenting media objects, some proposals integrate control of the I/O devices into the DBMS [Woel87a]. This means the application programs are removed from the chain of tranfers and mappings, but the DBMS becomes more complex. Alternatively, the DBMS interface could be extended to support some kind of pipelining that avoids materializing complete media objects in the address space of the application program: While block n-1 is written and block n+1 is read, both asynchronously, block n is processed, i.e. mapped, coded/decoded, or compressed/uncompressed. The facilities that support content search must also be added to the conventional DBMS architecture. This includes a parser for the captions, a matcher for stored descriptions and queries, and a description manager

- 31 -

that organizes large sets of content descriptions for efficient search. Standard database storage structures cannot be used for the descriptions, new ones must be developed. For some ideas in this respect see [Lum90a]. The task then is to move from well-established DBMS to new multimedia DBMS. Many concepts have been developed to achieve an “extensible” DBMS, some of which are useful for the handling of multimedia data (e.g. an ADT definition facility) while others are not (access path definition based on total ordering). A thorough evaluation with respect to multimedia data management remains to be done. Masunaga has developed a framework that helps to classify and compare the architectures defined in the research projects [Masu87a]. He assumes different database systems for the different media. They are integrated using an additional object-oriented DB that refers to them. There is either a single (extensible) DBMS managing all these different databases (“single DBMS architecture”), a “primary” multimedia DBMS that calls the “secondary” media-specific DBMS´s as subroutines (“primary-secondary DBMS architecture”), or a collection of cooperating DBMS´s accessing each other via Remote Data Access (“federated DBMS architecture”). MINOS [Chri86a] is an example of the single DBMS architecture, while the Starburst project [Yost88a] aims at the primary-secondary DBMS architecture. A very detailed proposal for a multimedia DBMS architecture is given in [Lock88a]. As in the primarysecondary architecture, media-specific data management systems are integrated into a single system, but they are supposed to be all based on a common object-oriented system. Only the video management system has its own storage system, since it must use a VCR instead of magnetic and optical disks. In addition to the mediaspecific systems, the architecture hosts a “mixed-object manager” and an “interrelationship manager.” Their task is to provide the integration of single-media objects into multimedia objects (mostly documents) and the management of various types of relationships among the media objects, respectively.The user interacts with the system through a homogeneous interface that hides the media-specific data management systems. The functionality of this interface however is still an open issue.

5

Summary and Prospects

In this paper, we have followed the path that leads to the design of multimedia data base management systems. It starts with the availability of new and inexpensive I/O devices that allow for the capture and presentation of information directly in media like photography and sound. These devices produce new types of data to be managed in a computer system, or they must be supplied new types of data for output. Those are the data that we named “media data” or “multimedia data.” Numerous applications can be imagined or are already being developed that make use of the new I/O facilities. The result is not only better user interfaces, but also more information held in the system: A photo can be included to describe a person, a car, a horse, etc. The applications can be classified in four categories using the stability of the data and the initiative in output of data:

- 32 -

-

Archiving;

-

Teaching, advertising, and entertainment;

-

Design, authoring, and publishing;

-

Supervision.

In all of these areas, visions are at hand concerning completely new ways of distributing and accessing information [Bran88a]. The technology is here to make those visions become reality, but the effort involved in creating software to drive the devices is still substantial. Powerful tools are needed. Also, the experimental systems are mostly isolated and “closed” systems; they cannot be integrated with existing systems, nor can they be accessed from other new application software. Virtual machines and abstract data objects must be defined and implemented in form of programming languages, system software, and library packages. We have looked into one of these tools: the database management system. We identified as the prime and sufficiently complex task of such a multimedia DBMS the storage and retrieval of multimedia data objects. This excludes complex processing algorithms as well as full editors. It does however include a number of goals to be achieved and subtasks to be accomplished: -

Device independence of the application programs

-

Format independence of the application programs

-

Representation of relationships among data objects

-

Support of search (content addressability).

Storing multimedia data objects in files as it is done in most of the experimental systems does not yield any of these. To learn more about the types of data to be managed by a multimedia DBMS, we have looked into specific media one by one, i.e. text, graphics, raster image, sound, and video. It was found that those data objects are not just unformatted, but in fact consist of at least two parts named raw data and registration data. The raw data are unformatted in the sense that they consist of a large set or sequence of small elements with hardly any structure, but they must always be accompanied by some registration data to be interpreted correctly. True reproduction of the media object with an output device is only possible with the registration data at hand. The next step was to combine the data objects from different media into multimedia objects. Some prototype systems have been built to handle multimedia documents; we mentioned the MINOS system. While they are all based on files, a multimedia DBMS should be designed to provide sufficient functionality to support them as well. This primarily means modeling concepts for relationships among media data objects, e.g. composition and synchronization. We also argued that multimedia data objects need not be documents only, so the data model should not be restricted to the representation of documents. Having collected a set of requirements, we proceeded to the design of a multimedia DBMS. This must start with the specification of the functionality, i.e. the data model. Many scientists claim that a multimedia DBMS can only be built using object-oriented technology. Innovative and powerful as it may be, this technology is

- 33 -

not very stable yet, and there are also “manifestos” arguing against it. To avoid a strong dependency on a particular data model, we set out to define abstract data types for the media objects without reference to the data model, and then demonstrated an integration into the standard relational model as an example. While this certainly has its limitations, it provides a stable environment to test the new data types and use them in existing environments. Finally, we discussed the different approaches toward search in multimedia DBMS and concluded that neither the use of associated formatted data nor pattern matching would suffice to answer important types of queries. Instead, content descriptions must be stored with the media objects and used in search. Keywords lack expressiveness for the purpose in mind; knowledge representation is sufficiently rich, but hard to generate; and natural-language text does not solve the search problem. As a compromise, we suggested the use of restricted text, i.e. captions, that can be translated internally into knowledge representations. We demonstrated the embedding in a relational query language. Implementation techniques are available, but were not presented due to space limitations. A more detailed discussion of the issues can be found in [Meye91a]. Of course, not all the problems in the area of multimedia DBMS have been solved. The main topics for future research are: -

Applications: There is a continuing development of applications and prototypes that use multimedia data, but store them in files. They must be analysed to scrutinize the requirements on multimedia DBMS. Only if multimedia DBMS offer some benefit to a majority of the applications, they will be used.

-

Extension of the Data Model: Using a DBMS is still much more costly than using files. This gap should be narrowed by allowing user to start with files when they have only a few multimedia data objects, and later migrate to using a DBMS when the number grows and storage allocation and search become problems. Recoding the application must be avoided. Schema information could be added without affecting the old applications, but helping new applications to use the data.

-

Contents Search The proposal of using captions externally and knowledge representations internally carries us further than other approaches, but it is far from being the perfect solution. The quality of the translation process needs to be improved. In particular, some notion of “common sense” would be useful to derive information from the captions that is not said, but is meant implicitly. To mention a “car” without other attributes implies “four wheels.”

-

Operational Aspects

- 34 -

To guarantee consistency of the data, standard DBMS technology offers a rich set of methods that all serve the concept of a transaction. Transactions must cover update operations on multimedia objects, too. However, since the operators of the media data types may cause large modifications inside a single attribute value, their failure should not force a rollback of the complete transaction. The concept of nested transactions [Moss82a] could be useful in this respect. -

Storage Systems Large media objects must be mapped to blocks on the disk appropriately. While flexibility is needed to allow for updates, access must still be very efficient, especially for timed media (sound, video). Few proposals exist.

-

Distribution Most likely, we shall see the development of multimedia workstations connected to large servers. The multimedia DBMS is supposed to run on the servers, but it could run locally on the workstations, too. Server data can be checked out into the workstation for editing, or it can just be “read” from the server without being stored locally (e.g. watch a video). Communication requirements for both remain to be identified. Certainly broadband communication is required.

Much more needs to be done, before multimedia DBMS can be a stable and well-established tool for everyday work. However, the purpose of this paper was to show that it is worth the effort.

References Aksc88a

Akscyn, R.M., and McCracken, D.L., “KMS: A Distributed Hypermedia System for Managing Knowledge in Organizations,” Communications of the ACM, vol. 31, no. 7, July 1988, pp. 820-835.

ANSI86a

ANSI, The Database Language SQL, Document ANSI X3.133, 1986.

Atki89a

Atkinson, M., Bancilhon, F., DeWitt, D., Dittrich, K., Maier, D., and Zdonick, S., “The Object-Oriented Database System Manifesto,” in Proc. 1st Int. Conf. on Deductive and Object-Oriented Databases (Kyoto, Japan, Dec. 1989), eds. W. Kim, J.-M. Nicolas, and S. Nishio, Elsevier Science Publishers, B.V., Amsterdam, 1989, pp. 40-57.

Ball82a

Ballard, D.H., and Brown, C.M., Computer Vision, Prentice-Hall, Englewood Cliffs, 1982.

Bono85a

Bono, P.R., “A Survey of Graphics Standards and Their Role in Information Interchange,” IEEE Computer, vol. 18, no. 10, Oct. 1985, pp. 63-75.

Bran88a

Brand, S., The Media Lab - Inventing the Future at M.I.T., Penguin Books, New York, 1988.

Brew87a

Brewer, B., “Ready When You Are, CD-I,” PC World, April 1987, pp. 252-255.

Burk89a

Burke, J.J., and Ryan, B., “Gigabytes On-line,” Byte, vol. 14, no. 10, Oct. 1989, pp. 259-264.

- 35 -

Chan87a

Chang, S.-K., Shi, Q.-Y., and Yan, C.-W., “Iconic Indexing by 2-D Strings,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. PAMI-9, no. 3, May 1987, pp. 413-428.

Chri86a

Christodoulakis, S., Ho, F., and Theodoridou, M. “The Multimedia Object Presentation Manager of MINOS: A Symmetric Approach,” in Proc. ACM SIGMOD '86 Int. Conf. on Management of Data (Washington, D.C., May 1986), ACM SIGMOD Record, vol. 15, no. 2, June 1986, pp. 295-310.

Chri86b

Christodoulakis, S., Theodoridou, M., Ho, F., Papa, M., and Pathria, A., “Multimedia Document Presentation, Information Extraction, and Document Formation in MINOS: A Model and a System,” ACM Trans. on Office Information Systems, vol. 4, no. 4, Oct. 1986, pp. 345-383.

Chri86c

Christodoulakis, S., and Faloutsos, C., “Design and Performance Considerations for an Optical-Disk Based, Multimedia Object Server,” IEEE Computer, vol. 19, no. 12, Dec. 1986, pp. 45-56.

Codd70a

Codd, E.F., “The relational model for large shared data banks,” Communications of the ACM, vol. 13, no. 6, pp. 377-387.

Conk87a

Conklin, J., “Hypertext: An Introduction and Survey,” IEEE Computer, vol. 20, no. 9, Sept. 1987, pp. 17-41.

Falo85a

Faloutsos, C., “Access Methods for Text,” ACM Computing Surveys, vol. 17, no. 1, March 1985, pp. 49-74.

Hala88a

Halasz, F.G., “Reflections on NoteCards: Seven Issues for the Next Generation of Hypermedia Systems,” Communications of the ACM, vol. 31, no. 7, July 1988, pp. 836-852.

Hask82a

Haskin, R., and Lorie, R., “On Extending the Functions of a Relational Database System,” in Proc. ACM SIGMOD Conf. (June 1982), pp. 207-212.

Hofm91a

Hofmann, M., and Langendörfer, H. (eds.), “User Interface and Navigation Facilities of the Hypertext System CONCORDE,” Technische Universität Braunschweig, Informatik-Bericht 91-01, Febr. 1991.

Hora85a

Horak, W., “Office Document Architecture and Office Document Interchange Formats: Current Status of International Standardization,” IEEE Computer, vol. 18, no. 10, Oct. 1985, pp. 50-60.

Kamp90a

Kampffmeyer, U., “Kombinierte WORM- und magneto-optische Massenspeicher für vorgangsorientierte Informationsverarbeitungssysteme,” (Combining WORM and magnetooptical mass storage systems for process-oriented information-processing systems), copies of slides (in German), available from: ACS Systemberatung GmbH, Poststraße 33, W-2000 Hamburg 36.

Kuni74a

Kunii, T.L., Weyl, S., and Tenenbaum, J.M., “A Relational Data Base Schema for Describing Complex Pictures with Color and Texture,” in Proc. 2nd Int. Joint Conf. on Pattern Recognition (Lyngby-Copenhagen, Denmark, Aug. 1974), pp. 310-316.

Lanc73a

Lancaster, F.W., and Fayen, E.G., Information Retrieval On-Line, Melville Publ. Comp., Los Angeles, CA, 1973.

Laub86a

Laub, L., “The Evolution of Mass Storage,” Byte, vol. 11, no. 5, May 1986, pp. 161-172.

Lee83a

Lee, D.L., and Lochovsky, F.H., “Voice Response Systems,” ACM Computing Surveys, vol. 15, no. 4, Dec. 1983, pp. 351-374.

- 36 -

Lee84a

Lee, Y.C., and Fu, K.S., “Query Languages for Pictorial Database Systems,” in Natural Language Communication with Pictorial Information Systems, ed. L. Bolc, Springer-Verlag, New York 1984, pp. 1-142.

Lock88a

Lockemann, P.C., “Multimedia Databases: Paradigm, Architecture, Survey and Issues,” report no. NPS52-88-047, Naval Postgraduate School, Monterey, CA, Sept. 1988.

Lum90a

Lum, V.Y., and Meyer-Wegener, K., “An Architecture for a Multimedia Database Management System Supporting Content Search,” in Proc. Int. Conf. on Computing and Information (ICCI ’90, Niagara Falls, Canada, May 1990).

Mack89a

Mackay, W.E., and Davenport, G., “Virtual Video Editing in Interactive Multimedia Applications,” Communications of the ACM, vol. 32, no. 7, July 1989, pp. 802-810.

Masu87a

Masunaga, Y., “Multimedia Databases: A Formal Framework,” in Proc. IEEE CS Office Automation Symp. (Gaithersburg, MD, Apr. 1987), IEEE CS Press, order no. 770, Washington, 1987, pp. 36-45.

Meye91a

Meyer-Wegener, K., Multimedia-Datenbanken (Multimedia Databases), B.G. Teubner, Leitfäden der angewandten Informatik, Stuttgart 1991 (in German).

Mind89a

Mindell, D.A., “Dealing with a Digital World,” Byte, vol. 14, no. 8, Aug. 1989, pp. 246-256.

Moss82a

Moss, J.E.B., “Nested Transactions and Reliable Distributed Computing,” in Proc. 2nd Conf. on Reliability of Distributed Software and Database Systems (1982), pp. 33-39.

Deli86a

Delisle, N., and Schwartz, M., “Neptune: a Hypertext System for CAD Applications,” in Proc. ACM SIGMOD '86, Int. Conf. on Management of Data (Washington, D.C., May 1986), ed. C. Zaniolo, also ACM SIGMOD Record, vol. 15, no. 2, June 1986, pp. 132-143.

Ong84a

Ong, J., Fogg, D., and Stonebraker, M., “Implementation of Data Abstraction in the Relational Database System INGRES,” ACM SIGMOD Record, vol. 14, no. 1, 1984, pp. 114.

Phil88a

Phillips, B., “Multimedia Systems and Text,” in Proc. 4th Int. Conf. on Data Engineering (Los Angeles, Febr. 1988), IEEE Computer Society Press, order no. 827, p. 601.

Ripl89a

Ripley, G.D., “DVI - A Digital Multimedia Technology,” Communications of the ACM, vol. 32, no. 7, July 1989, pp. 811-822.

Salt83a

Salton, G., and McGill, M.J., Introduction to Modern Information Retrieval, McGraw-Hill, New York 1983.

Shar64a

Sharp, H.S. (ed.), Readings in Information Retrieval, The Scarecrow Press, New York & London 1964.

Shne87a

Shneiderman, B., “User Interface Design fo the Hyperties Electronic Encyclopedia,” in Proc. Hypertext ´87 (Chapel Hill, NC, 1987), pp. 189-194.

Shou79a

Shoup, R.G., “Color Table Animation,” Computer Graphics, vol. 13, no. 2, 1979, pp. 8-13.

Stei89a

Steinmetz, R., “Synchronization Properties in Multimedia Systems,” Technical Report No. 43.8906, IBM European Networking Center, Heidelberg, May 1989.

Ston86a

Stonebraker, M., and Rowe, L.A., “The Design of POSTGRES,” in Proc. ACM SIGMOD ’86 (Washington, May 1986), ed. C. Zaniolo, also ACM SIGMOD Record, vol. 15, no. 2, June 1986, pp. 208-214.

- 37 -

Ston90a

Stonebraker, M., Rowe, L., Lindsay, B., Gray, J., Carey, M., Brodie, M., Bernstein, P., and Beech, D., “Third-Generation Database System Manifesto,” ACM SIGMOD Record, vol. 19, no. 3, Sept. 1990, pp. 31-44.

Tamu84a

Tamura, H., and Yokoya, N., “Image Database Systems: A Survey,” Pattern Recognition, vol. 17, no. 1, 1984, pp. 29-44.

Terr88a

Terry, D.B., and Swinehart, D.C., “Managing Stored Voice in the Etherphone System,” ACM Trans. on Computer Systems, vol. 6, no. 1, 1988, pp. 3-27.

Thom88a

Thompson, T., and Baran, N., “The NeXT Computer,” Byte, vol. 13, no. 12, Nov. 1988, pp. 158-175.

Thom89a

Thompson, T., “Full-Spectrum Scanners,” Byte, vol. 14, no. 4, Apr. 1989, pp. 189-194.

Weye85a

Weyer, S.A., and Borning, A.H., “A Prototype Electronic Encyclopedia,” ACM Trans. on Office Information Systems, vol. 3, no. 1, Jan. 1985, pp. 63-88.

Will87b

Williams, G., “HyperCard,” Byte, Dec. 1987, pp. 109-117.

Wils85a

Wilson, G.B., “Some Aspects of Data Fusion,” in Proc. Int. Conf. on Advances in Command, Control, and Communication Systems: Theory and Applications (Bournemouth, April 1985), IEE Conf. Publication no. 247, London 1985, pp. 99-105.

Woel85a

Woelk, D., and Luther, W., “Multimedia Database Requirements - Rev.0,” MCC Technical Report no. DB-042-85, Austin, Texas, 1985.

Woel86a

Woelk, D., Kim, W., and Luther, W., “An Object-Oriented Approach to Multimedia Databases," in Proc. ACM SIGMOD '86 Int. Conf. on Management of Data (Washington, D.C., May 1986), ed. C. Zaniolo, also ACM SIGMOD Record, vol. 15, no. 2, June 1986, pp. 311325.

Woel87a

Woelk, D., and Kim, W., “Multimedia Information Management in an Object-Oriented Database System,” in Proc. Int. Conf. on VLDB (Brighton, England, Sept. 1987).

Yank85a

Yankelovich, N., Meyrowitz, N.K., and van Dam, A., “Reading and Writing the Electronic Book,” IEEE Computer, vol. 18, no. 10, Oct. 1985, pp. 15-30.

Yank88a

Yankelovich, N., Haan, B.J., Meyrowitz, N.K., and Drucker, S.M., “Intermedia: The Concept and the Construction of a Seamless Information Environment,” IEEE Computer, vol. 21, no. 1, Jan. 1988, pp. 81-96.

Yost88a

Yost, R.A., “Can image data be integrated with structured data?” in Proc. 4th Int. Conf. on Data Engineering (Los Angeles, Febr. 1988), IEEE Computer Society Press, order no. 827, p. 602.

- 38 -