Modeling and Querying Primitives for Digital Media

0 downloads 0 Views 339KB Size Report
Oct 29, 1994 - We are using the formalisms presented in this paper as the foundation for developing a Media Library Kiosk ICG94]. The kiosk is a multimedia ...
Modeling and Querying Primitives for Digital Media Munish Gandhi



[email protected]

Edward L. Robertson

[email protected]



Dirk Van Gucht



[email protected]

October 29, 1994

Abstract

We present a data model and equivalent rule-based and visual query formalisms for managing digital media data { speci cally, audio and video data. The query formalisms in the paper are also equivalent to an algebra. The simplicity and equivalence of these formalisms provides evidence that the concepts in this model may de ne a kernel of capabilities required for handling digital media.

1 Introduction Digital media such as audio and video are becoming ubiquitous in computing. From a database perspective, digital media data may be considered at three levels of granularity ( gure 1). The lowest level consists of media format speci c media atoms. A sequence of media atoms constitute a media track. In order to develop useful abstractions for media tracks, it is imperative that the media atoms be considered uninterpretable by the media track modeling layer. The media library may now store information about the track and maintain links between tracks. At this layer, the media track is considered atomic. As an example, consider how a movie may be stored using this architecture. The lowest layer could consist of JPEG compressed [Wal91] frames from the movie. A media track would then represent a movie using a sequence of these frames. Finally, the media library could store bibliographic information about the movie in addition to referencing the media track which constitutes the movie. While there is work at the extremes of the granularity spectrum, issues at the track level of granularity have been largely unexplored. For example, at the nest level of granularity, there exist many multimedia standards for interpreting bit streams as audio or video data [LG91, Wal91, Poh92]. At the coarsest level of granularity, current database technology [EN89] or other evolving paradigms [HBvR94, NKN91] may be used to organize a set of tracks. At the track level, the work is at an informal level ( [BGT92, RBE94]), is targeted toward other domains ( [GNU94, RP92, SLR94, CCT94]), or looks at a di erent modeling problem ( [LG93, LAF+ 94]). We brie y discuss these in section 5. 

Computer Science Department, Indiana University, Bloomington, IN-47405.

1

Format−independent

Media Atom

Format−dependent

Fine

Media Track

GRANULARITY

Track−structure independent

Coarse

Media Library

Figure 1: Media architecture In this paper, we propose a Track Data Model for modeling the central layer of media tracks (section 2 and specify a Track Rule Language (section 3.4) and a Track Visual Language (section 4.3) for managing such media. The rule language and visual language are equivalent to one another and are, in fact, equivalent to a Track Algebraic Language [GR94]. The appendix summarizes the operators in the algebraic language. We are using the formalisms presented in this paper as the foundation for developing a Media Library Kiosk [ICG94]. The kiosk is a multimedia catalog of lms in the Media Library at Indiana University. The architecture of the system illustrates the context in which the formalisms are being used ( gure 2). Media Libary Kiosk Query Interface

Track Viewers

Track Players

Query Processor

Other UI elements

Relational Database

Track Database

Application User Interface Toolbox

Databases

Figure 2: Architectural context The track database stores media data, such as movie clips, and other track based annotations about the media, such as shot analysis. This database is based on the Track Data Model. The relational database stores information about the movie, such as name and director of the movie. It also stores information linking the movie clip tracks with the associated shot analysis tracks. Track viewers and track players may be used to examine the track database. The query processor is used to query the track database. This processor implements the Track Rule Language. The user, however, interacts using the visual interface which is based on the Track Visual Language. Since the two languages are equivalent, a query in the Track Visual Language is translated to the Track Rule Language for ecient 2

execution. Other interfaces are used to access the relational database, but they are beyond the scope of this discussion.

2 Track Data Model One of the objectives in developing this model was that it should be a natural abstraction for media data. Thereto, we assume media streams to be a sequence of abstract data elements called the media atoms. Media atoms may have an internal structure but that structure is interpretable by speci c media format processors and not the data model. The above assumptions make the model independent of the media format and make the application's choice of granularity for the media atom independent of the storage system for the media data. For example, the media atom for a video may either be a single frame from a movie, a scene from a movie, or a story from a news broadcast. Since the media atom is a non-modi able abstract element, it is sucient for the data model to operate using surrogates for the media atoms. Thus, the model assumes the existence of a set MID of media atom identi ers (mids) each of which identi es an element from a pool of media atoms. Whereas the mids constitute the building block for modeling media streams, we use a denumerable set A of annotations as the building block for annotating media streams. For example, annotations could consist of frame numbers for a movie, the length of shots in a movie, or the title of news stories. Annotation tracks add valuable information about the media data and provide a symbolic basis for accessing and querying media data [DSP91]. For this reason, we believe that they constitute a necessary element of any audio-video model. Given the above, we now introduce the central notion of a track. A track seeks to characterize either a media stream or the symbolic information that annotates a media stream. It is structured as a sequence of elements called track atoms. A track atom, in turn, is either an mid or an annotation. We de ne these notions below.

De nition 2.1 We will assume that N  A and MID \ A = ;. A track is a partial function where the domain is a nite subset of the non-negative integers N . The range of a track is the set of track atoms T A, where T A = MID [ A. An annotation track is a track with the range restricted to the set of annotations. A media track is a track with the range restricted to the set of media atom identi ers. We will use the following running example for the rest of the paper. There are four tracks in the database. Tracks PBS and CNBC are video tracks which represent news summaries for a particular day. Each media atom in these tracks represents a single news story. In the following gure, di erent symbols denote di erent stories and there is no connection between symbols of the same shape. Tracks PBSSubject and CNBCSubject annotate the news stories with the subject of the news story. Here, P represents the string `Political', H represents `Human Interest', F represents `Finance' and D represents `Disaster'. 3

PBS: M O   CNBC: N H  

?

0 1 2 3 4

:::

PBSSubject: D P H F CNBCSubject: D D F F H 0 1 2 3 4

:::

3 Track Rule Language The Track Rule Language provides a mechanism to query tracks or to create new tracks in the database. This section starts o with an overview. This will be followed by a bottom-up description of the language and sample queries which illustrate its various features. The section will conclude with an algorithm for evaluating a query in the rule language.

3.1 Overview

The top-level structure of the language may be described by the following segment of the BNF in gure 3. program : [ rule-cluster ] rule-cluster : cluster-head := rule ; [ rule ; ] . rule : rule-head