Content Adaptation of Multimedia Delivery and Indexing using MPEG-7 Mulugeta Libsie and Harald Kosch Department of Information Technology University Klagenfurt A−9020 Klagenfurt, Austria ++43 463 2700 3629 [email protected]
ABSTRACT This work introduces a framework for adapting MPEG-4 intraand inter-Elementary Streams and for encoding the results in an MPEG-7 stream to be used for resource adaptation on the delivery path to the user.
Categories and Subject Descriptors H.5.1 [Multimedia Information Systems]
Keywords Multimedia Indexing, Resource Adaptation, MPEG-7, MPEG4.
1. Introduction and Motivation This research work is part of the CODAC (Modeling and Querying Content Description and Quality Adaptation Capabilities of Audio-Visual Data) project which is itself part of the larger ADMITS (Adaptation in Distributed Multimedia IT Systems) project bundle. The main objective of the CODAC project is the definition of a common framework for modeling content information and quality adaptation capabilities of audio-visual data and the implementation of the framework in a query, indexing, and retrieval system . Descriptions, which are indexed, i.e., stored in the meta-database will be encoded as an MPEG-7 description. Such a model enables us to provide active components with valuable information to govern or enhance their media scaling, buffering, and caching policies on the delivery path to the client (network routers and proxy caches). With the increasing number of multimedia uses and applications, delivery and indexing of multimedia material is crucial. Besteffort scheduling and worst-case reservation of resources are two extreme cases that can be used to deliver multimedia objects. However, both of them are not well suited to cope with largescale, dynamic multimedia systems. One possible solution could be adaptation. Adaptation is the capability of a system to dynamically change its behavior in order to keep the quality of service (QoS) above a certain level [1,4]. In contrast to quality reduction, adaptation includes also dynamic improvement of quality when the available resources are improved. MPEG, in its newest MPEG-21 standard, is in the process of defining a multimedia framework to enable transparent and augmented use of multimedia resources across a wide range of networks and devices used Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM MM’02, December 1-6, 2002, Juan Les Pins, France. Copyright 2002 ACM 1-58113-000-0/00/0000…$5.00.
by different communities . It has identified various reasons for the need of Digital Item Adaptation, in general, some of which are applicable to video resources. In this context, my PhD thesis deals with the generation of adaptation meta-data, encoded in MPEG-7, for MPEG-4 video streams. The originality of my work lies on the development of a framework for adapting MPEG-4 intra-and inter-Elementary Streams (ES) and in encoding the results in an MPEG-7 stream and also in delivering them. This paper gives an overview of my PhD-thesis and introduces preliminary and important results.
2. Methodology In this Section we provide the general solution architecture. Sections 2.1 and 2.2 briefly describe the MPEG-4 and MPEG-7 standards from video adaptation point of view. Section 2.3 presents the solution architecture.
MPEG-4 and its Scalability options In this paper, the interest is in the adaptation capabilities of MPEG-4 encoded videos. An MPEG-4 visual bitstream provides a hierarchical description of a visual scene . An MPEG-4 visual scene may consist of one or more video objects (VOs). A video object may consist of one or more layers to support scalable coding. The scalable syntax allows the reconstruction of a video in a layered fashion starting from a standalone base layer, and adding a number of enhancement layers. A video object layer (VOL) may be composed of group of video object planes (GOVs), which are in turn composed of video object planes (VOPs). This hierarchy allows us to drop components of a visual bitstream, ranging from a VO down to a VOP, and this provides different levels of granularity. An entire VOL can be dropped thereby scaling down the spatial or temporal resolution. VOPs and GOVs are also useful for temporal adaptation. However, dropping must be done so that the decoding process is not affected. For instance, dropping an I-VOP will make an entire GOV undecodable. A VOP provides a finer adaptation level, whereas a VO provides a coarser level resulting in a higher loss of video content. The dropping must also be driven by semantics. A component must be dropped if its contribution to content, hence semantics, is relatively low. This has to be assessed both within (intra) and between (inter) ESs.
MPEG-7 and Adaptation Meta-data MPEG-7 is used to generate meta-information both for the source and the variation videos. MPEG-7 specifies a standard set of descriptors that can be used to describe various types of multimedia information [4,5]. In MPEG-7 multimedia data is represented with the help of descriptions in a standardized way. A description consists of a description scheme (DS) and a set of descriptor values
(Ds) describing the meta-data. A DS specifies the structure and semantics of relationships between its components, which can be both DS or D. A D defines the syntax and the semantics of a feature-representation. The Variation DS, which is defined in the MDS part of MPEG-7 and is of particular importance to us, is used to represent the associations or relationships between different variations of multimedia resources . It serves an important role in applications such as Universal Multimedia Access (UMA) by allowing the selection among the different variations of the multimedia documents in order to select the most appropriate one in adapting to the specific capabilities of the terminal devices, network conditions or user preferences [7,8]. The necessary variation capability information can be extracted from audio-visual streams and stored as an MPEG-7 document in the meta-database so that it can be delivered later to the components on the delivery path to the client.
2.3 Solution Architecture The research project has three major components: extracting meta-data, developing video variation algorithms that will be used for the adaptation process, and explaining the source and the variation videos using MPEG-7. Stationary MM Client
MM Meta DB
Figure 2.1: The Processing Unit as a Server Component.
The module that is responsible for carrying out these activities is called the Processing Unit (Figure 2.1). It is a supplemental server component responsible for creating variation videos and extracting meta-information about the variations and the source as MPEG-7 documents. It extracts the necessary variation capability information from the audio-visual streams. This information is stored as an MPEG-7 document in the meta-database. An expanded version of the Processing Unit is shown in the following Figure 2.2: Architecture of the Processing Unit MM Media DB
MPEG-4 Source Video
MPEG-7 Document (Source)
a) Temporal Scaling in the Compressed Domain: Temporal adaptation in the compressed domain was performed in two stages: (1) dropping every other B-frame and (2) dropping the remaining B-frames. In both cases, the new file sizes were recorded and the resulting file size ratios were computed. The procedure for computing the fidelity of a variation (thus, the resulting quality) is taken from . 95 90
MM Media DB
AnyUnCom MPEG-7 Documents (Variations)
In video variation, as a means for adaptation, the interest is to generate new videos (variation videos) from the source video, basically with a reduced data size and quality. There are many possibilities for video variation. In this paper, results of an experiment in temporal scaling are presented. The other variations that we have explored and achieved results, namely spatial scaling, color reduction and key frame extraction, are not covered in this paper. For all the experiments, five selected MPEG-4 videos supplied mostly from the MPEG-4 Reference Software  were considered. Temporal scaling provides a mechanism to vary a video stream in the time domain, e.g., by dropping B-frames. In a layered coding system such as MPEG-4, this can also be achieved by discarding enhancement layers that contribute to temporal scaling. In temporal scaling, the frame rate of the visual data is reduced. A variation produced by this method corresponds to the SamplingReduction Relationship of MPEG-7. Temporal scaling is carried out in the compressed domain. In the uncompressed domain, we propose a modification similar to temporal scaling, where frames are duplicated, instead of being dropped.
Mobile MM Client
MPEG-4 Variation Videos
3. Flavor of Experimental Results
3.1 Temporal Scaling
Active Cache (Proxy)
MM Media DB
An MPEG-4 video is first demultiplexed into its ESs. For each type of an ES, a series of analysis tools are defined. Once the analysis of an ES in the compressed domain is done, each ES is decompressed and a chain of analyses is applied in the decompressed domain. This is due to the observation that many variation descriptions may conveniently be extracted only in the uncompressed domain. Examples are descriptions for spatial scaling or key frame extraction. The other variations that we have explored and achieved results, namely spatial scaling, color reduction and key frame extraction, are not covered in this paper.
MM Meta DB
The two components AnyCom and AnyUnCom are responsible for the processing in the compressed and uncompressed domains, respectively. Currently processing in the uncompressed domain is handled through Java Media Framework (JMF) which is an API that supports the integration of audio and video playback into Java applications and applets. When a video is added to the database (on demand or on a regular basis), the Processing Unit shall apply operations such as transformation to the MPEG-4 encoded
video, reports the results to the meta-database, and writes back the adapted video onto the media server.
80 75 70 65 60 55 50 akiyo
coast -guar d
File Size Ratio (1) File Size Ratio (2)
vcon- a1er 2-1
Fidelity (1) Fidelity (2)
Figure 3.1: Effects of dropping B-frames on file size and quality.
A graphical comparison of the effects of dropping frames is presented in Figure 3.1. For the first variation level, the average file size ratio (reduced file size compared to the original file size) is 0.78, which is an average file size reduction of 22%. For the second variation level, the average file size ratio is 0.57 which translates into an average file size reduction of 43%. This is a significant reduction for alternating or full B-frame dropping. Correspondingly, the resulting quality reductions are 8% and 15%, respectively. However, these are quantitative measures and may not measure the real reduction in visual perception quality. b) Modified Temporal Scaling in the Uncompressed Domain: This was done under JMF. Selected frames were duplicated so that they will exhibit temporal redundancy and, therefore, a high compression ratio would be achieved. As the result shows, this assumption was valid. Because no frame was physically dropped, the play-back duration was not affected. 100
90 80 70 60 50 40
rig Ev ina l er y 2 Ev nd er y Ev 3rd er y Ev 4th er y Ev 5th er y Ev 6th er y Ev 7th er y Ev 8th er y 9t Ev h er y 10 th
Dropping Method Applied akiyo
Figure 3.2: File size ratio trends.
The reduction in quality was not significantly different between successive dropping of frames (7% on the average).
MPEG-7 Descriptors Generation For the variations discussed above, the corresponding MPEG-7 documents were generated both for the source and variation videos using the standard method. The results could not be presented here because of limitation of space. The VariationSet DS was used to define different bindings to the same source video. However, this concept does not directly support multiple adaptations of media contents, for example spatial scaling followed by color reduction. For this we are trying to define and implement a new DS called the VariationSetTree, which is a tree of VariationSet DSs.
4. Conclusion and Future Works In this paper, we showed the needs for and the possibilities of video adaptation capabilities of audio-visual materials based on video variation. MPEG-4 and its scalability structure were briefly described. Facilities of MPEG-7 that would allow us to describe a video and its importance in video adaptation were also discussed. A general solution architecture is presented and its various components were described and first experimental results were presented. In the near future, we will propose to assess the possibility of dropping frames based on their relative contribution to content rather than arbitrarily dropping them. We intend to develop a chain of video variations, for example spatial or temporal scaling followed by color reduction and developing a new DS that will support such a sequence of variations. Finally, we will explore the possibility of implementing other variation relationships of MPEG-7.
5. References 
L. Böszörményi, H. Hellwagner, H. Kosch, M. Libsie, and S. Podlipnig: Comprehensive Treatment of Adaptation in Distributed Multimedia Systems in the ADMITS Project. ACM Multimedia 2002 Conference, Juan Les Pins, France, December 2002.
ISO/IEC JTC1/SC29/WG11. MPEG-21 Requirements on Digital Item Adaptation. Document N4684, March 2002.
Fernando Pereira (Editor). Image Communication Journal, Tutorial issue on the MPEG-4 standard. http://leonardo.telecomitalialab.com/icjfiles/MPEG-4_si/.
Harald Kosch: MPEG-7 and Multimedia Database Systems. SIGMOD Records, 31(2), June 2002.
José M. Martínez. Overview of the MPEG-7 Standard. ISO/IEC JTC1/SC29/WG11 N4509, December 2001. http://MPEG.telecomitalialab.com/
A series of experiments were conducted by always maintaining the first frame and then maintaining every second, third, fourth, etc. frame from the first and dropping the rest. The results are graphically presented in Figures 3.2 and 3.3. Figure 3.2 shows the trend of file size reduction for each experimental video. Figure 3.3 shows the average trend of file size and quality reductions.
ISO/IEC JTC1/SC29/WG11. Information Technology - Multimedia Content Description Interface - Part 5: Multimedia Description Schemes. Document N3966, Approved International Standard 15938-5, March 2001.
ISO/IEC JTC1/SC29/WG11. MPEG-21 Requirements on Digital Item Adaptation. Document N4684, March 2002.
There is a substantial decrease of file size until the fourth step (i.e., maintaining every 5th frame). Thereafter, an increase was also observed at some points. The average file size ratios vary between 0.68 and 0.41 (a file size reduction of 32% and 59%, respectively). The overall average file size reduction was 49%.
ISO/IEC JTC1/SC29/WG11. MPEG-7 Multimedia Description Schemes XM (Version 7.0). Document N3964, March 2001. http://leonardo.telecomitalialab.com/MPEG/public/MPEG7_mds_xm.zip
90 80 70 60 50 40
O r Ev igin e r al y Ev 2n er d Ev y 3 er rd Ev y 4 er th Ev y 5 er th Ev y 6 er th Ev y 7 er th Ev y 8 e th Ev ry er 9th y 10 th
Dropping Method Applied Fidelity
Figure 3.3: Average file size ratio and fidelity trend.