Interactive news documents for digital television - ACM Digital Library

Interactive News Documents for Digital Television Marcelo G. Manzato, Daniel C. Junqueira, Rudinei Goularte Mathematics and Computing Institute University of Sao Paulo Av. Trabalhador Sancarlense, 400 PO Box 668 – 13560-970 Sao Carlos, SP – Brazil

{mmanzato,danielcj,rudinei}@icmc.usp.br ABSTRACT As different technologies emerge each day in the context of digital television, interaction functionalities are becoming crucial once users are already familiarized with the interaction on the web. Although digital television standards provide ways to interact with the content, more complex manipulations can be accomplished using the MPEG-J specification, which is part of the MPEG-4 standard. In this paper, we present a technique that aims at the generation of dynamic and interactive content from interactive sources like the web, enabling users, despite their limited interactivity with the content of TV, have access to fresh dynamic content that is generated in real time during the video’s compilation. Although we have not explored the full potential of MPEG-J, we think that this work is the first step for more contributions in the area of digital and interactive television.

Categories and Subject Descriptors H.4 [Information Systems Applications]: Communications Applications; I.7.2 [Document Preparation]: Multi/mixed media—XSL transformations, MPEG-4 authoring

General Terms Design

Keywords XSLT, XMT, MPEG-J, MPEG-4, interactive digital television

1.

INTRODUCTION

The digital video has become popular once it brings interesting advantages to the users, such as better video and audio qualities, portability and interactivity between multimedia content and users. This last characteristic, in special, requires the support of new technologies which are able to provide good compression rates and possibility to interact with the content in the same way it is done on the web.

These requirements have led to the development of standards, techniques and tools to create, deliver and present interactive content, such as MPEG-4 [4], and to provide metadata for this content, such as MPEG-7 [5]. These standards, together with the interests from communication and entertainment industries, are motivating the development of interactive and digital TV as one of the main interactive video applications. As well as the European, Japanese and American standards, the Brazilian system for digital television, called Ginga [7], is responsible to support the video delivery chain in a way to maximize the advantages of digital television, giving support to user interaction with the content. In the case of Ginga, the interaction can be specified in two different ways: using a declarative language, called NCL [13], which provides ways to synchronize and interact with the content, and a procedural language, which provides an infrastructure to run programs written in Java. Although both ways can be adopted in Ginga to create interactive multimedia content, more complex media processing can be accomplished if we have access to low level structure of the scene. Thus, the MPEG-J specification [6], which is part of MPEG-4 standard, provides ways of accessing these low level structures using programs written in Java, and this feature allows the development of extremely complex scene manipulations and interaction with video objects defined inter or intra frame. On this way, we have started the development of an extension for the middleware Ginga, with support to MPEG-J for interaction [1]. In this paper, we present a technique that aims at the generation of dynamic and interactive content from interactive sources like the web, enabling users, despite their limited interactivity with the content of TV, have access to fresh dynamic content that is generated in real time during the video’s compilation. In order to accomplish this objective, various structured documents processing must be done, and they will be described in details in the following sections. This paper is organized as follows: in Section 2 we present the related work; in Section 3 the concepts related to authoring MPEG-4 video are described; in Section 4 we depict the method to capture fresh news from web sites; in Section 5 the results of our technique are presented; and finally, in Section 6 we present the final remarks.

2. RELATED WORK As part of work that explores news captured from the web, we can cite the Dowman et al.’s work [3], which presents the

1 © 2008 Brazilian Computer Society

Rich News system, that identifies individual stories in news broadcasts, and annotates them with related content from the World Wide Web. After being semantically analyzed, and used to produce summary information for each news story, they can be delivered to users as part of an interactive television broadcast. While they use the captured material to create semantic annotations, we use the pieces of news to deliver additional information to the user, by applying XSL transformations to generate XMT-O documents. The use of XSL transformation is explored in the literature for different purposes. Pimentel et al. [14] use XSLT to translate XMI documents into XML Schemas. The technique is part of a work that considers the modeling of context information for capture and access applications. Itagaki et al. [10] propose the SAVANT project, which is an interactive digital framework that uses XSL transformations to generate different scalable representations of Internet pages. Together with transformations, interaction functionalities is also accomplished in this paper. In the context of interactive digital television, Cesar et al. [2] present a graphics software architecture for next-generation digital television receivers. Interaction can be implemented using a Javabased procedural environment, or a declarative environment, which both support rendering 2D/3D graphics and W3C recommendations, such as SMIL and XForms. Although they have support for procedural languages, allowing the development of complex features, which may be difficult to be implemented using only declarative languages, we are able to reach more complex manipulations, once the MPEG-J procedural program has access to the scene structure, which is defined into the MPEG-4 video. In order to represent the scene structure of MPEG-4 video, Goularte et al. [8] proposed the MediaObject model, with the objective to describe I-TV programs, and minimizing some limitations, such as strict hierarchical relationships among media objects, low levels of granularity for metadata, etc. We plan to adopt a model to efficiently represent scene structure in future work.

3.

MPEG-4 AUTHORING

The authoring of MPEG-4 video uses open source tools available on the literature. The GPAC project [9] is an open source multimedia framework for research and academic purposes in different aspects of multimedia, with a focus on graphics, animation and interactivity technologies. In a previous work [1], the GPAC framework was extended with the objective to allow the insertion of MPEG-J streams into MPEG-4 video, which is in conformance with the MPEG-4 standard. This feature has as advantage the possibility to implement high levels of interaction between user and video objects. Figure 1 presents the outline to author MPEG-4 video with support to MPEGLets. All procedures will be described in the following subsections.

3.1 XMT-O/A Conversion The XMT [12] stands for Extensible MPEG-4 Textual, which is a textual format to represent MPEG-4 scene descriptions and media content. The XMT format can be divided in two levels: XMT-O, which is similar to SMIL and provides a high level language to represent media objects and their relationships, and XMT-A, which is based on X3D and provides a low level language to represent MPEG-4

Figure 1: Generating MPEG-4 video using GPAC tools. scene definitions. Although both document formats can be used to describe the scene, the XMT-O is preferred because of its simplicity and similarity to SMIL. However, once the MP4Box tool, provided by the GPAC framework, requires the scene description in a low level format (XMT-A, for example), it is necessary the use of a conversor in order to transform the XMT-O document into XMT-A. We have implemented this conversor in Java, and we have added functionalities to convert any media type (video, audio, string, MPEG-J, etc.) definition into MPEG-4 object-tree structure definitions, which are described using the XMT-A format.

3.2 MPJ Generation An MPEG-J [6] application can be included in a video as an elementary stream similar to video and audio streams. To accomplish the inclusion of the MPEG-J in an elementary stream of a MPEG-4 file, it is necessary firstly to generate the Java application, and then, to encapsulate such application as a stream in the final media file. The main requirement of the Java application is that it has to implement the MPEGLet Java interface, which is defined in the ISO/IEC 14496-21 standard. In the MPEG-J application created in the experiment reported here, we have used a version of the interface which is distributed with the IM1 reference player from ISO. To generate the MPJ file, we have used a tool called JavaES which is distributed by ISO with an open source license. The tool can be called through command line and receives as arguments the file which contains MPEG-J main class, the MPEGLet APIs classes, and, obviously, the classpath needed to generate the MPEG-J. Then, it generates the MPJ file, which is a JAR-like file, prepared to be inserted as a stream into the MP4 final file. The MPJ file also includes the news content previously captured from a web site, as described in subsection 4.2. Thus, as the MPEGLet has access to the MP4 terminal, when a user clicks on a news header node, the MPEGLet processes the attributes of the node, which include the news ID, and displays the news content in a Java Swing JFrame.

3.3 MP4 Generation The last step of the MP4 authoring consists on the gener-

2

ation of the MP4 file, which is then streamed to the player. To generate the MP4 file, we have used a extended version of the MP4Box tool, which was built to handle MPEG-J streams. This version of the tool recognizes MPJ objects and other medias in the XMT-A file, and includes such objects as elementary streams in the MP4 file, as they are specified in the ISO/IEC 14496 standard.

4.

NEWS CAPTURING

In the last section, we described how a interactive MPEG4 video is created from XMT-O documents. In this section, we show the process of authoring XMT-O documents in order to create an interface containing fresh pieces of news to be interacted with the user. In addition, as the user may not have an Internet connection, we show how the news content is captured in the provider side, and delivered to the user together with the video stream.

Figure 3: Video being presented to the user with fresh news captured from the UOL web site.

4.1 Creating the Interface Nowadays, lots of web sites make their content available using the RSS format. The RSS specification [15] is a web content syndication format, which conforms with the XML version 1.0 specification. An RSS document is composed by a root element called , and at one level below, there is a single element that contains metadata about the channel, and its content which is structured in the form of items. Once RSS documents are supposed to be updated continuously by web site providers, they can be used as news feeders for news programs in digital television. Thus, if we have ways of combining RSS documents with XMT-O documents, we can apply the MPEG-4 authoring tools described previously in order to generate MPEG-4 video with fresh news headers. In this paper, we are using XSL transformations [16] to combine both documents. The XSLT is a W3C recommendation, acting as a language for transforming XML documents into other XML formats. We have implemented a stylesheet that is able to transform an RSS document into a XMT-O document. Together with the transformation, the stylesheet defines features to present the content formatted according to the interface to the user. Besides that, it creates the MPEGLets definitions, which are used for interaction functionalities. Figure 2 presents an XMT-O document generated from a XSL transformation using an RSS document captured from the UOL web site1 . The content of the RSS element is converted to the textLines attribute from the XMT-O element. All formatting is specified in the document, which are in conformance with the presentation’s interface. In addition, the XMT-O document contains the MPEG-J definition, which is used to specify the elementary stream for the Java program.

4.2 Capturing the News Content The capture of the news content is done on the provider site, because users may not have an Internet connection available on their set-top boxes. Again, we use XSL transformations in order to capture the news content from the news web page, which address is specified into the RSS element. This transformation 1

http://rss.home.uol.com.br/index.xml

has also the purpose of eliminating additional information present in the news web page, such as announcements, banners, menus, etc. As the news content captured from the UOL web site is structured in the form of old HTML format, which is not well-formed, we have to convert it to XHTML before applying the stylesheets. This conversion is done using a JavaScript code available at [11], and after the XSL transformation, the created documents are attached to the Java program, which is delivered to the user together with the MPEG-4 video, in the form of a MPEG-J stream. Then, at the client side, when the user clicks on a news header, a window is opened and the related news content is displayed.

5. EXPERIMENTAL RESULTS By combining the techniques of MPEG-4 authoring with the XSL transformations we have created an environment for delivering news for digital television, giving possibilities to the users to interact with the content in different ways. Figure 3 presents the client-side player, where the user is able to watch the television content being transmitted in a channel, and at the same time, he is able to read and access fresh news captured from a web site. In the Figure, the news is captured from RSS documents provided by the UOL web site, and by applying XSL transformations on it, an XMT document is created, which is used by the MP4Box tool to create the MPEG-4 video. The MPEG-4 video created with the MP4Box tool has a stream which contains a Java program responsible to define interactions between video objects and the user. This program is defined using the MPEG-J specification, and it provides ways to the user to interact with the interface by clicking on news of interest. By doing that, a window is automatically opened to the user, and the news content corresponding to the clicked news is displayed. It is also possible to choose different channels to watch the video. When clicking on the three different channels available on the interface, the MPEGLet handler makes a new request for the related video.

6. FINAL REMARKS This paper has presented a technique to generate dynamic and interactive content from interactive sources like the web.

3

Figure 2: XMT-O document generated from a XSL transformation using an RSS document obtained from the UOL web site. Although we have not explored the full potential of MPEGJ, neither integrated the tool into a real digital TV system, as for instance, Ginga, we believe that our knowledge and infrastructure may lead to future contributions with high interactions capabilities. As future work, we plan to extend the technique by implementing the dynamic addition of news items in execution time, and also, the development of new interaction paradigms, such as the use of TV remote controls and personal and portable devices.

7.

[7]

[8]

[9]

ACKNOWLEDGMENTS

This work was sponsored by UOL (www.uol.com.br), through its UOL Bolsa Pesquisa program, process number 20080129100700, and by FINEP, through the Ginga Project at ICMC. The authors would like to thank the financial support from UOL and FINEP.

[10]

8.

[11]

REFERENCES

[1] D. F. Carvalho, R. Chies, and R. Goularte. Esteganografia em videos MPEG-4. In Proceedings of XIII Brazilian Symposium on Multimedia Systems and Web (WebMedia 2007), volume 1, pages 5–8, 2007. [2] P. Cesar, P. Vuorimaa, and J. Vierinen. A graphics architecture for high-end interactive television terminals. ACM Transactions on Multimedia Computing, Communications, and Applications, 2(4), 2006. [3] M. Dowman, V. Tablan, H. Cunningham, and B. Popov. Content Augmentation for Mixed-Mode News Broadcasts. In Proceedings of the 3rd European Conference on Interactive Television: User Centred ITV Systems, Programmes and Applications, 2005. [4] I. O. for Standardisation. MPEG-4 Overview. Available at: http://www.chiariglione.org/mpeg/standards/mpeg4/mpeg-4.htm. Access date: Jun. 2008. [5] I. O. for Standardisation. MPEG-7 Overview, 2004. Available at: http://www.chiariglione.org/mpeg/standards/mpeg7/mpeg-7.htm. Access date: Nov. 2007. [6] I. O. for Standardisation. MPEG-J White Paper, 2005. Available at:

[12]

[13] [14]

[15]

[16]

http://www.chiariglione.org/MPEG/technologies/mp04j/index.htm. Access date: Jun. 2008. Ginga. Middleware Ginga: TV Interativa se faz com Ginga! Available at: http://www.ginga.org.br. Access date: Jun. 2008. R. Goularte, M. G. C. Pimentel, and E. S. Moreira. Context-aware support in structured documents for interactive-TV. Multimedia Systems, 11(4), 2006. GPAC. GPAC Project on Advanced Content. Available at: http://gpac.sourceforge.net. Access date: Jun. 2008. T. Itagaki, J. Cosmas, and M. Haque. An interactive digital television system designed for synchronised and scalable multi-media content over DVB and IP networks. In Proceedings of the 2004 IEEE International Conference on Multimedia and Expo, 2004. ICME’04., 2004. Jupitermedia. The JavaScript Source: Generators: HTML2XHTML. Available at: http://javascript.internet.com/generators/html2xhtml.html. Access date: Jun. 2008. M. Kim, S. Wood, and L. T. Cheok. Extensible MPEG-4 textual format. In Proceedings of the 2000 ACM workshops on Multimedia, 2000. NCL. Middleware Ginga-ncl. Available at: http://www.ncl.org.br. Access date: Jun. 2008. M. G. C. Pimentel, L. Baldochi, and E. V. Munson. Modeling Context Information for Capture and Access Applications. In Proceedings of the 2006 ACM Symposium on Document Engineering, pages 92–94, 2006. RSS Board. RSS 2.0 Specification. Available at: http://www.rssboard.org/rss-specification. Access date: Jun. 2008. W3C. XSL Transformations (XSLT), 1999. Available at: http://www.w3.org/TR/xslt. Access date: Jun. 2008.

4