Object-based Navigation - CiteSeerX

Object-based Navigation: An Intuitive Navigation Style for Content-oriented Integration Environment Kyoji Hirata*, Sougata Mukherjea*, Yusaku Okamura**, Wen-Syan Li*, Yoshinori Hara* *C&C Research Laboratories **Stanford University NEC USA, Inc. 110 Rio Robles, San Jose, California, USA Tel: +1 408 943 3002 E-mail: {hirata, sougata, okamura, wen, hara}@ccrl.sj.nec.com

ABSTRACT

In this paper, we present the idea of object-based navigation. Object-based navigation is a navigation style based upon the characteristics at the object level, that is contents of the objects and the relationship among the objects. With objectbased navigation, users can specify a set of objects and their relationship. The system creates queries from the users’ input and determines links dynamically based on matching between this query and indices. Various kinds of attributes including conceptual and media-based characteristics are integrated at the object level. We introduced this navigation style into the content-oriented integration environment to manage a large quantity of multimedia data. COIR (Content Oriented Information Retrieval tool), an object-based navigation tool for content-oriented integrated hypermedia systems is introduced. We show how this tool works in indexing and navigating multimedia data. Using COIR, we have developed the directory service systems for the World-Wide Web and have evaluated the navigational capability and extensibility of our tools. Multimedia search engines including COIR, extract the characteristics from multimedia data at any web site automatically. Extracted characteristics are connected with each other semiautomatically and utilized in the navigational stage. With this system, users can execute the navigation based on the relationship between objects as well as the contents of the objects. In this paper, we present how the COIR tool increases the navigational capabilities for hypermedia systems. KEYWORDS: Object-based Navigation, Relationship among

objects, Object-level Integration, Integration, COIR, World-Wide Web

Content-oriented

1 INTRODUCTION

With the explosive growth in the volume and distribution of information, information processing and management has Permission to make digital/hard copies of all or part of this material for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires specific permission and/or fee. Hypertext 97, Southampton UK © 1997 ACM 0-89791-866-5...$3.50

become critical. The World-Wide Web involves thousands of bytes of information circulating daily at an uninterrupted pace. In such condition, users require the variety of the methods to access their intended data. Users want to navigate by specifying multiple objects with relationships. For example, users may want to navigate through the video scenes by specifying the objects contained in the scene such as car, tree, boy, etc. In addition, users may also want to add their visual characteristics and use the relationship among the objects. For example, “Navigate to the scenes in which a boy like this one is on the left side of the red car.” It is very hard to provide this type of navigational capability with the conventional hypermedia models. To provide these navigational capabilities, the system designer has to define the buttons which represent the multiple objects, or users have to specify the complicated query using query languages such as SQL. Both approaches are not practical since the system designer has to define many buttons in the first approach and users have to be an expert of the language in the second approach. However, it is highly desirable that this flexible information retrieval concept is introduced into the hypermedia navigational framework. In this paper, we examine a new navigational mechanism, Object-based Navigation. In this navigational style, users may specify a set of objects and their relationships. The system creates the query from the users’ input and creates links dynamically based upon executing a matching between this query and indices. To apply this object-based navigation to large-scale hypermedia systems, a well-organized structure of the indices and integrated matching mechanism, is required. We proposed the concept, Content-oriented Integration, to provide well-organized media contents and their operations[7]. Content-oriented integration provides an integrated navigational environment that consists of both conceptual-based navigation and media-based navigation. We introduce this object-based navigational style into the content-oriented integration environment. For object-based navigation, rich descriptions about the objects and their relationships are required. In the content-oriented

75

integration environment, various media-dependent characteristics are extracted and some of these characteristics are connected to the conceptual representation. These characteristics include the relationship among the objects. By associating these characteristics into the objects contained in the data, we can provide the objectbased navigational capability in the content-oriented integration environment. We discuss the navigational capability of this architecture later. We have been developing indexing and navigational tools for content-oriented integration which we call COIR(Content-Oriented Information Retrieval.) COIR is designed to support object-based navigation. In COIR, multimedia data are divided into several units with semantic concepts. Several media-specific attribute values are added into each unit. Each multimedia data by itself is managed as the set of these units. Using COIR, users can navigate though the information space using the relationship among the objects, in addition to the navigation based on a single object. We have developed several directory-service systems on the World-Wide Web to evaluate the capability of COIR. In these systems, text, image and video search engines extract characteristics automatically from each media item on the web site. We have used COIR for indexing and searching both image and video data. We use the Harvest Information Discovery and Access System[1] for text indexing and searching. The extracted characteristics from text, image and video data are connected to each other semi-automatically and utilized in navigational steps. In Section 2, we describe the idea of object-based navigation and its characteristics. Section 3 explains the mechanisms of object-based navigation in depth. Section 4 presents the COIR tool and its indexing and navigation mechanism and in Section 5, the directory service system is described. We present the architecture of the system and show some navigation examples. In Section 6, we discuss the future work and the conclusion. 2 CONCEPT OF OBJECT-BASED NAVIGATION

In this section, we will present the general idea of objectbased navigation in the content-oriented integration environment. Before explaining the concept of object-based navigation, we briefly summarize the concept of contentoriented integration. 2.1 Content-oriented integration [7]

In the content-oriented integration environment, each media representation is integrated using the conceptual representation as the core. Figure 1 shows the concept of the content-oriented integration. Each media representation is divided into two parts, mediadependent part and media independent part. Media-

76

independent part is the semantic part of the media and is translated into the conceptual representation. Various types of media representations are integrated using conceptual representations as the core. Media-dependent part, which is difficult to translate into conceptual representation, is processed directly without any translation.

Figure 1 Content-oriented Integration

Navigation, in the content oriented integrated environment, consists of two parts: conceptual-based navigation and media-based navigation. Conceptual-based navigation is based on the connection to the conceptual representation and media-based navigation is based on the media-dependent clues such as color, shape, motion etc. Users can access multimedia data in a rich way, through complementary use of these mechanisms. 2.2 General idea of object-based navigation

Object-based navigation is the navigation based on the characteristics at the object level, that is, contents of the objects and their relationship. In this navigation style, users specify a set of objects and their relationship. The system searches the candidates based on the users’ request and displays the results. Figure 2 illustrates examples of the object-based navigation. Suppose that the user is currently looking at the photograph in which a Formula-1 Driver, Ayrton Senna, is standing in front of the car with champagne. He/she is now trying to access the video scenes just by clicking this photograph. He/she can retrieve the video scenes by clicking on the car only. Sometimes, however, the user would like to navigate the video scenes in which Senna is celebrating his winning with champagne. In this case, by specifying both champagne and Senna, the user can access the video scenes he/she is interested in. The user can also define a relationship like “Senna is next to a car.” Moreover, the user can add attribute values to the navigation. In this case, for example, by specifying Senna and the color values of the picture, the user can navigate to the video scenes in which Senna is wearing white apparel.

In media-based navigation, we have expanded the navigational capabilities to include the navigation based on the media-based characteristics such as shape, color, etc. In this navigation, objects themselves and the relationship between objects in multimedia data are managed and used for navigation. However, since the conceptual descriptions of the objects are still for multimedia data files, the integration level is still limited. (The query, “Search for images that look like this and drawn by Van Gogh” , is an example of this kind of navigation.)

Figure 2 Examples of Object-based Navigation

To provide these object-based navigational capabilities, the contents of the objects and their relationships should be described precisely. Considering the costs for their authoring and maintenance, the indices have to be managed in a well-organized way and used effectively in the navigation stage. We introduced the object-based navigation style into the content-oriented integration environment. In contentoriented integration environment, various media-dependent characteristics are extracted and some of these characteristics are connected into the conceptual representation. It is natural to provide the object-based navigational capability in content-oriented integration environment. In the indexing stage, the system extracts the objects contained in the multimedia data based on media-dependent characteristics. The media-dependent characteristics such as color, shape, motion are added to these objects as the attribute values. The relationship among objects such as the composition is also stored in the indices. Some of the objects are connected to the conceptual representation and integrated with other multimedia data. In this framework, various kinds of navigation are defined. Figure 3 illustrates these navigational capabilities. In conventional hypermedia systems, the system designer defines the links between multimedia files (for example, from image to image) or between the single object contained in the multimedia file and other whole files. Users specify the whole multimedia data or a single object contained in it to obtain the data they are interested in. Users cannot specify multiple objects. They also cannot use the relationship among the objects as the query.

Object-based navigation expands the navigational capability through integration between conceptual and mediadependent characteristics of the object. Users can use the media-dependent characteristics of the objects, conceptual information and the relationship between objects as they like. In Figure 2, the query “Navigate to the scenes in which Senna is wearing white apparel” is an example of the singleobject-based integration and the query “Navigate to the scenes in which Senna is on the left side of the car” is an example of the multiple-object-based integration. Attributes

Conceptual

Anchor type

Media-based

Integration of Conceptual and Media based

The whole of multimedia data

Single Object-based Object -based

Plural objects and their relationship

Conventional Hypermedia Navigation

Hypermedia with Media-based Navigation

Object-based Navigation

Figure 3 Object-based Navigation Capabilities 2.3 Data Model

Several node-link models have been proposed and implemented for existing hypertext/hypermedia systems[3][10]. Two basic models are a static link model based on uni-directional graph structure and a dynamic link model mainly described by a scripting language. On of the typical dynamic link model is based on the attribute values, as illustrated in Figure 4. Each node is characterized with a set of attribute values and the links among nodes are defined by comparing one or more attribute values. Users get the next state by specifying the constraints of node. Some are defined by the exact match and others are specified by the similar match. Many text search engines on WWW belong to this model[14]. Our model is defined as the extension of this dynamic-link model. Our main extension is as follows: 1) Hierarchically organized attribute values; 2) Structural constraint and attribute constraint; 3) Comparison with the different level of hierarchical structure.

77

We introduced the concept of the structure into the attribute values including media-based characteristics. As shown in Figure 5, attribute values are described at different levels such as object level, and connected to each other using isa/is-part-of link. Based on this structure, the system defines the similarity at a different level. Destination Nodes

Anchor Node

Dynamic-Link (Similarity Calculation)

A1 A2 A3

Oj

∃

attribute values A1 V j1 V j2 A2 A3 V j3

k, dist( Vik, Vjk ) < Thk

Figure 4 Dynamic Link Model MM Node (Minimum Unit For Navigation)

Media-based Attributes

is-a/is-part-of Media-based Attributes

Object level is-a/is-part-of Sub-object level

Media-based Attributes

Figure 5 Structure of the Attribute Values Similarity including the constraint O i1

O i11

O i3

O i12

O j1 Structural Constraint Attribute Constraint

O i1

O j3

O j21

Attribute values

Attribute values Relevance

O j1

O i2

O j2

O i3

O j3

Multi-Dimension Specify the Dynamic Link

Figure 6 Link Specification in Object-based Navigation

Figure 6 illustrates the link-defining mechanism of our model. Users specify the attribute values of the objects and the relationship between the object. This corresponds that specifying the structural constraint and attribute constraint.

78

The system calculates the similarity of the attribute values and stores them in vector form. To define the dynamic links, these similarities are normalized with the user specified weighting values. 2.4 Related work

Oi attribute values V i1 Relevance V i2 V i3

By specifying structural constraint, the system can define the links based on the different levels attribute values and the links based on the multiple objects.

There exists several approaches to retrieve image data based on the media dependent clues. Some of them are demonstrated on the World-Wide Web. Virage[5] extracts some feature vectors from the image and executes mediabased searches on the WWW. QBIC[4] also extracts the visual features and uses them as clues for retrieval. QBIC system supports the integration with keywords. However, these systems are focused mainly on media-based search capabilities and not on the relationship between these extracted characteristics and their semantics. The Microcosm/MAVIS[9] system attempts to integrate image, video and audio information based on mediadependent features. By defining source anchors and destination as a set of media attribute, the systems define generic links using relationships between the set of the anchor’s attributes and destination. Although this approach is similar to our approach, this system doesn’t focus on the measurement itself. In addition, using this approach, it is hard to describe the relationship between objects. Yahoo’s Image Surfer [13][14] has constructed a collection of images that are available on the Web. The images from the Web Site are categorized manually and in those categories, users can retrieve images based on color histograms. However, these categorizing steps are executed manually. WebSeer [2] connects the textual information and images using HTML documents. By integrating an analysis about the images, WebSeer provides the multimedia search based on the image contents. However, for both systems, the integration is at the image file level. 2.5 Characteristics of object-based navigation

Object-based navigation has the following characteristics: Integration of the conceptual and media-based clues at the object level: In this navigation, users can integrate conceptual and media-based clues at the object level. Users can define dynamically various types of navigation. For example, users can obtain images saying, “Navigate to images in which a man like this is at the center of the image,” or “retrieve all images containing a car like this.” In this case, user specifies the objects using visual attribute values such as location, color, size, motion in addition to the semantic word “man.” Navigation based on the relationship between the objects: Users can specify multiple objects as an anchor.

Users can also specify the relationship among them. For example, users can obtain the photographs saying, “Navigate to the pictures in which a man is at the left side of the car.” In this case, multiple objects “man” and “car” are specified and the relationship “at the left side” is also specified. It is difficult for current hypermedia systems to execute these kinds of navigation. Content-oriented integration: This object-based navigation is designed at the framework of the contentoriented integration. The objects contained in the multimedia data are connected to the conceptual representation. The system naturally integrates multimedia objects and has well-organized structures. Cost-effective indexing processing: Media processing techniques such as image or video processing techniques can be applied for automatic indexing. Based on mediadependent characteristics, the system can extract the objects from multimedia data. The calculation of the attributes values such as color, size, texture, for each object is also executed automatically. Considering the current level of the media understanding technologies, the user has to define the connection between the objects and conceptual representation. However, these steps can be executed interactively with drag and drop interfaces. In addition, storing the information about this connection in the media dictionary, this step can be executed semi-automatically. The system proposes the candidates of the conceptual representation based on its media-dependent characteristics and the system designer simply specifies the intended representation from them. 3 OBJECT-BASED NAVIGATION : INDEXING AND NAVIGATION MECHANISMS

In this section, we will describe the indexing and navigational mechanisms of object-based navigation. To provide the flexible navigational capabilities in an uniform way, the navigation stage is divided into two parts. The first stage is to create the query for matching from the users’ wide variety of the inputs and the second step is to calculate the similarity between the query and indices. Users’ inputs are translated into the query object and weighted parameters, and by using querying phase output, matching phase determines the links dynamically. 3.1 Indexing Step

To provide the object-based navigational capability, the index also has to be described using the objects as a unit. Multimedia data is described using contents of the objects and the relationship among objects. Rich and compact description related to the contents and the relationships are required. Considering the structure from the contentoriented integration, we divide the description of the contents into two parts, conceptual characteristics and media-based characteristics. As for the relationship among the objects, we define two types of descriptions, their logical

relationship and physical relationship. The former provides the structure of the objects. The logical relationship also determines the granularity of correspondence. The latter describes the media-dependent relationship. Figure 7 illustrates the index structure for object-based navigation. Original Image

Media Index

Conceptual Representation

Logical Relationship Conceptual characteristics

Media-dependent characteristics Attributes (Color, size etc. )

Tree

Attributes

attributes

attributes

Attributes

Man

attributes

Physical Relationship Extract the set of characteristics and Create the search query

Query hander Tree

Man

Attributes(Color)

Attributes(Shape)

Figure 7 Indexing Structure for Object-based Navigation

Conceptual characteristics: In content-oriented integrated environment, conceptual representation is managed by a conceptual augmenter. The objects obtain the conceptual characteristics just by being connected to the conceptual representation. Not all the objects have to be connected to the conceptual representation. In Figure 7, the object “horse” is not connected to the conceptual representation. Media-dependent characteristics: These include color, shape, size and motion. These characteristics are extracted through media processing. Logical relationship: This is usually described using isa/is-part-of relationships. This class structure is described recursively. This relationship corresponds to the organizational links in hypermedia field. In Figure 7, the image consists of 6 parts, sky, land, road, tree, horse and person. The person consists of three parts, the upper half of the body, and the lower half of the body and umbrella. Physical relationship: This relationship is based on the media-based characteristics. For spatial data such as still images or video, this relationship is mainly based on its location. For time-variable data such as audio or video, this relationship is described based on its time line. In addition to these four elements, we have to consider the relationship among the conceptual representations. To make our discussion clear, however, we don’t refer to the relationship between conceptual representations in this

79

paper. 3.2 Querying phase for navigation step

The links between multimedia data are determined dynamically based on the similarity between the anchor objects and other multimedia data. In object-based navigation you can specify any set of elements as an anchor object for next navigation. Query handler creates the composite anchor from the set of elements users specify. In Figure 7, the user specifies the conceptual word “tree” and “man,” the objects that look like a horse with the shape information, color values of trees and the relationship between trees and man. In this case, the system doesn’t have to know that the selected object is a horse. The system will find the data which has the same shape information as the object “horse”. From the media-index, the query handler creates the query data. From the selected elements and user-specified parameter, the system defines the weighted function in the matching process. When users want to create the query by themselves, they can draw rough sketches and add some media-based attribute values and conceptual representation. In this specification step, users have to provide input for the next navigation, i.e. select the multiple objects and specify the relationship among them. It is not convenient for users to do so many operations for their navigation. However, it is possible to navigate the next step in one action, by specifying default settings. In the default setting file, the attribute values which are used in the matching step and their weights are described. Users can navigate to the next step in one action, like this, “Navigate to the video scenes in which the man like this is located in this area.” In this case, all users have to do is to click the man. The query handler extracts the object and extracts the media-based attribute values and conceptual representation automatically from the object, based on the setting and creates the query for the matching. 3.3 Matching phase for navigation step

The query that is mentioned in previous sections is compared with the media-indices stored in the system. The procedure of the matching stage is divided into four steps. 1) Correspond the objects with each other; 2) Evaluates the similarity between each correlated object (based on both conceptual representation and mediabased attributes); 3) 3) Evaluates the similarity of the relationship; 4) From 1) -3), evaluates the similarity between the query and the media index. Since multimedia data consists of many objects and the objects have many aspects for correspondence, it is very hard to find the precise one-to-one correspondence,

80

especially by using media-dependent queries. Restriction in matching is required in order to correspond the objects using media-dependent attributes. The system considers the corresponding objects which have similar values about the specified attribute values. Figure 8 illustrates the object corresponding for visual data[8]. In this case, the system defines the correspondence based on the position and overlapping area of the objects. Users can specify the weight between conceptual values and media-dependent values.

Picture Index

U sers’Q uery Region Correspondence

Region M erge

Region M erge

Evaluate the attribute values of each object location, size, motion vector, color, conceptual representation, precise information etc.

Figure 8 Similarity Calculation

The system evaluates the similarity values for each attribute. The similarity of the objects is calculated as the sum of the similarity of each attribute value. The similarity of each attribute value is normalized and the weight of each attribute value is specified based on users’ specification. In COIR, for example, the attributes such as color, shape, motion, texture and conceptual representation are used for calculation. The similarity of the relationship is calculated only when the multiple objects are specified. For visual data, based on the location of each object and the information described in media index, the system calculates the similarity. For example, the relationship “ A is on the left of B” is easily evaluated by comparing each location values. Other relationships such as “A is bigger than B” are also easy to evaluate by extracting the media-dependent attribute values and comparing them to each other. The similarity between the query image and the picture index is calculated as the sum of the object-based similarity and relationship similarity. Since this similarity is based on each object, the system can change the matching weight based on the user’s specification. As we mentioned in 3.1, the picture index is described, using objects as a unit. The system can easily extract the attribute values from the index using the extraction function. So, the system can evaluate at a high speed, once the similarity criteria are defined. Using the mechanisms of this object-based matching and statement-extract functions, the

system can define the links dynamically in a flexible manner. As shown in Figure 3, the concept of object-based navigation includes conventional hypermedia navigation and conventional media-based navigation. When users want to specify only one object like in conventional hypermedia navigation, they simply specify the objects they are interested in. From the media index, the conceptual representation about the specified object is extracted and based on this conceptual representation, conventional hypermedia navigation is executed. Only when users want to specify multiple object, they have to execute some action. To improve the user’s task to specify these many parameters, it is required that the system keeps some default settings in which the weight for calculation are described and the user only specifies the objects in the data. 4 COIR, CONTENT-ORIENTED INTEGRATION TOOLS

We have developed COIR, Content-Oriented Information Retrieval tool, for content-oriented integration. COIR provides the object-based navigational capabilities for still image and video. Considering the limitation of current image recognition and understanding technology level, COIR has several functions to enhance it. In this section, we will explain how COIR works in indexing and navigation step. 4.1 COIR indexing steps

Using COIR, the system (in content-oriented integration environment, we call it video augmenter and image augmenter) creates picture indices. The picture indices created by COIR mainly have these structures. a)

Spatial relationship between the objects (Corresponds to the physical relationship) b) Media-dependent characteristics added on each object (Corresponds to the media dependent characteristics) c) Pointer to the user-specified information (Conceptual representation, Object-index) (Corresponds to conceptual characteristics and logical relationships) Currently, media-dependent characteristics are color, texture, size and location for still images. For video data, motion of the object is also used. In addition to these automatically created characteristics, these picture indices have pointers to the semantic information. (This step is currently executed interactively by the system designer and it is optional.) The system can also store the precise information about the object as an object-index. In the object-indices, visual information such as composition, structure, color are described. The system designer specifies the region of the object and then, COIR creates the objectindex about it automatically. This information is used for the navigation based on the detailed object characteristics. This has almost the same format of the picture indices, and users

can handle them in the same way as picture indices, for example, by applying matching function both to picture indices and object-indices. Indexing by still image augmenter[8] Figure 9 shows the indexing process in image augmenter. More Detail Object Info Divide into Region & Extract Attribute Values (Keyword & Object-index ) (Automatically) Standard Object Info Region 1 Region 3 Original Images

Region Information Region 1 ( Color, Texture, Size, Position etc )

Region 4 Region 2 Structural Information Region 2 ( Color, Texture, Size, Position etc )

Feedback (Re-divide) based on Users’ Segmentation

Connect the semantic words by semi-automatically or Interactively Region 3 ( Color, Texture, Size, Position etc )

Region 4 ( Color, Texture, Size, Position etc )

Store the Object (If Needed) Interactively

“Tree”

‘Shadow’

Figure 9 Indexing by Still-Image Augmenter

COIR divides the image into several regions and calculates the attribute values of each region (Color, Texture, Size, Position etc.) This process is done automatically using image processing techniques. The number of the region depends on the image contents. Currently the number of the regions is at most eight. It is hard to extract the small objects automatically and to handle too many objects in one matching phase. The index size has to be small for better performance. For these reasons, we put an upper limit on the number of regions. In Figure 9, the original image is divided into four regions. The region information is stored as the structural information in a matrix format. The size of this matrix is currently 24*24. This matrix takes charge in the composition and shape of the region. The system designer can connect several regions into conceptual representation (In this case, region 3 is connected into the conceptual representation ‘tree’ and region 4 is connected in to the conceptual representation ‘shadow’.) This step is executed interactively using the semantic editor by just pointing to objects on the screen. The system designer can also specify the object-index for each region. Using the region specified by the user as an input, COIR creates the index whose format is almost the same as picture indices and stores it as an object-index. Object-indices are used for searching based on the component. Component-based navigation such as “Navigate to the images which contain the object like this” is executed by matching the query and this object-index. Since users can specify any size of region, COIR has the capability of

81

handling objects of any size. Object-indices compensate the picture index description capability. Users can also register the relationship between the conceptual representation and the object-index on the media dictionary. With the help of this media dictionary, the process of connecting the regions into the conceptual representation can be executed semiautomatically. The system designer may want to specify the region of the objects. In this case, he/she can define the region interactively. COIR divides other areas again automatically, and stores the new index. Indexing by video augmenter Figure 10 shows the indexing process in video augmenter. At first, COIR extracts the scene automatically. Color histogram and frame differential is used in this step. The information such as scene number, start frame and end frame are stored as scene information.

objects located at the same area corresponding to each other. Using the overlapping information, COIR merges the region until each region corresponds with each other. Objectindices are used to search the specified objects in the image. (See 4.1) The similarity between each pair of objects is calculated. Since the picture index keeps the shape information in the form of the matrix, the shape similarity is measured by comparing these regions on the matrix. Color similarity is measured based on Luv spaces. The similarity of the motion is calculated based on the magnitude and direction of the motion vector. Next, the similarity of the relationship is measured. When the correspondence is executed based on conceptual representations or object-indices, the system extracts the location values and evaluates the relationship directly. (So, users can create a query such as “Navigate to the image which a man is existing at the left of the car like this.”) When the correspondence is executed using location and overlapping, the number of corresponded objects represent the similarity of the relationship. These similarities are normalized using average and standard deviation and calculated with the weight based on the users’ specification. Using similarity between images, COIR determines the links dynamically.

Figure 10 Indexing by Video Augmenter

COIR extracts the still images as key-frames of the scene. These key-frame images are processed just the same way as a still image. Users can also add semantic and object-indices semi-automatically. In addition to this still-image based approach, COIR also extracts the motion vector of the objects and adds them as one of the region attributes. 4.2 COIR matching steps[8]

COIR provides object-based navigation functions for visual data. As mentioned in Section 3.3, COIR has to process various kinds of inputs. COIR matching process is also divided into 4 steps[See 3.3]. At first, COIR tries to associate the objects between query image and picture indices. Currently, COIR has three methods of correlation. 1) Based on the location and overlapping area information; 2) Based on the conceptual representation; 3) Based on the object-indices; Figure 8 illustrates method 1). COIR considers that the

82

To improve the user’s task to specifying these many parameters, the system keeps some default settings which users can specify beforehand. Users specify the objects in the data in the navigation stage. Currently we are trying to use the users’ manipulation history to modify this setting automatically as well as developing good graphical interfaces. 5 OBJECT-BASED NAVIGATION ON THE DIRECTORY SERVICE AP FOR WORLD-WIDE WEB

We have developed the directory service system for WorldWide Web to evaluate the navigational capability and extensibility of COIR. Currently, COIR supports only image and video data. We modified Harvest Information Discovery and Access System[1] and used it as a text augmenter. 5.1 Architecture

Figure 11 shows the overall architecture of the system. The directory server manages the multimedia indices of the multimedia documents at various data sites. In the indexing phase, from these data sites, the system gathers multimedia data, creates indices of them and stores the indices at the directory server. In the navigation phase, various clients can search their intended information using these multimedia indices on the directory server. The directory server has the querying interfaces. Users can input many types of queries and retrieve the results, navigate through the information spaces based on “thumbnail” data, and retrieve the data they

are interested in. Data Server

Directory Server

Web Server

Web Server Harvest COIR Image ( Mirroring )

Thumbnail

HTML Document (Image, Video, Text)

Picture Index Picture Index-URL table (connection, if needed) Keyword Index

Figure 12 Mechanism in Indexing Step

Figure 11 Directory Service System Architecture 5.2 Architecture

Figure 12 explains the indexing phases. COIR library works mainly for both video and still images indexing and Harvest Information Discovery and Access System works for text indexing. At first, a Harvest gatherer gathers and indices textual information from the user specified sites. The user is allowed to specify one or more sites by listing their URLs. Moreover, Harvest allows the indexing of only a section of a site by using various mechanisms; (for example, limiting the URLs that are indexed by using regular expressions.) We then extract the images and videos which are referenced in each HTML page. This information is available from Harvest. These multimedia data are extracted using a Web mirroring package. From this extracted data, COIR creates the picture indices, thumbnail files for browsing and tables to refer the original data location(URL) automatically [See 4.1]. Using this step, the system designer only has to enter some required information such as URLs of the site. After that, as described in Section 4.1, these extracted objects contained in the images or videos are connected into the conceptual representation semi-automatically. 5.3 Navigation phase

Figure 13 explains the navigation phases. Users can access to the directory server in various ways by mouse clicking or using the editor prepared by the directory server. COIR accepts these various types of inputs such as rough-sketches of the images, key frames of the videos, semantic words for whole images or objects, motion vectors of the objects, the IDs of the thumbnail files. By calculating the similarity between the query and the picture indices stored at the directory server, COIR determines links between them and returns the results with the match rate.

The Harvest broker is used to retrieve documents that contain the user specified keywords. The Broker provides an indexed interface to gathered information. To accommodate diverse indexing and searching needs, Harvest defines a general Broker-Indexer interface that can accommodate a variety of search engines We are using a glimpse as our search engine. Our system allows the users to view the result in several ways. Users can display the result using 3D visualization techniques (For example, Right of Figure 14) and walk through the 3D space. VRML is used for the visualization and any VRML browser can be used to look at the views. Since users can also obtain the URL information of the original multimedia document, they can obtain the original document using this information. The system designer can easily create or modify the user interfaces using Perl or Java in order to fit the application.

Figure 13 Mechanism in Navigation Step 5.4 Applications

We have indexed several Web sites allowing us to query these sites based on both images and keywords. In addition to indexing these sites, we have created several applications on the Web. In this section, we will present some of our systems. Georgia Institute of Technology, College of Computing web server We have indexed the HTML documents at Georgia Institute of Technology, College of Computing web server, which consists of approximately 2500 HTML documents and 1600 images[15].

83

Figure 14 is an example of the navigation process. The left image is one of the querying interface of our search engine and the center image shows the results of this search and the right is an example of 3D visualization of the results. In this case, the user is trying to obtain the document which has the keyword “graphics” and an image similar to the images whose URL is specified (Photograph of the “professor”). The photograph which is located at the upper left side of the right image is used for query. The results of the retrieval are sorted based on the matching score and the 5 candidates are displayed. In this case, the first candidate has the same photograph as the query. All the retrieved documents are related to the graphic areas and have some pictures on it. Users can continue to navigate from these results as well as visualize them[11]. This overview of the result space allows the user to navigate through it. Users can easily obtain their intended HTML document if they remember the contents of the HTML documents. In addition, users can find the new addresses in which the user is interested in. Photo stock application We have implemented the application for stock photography. On this site, there exists approximately 10,000 photographs. Figure 15 shows an example of the media-based search. The user draws a rough sketch in which the sun is setting and retrieves images using the editor prepared by directory service system. (This editor is implemented with JAVA). If needed, using this editor, users can add keyword information about the objects. Users obtain the navigation results based on thumbnail images. Users may obtain the related information about the images using the URL address, or search again using the retrieved images as the next query. Figure 16 shows the combination search with conceptual word and media-based queries at the object level. The user just specify the position of the object and inputs the conceptual representation “person”. (Left side shows the query user specified). Based on this ambiguous mediadependent clues (location & size) and the connection to the conceptual representation, “person,” COIR searches the similar images and displays the candidates as shown the left image of Figure 16. The right-side image is the navigation results. Since, the default weighting is specified beforehand, users access the similar image considering both conceptual representation and media-based characteristics at the object level simply by clicking the button. In this case, all the objects contained in the image corresponded based on the location and overlapping area information and conceptual representation factor is added to the similarity of the object. Therefore, in this retrieval, the spatial relationship among the objects with the conceptual representation are taken into consideration. In Figure 16, the spatial relationship among “snow”, “person”, “sky” are evaluated as well as other

84

media dependent characteristics. Video stock application We have also implemented the application to retrieve video data. Video indices are created automatically as mentioned in 4.1. To navigate through the video information, users can specify the motion of the objects. Figure 17 is an example of this kind of navigation. Currently, 40 video titles with about 500 scenes are stored in this database. The user draws the objects and adds the motion vector on it (left side image, in this case, specified the object at the center of the image and set the motion vector to right direction) and the system retrieves the video scenes corresponding to the query. Almost all the retrieved video scenes have the objects moving to the right at the center of the image. The user navigates to the video scenes. (See right-side image of Figure 17) In this case, video data and thumbnail images of each key frame which are created automatically by COIR are displayed. 6 CONCLUSION

This paper has described a mechanism of object-based navigation defined in the content-oriented integration environment. To integrate the conceptual representation and media-based representation at an object level, users can navigate through the hypermedia spaces using the contents of the objects and their relationship. We’ve been developing COIR, an object-based navigation tool for content-oriented integrated hypermedia. In this paper, we also explain how COIR works in indexing and navigation stages. By using COIR, we implemented a content-oriented integrated hypermedia system. To evaluate the usefulness of COIR, we have created directory service systems on the World-Wide Web. We are now visualizing the retrieval results in a userfriendly manner. 3D approaches described in 5.4 is one example of our approach. The clustering technologies are required to provide some class-based visualization. In order to improve the system designer’s work, we are now modifying the media dictionary and its human interfaces. User-friendly interactive interfaces to specify these anchor objects is also significant work. It is possible to support the user-customized parameter settings by introducing the human perception model and learning mechanisms [6][12]. We are applying this technology to set weights for matching automatically. We believe that COIR library expands the navigational capabilities and can be used for creating large hypermedia systems. 7 ACKNOWLEDGMENTS

We would like to thank College of Computing, Georgia Institute of Technology for allowing us to use the web site and photographs. We also would like to thank our

colleagues in the C&C Research Laboratories, NEC USA, Inc., in particular, Kojiro Watanabe for his encouragement and kind comments on this work, Nancy Fix and Sue Frasula for critically reading the manuscript.

pp361-369, 1996. 7.

K. Hirata, Y. Hara, et al., ”Content-oriented Integration in Hypermedia Systems,” ACM Hypermedia’96 pp. 1121, Mar. 1996.

8.

K. Hirata, Y. Hara, et al., ”Media-based Navigation for Hypermedia Systems,” ACM Hypertext’93 pp.159-173, Nov. 1993.

9.

Paul H. Lewis, Hugh C Davis, et al., “Media-based Navigation with Generic Links”, ACM Hypertext’96, pp215-223, March, 1996.

REFERENCES

1.

C. Bowman, P. Danzig, et al., “The Harvest Information Discovery and Access System,” Proceedings of the Second International World-Wide Web Conference, Oct. 1994.

2.

C. Frankel, M. J. Swan, et al., “WebSeer: An Image Search Engine for the World Wide Web,” University of Chicago Technical Report TR-96-14, July, 1996.

3.

F.G. Halasz. Seven Issues: Revisited. Keynote Talk at ACM Hypertext’91, Dec. 1991.

4.

Jonathan Ashley, Myron Flickner, et al., “The Query By Image Content (QBIC) System,” ACM SIGMOD Conference, May, 1996.

5.

J. R. Bach, C. Fuller, et. al., “The Virage Image Search Engine: An Open Framework for Image Management,” Proceedings of the SPIE - The International Society for Optical Engineering: Storage and Retrieval for Still Image and Video Databases IV, pp.76-87, Feb. 1996.

6.

I. J. Cox, M. L. Miller et al., “PicHunter: Bayesian Relevance Feedback for Image Retrieval,” 13th International Conference on Pattern Recognition,

10. S.J. DeRose, “Expanding the Notion of Links,” ACM Hypertext’89, pp. 249-267, 1989 11. S. Mukherjea, K. Hirata, et al. “Visualizing the Results of Multimedia Web Search Engines,” IEEE Information Visualization Symposium, Oct, 1996. 12. T.P. Minka and R.W. Picard, “Interactive Learning using a ‘Society of models’ ,” MIT Media Laboratory Perceptual Computing Section Technical Report No.349. 1995. 13. The Image Surfer. Available at http://isurf.yahoo.com 14. Yahoo. Available at http://www.yahoo.com 15. Available at http://www.cc.gatech.edu

Figure 11 An Example of Homepage Navigation

85

Figure12 An Example for Photograph Stock Application (Similar Search)

Figure13 An Example for Photograph Stock Application (Combination Search)

Figure 14 An Example for Video Stock Application

86