Interactive 3D Sonification for the Exploration of ... - Semantic Scholar

NordiCHI 2006, 14-18 October 2006

Papers

Interactive 3D Sonification for the Exploration of City Maps Wilko Heuten OFFIS Escherweg 2, 26121 Oldenburg, Germany [email protected]

Daniel Wichmann OFFIS Escherweg 2, 26121 Oldenburg, Germany [email protected]

Susanne Boll University of Oldenburg Escherweg 2, 26121 Oldenburg, Germany [email protected]

ABSTRACT

INTRODUCTION

Blind or visually impaired people usually do not leave their homes without any assistance, in order to visit unknown cities or places. One reason for this dilemma is, that it is hardly possible for them to gain a non-visual overview about the new place, its landmarks and geographic entities already at home. Sighted people can use a printed or digital map to perform this task. Existing haptic and acoustic approaches today do not provide an economic way to mediate the understanding of a map and relations between objects like distance, direction, and object size. We are providing an interactive three-dimensional sonification interface to explore city maps. A blind person can build a mental model of an area’s structure by virtually exploring an auditory map at home. Geographic objects and landmarks are presented by sound areas, which are placed within a sound room. Each type of object is associated with a different sound and can therefore be identified. By investigating the auditory map, the user perceives an idea of the various objects, their directions and relative distances. First user tests show, that users are able to reproduce a sonified city map, which comes close to the original visual city map. With our approach exploring a map with non-speech sound areas provide a new user interface metaphor that offers its potential not only for blind and visually impaired persons but also to applications for sighted persons.

Sighted persons typically use a visual city map to make themselves familiar with an area. Looking at the map, they can identify the location of objects and landmarks, as well as the spatial relations between these objects. When actually walking through this area the user either carries the map along or tries to recall the cognitive model of the map to find his way. Most of the navigation aids we find today are targeted as being perceived by the visual sense. This excludes blind and visually impaired persons from this source of information. Unfortunately, for a blind or a visually impaired person it is very important to have a detailed understanding of the target environment before leaving the house. Consequently these persons typically do not leave the house alone and do not feel themselves prepared to walk a new route. What is needed is an information presentation that allows a blind or visually impaired user to access the same map information as a sighted person, however, with a different sense and channel. For a blind or visually impaired person the auditory sense is one of the central senses for interacting with the real world. Our objective is to support blind and visually impaired people to prepare a visit of an unknown area. The use of our tool should lead to a cognitive model of the area. We do not aim at a speech-driven exploration of a map as this has some downsides: Only one item of information can be presented at the same time. Also speech-based navigation systems follow a route and read out the directions, they do not provide information about the map, its objects and landmarks and the spatial layout of the map. For our approach we aim to use the auditory channel to provide information that helps to understand the spatial layout of a city, its landmarks and relevant objects.

Author Keywords

sonification, auditory display, 3D sound, exploration, interaction techniques, city maps ACM Classification Keywords

H.5.1 [Information interfaces and presentation]: Multimedia Information Systems, Audio input/output; H.5.2 [Information interfaces and presentation]: User Interfaces, Auditory (non-speech) feedback

In our auditory city map each relevant geographic object like a park and a public building are represented as twodimensional area that is associated with a distinct sound. With all sound areas together we establish an acoustic representation of the city. However, this alone does not allow easily exploring the map and building a cognitive model of the map. Like in similar approaches, the user would have to move the mouse or any other input device over the map. Only if one of the areas is hit, the sound object can be heard. This approach makes it difficult to understand direction, distance and the relationships of objects to each other such as: a railway station is located left of a park or lake is loca-

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. NordiCHI 2006: Changing Roles, 14-18 October 2006, Oslo, Norway

Copyright 2006 ACM ISBN 1-59593-325-5/06/0010…$5.00

155


Papers

task is the familiarization with the placement of specific facilities and destinations the user is interested in. The user wants to know important places like railway stations, central parks, bus stops or shopping facilities and where they are located in the city. Maps also often support the planning of a trip. The map is then used to find and prepare a route, which leads efficient and save from one location to another. We can utilize maps for measuring distances. These can either be relative to other objects or absolute. A map can help to orientate within an area. In this case the map is often used, while on the way. A typical question, the user would like to answer is: Where am I now and in what direction am I orientated?

ted within a park. To overcome this constraint our proposed system plays the sound of certain objects concurrently. The user can hear all close-by objects. Depending on the distance between the user’s and the object’s position the volume of the object’s sound is adjusted: Objects close by are louder than farther objects. Furthermore our prototype uses a planar three-dimensional acoustic model, in order to mediate also the direction of a geographic object compared to the user’s orientation. By listening to the sound objects, the user gains an idea of his surroundings. In order to get a cognitive model of the entire presented area, the user can alter his own position and virtually walk through the city, while listening to the objects around him. The current system is implemented as a standalone application and as a Web browser plugin. We hope, that we can contribute to the visually impaired and blind people’s exploration of unknown areas at their home, in order to feel better prepared when visiting new places in the real world. With a small set of test users we evaluated our prototypical application and received a first confirmation that our sonification approach helps to understand and reproduce city maps and their layout.

In order to fulfil above-mentioned tasks, typical maps provide us with certain geographic features. Because of the high numbers of feature types, there exist many types of maps, e.g. with focus on sights, statistics, or elevation information. In this paper we concentrate on city-maps, which usually provide following geographic features: districts and quarters (organizational structures), streets, public transport stations, public buildings, squares, gardens, waters, monuments, hotels and shops.

The further sections of this paper are structured as follows: First, the usage of city maps is analyzed in more detail, in order to be able to identify typical tasks that we perform with city-maps. These tasks are used to build the requirements for our system and to describe later what tasks it supports. Afterwards the paper discusses the current related work in the topics of available navigation systems, presentation of geographical information for the blind and visually impaired, and recent sonification approaches. The following section 3D Sonification of Maps“ describes in detail our solution ” of conveying city map information to the blind, following with an explanation of our current prototype development. The paper closes with the results of our evaluation and a conclusion.

Requirements for an acoustic exploration of city maps

Our approach aims to give blind and visually impaired people the ability to have access to city maps and to build a cognitive model of these maps. In the following, more specific requirements that a system should fulfil to support this process are presented. • It should be easy to get an overview of the area. That means, that the user should be able to understand the area, which the map presents. • The users should easily perceive the most relevant details of the maps. With the purpose of getting an overview of the city in mind, we identified the following geographic elements of a city map as most relevant: Parks and gardens, lakes, squares and places, public buildings, quarters, sights, points of interest.

REQUIREMENTS FOR AUDITORY MAP EXPLORATION

Typically, maps can be used in various ways and with different objectives. In order to be able to support blind and visually impaired people to accomplish these tasks, it is important to understand the usage and properties of city maps. In the following subsections the tasks and properties are being analyzed. The results lead to specific requirements that our final system has to fulfil.

• For giving the blind and visually impaired people the possibility to take advantage of the auditory presentation, the application should make it possible for the users to understand the position of specific facilities and destinations. • It is desirable to provide a representation, which allows it to apprehend the size of specific elements and even better to understand the shapes of elements.

Using city maps

One major task people aim at using city maps is to explore an unknown area and get an idea how the area is organized. This task usually is conducted by a fast exploration of the map, because it is accomplished by getting an abstract overview of the location, size and shape most important geographic objects. For sighted people an overview can be obtained at a first glance on the map. The longer the period of time the user is investigating the map, the more detailed the resulting overview. This is however constrained by the user’s mental capacities. A similar task that maps can solve, is to get an overview about the surroundings of a predefined location. The user would like to know, what objects are close to this location and in which direction they can be found. Another

• It should be possible to get an understanding of the distances between two different objects on the map and in a next step to understand the geographic relations between objects, e.g., one object is left, right, above or below of another or within its boundaries. RELATED WORK

Adams described three phases of navigation [2]: preparation, gross navigation, and fine navigation. The first phase includes to get an overview of the destination area and to create 156


Papers

phic objects on the map nor getting an idea of distances and directions between objects.

route to get there. Blind people perform this process usually at home in a safe environment. The gross navigation is performed on the way. The main task of this phase is to get from one way point to the next way point of the planned route. The fine navigation process contains tasks like obstacle detection, perceiving the material of the floor etc. With the advent of speech-based car navigation systems we find adaptations of these for the mobile users, some systems also for blind users that use speech, sound, and/or vibration or even Braille to support the gross navigation process of the mobile person (e.g., [7], [14], [16], [8], [6], [17], [5]). However, these systems are employed when the user already is on his or her tour. With our approach we aim to achieve a map visualization and understanding in advance to a walking tour to support the preparation navigation phase.

The existing solutions and approaches for non-visual information presentation of maps have definitely their benefits when performing typical tasks with digital maps. However, they suffer from the inability of presenting more than one geographic object at the same time. Objects have to be selected before the user is getting information about them. This leads to two problems for the user. First, to explore his near environment, the user has to select all objects in his surrounding, for example by moving his finger over the map. If the user does not select or hit an object, his mental model about the environment will be inaccurate. The second problem is, that if only one object at the same time is presented and can be perceived by the user, then it is a hard task to perceive geographic relations between objects, for example if an object A is near to an object B and if an object A is located within object B. These relations however are essential for building a mental model and getting an overview of a region.

With regard to map visualization, different approaches to provide blind and visually impaired people with geographic information have been developed in the past. The most common projects focus on tactile exploration of maps. They use raised lines and shapes to present geographical entities on swell paper or are created by Braille printers (embossing maps). The major disadvantage of these maps is, that they are expensive to produce and not very flexible in the matter of displaying certain information on demand. In order to print tactile maps, extra work has to be done, so that they meet the requirements of the printing device. Also the maps have a low-resolution, so that the user can explore the map with her fingers. A multimodal user interface for displaying maps was developed with TACIS (Tactile Acoustic Computer Interaction System) [4]. A raised-dot graphics output is placed on a touchpad. The user can either touch it lightly with a single finger to instantly hear a musical tone that indicates the geographic entity. On demand speech output is given to provide more information about the object. Although, the system allows some simple interaction with the map, the above-mentioned disadvantages of tactile exploration remain. Furthermore, expensive special devices are needed to explore the map. Employing the haptic modality, special force feedback devices have been proposed to convey shapes, streets, and other geographic objects such as the Grab system [13]. Another project uses the PHANToM device by SensAble Technologies Inc. to display geographic data [15]. Although these devices are in general suitable to provide the necessary information, they are not widely used, because of their high costs. Another disadvantage is that with using haptic exploration, it is not possible to present several objects at the same time.

To solve these challenges 3D sound can be used, which combines a concurrent auditory presentation of information objects with their spatial layout. A very recent article by McGookin and Brewster [12] shows the advantages as well as the issues when using a concurrent audio presentation as part of an auditory display. On the other hand there have been some approaches showing that 3D sound is suitable to convey spatial information. In [19] a system is proposed which transforms 2D images into auditory images. Different sounds corresponding to the brightness level are emitted by the system. First results show that sound localization can be used to convey visual information for simple shapes and low resolution patterns. The usage of 3D sound to help navigation in immersive virtual environments has been investigated in [11]. The results show, that sound cues can be used for navigation in 3D environments. A very recent article by Walker and Lindsay [18] examines non-speech beacon sounds for navigation and path finding. Their conclusion shows, that that the non-speech auditory interface can definitely be used for successful navigation. 3D SONIFICATION OF MAPS

The objective of our work is to provide blind users with an overview of a city. As basis we use a 2D city map and identify the most important objects and areas on the map, which are described in the requirements section. For these objects we apply the concept of sonification. Sonification is described in [10] as follows:

Recently, we find solutions to provide blind and visually impaired users with geographic data through more sophisticated auditory feedback. One of the later developments have been published in [20,21]. While the user can explore a map with a standard keyboard the system conveys statistical data for geographic areas by artificial sounds. The user also can zoom into a map to get more detailed information. This approach is suited to get for example statistical data like number of habitants of a specific area or demographic values. However, it does not support the search of particular geogra-

“Sonification is defined as the use of nonspeech audio to convey information. More specifically, sonification is the transformation of data relations into perceived relations in an acoustic signal for the purposes of facilitating communication or interpretation.“ For our application, we transform the geographic objects (e.g., lakes, parks) within the city along with their characteristics into an acoustic signal. These characteristics are in 157


Papers

particular the object type, the object location and its shape. Different object types are represented by different sounds. In order to convey information about the location of objects and their shape we combine the concept of sonification with a 3D virtual sound room. Using 3D sound, the sound sources – each representing a specific geographic object – can be placed within a virtual room equivalent to the objects’ spatial positions on the map. The auditory representation of an object can be localized by the user within the sound room and he or she gets an idea of the position on the map. In order to convey information about the shape and size of an object, we are using not a punctual sound source, but two-dimensional sound areas. By exploring the map, the user is able to hear the edges of the objects and can then construct the objects’ silhouettes. To improve the perception of object sizes, which affect also the distances between the user and the object, the distance of a two-dimensional geographic object is always measured from the nearest point of the object to the user as illustrated in Fig. 1.

more than one object at the same time. The sounds in the near environment of the user are being played continuously. Thus the user can listen to several objects concurrently without specifically selecting objects. This leads to two advantages: first it is easier for the user to find out spatial relations between objects, which is an important task to get an overview. Second, the user can listen to objects from a distant location (he does not have to touch them before). Therefore, a user is able to find objects more easily. To clarify this benefit we present an example: Suppose the user is looking for a lake in his near environment. Instead of scanning his environment by swiping over and selecting one object after another, in order to identify the object type, with our approach the user can listen to his surrounding and identifying the nearest lake without any further interaction (supposed there is one). Building a Mental Model of the City Using Sonification and Exploration

As described above, for the presentation of the elements, the concept of sonification is used. Each city map element is associated with and presented by a particular non-speech looped sound. Objects of the same element type that are close to each other can have different nuances of the sound, so that the user is able to distinguish them. There are two possible paradigms which can be used to associate an element with a sound: Using natural sounds like the dabbling of water for the presentation of lakes or singing birds for parks or using instrumental sounds such as a particular melody for waters and a different melody for parks. Both paradigms have their advantages and disadvantages. To learn the association of natural sounds with natural entities is very easy. The user can start immediately with the exploration of the map. However, some entities of a city map are not natural but more organizational like quarters, public buildings, or sights. For those it will be hard to find appropriate natural sounds. On the other hand, we could use instrumental sounds, which have to be learned before the exploration of the map can start. Different melodies are associated with different entities of the city map. However, there won’t be any problem to find sounds for all entities. Furthermore, we can present easily even more information with instrumental sounds. For example, we could associate the type of instrument with the size of an object (large lakes are played by the tuba, small lakes are played by the flute). Future research has to be done, to proof which paradigm (natural ore instrumental sounds) would be more appropriate or if it would even be useful to mix both paradigms at the same time.

Figure 1. Distances from the user to the nearest point of objects Although the sound objects currently need only two dimensions (they are all placed on the same height in the room and the objects do not have a volume), we are using a threedimensional sound room for several reasons: Most newer sound card producers incorporate the sound room metaphor within their drivers; we can therefore use common application interfaces such as EAX. These interfaces often use the extent of a room, e.g. to calculate reflections, in order to generate a more realistic output sound. Furthermore we can use the third dimension in our future research, to display further information of objects, such as size, elevation, and orientation.

By placing a virtual listener within the virtual sound room, the directions and distances of the various sound sources regarding the listener’s position and orientation can be perceived by the user. Far away objects sound more silent than object near by. Left objects will sound on the left hand side whereas objects to the right will sound on the right hand side. The user can move the listener through the virtual sound room (indicated by arrows in Fig. 2). The input device to control the listener’s movement can be for example the mouse or a digitizer tablet. The problem with the mouse as well as with every other relative input device is, that these kinds

With the combination of sonification and placing the auditory objects within a virtual sound room, we are able to convey important information about the geographic layout of the city. The user is now able to perceive a basic overview of the map. Another benefit that we can draw from using of an auditory 3D display is that, we can present information about 158


Papers

of devices are hard to handle by blind people, because the position of the device in the real world gives no reference point to determine the position in the virtual map. Thus it is recommended to use absolute devices like a digitizer tablet. The pen’s position on such a tablet is equivalent to the listener’s position on the screen. Hence, he always knows his absolute position on the auditory map.

mentioned entities. Also, it is possible, that these overlap. One challenge of this work is to reduce the simultaneously played sounds, so that the user is able to locate, memorize and interpret them. There are many possibilities to reduce sound sources. Some of them are also known from the computer graphics and are also implemented in GIS-(Geographic Information System)-Viewers: • Filter objects: This possibility allows it to select the objects, which should be presented. Only these selected objects are presented. The filtering can be based on type, size, location, or/and distance of objects. • Clustering objects: Objects of the same type can be combined, if they are very close to each other. The term close“ ” also depends on different parameters like the scale or the zoom factor of the map. • Level of detail and fisheye ( fishear“) view. Objects ne” ar to the user are presented in high resolution; others are presented in less detail. • Prioritisation of objects via volume modification: Some objects or object types are more important than others. These objects are presented with a higher volume, while others are displayed with a lower volume. • Prioritisation of objects using distance information: Another prioritisation mechanism is based on the distance information. Important objects or types might be presented even if the user is far away. In contrast, other objects, which are less relevant, are presented only if the user is very close.

Figure 2. Moving the virtual listener over the map The volume and direction of the objects’ sounds change depending on the listener’s movement. If she comes closer to an object, the sound volume increases and vice versa. And if she moves around an object, the direction of the sound revolves around the listener’s head. Thus, she is focusing on the area of the map around the listener’s position, because according to the distance between different objects and the listener the objects are displayed with different volume and level of detail.

• Prioritisation based on object size: In this case small objects are presented later, while large objects are presented earlier, or vice versa depending on the preferences of the user. • Zooming and panning: By zooming into the map, the user can explore only a part of a map. But this part can then be presented in a higher level of detail. To explore and choose other sections the user can pan the map.

So the user can explore the city map step by step while moving the virtual listener over the map and by investigating the displayed objects and their spatial relations to each other. In this manner the user can build and complete the mental model of the city map by taking advantage of the concept of interactive 3D sonification.

To consider these reduction methods, we introduced in a first step of the development the concept of sound radiation. Each object possesses it’s own radiation, which can be configured either by the user or automatically. The radiation specifies the distance between the listener and the object that has to be reached in order to hear the sound source associated to the object. See Figure 3 for an example of the sound radiation of one particular object. Automatic reduction of radiation levels could be performed, if the user reaches a location where all sound objects at this point exceed a predefined threshold regarding for example the amount or overall volume of objects.

Avoiding Information Overload

Humans are restricted in identifying concurrently played sound objects. Brazil and Fernström investigated in [3] the perception of everyday sound scenes with overlapping auditory icons. They stated that participants have been able to identify correctly three auditory icons with 84.8%, while six auditory icons have been recognized with 50%. On average with 74% sound scenes with 3, 4, 5, or 6 concurrent overlapping auditory icons have been identified correctly. Their experiment also shows, that the less auditory icons are presented at the same time, the better the identification. For no overlapping categories the identification quality increases on an average of 89.92% for 3, 4, 5, or 6 concurrent auditory icons. A typical city contains many more than six of above-

REALIZATION AND PROTOTYPE

In this section our current prototype for the sonification and interactive exploration of city maps is presented. At first, we discuss different methods to obtain the necessary semantic information from common city-maps, which our prototype 159


Papers

• Bitmap maps and manual annotation: In the worst case, there is only a bitmap map, but no image analysis methods can be used. Hence a manual annotation has to be performed. This can be done by using an application to specify the geographic entities on the bitmap. Generally in each of the three cases described above a map in a certain format forms the starting point of the annotation process (see Fig. 4). The relevant geographic entities, their coordinates and type, are extracted using different approaches depending on the map format and the requirements. Finally, the extracted data and the map form an annotated city map, which can be used in applications. In our approach, we use the Geography Markup Language (GML) to store the annotations for a map. GML is an initiative by the Open Geospatial Consortium (OGC) to provide an open and standardized format for describing geographical features, based on XML Schema. In its current version 3.1, GML enables the specification of two- and three-dimensional geographical objects. For example, GML provides a general Polygon“ ” element, however, in a real world scenario an application developer would want to specify what that polygon represents (i.e. lake, forest, building, etc.). We specified our own application schema [9], which is based on GML. The following listing is an example for the GML-description of a polygon for a geographical element Building“: ” 619 209 643 125 706 84 716 99 677 228 619 209

Figure 3. Sound radiation of an object requires to display the map in an auditory way. This subsection also includes a description, how the semantic information is stored. Afterwards our framework for the auditory display is described. The framework provides an application interface for building three-dimensional auditory worlds and common interaction methods. The framework is substantiated by our auditory map prototype, which is described last in this section. Annotation of Maps

To be able to present geographic entities like parks, buildings, lakes, and squares by different sounds, we need a map with some semantic information about the shape, position, orientation and type of all geographic elements of the map. In the following, we call this semantic information also an annotation of a map. There are three different possibilities to get the annotations, which depend on the map type (vector maps or bitmapped maps) and available image recognition strategies: • Vector maps: If a vector map is available, then semantic information is usually stored within the map. This is the best case. It is only necessary to analyse the vector map and to create the map annotations directly from the information in the vector map. This can be done automatically, but manual improvements are possible. An example for this type of maps is the ArcView-shapefile. ArcView is a Geographic Information System (GIS) and shapefile is a data type for geographic data.

The polygon of the element is given by a number of coordinates, which describe the position of the polygon’s vertices as a linear ring. In this case the coordinates are the pixels on the bitmap map, but GML also provides support for various coordinate systems, for example, Gauss-Krüger coordinates, which is a coordinate system to measure points on the Earth’s surface. The annotations for the city maps are stored in GML-files that follow our application schema. Our prototype reads these files and represents the annotated maps acoustically using our framework for auditory display.

• Bitmap maps and the use of image analysis methods: If only a bitmap map exists, then image analysis methods can be used to add the semantic information based on colours. Maps have unique colour-keys, e.g. lakes are blue, parks green. So it is possible to recognize the different geographic elements with appropriate analysis algorithms. This approach is presented in [9]. There are still many difficulties to get all the necessary semantic information; manual rework is usually necessary with this approach.

Framework for Auditory Display

The development of auditory displays requires always-recurrent tasks. Therefore we have developed a framework for the creation of auditory displays. It is written in C++ and provides an application interface for creating three-dimensional 160


Papers

Figure 5. Architecture of the framework for auditory display

auditory worlds. To encapsulate from diverse sound technologies such as EAX, DirectSound, A3D, operating systems and devices, the framework is based on the FMOD audio engine [1]. The framework provides a 3D virtual sound room – in the following also called the sound world – which is the container for every element necessary for the auditory display (see Fig. 5). This world object has functions to manage the general audio display and to control the contained objects. An important element is the virtual listener, which represents the users position and orientation in the sound world. The listener can be moved arbitrarily within the sound world and provides methods for manual and automatic movement and rotation. Like the virtual listener every sound object can be positioned and oriented within the sound world. At the moment the sound objects can be either one-dimensional or twodimensional. Two-dimensional objects are defined as polygons. The illusion of two-dimensional area sounds is created by moving a one-dimensional sound object onto the border of the polygon, where the listener is closest. If the listener is moved, the sound source’s position is updated accordingly, to keep the shortest distance to the listener. If the listener is located onto an object, than the sound object is played with a maximum. Therefore the object sounds the same, no matter where the listener is located on the polygon. Every sound object can adapt its presentation through a number of parameters. For example the volume, the radiation radius, within which the object can be perceived by the listener, and the attenuation rate can be configured. It is also possible to loop the sound of an object. Only objects, which are active and whose distance to the listener is smaller than their radiation radius, will be played. The framework also provides functionalities to group sound objects. This is useful for filtering objects and changing the properties for a set of objects at once.

Figure 4. Example of an annotation process for city maps

161


Papers

where the sound comes, so the user can find the location of the sound source on the map. In our current prototype we are using natural sounds for the auditory presentation. By moving the virtual listener over the map, the user can build a mental model of the city map based on the concept of 3D sonification. Besides the standalone version of the map prototype there is also an ActiveX version, which runs in a Web browser (see Fig. 6). EVALUATION

To proof, whether our approach is useful for building a helpful mental model of a city or not, we conducted a preliminary evaluation. The participants of this early evaluation were four sighted persons with an age between 26 and 29. The group consists of one woman and three men. All users are experienced computer users but had no special experience with three-dimensional virtual sound rooms or sonification. Figure 6. The ActiveX version of the prototype running in a browser The framework provides also methods and classes with some mathematical algorithms, coordinate system converters for visual and audio presentations of the world as well as some data structures, which are useful for the audio display and the management of the sound world and its objects. The Map Prototype

The map prototype is developed using the programming language C++. The prototype uses the framework for auditory displays to present an audio representation of the virtual 3D sound world to the user. The visual presentation of the city map is realized with DirectX.

Figure 7. Evaluating the prototype

When starting the application an image of the city map and an associated GML data file with the annotations for the map are loaded. Both can be located on the local machine or accessed via the Internet. Afterwards the sound world object is created and the virtual listener is positioned. For creating the auditory representation of the map the prototype parses the GML-file and the application creates a sound object for every geographic feature. The objects’ position and extent in the sound world are based on the annotations in the GML-file. The sound objects are linked to appropriate sounds, depending on the type of the geographic elements (e.g. park or sight). Additionally the prototype shows every geographic element graphically as coloured polygon. The colour of a polygon represents the type of the corresponding element (e.g., green for parks and blue for lakes).

The evaluation setting illustrated in Figure 7 consists of one notebook with the display deactivated and an external monitor for the evaluator to see the test users’ movement over the map. A digitizer tablet by Wacom (21 cm x 15 cm) was used as input device to control the virtual listener. In order to improve the perception of the tablet’s area, the tablet was equipped with a 3 mm high border. For the sound output an external USB 3D sound card by Creative with EAX support (Soundblaster Audigy 2 NX) and standard wireless stereo headphones were provided. The participants got a brief explanation of the prototype about how to interact with the application and how to interpret the auditory signals. But they have never used the prototype nor they have seen the map before, which was taken for the evaluation. After the introductory explanation the participants explored the map. For the evaluation of the basic concept parks and lakes were considered and represented on the auditory map: the test setting contained two lakes and six parks. There were not time restrictions. The average exploration took between five and ten minutes per participant. During the exploration phase the participants were asked to draw their cognitive model of the map. For this, a foil was put on the same tablet on which they explored the map. The

After this initialization phase, the audio playback starts. Each object in a certain distance to the virtual listener plays its looped non-speech sound. The user can move the listener with the digitizer tablet. By moving the listener over the map the auditory feedback changes in real-time. The closer the listener comes to a geographic element, the higher the volume of its associated sound. With stereo headphones or stereo speakers it is possible to determine the direction from 162


Papers

participants drew the perceived geographic entities and their extent on this foil.

Figure 8. Diagram: objects found and differentiated The results of this preliminary evaluation are very promising. Three of four test persons found all objects on the map. The white bars in the diagram in Figure 8 represent the percentage of geographic objects on the map for which the location was reproduced correctly by the participants. Only one test person did not find a small park, which was a bit isolated from the other objects. All other locations were identified correctly. The black bars show the percentage of objects, which were found and recognized as single objects. Three of the participants perceived all objects correctly as single objects. Person A recognized four of the parks on the map as only two parks. The reason for this was maybe that on two locations on the map there are two parks very close to each other in each case. They are just separated by a street. Figure 9. Real city map and mental model in the evaluation

The observation during the evaluation showed, that the test users were attracted by the objects when they came into the radius in which the object was displayed aurally. The participants moved to the direction from where the sound came and found the objects in this manner without scanning the area randomly and searching for objects.

most cases the perceived extent was smaller than it was on the city map. The shape of the drawn objects often differs clearly from the real shapes. So there have to be some improvements in these aspects, but the main goal of getting an overview of the city and the spatial relations of the objects on the map is achieved.

Figure 9 shows an example how the participants perceived the map’s layout. The upper half of the Figure 9 shows the city map, which was used in this evaluation, together with the aurally displayed objects. The lower half shows the sketch of one of the test persons, which was drawn in the evaluation to display the participant’s mental model of the city. In this sketch all objects are found and differentiated correctly and the spatial relations between the objects are proper. The objects are also assigned to the correct type of entity (W in the sketch marks water/lakes and P stands for parks). The parks on the right side are identified accurately as single objects, although they are close to each other in the real map, and the lake within the park on the lower left part of the map was correctly perceived as inside of the park“. Thus, the building of ” a mental model especially in respect of the spatial relations of geographic entities worked well in this evaluation.

The evaluation showed, that building a mental model of a city map using the concept of sonification and 3D virtual sound rooms works convincingly well. We also discovered necessary improvements, especially in the display and perception of the shape and extent of objects. In the near future, we are planning to conduct a large evaluation with a higher number of participants with different levels of experience and ability and from different groups to get further and more detailed results. The visual ability of these test persons will be widely spread from congenitally blind over adventitiously blind or visually impaired to sighted. CONCLUSIONS

In this paper, we present a system that allows blind and visually impaired users to explore geographic information provided on city maps. The overall goal – building a cognitive

On the other hand, it was hard for the test persons to determine the extent and especially the exact border of objects. In 163


Papers

model of the city – is supported by using 3D sound areas, each of them representing a specific geographic object on the map. By exploring this virtual sound room the user obtains an idea of relations between objects like distances, directions, and object sizes. With the results from our first tests we can say that this approach can much contribute to the mobility and safety of blind and visually impaired users.

10. G. Kramer, B. Walker, T. Bonebright, P. Cook, J. Flowers, and N. Miner. The sonification report: Status of the field and research agenda. report prepared for the national science foundation by members of the international community for auditory display, 1999. 11. T. Lokki and M. Gröhn. Navigation with auditory cues in a virtual environment. IEEE MultiMedia,, 12(2):80– 86, 2005.

Getting an overview of the destination area is only one task in the whole navigation process. Our concept does not support all navigation tasks, so other aids e.g. aiding persons while they are on their way like [7] are still necessary. Next to the improvements on our current prototype, one issue for our future research will be to combine diverse systems, which are supporting certain tasks of the navigation process, to a homogeneous non-visual navigation support system.

12. D. K. McGookin and S. A. Brewser. Advantages and issues with concurrent audio presentation as part of an auditory display. In Proceedings of the 12th International Conference of Auditory Display, pages 44–50, 2006. 13. The Grab Project Members. Grab, http://www.grab-eu.com/index.htm [Online; sed 01-September-2006].

Acknowledgments

2003. acces-

14. H. Petrie, V. Johnson, T. Strothotte, A. Raab, S. Fritz, and R. Michel. Mobic: Designing a travel aid for blind and elderly people. Journal of Navigation, Royal Institute of Navigation, 49(1):45–52, 1996.

This paper is supported by the European Community’s Sixth Framework programme1 . REFERENCES

15. K. Rassmus-Gröhn and C. Sjöström. Using a force feedback device to present graphical information to people with visual disabilities. In Second Swedish Symposium on Multimodal Communication, Lund, Sweden, 1998.

1. FMOD audio engine. http://www.fmod.org [Online; accessed 01-September-2006]. 2. C. Adams. An investigation of navigation processes in human locomotion behavior. Master’s thesis, Polytechnic Institute and State University Virgina, 1997.

16. D. A. Ross and B. B. Blasch. Wearable interfaces for orientation and wayfinding. ASSETS’00, pages 193–200, 2000.

3. E. Brazil and M. Fernström. Investigating concurrent auditory icon recognition. In Proceedings of the 12th International Conference on Auditory Display, pages 51–58, 2006.

17. K. Tsukada and M. Yasumura. Activebelt: Belt-type wearable tactile display for directional navigation. Lecture Notes in Computer Science, 3205:384–399, 2004.

4. B. Gallagher and W. Frasch. Tactile acoustic computer interaction system (TACIS): A new type of graphic access for the blind. Proc. of the 3rd TIDE Congress, 1998.

18. B. N. Walker and J. Lindsay. Navigation performance with a virtual auditory display: Effects of beacon sound, capture radius, and practice. In Human Factors, volume 48(2), pages 265–278, 2006.

5. Sendero Group. Sendero group - inventor of the bluenote gps, 2006. http://www.senderogroup.com [Online; accessed 01-September-2006].

19. Z. Wang and J. Ben-Arie. Conveying visual information with spatial auditory patterns. IEEE Transactions on Speech and Audio Processing, 4(6):446–455, 1996.

6. G. Harrasser. Design eines für Blinde und Sehbehinderte geeigneten Navigationssystem mit taktiler Ausgabe. Diploma thesis, Technische Universität München Fakultät für Informatik, 2003. in German.

20. H. Zhao. Interactive sonification of geo-referenced data. In CHI ’05: CHI ’05 extended abstracts on Human factors in computing systems, pages 1134–1135, New York, NY, USA, 2005. ACM Press.

7. N. Henze, W. Heuten, and S. Boll. Non-intrusive somatosensory navigation support for blind pedestrians. In EuroHaptics 2006, pages 459–464, 2006.

21. H. Zhao, B. K. Smith, K. Norman, C. Plaisant, and B. Shneiderman. Interactive sonification of choropleth maps. IEEE MultiMedia,, 12(2):26–35, 2005.

8. S. Holland and Morse D. R. Audio GPS: Spatial Audio in a minimal attention interface. In 3rd International Workshop on HCI with Mobile Devices, 2001. 9. M. Horstmann, W. Heuten, A. Miene, and S. Boll. Automatic annotation of geographic maps. In Computers Helping People with Special Needs, volume 4061/2006, pages 69–76. 10th International Conference, ICCHP 2006, Linz, Austria, July 11-13, 2006. Proceedings, Springer, 2006. 1

FP6-2003-IST-2-004778

164