Mixed Fantasy: An Integrated System for

0 downloads 0 Views 144KB Size Report
at our 3D audio design and delivery system and our experiments ... macrostimulators here) and the creative uses we have of them in MR ... devices on the weapons for this purpose. .... lighting signifies the start of a scenario, change the mood.
ISMAR 2003

Mixed Fantasy: An Integrated System for Delivering MR Experiences Charles E. Hughes Computer Science Dept.

Christopher B. Stapleton Inst. For Simulation & Training and Digital Media Program

J. Michael Moshell Digital Media Program and Computer Science Dept.

Paulius Micikevicius Computer Science Dept.

Darin E. Hughes Computer Science Dept.

Peter Stepniewicz Digital Media Program

University of Central Florida Orlando, FL 32816 {[email protected], [email protected], [email protected], [email protected], [email protected], [email protected]} realistic visuals required of a multimodal MR entertainment experience [5]. Section 2 presents the interesting aspects of our MR Graphics Engine, focusing on our contributions to the areas of object selection and registration. Section 3 describes the audio systems we have developed, looking at our 3D audio design and delivery system and our experiments with hypersonic sound. Section 4 describes the technologies of show control devices (called macrostimulators here) and the creative uses we have of them in MR experiences. Section 5 describes our scenario authoring and delivery system. Section 6 ends the paper with a brief discussion of directions that our current research efforts are taking.

Abstract This paper describes the underlying science and technologies employed in a system we have developed for creating and delivering multimodal Mixed Realities. We also present several aspects of a number of experiences that we have developed and delivered, when such aspects help understanding of algorithmic requirements and design decisions. The technical contributions include a unique application of retro -reflective material that results in nearly perfect chroma-key, even in the presence of changing illumination, and a component-based approach to scenario delivery. The user interface contribution is the integration of 3-D audio, hypersonic sound and special effects with the visuals typical of an MR experience.

2. Rendering and Registration While most of our work transcends the specific technology used to capture the real scene and deliver the mixed reality, some of our work in rendering and registration is highly dependent upon our use of a video see-through head mounted display (HMD). The specific display/capture device used in our work is the Canon Coastar video see-through HMD [6]. In our MR Graphics Engine, the images displayed in the HMD are rendered in three stages. Registration is performed during the first and the third stages and includes properly sorting the virtual and real objects in the images dis played to the user, as well as detecting when a user selects a real/virtual object.

1. Introduction Mixed Reality, the landscape between the real and the virtual, is an area of research with challenges in science, technology, systems integration and the application of artistic convention. The latter area is the topic of a companion paper [4]. Here we focus on our scientific, technological and system integration contributions. The scientific results presented here include object selection techniques based on a laser beam emitted from a user-controlled device and a unique application of retroreflective material to object registration. The technologies include 3D and hypersonic sound, and show control software systems and interfaces we have developed. The integration contribution is a component-based approach to scenario delivery, the centerpiece of which is a scenario engine that integrates the story components provided by compelling 3-D audio, engaging special effects and the

2.1. Rendering Rendering is done in three stages. In the first stage, images are captured by the HMD cameras, scaled, and

1

ISMAR 2003

processed to correct for lens distortion. Since the cameras provide only RGB components for each pixel, alpha channel is added by chroma-keying, as described in the subsection below. Alpha-blending and multi-pass rendering are used to properly composite the virtual objects with the captured images. For example, if a bluescreen indicates an opening in the wall, virtual characters are rendered “realistically” when they are behind the portal and “infrared vision” effect may be used whenever the characters are behind the wall. Detection of a laser-dot, described in the next subsection, is also performed during the first stage. In the second stage, the virtual objects are rendered and combined with the output of the first stage. Depending on the application, multiple passes and alphablending are utilized as indicated above. In the third stage, the output of the second stage is used to determine whether the laser-beam intersects any virtual or real objects. More details on the process are given in the following subsections.

intersection are computed, the list of the virtual objects is traversed to determine which object has been hit. Note that this approach automatically selects the first object hit by the laser, while tracker-based techniques require an additional step to sort the objects. However, the laserbased approach requires the user to be looking at the intersecting object, while tracker-based methods do not. We extended the above method to also detect intersection between the laser beam and real objects. For example, in several scenarios we developed, a macrostimulator is invoked to flip a real object when it is hit by the laser. To achieve this functionality, a transparent polygon is rendered in the same global 3-D position as the real object. Thus, while the RGB values of the image are not affected, the corresponding position in the depth buffer is modified. Since before every rendering step the depth of all pixels is set to infinity, laser-object intersection is detected by a change in the depth value. Global coordinates for the intersection are then computed by unprojection and used to identify a real-world object to be affected by the macrostimulator system.

2.2. Object selection 2.3. Retro-reflection and chroma-key A number of Mixed Reality scenarios require that users have the ability to select virtual objects. A variation of the selection process is registering a hit of an object (real or virtual) by a virtual projectile fired from a real weapon. A number of existing systems place additional tracking devices on the weapons for this purpose. However, such approach is susceptible to tracking errors (for example, magnetic trackers are generally not accurate enough to consistently hit small objects) and is limited by the maximum number of tracking elements in the system. As an alternative, we utilize a laser emitter combined with laser-dot detection in the images captured by the HMD. Laser-detection is performed in real-time by our MR software system. In fact, for efficiency reasons, it is part of the chroma-keying process described in the following subsection. Once a laser-dot (a predetermined number of pixels in the specified color-range) is detected, its 2-D position in the captured image is recorded. The rendering engine then overwrites parts of the captured image with the virtual objects. After rendering, the recorded laser-dot position is inspected for a change in color values. A color change occurs only if the pixel was overwritten during rendering, which indicates object selection. In such case, the global 3-D position of the laser-object intersection can be computed by unprojecting the vector (xs, ys, zs, 1), where xs and ys are laser-dot coordinates in 2-D screen space, and zs is the value written to the depth buffer during rendering. Unprojection is a standard functionality in OpenGL and Direct3D APIs. Once global coordinates for the

In some cases, especially when movie clips are inserted into openings such as windows or doors in a real scene, we employ chroma-keying technique to properly sort the virtual objects with respect to real-world objects. One approach is to use a monochromatic screen, which has become a standard part of television and film studios. Lighting conditions greatly affect color values recorded by the digital cameras and thus must be tightly controlled. Furthermore, uniform lighting of the screen is necessary to avoid “hot-spots.” Such control is problematic for a Mixed Reality system as it often operates in environments with dynamic lighting. As an alternative, we use a combination of a retro-reflective screen and camera-mounted LED’s. A retro-reflective screen reflects the light only directly toward the source. Thus, the light of the camera-mounted LED’s is reflected to the cameras with negligible chromatic distortion. At the same time, any light sources not in close proximity to the cameras have a minimal contribution to the captured image. As a result, the same lighting conditions are guaranteed for any venue and environment, dramatically reducing the computational complexity of chroma-keying. Our experiments were carried out with five Color Kinetics LED’s placed near each camera of the headmounted display unit and all LED’s were aimed in the same direction. This setup was effective for distances from 3 to over 15 meters. The minimum distance is determined by the vertical and horizontal field of view of the camera as well as the LED’s light cone and can be decreased by

2

ISMAR 2003

uniformly distributing the LED directions across the view frustum. The maximum distance is determined by the LED intensity. The four images below were captured using the HMD and our MR software. Figure 1 shows the environment and the retro-reflective screen while Figure 2 shows the same scene illuminated by the camera-mounted LED’s.

Figure 3. Best case observed

Figure 1. Unlit material

Figure 4. Worst case observed

3. The MR sound engine In designing the MR Sound Engine, we required a system that could provide real-time spatialization of sound and support acoustical simulation. Real-time synthesis of sound was not a primary concern. After studying many alternatives from APIs to DSP libraries, we settled on EAX 2.0 in combination with DirectSound, primarily because this was the most powerful, the best supported, and the most widely used API combination available. We also chose to incorporate the ZoomFX feature from Sensaura to simulate emitter size of sound sources.

Figure 2. Material lit by HMD-mounted LEDs

The best and worst results after chroma-keying are displayed in Figure 3 and Figure 4, respectively. Each pixel determined to have the color emitted by the LED’s appears as red; all others are unchanged. Chroma-keying is done by our MR software in real-time (rather than coloring the pixels red, our software sets the alpha bits for appropriate blending when rendering virtual objects; here the virtual object is a red polygon).

3.1. Sound properties There are three main properties that affect 3-D sound: directionality, emitter size and movement. The first can be divided into omni-directional versus directional sounds, the second into point sources with emitter size zero versus voluminous sources, and the third into stationary versus moving sounds.

3

ISMAR 2003

The MR Sound Engine combines these attributes to provide the ability for real-time spatialization of both moving and stationary sound sources. For stationary 3-D sounds, the source can be placed anywhere along the x, y, and z axes. Moving sounds must have a start and end point and a timeline for that movement to take place. Simulation of emitter size, sound orientation, and rotating sound sources are supported. Emitter size can only be altered for sounds that are spatialized. Sounds that do not have spatialization are called ambient sounds and play at an even volume in all speakers. The orientation of a sound is defined by an inside cone and an outside cone. Parameters may be adjusted to change the size and volume of these cones. Rotating sounds, both those that rotate around the user and those that rotate around their own axis in space are supported. Standard features such as simultaneous playback of multiple sounds (potentially hundreds using software buffers), ramping, looping, and sound scaling are supported.

We provide a flexible approach in which sound types and their parameters are predefined by the positioning engine, but most of the parameters can be changed by the Scenario Engine in real-time (Parameters, for which runtime change doesn’t make sense, like class type, ID, filename are excluded.) The MR sound engine provides realistic spatialization of sound suited for Mixed Reality applications. By using a multi-speaker system instead of headphones, real world audio is mixed with simulated sound. This feature enables a true MR experience without the cumbersome task of head tracking with real-time response, or HRTF calculation.

3.3. Hypersonic sound We have incorporated the Hypersonic Sound System (HSS) into our overall sound architecture by using dynamically controlled servos to direct the HSS module at various positions in an MR environment. The HSS, produced by American Technology Corporation of San Diego, CA, uses a beam of ultrasonic energy containing multiple frequencies (48 kHz and up) to produce audible sound by heterodyne (beat frequency) within the air column occupied by the beam of ultrasonic energy. When the HSS beam is directed against an object (e.g., a wall) and the hearer responds as though the sound source were located on the object, we refer to this experience as a “virtual sound source.” When the HSS beam is directed at the hearer, we refer to this experience as a “direct sound source.” Several tests were conducted to determine the effectiveness of the HSS in a variety of environments. The results of these tests are summarized below: • The HSS can indeed produce highly directional perceived sound, in both the direct and virtual modes. However, the beam is not narrow enough to reliably deliver “private audio” via direct mode to individual hearers while excluding nearby listeners, in the absence of deliberately managed background interference sounds. • Experimentation indoors is very challenging, since the hypersonic beam reflects off walls – often in unintended ways – and produces confusing and contradictory measurements. This result has substantial implications for possible indoor deployment of HSS systems. • Outdoor experimentation is more controllable than indoor experimentation, since back reflections can be minimized. However, care must be taken to manage background noise. • Direct sound sources are very compelling, and produce accurate localization across ranges of up to 200 feet.

3.2. Sound creation/delivery components The MR sound engine contains two independent programs, the positioning and the sound engine. The positioning engine is used for the creation of the spatial sound world. It provides the abilities to position sound in space with the help of an intuitive user interface and to save the configurations into a text configuration file. The sound engine reads the settings from the configuration file and gets messages from the Scenario Engine (described in Section 5.2). Based on these messages it renders selected sounds in real-time. Timing for the playing of audio segments is neither defined by the positioning engine nor by the sound engine, as these events are under total control of the Scenario Engine. For moving sounds, start and end points, as well as the duration of the movement are defined within the positioning engine. As most of the movement will be cued by the Scenario Engine, linear interpolation and simple rotation (for sounds like sirens) are supported. In that way the spatial designer, who is creating the spatial sound world, has easy control over moving sounds. In the current clip-based system, predefinition using the positioning engine is available to define all parameters and thus lessen the responsibility of the Scenario Engine. However, when sounds are associated with objects that move in unpredictable or widely varying ways, many parameters need to be changed in real-time by the Scenario Engine. Predefinition still helps, as not every parameter will change during runtime.

4

ISMAR 2003 • Virtual sound sources are also compelling, but are more difficult to apply in practice. When the hearer is behind the plane of the face of the HSS unit, the hearer almost always correctly localizes the virtual sound source to within ten degrees of its actual azimuth. However, if the hearer is in front of the HSS unit, there is substantial potential for confusion between the direct and virtual sources. • There is some evidence for differences in the angular dispersion of virtual sound sources, depending on the material against which the beam is directed. This is to be expected, based on the physics of the system.

lighting signifies the start of a scenario, change the mood as it is dimmed, and reveal new parts of the set on demand. Effect lighting allows us to illuminate smaller areas where another macro-stimulator event might be happening, or simply act as an interactive element. Heat lamps let the user to feel the temperature of an explosion and flickering LED lighting can simulate a fire. In one of our simulations, users are instructed to shoot out the lights to provide cover for fellow teammates. A motion tracked shot allows us to flicker out the light (over a door or window), while adjusting parameters of virtual adversaries to make their shots less effective (this latter adjustment is done within the Scenario Engine).

4. Macrostimulators Compressed air and smoke We have incorporated pneumatic actuators to move pieces of sets. The actions we use include flinging a door open to reveal a virtual character, toppling a pile of rocks, etc. Possible uses of compressed air include simulating an explosion, a rocket taking off, or (combined with lighting) simulating an underwater environment.

Computer controlled “macro stimulators” are special effects applied in a new way, freed from the linear storylines of most traditional show control systems and applied to the next generation of interactive simulations. Macro-stimulators in a mixed reality allow for rich enhancement of the real space in response to actions by either real or virtual characters. When coupled with our Scenario Engine software, this system creates the next level of mixed reality – one in which the real world knows and is affected by items in the virtual, and vice versa.

5. The System The MR Fantasy system encompasses both a scenario authoring and a scenario delivery system. These are integrated in the sense that the authoring system produces an XML document that is imported into the delivery system, resulting in the instantiation of all objects needed to run the scenario.

4.1. Effects We apply traditional special effects, the same that might enrich a theme park attraction, theater show, or museum showcase. The following is a breakdown of just a few of the types of effects we use. Figure 5 depicts a scene in one of our scenarios with labels showing places where we have inserted effects.

5.1. Authoring Scenario authoring is presently done by selecting actors that reflect the virtual and active real objects of your world. For instance, a scenario might contain a virtual helicopter, several virtual people, a number of real lights, a real soda can, an abstract object that is acting like a story director and, of course, the real person who is experiencing the MR world. An actor component, when brought into the authoring system, will have a set of default behaviors, some complex enough that they will have been pre-programmed. However, a finite state machine, consisting of states and transitions, is the primary means of expressing an actor’s behavior. Each transition emanates from a state, has a set of trigger mechanisms (events) that enable the transition, a set of actions that are started when the transition is selected, and a new state that is entered as a consequence of effecting the transition. A state can have many transitions, some of which have overlapping conditions. If multiple transitions are simultaneously enabled, one is selected randomly. Weighting can be achieved by

Figure 5. Special effects in an MR scene

Lighting We lace our exp eriences with lighting – to illuminate the area of the experience and also as localized effects. Area

5

ISMAR 2003

repeating a single transition many times (the cost is just one object handle per additional copy). States and transitions may have associated listeners (attached via hookups in the authoring environment) causing transitions to be enabled or conditions to be set for other actors. Actors, finite state machines (FSMs) and transitions can be archived so that completely specified actors can be chosen, or new actors can be composed from the FSM behaviors stored with other actors (that serve as prototypes) or from behaviors stored independently or even from transitions stored independently. Totally new behaviors can be created by starting with a blank slate, adding states and transitions, and the actions that are associated with these transitions. All components are Java beans and can, consequently be assembled without the need (in most cases) of looking inside their code. The exception is when it is convenient (or perhaps necessary) to add a new Java service that is invoked as part of an action associated with a transition. Our intention is to alleviate this (somewhat) in the future by supporting services written in JPython. At present, our library of actors and behaviors is limited by the scenarios we have already created, but this will grow substantially as we and others develop new scenarios.

effected. For instance, an audio request goes to the MR Sound Engine, requesting it to play a ricochet sound emanating from the spot where the bottle was located; a special effects request goes (via DMX) to the Macrostimulators to activate a servomechanism that causes the bottle to jump; and a request goes back to the graphics engine to deactivate the bottle’s virtual stand-in. One critical aspect of the above is that only the scenario engine understands the semantics of events. All other parts of the system are either purely reactive (audio and special effects) or can also perform some specialized action, such as the graphics engine analyzing a scene, but do not determine the consequences of this analysis.

5.3. Network protocols Our goal is to create a scalable system that allows the easy addition of new actors; even while a world is running (we envision very long running, theoretically never ending worlds). At present, communication between the scenario engine and the graphics engine is via TCP/IP, and all other communication is multicast UDP (which, in turn, may result in specialized communication such as that provided by the DMX protocol discussed later). The choice of multicast UDP for audio and special effects engines makes it possible to start, stop, restart or even exclude these components during a given experience. In actual; practice, these systems run continuously, ready to service a new scenario whenever one is started. An occasional (especially rare in our dedicated local network) loss or incorrect ordering of a packet is rarely noticed (users are usually so involved that they miss some of the audio cues, just as they would in real life). However, the loss or reordering of a request to change the appearance of an object would be very distracting; hence our choice of TCP/IP for communication with graphics. Our longer term goal (planned to be completed by the end of summer 2003) is to mediate all communication through middleware. In fact, we have already done so in an experimental prototype that uses tuple spaces [2] for the communication/coordination layer.

5.2. Delivering the experience Once a world has been authored, the experience it represents needs to be delivered to one or more users. In our case, the users(s) are wearing Canon Coastar video see-through HMDs [6]. The images associated with the user’s sight in each eye are fed to separate dual processor 2+ GHz P4 Linux systems. The left and right eye systems cooperatively analyze the images in order to “understand” what is in the real world and to add virtual objects registered properly with respect to each other and real objects, applying mutual occlusion, as appropriate. One aspect of reasoning about the real world that we have used in several of our scenarios is the ability to determine if a laser beam intersects a virtual object (sometimes an invisible one). When the laser beam is attached to the barrel of a gun or rifle, the intersection with a virtual object represents a “hit.” This can then give rise to a variety of consequences, e.g., a real bottle that is internally represented by an invisible actor jumps in the air and a ricochet sound is heard emanating from where the bottle was and heading back toward the shooter. The above interaction is an example of how the MR Scenario Engine receives events as a consequence of a user’s actions. These events can trigger transitions, e.g., in the bottle’s actor object, that cause actions to be

6. Future directions Recent publications on rendering have demonstrated offline integration of real and virtual objects [1]. However, extending such work for real-time computation does not seem straightforward [3]. Our approach, based on a hardware-assisted algorithm, creates accurate and near interactive rendering of virtual scenes. Extending this algorithm to render virtual objects with proper illumination in real scenes is a goal that now appears attainable with

6

ISMAR 2003

the aid of a simple cluster of workstations, each with programmable graphics cards. This same cluster approach, planned for interactive illumination, has already shown great promise in occlusion-based level-of-detail management. Our goal is to apply this research, currently focused on real-time walkthroughs of complex, virtual forest scenes, to achieve such performance in mixed realities involving augmentation with complex and evolving vegetation. This is especially desirable in informal science MR exhibits that will allow visitors to change environmental conditions and then experience the resulting augmentation to a real world environment. Our current delivery system, while effective, will not scale to massively shared MR worlds, populated with large numbers of virtual objects. Our design for a future system is being informed by Enterprise environments that have made effective use of scalable middleware, managing shared services based on the balance between global efficiency and individual quality of service criteria.

audio software), Tim Eck, Felix Hamza-Lup, Scott Malo, Chris Murray, Matthew O’Connor, Preet Rawat, Eileen Smith and Ron Weaver for their dedicated work on this project.

8. References [1] P. Debevec, “Image Based Lighting.” IEEE Computer Graphics and Applications, pp: 26-34, Mar/ Apr 2002. [2] D. Gelernter, “Generative Communication in Linda,” ACM Transactions on Programming Languages and Systems, 7(1), pp. 80-112, January 1985. [3] S. Gibson. and A. Murta, “Interactive Rendering with Real-World Illumination,” Rendering Techniques 2000 (Proceedings of 11th EG Rendering Workshop), pp. 365375, June 2000, Brno, Czech Republic. [4] C. B. Stapleton, C. E. Hughes, E. Smith, S. Malo and C. Murray, “Where Realities Collide and Imagination Soars,” submitted to IEEE and ACM International Symposium on Mixed and Augmented Reality, October 2003, Tokyo, Japan. [5] C. B. Stapleton, C. E. Hughes, J. M. Moshell, P. Micikevicius and M. Altman, “Applying Mixed Reality to Entertainment,” IEEE Computer 35(12), pp. 122-124, December 2002. [6] S. Uchiyama, K. Takemoto, K. Satoh, H. Yamamoto, and H. Tamura, “MR Platform: A Basic Body on Which Mixed Reality Applications Are Built,” Proceedings of IEEE and ACM International Symposium on Mixed and Augmented Reality, September/October, 2002, Darmstadt, Germany.

7. Acknowledgements The authors gratefully acknowledge the inspiration and guidance provided by Dr. Hideyuki Tamura, Ritsumeikan University. This work was supported in part by the Canon Mixed Reality Systems Laboratory, the US Army Research, Development & Engineering Command (RDECOM), the Office of Naval Research, and the National Science Foundation (under grant EIA 9986051). We are indebted to Nick Beato, James Burnett, Alex Cook, Daniel Dobler (the creator of the first version of the 3D

7