paper - Shumin Zhai's home page

11 downloads 0 Views 131KB Size Report
Spaceball™ is an isometric, force sensitive device, the EGG is a suspended elastic resistance ..... 1989, Wright Patterson Air. Force Base, Ohio. .... Shumin Zhai is with the IBM Almaden Research Center, 650 Harry Road, K54/B2, San Jose,.
In IEEE Transactions on Systems, Man, And Cybernetics – part A: Systems and Humans. Vol. 27, No.4, July 1997, pp518-528.

Anisotropic Human Performance in Six Degree-of-Freedom Tracking: An Evaluation of 3D Display and Control Interfaces Shumin Zhai

Paul Milgram

Anu Rastogi

ABSTRACT Motivated by the need for human performance evaluations of advanced interface technologies, this paper presents an empirical evaluation of a 3D interface, from the point of view of both display and control, in a pursuit tracking experiment. The paper derives methods for decomposing tracking performance into six dimensions (three in translation and three in rotation). This dimensional decomposition approach has the advantage of revealing overall performance levels in the depth dimension relative to performance in the horizontal and vertical dimensions. With interposition, linear perspective, stereoscopic disparity and partial occlusion cues incorporated into a single 3D display system, subjects' tracking errors in the depth dimension were about 45% (with no practice) to 35% (with practice) larger than those in the horizontal and vertical dimensions. It was also found that subjects initially had larger tracking errors along the vertical axis than along the horizontal axis, likely due to their attention allocation strategy. Analysis of rotation errors generated a similar anisotropic pattern. By applying the dimensional decomposition method, the paper also analyses the issue of coordinated control of 6 degrees of freedom with one hand. It was found that when the subject could not control all 6 degrees of freedom well, translational aspects of the task were given higher priority than the rotational aspects. After 40 minutes of practice more than 80% percent of subjects were able to control both translational and rotation aspects together.

INTRODUCTION Three dimensional (3D) human machine interfaces, including 3D displays and multiple degree-of-freedom (DOF) controllers, are being applied increasingly in areas such as teleoperation, virtual environments, data visualization, and computer aided design. Hence a need

arises to empirically evaluate human performance with these interfaces. This paper presents one such endeavour to investigate some of the human factors issues associated with 3D displays and 6 DOF input controls. With respect to 3D visual displays, a major challenge is to provide sufficient depth information to the user. Human perception is sensitive to a variety of depth cues, including occlusion, binocular disparity, perspective, shadows, texture, motion parallax and active movement [1-3]. Many techniques have been developed for realising such depth cues in computer graphic displays (see for example [4] for a review of implementation methods). Different approaches have been proposed for evaluating the effectiveness of these techniques. One conventional empirical approach is to manipulate the presence or absence of various depth cues and measure task performance as a function of each combination of depth cues (e.g. [5-7]). This approach reveals the strengths of the depth cues relative to each other and potential interaction effects between them; however, it does not provide an estimation of the overall efficiency of particular display cues. An alternative approach is to compare how task performances in each dimension compare with each other, by decomposing behavioural data collected in the presence of all available depth cues into separated horizontal (X), vertical (Y) and depth (Z) components. Using such an approach, Massimino, Sheridan and Roseborough [8] found that, with a conventional 3D display comprising perspective projection cues only, tracking errors in the Z dimension were approximately 400% greater than those in the X and Y dimensions. Performance in the depth dimension can be expected to improve as more depth cues are added, but it is also expected to remain somewhat poorer than performance in the horizontal or vertical dimensions. Few studies have been carried out to determine the extent of that remaining difference, however. One of the objectives of the present study was to examine these differences using a display system comprising a collection of readily implementable 3D cues: interposition, perspective, binocular disparity and partial-occlusion. There is also some reason to believe from the general literature on visual perception that performance differences may exist between the horizontal and vertical dimensions. For instance, in a task that required nursery school children to reproduce lines on a circular background, Berman, Cunningham and Harkulich [9] found that reproductions of the vertical lines were

[ [

significantly more accurate than reproductions of the horizontal and the oblique lines, as measured by orientation differences. Gottsdanker and Tietz [10] found that subjects tended to be more sensitive in judging relative length in the horizontal direction than in the vertical direction. Following up on this issue, another objective of the present study, therefore, was not only to compare performance in the Z dimension relative to the X and Y dimensions, but also to evaluate performance differences between the X and the Y dimensions. In other words, the generalized objective here was to examine (an)isotropies, or asymmetries, of performance in the X, Y, and Z directions. There are two approaches to evaluating (an)isotropy in 3D performance. One is to let subjects perform the same task in each of the three dimensions separately and compare the X, Y and Z performances. Two of the foreseeable problems with this approach are (1) The interactions and integration of the three dimensions are missing. Subjects’ behaviour and performance in one dimension at a time may not be the same as those in an integrated 3D environment. (2) The particular order of the task dimensions performed may have an effect on the final results, due to asymmetrical skill transfer resulting from learning [11, 12]. The second approach, which was the method used in the present study, is to request subjects to perform an integrated manipulation task in a 3D environment and subsequently decompose the task performance into its dimensional elements for analysis of the (an)isotropies with respect to the 3D displayed information. No less important than the problem of how to display depth in 3D human-machine interfaces is the issue of how to enable control of systems possessing several (e.g. six or more) degrees of freedom. Much work has been done in designing hand controllers and input techniques to allow efficient translational and rotational manipulation in 3D space (see [13-15] and [16] for reviews). However, the number of published evaluations carried out to compare human performance with various types of multi-DOF controllers is limited. In fact, even such basic issues as whether and how well humans can handle all six degrees of freedom together have not been satisfactorily resolved. In the teleoperation literature, for example, there have been some concerns about whether human operators are able to control all six degrees of freedom with only one hand. Rice, Yorchak and Hartley [17] observed that controlling 6 DOF with one hand is difficult, due to unwanted cross coupling between axes. Some teleoperation systems, such as the Shuttle Remote Manipulator, also known as the “Canadarm”, explicitly require two-handed operation, with one hand for rotation control and the other for translation control. O’Hara [18] contradicted Rice’s observation, however, and found no differences between two 3 DOF controllers versus one 6 DOF controller. McKinnon and King [19] further argued that a single 6

DOF hand controller should be preferable to control distributed among separate controllers. In conclusion, it is only with empirical data that such issues can be resolved. Decomposition of task performance into six components is one way to provide insights into the issue of 6 DOF controllability with one hand.

METHOD Experimental Task Pursuit tracking with 6 degrees of freedom was chosen as our evaluation task. The long tradition of using tracking as an experimental paradigm for studying human skills in motor control research (e.g. [20-22]) has more recently taken root in the teleoperation and virtual environment community. For example, Kim, Ellis, Tyler, Hannaford and Stark [23] examined 3D displays with a three axis tracking task; Massimino and colleagues [8] studied 6 DOF tracking with one hand; Ellis, Tyler, Kim and Stark [24] studied the effects of display-control axis misalignment; and Tachi and Yasuda [7] studied tracking in the depth dimension with various displays. In the present experiment, subjects were asked to continuously control a 3D cursor and align it as closely as possible in both position and orientation with respect to a 3D target that moved unpredictably in 6 DOF within a virtual environment (see Figure 1). Both the tracking target and the controlled cursor were tetrahedral. Each tetrahedron had two blue adjacent edges and the remaining edges were colored red. The two blue coloured edges ensured that only one possible correct orientation match existed. The cursor and the target differed in two ways. First, the radius (from the centre to any of the vertices) of the target tetrahedron was 3.55 graphics units1 while the radius of the cursor was 4.62 graphic units, or 1.3 times as large as the target. Second, the cursor had semi-transparent surfaces while the target was rendered as a wireframe model only. These two differences were

[

1 In this paper, all lengths are given in terms of graphic units as defined in the display program, where 1 graphic unit

= 1.4 cm on the particular display screen used in this experiment.

introduced in order to minimize any potential confusion between the identity of the subject controlled cursor and the independently moving target. As discussed later, the semi-transparent surfaces also served as an important visual cue to locate the cursor relative to the target. Motion of the target in the experiment was driven by six different independent forcing functions, one for each degree of freedom. Each of the forcing functions consisted of a weighted combination of 20 different sine functions with a random initial phase (similar to the forcing function used in [7]). Such forcing functions are often used in tracking research, since they produce smooth motion which, from the subjects' point of view, is unpredictable, i.e., subjectively perceived low-pass random noise (Poulton, 1974).

(a)

(b) (c) Figure 1. 6 DOF tracking task. The tetrahedron with semi-transparent surfaces was the controlled cursor. The tetrahedron without semi-transparent surfaces was the randomly moving target. The subjects’ goal was to align the cursor with the target. Shown in the figure are examples of (a) a very larger error between the cursor and the target (b) a large translation error and small rotation error, and (c) a small translation error and large rotation error.

In our experiment, the six degrees of freedom of the target movement, translations along the X, Y and Z axes and rotations about the X, Y and Z axes, were respectively driven by: 19

x(t) =

A p -i sin (2 pf 0p it + a x(i))

i=0 19

y(t) =

A p -i sin (2 pf 0p it +a y(i))

i=0 19

z(t) = i=0

[

A p -i sin (2 pf 0p it +a z(i))

(1)

(2)

(3)

j(t) =

19

B p -i sin (2 pf 0p it + a j(i))

i=0

q(t) =

19

(4) B p -i sin (2 pf 0p it + a q(i))

(5)

i=0

y(t) =

19

B p -i sin (2 pf 0p it +a y(i))

(6)

i=0

where t is the time duration from the beginning of each experimental test and the constants A = 3.5, B = _/3.0, p = 1.25, fo = 0.01. These values were determined through pilot testing so that the target remained within the bounds of the display and moved at a challenging but manageable speed. The parameters a x(i), a y(i), a z(i), a j(i), a q(i), a y(i), (i = 0, ..., 20) were independent pseudo-random numbers ranging between 0 and 2 p . Since fo and p were common, each forcing function therefore had identical frequency characteristics. Performance Measures At sampling instant t = i D t (where i is the step number and D t is the sampling period), the vector from the center of the cursor to the centre of the target is defined as the translation vector Ti.. The norm of Ti. i.e., the Euclidean distance from the centre of the cursor to the centre of the target, is the translational error. For each trial, the translational root-mean-square (RMS) error was defined as N

å T rms =

i= 0

2

|T i |

N

(7)

where N was the final step in the trial. The translation vector T i consisted of three components along the horizontal ( xi ), vertical ( yi ) and depth ( zi ) dimensions respectively, i.e.: T i = ( xi , yi , z i )

(8)

For each trial the decomposed RMS tracking errors in the X, Y and Z dimensions were defined according to

N

å Xrms =

N

x

i=0

N

å

2

i

,

Y rms =

N

y

i= 0

N

å

2

i

,

Zrms =

z

i=0

2

i

(9)

N

respectively. Xrms, Yrms and Zrms are the decomposed measures used for the subsequent dimensional analysis of translation. Rotational errors were measured in a similar way. (Parameterization of rotations in 3D space is a relatively complex subject; see Altmann, 1986 or Hughes, 1986 for mathematical treatments of rotational parameterization). At t = i D t , the angular error (rotation mismatch) between the cursor and the target can be expressed as [25], p 70) R(f i n i )

(10)

where R( f i n i ) signifies that at tracking step i , the cursor and the target are angularly mismatched about an axis n i by an amount f i . n i = ( n xi , n yi , n zi ) is a unit vector specifying the direction of the orientation mismatch and f i is the amount of mismatch. f i and n i can be combined as a single rotation vector:

fi = fin i = ( fi n xi, fin yi,fin zi ) = ( fxi, fyi,fzi )

(11)

Since n i is a unit vector, the length of fi is the amount of rotation mismatch between the cursor and the target. f xi , f yi , and f zi are the decomposed components of the rotation mismatch in the X, Y and Z dimensions respectively. Note that f xi , f yi , and f zi are not pitch, yaw and roll angles. They are the projections of vector fi onto X, Y and Z axes. The values of f xi , f yi , and f zi relative to each other collectively reflect the inclination of fi towards the axes X, Y or Z. For instance, the greater f xi is relative to the other components, the more biased the rotation vector fi is towards the horizontal axis X (See Figure 2).

Y

Øi

rotation vector f yi

X f zi f xi

Z

Figure 2. Rotation vector and its components in the X, Y and Z dimensions

For each entire trial, the total RMS rotational error is: N

Rrms =

å

|f i |

2

i =0

(12)

N

where N is the final step in the trial. The RMS values of the individual components f xi , f yi , and f zi are: N

Rxrms =

å fxi i= 0

N

N

2

å ,

Rxrms , Ryrms and Rzrms

Ryrms =

i=0

f yi

N

N

2

å ,

Rzrms =

f zi

i=0

N

2

(13)

reflect the total rotational errors (from i = 0 to i = N ) of the

projections of rotation vector fi onto the X, Y and Z axes respectively. These measures are used as the decomposed rotation measures for analysis of rotation asymmetries.

Experimental System Display. In designing the 3D displays used in the experiment, four types of depth cues were chosen, all of which are both powerful and easily realized with current technology: binocular (stereoscopic) disparity, linear perspective, interposition (edge occlusion), and partial occlusion through semi-transparency. Binocular disparity, linear perspective and interposition are conventionally used as strong depth cues [2]. The use of semi-transparency to create partial occlusion, as shown in Figure 1, is a less prevalent technique, but has been shown to be both effective and easy to implement [26, 27]. During the experiment, subjects sat 60 cm away from a Silicon Graphics colour display (Model No. 2086A3SG) and wore CrystalEyes™ 120 Hz stereoscopic glasses (Model No. CE-1), manufactured by StereoGraphics Corp. The experimental room was darkened throughout the experiment. Input Controllers. Two 6 DOF input controllers were used in the experiment: a Spaceball™ (Model 2003, Spaceball Technologies Inc.) and an EGG (Elastic General-purpose Grip) controller, an egg shaped 6 DOF device designed by the authors (Figure 3). Whereas the Spaceball™ is an isometric, force sensitive device, the EGG is a suspended elastic resistance device whose displacement is proportional to the force and torque applied by the user. Both devices were operated in rate control mode (see [16] for further details). During the experiment each controller was situated proximal to the subject's dominant hand.

(a)

(b) Figure 3. The isometric Spaceball™ (a) and elastic EGG (b) input controllers used in the experiment

One of the fundamental decisions to be made in designing such experiments is to develop a set of transfer functions between the operator's control output and the control device output which permit fair and valid comparisons across different devices. Our prevailing philosophy in this research has been to determine an optimal transfer function for each device, which thus supports our claim that all comparisons are made between the best case design of each controller configuration. In so doing we therefore maximise the conservativeness of our tests or, alternatively, minimise the extent to which significant effects can be attributed to the design of the controller transfer function rather than experimental treatment. (See [16] for further details.) To accommodate both fast/coarse motions and slow/fine motions in this experiment, a nonlinear exponential transformation was applied to all inputs: D x = Kx iax , Dj = Kj ibj ,

D y = Ky iay , D z = Kziaz , Dq = Kq ibq , Dy = Ky iby ,

(14)

where Kx , Ky , Kz, Kj , Kq , Ky are the control gains (sensitivities) for each DOF, chosen empirically by optimal searching; and ix , iy , iz, ij , iq , iy are the signals generated by each control device (either the Spaceball or the EGG controller) for each of the six degrees of freedom. The coefficients a and b were also determined empirically and set at 2.75 and 2.2 respectively.

Workstation. The experiment was conducted using the MITS (Manipulation in Three Space) system developed by the authors. MITS is a desk-top stereoscopic virtual environment developed for the purpose of investigating 6 DOF motor control performance. In this experiment MITS was run on a SGI Iris 4D/310 GTX graphics workstation, with a graphics update rate of 15 Hz. Subjects. Thirty paid volunteers were recruited by advertising. All subjects were screened using a Bausch & Lomb Orthorater. Three subjects were rejected for having weak stereoscopic acuity, and one was rejected for having poor corrected near-vision acuity. The accepted subjects' ages ranged from 18 to 37, with 3 subjects under 20, 17 subjects between 20 and 30, and 6 subjects over 30 years of age. Except for one school teacher and one high school student, the rest of the subjects were university students studying Engineering, Science, or Humanities. None of the subjects had prior experience with any 6 DOF manipulation devices. Half of the 26 subjects accepted were assigned to the isometric rate controller (Spaceball) and the other 13 to the elastic rate controller (EGG). Procedure. Each experimental session was preceded by a 15 minute vision screening test and a handedness check. The data gathering was divided into five phases, as illustrated in Figure 4. Each phase consisted of a practice session, followed by 4 trials of tracking. Each trial lasted 40 seconds. Practice in Phase 0 proceeded as follows: The subject was first shown how to use the assigned input device to control the cursor for each of the six degrees of freedom, as well as translations and rotations along/about arbitrary axes. After that, the subject performed one trial of tracking. The total duration of Phase 0 practice was about 3 minutes (Figure 4). Practice sessions in Phases 1, 2, 3 and 4 lasted 7 minutes each, and consisted of demonstrations and coaching by the experimenter, together with actual practice trials. Phase 0 Phase 1

0

3

13 Practice

Phase 2

Phase 3

23

33

Phase 4

43 Time (min.)

Test (4 trials of tracking)

Figure 4. Experimental procedure: each phase consisted of practice followed by a test consisting of 4 trials of tracking.

Each of the four trials in any test had a distinct target trajectory. Each trial began with the cursor coincident with the target (zero error). During the experiment, subjects were instructed to track the target as closely as possible in both translation and rotation.

RESULTS AND DISCUSSIONS RMS tracking error scores, as defined earlier in equations (7), (8), (12) and (13), were collected for 2 (controller types) x 13 (subjects) x 5 (phases) x 4 (paths) = 520 trials. Nonlinear (logarithmic) transformations were applied to the data in order to meet the model residual distribution requirement for ANOVA analysis [28]. In the following analysis, results are organized respectively according to translation in 3D, rotation in 3D and the controllability of all 6 DOF with one hand. Anisotropic Performance in Translation

Zrms

A repeated measure analysis of variance on the translational RMS Errors Xrms , Y rms , as defined in equation (8) was conducted with one between-subject factor (controller type)

and three within-subject factors (dimension, experimental phase and tracking path). The significant main effects found included dimension (X, Y, Z) (F(2, 48) = 59.03, p