Comparing Information Graphics: A Critical Look at Eye Tracking

3 downloads 1966 Views 1MB Size Report
Although used in business domains such as HR management and power generation ... Permission to make digital or hard copies of all or part of this work for personal or .... Eye tracking software assigns fixations to AOIs based solely upon spatial distance, a ... Use the smallest or largest AOI containing the fixation. •. Use the ...
Comparing Information Graphics: A Critical Look at Eye Tracking Joseph H. Goldberg

Jonathan I. Helfman

Oracle 500 Oracle Parkway MS 2op2 Redwood Shores, CA 94065 +1.650.607.6020

Oracle 500 Oracle Parkway MS 2op2 Redwood Shores, CA 94065 +1.650.506.3661

[email protected]

[email protected]

ABSTRACT Effective graphics are essential for understanding complex information and completing tasks. To assess graphic effectiveness, eye tracking methods can help provide a deeper understanding of scanning strategies that underlie more traditional, high-level accuracy and task completion time results. Eye tracking methods entail many challenges, such as defining fixations, assigning fixations to areas of interest, choosing appropriate metrics, addressing potential errors in gaze location, and handling scanning interruptions. Special considerations are also required designing, preparing, and conducting eye tracking studies. An illustrative eye tracking study was conducted to assess the differences in scanning within and between bar, line, and spider graphs, to determine which graphs best support relative comparisons along several dimensions. There was excessive scanning to locate the correct bar graph in easier tasks. Scanning across bar and line graph dimensions before comparing across graphs was evident in harder tasks. There was repeated scanning between the same dimension of two spider graphs, implying a greater cognitive demand from scanning in a circle that contains multiple linear dimensions, than from scanning the linear axes of bar and line graphs. With appropriate task design and targeted analysis metrics, eye tracking techniques can illuminate visual scanning patterns hidden by more traditional time and accuracy results.

1. INTRODUCTION 1.1 Comparing Information Graphics Information graphics (i.e., ‘graphs’) are visualizations for conveying information about data trends and distributions. Two popular graph types include clustered bar graphs (Figure 1, top right) and line graphs (Figure 1, lower left). Both clustered bar and line graphs convey quantitative information along the y-axis and can convey either quantitative or categorical information (categories a-h, here) along the x-axis. Although conveying categorical information on a line graph is hardly ever recommended, because it may be misinterpreted as showing a non-existent quantitative trend, we use categorical line graphs for comparison with bar graphs and spider graphs. A spider (or radar) graph (Figure 1, upper left) is a circular graph that encodes quantitative values along axes that radiate from the center outward. Spider graphs are especially effective for assessing the symmetry of values, rather than comparing their magnitudes [3].

Categories and Subject Descriptors H.5.2 [User Interfaces], H.1.2 [User/Machine Systems]

General Terms Measurement, Performance, Design, Experimentation, Human Factors.

Keywords Evaluation, Visualization, Eye Tracking Figure 1. Spider, bar, and line graph visualizations, each displaying two data sets. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BELIV’10, April 10–11, 2010, Atlanta, GA, USA. Copyright 2010 ACM 978-1-4503-0007-0…$5.00.

Although used in business domains such as HR management and power generation monitoring, spider graphs are generally not thought to be as effective as bar graphs because it can be difficult to read values arranged in a circle effectively [4]. Other factors, such as individual differences (notably perceptual speed) can also influence an individual’s ability to read complex spider graphs [2]. Spider graphs are used primarily for relative comparisons across multiple quantitative data dimensions, and can highlight

extreme values along these dimensions [8]. Data can convey a distinct emergent shape when plotted across dimensions, such as the two data sets shown in Figure 1.

bars. AOI5 intends to include fixations that are on or near the horizontal axis labels of the graph.

Evaluating the effectiveness of graph types can be a challenge for usability professionals. While completion time and accuracy on specific tasks may indicate that specific usability problems exist, a deeper understanding of visual scanning strategies on information graphics may be necessary to improve designs. One way to appreciate visual scanning strategies is through eye tracking.

1.2 Eye Tracking Challenges Eye tracking methods can help a usability specialist understand the scanning strategies involved in comparing data within and between various graphs. Eye tracking has been helpful for understanding how people explore node-arc diagrams, through qualitative analysis of scanpaths [10] and of quantitative delays due to additional fixations [9]. Eye tracking methods can also determine if people read common graphs differently. Despite many challenges for objective analysis and interpretation of eye tracking results, however, eye tracking continues to grow in popularity as a usability evaluation method.

1.2.1 Defining Fixations The human eye scans visual scenes (e.g., computer interfaces) by making a series of rapid eye movements (saccades) followed by longer dwells (fixations). Eye tracking systems gather an observer’s gazepoints at 60-120+ samples per second, then translate these into fixations algorithmically, typically 3-4 samples per second. These algorithms use various metrics to define fixations, such as the number of gazepoints within a defined radius, or the velocity of eye movement [12]. Regardless of the system’s fixation algorithm, it is imperative that the same algorithm be used throughout a study. Altering fixation characteristics may influence the positions of fixations and their appearance within defined areas on a page.

1.2.2 Defining Areas of Interest Areas of Interest (AOIs) are regions on a stimulus image that may be used to tally fixations and to define scanning sequences and transitions. Setting up appropriate AOIs is often an important first step for analyzing eye tracking data, yet there are no rules or best practice guidelines to aid this effort. Typical questions include: (1) which features should receive an AOI, (2) how much padding should there be surrounding a visual feature or target, and (3) should the padding be consistent among the various AOIs that are defined? To illustrate some of these AOI definition issues, Figure 2 contains a wireframe of a software application, broken into typical regions. Five AOIs have been defined, in yellow, to correspond to specific page features. AOI1 covers an oval-shaped branding icon. The rectangular AOI covers more space than necessary, in order to capture fixations that are close to the branding. AOI2 captures fixations on sub-navigation objects such as tabs. Ample vertical padding has been provided to make sure that fixations are captured for most of these objects. AOI3 captures a selection of text in the first content column. Smaller padding about the area has been defined, because reading yields scanpaths that are much tighter than those for general scanning. AOI4 is intended to capture only fixations that compare the heights of two of the data

Figure 2. Wireframe of software application page, with example AOI definitions for several features. General guidelines for defining AOIs include: •

Padding around a visual target should be consistent with questions and tasks.



AOIs do not have to fill the entire user interface; they should be defined only for objects and task features of interest.



The amount of padding around an object should depend on (1) the importance of capturing every fixation on that object, (2) the amount of white space surrounding the object, and (3) expected variance in fixation positions across participants.

Dynamic content can make AOI definition extremely difficult. Visual stimuli containing animated elements, resizable panes, dropdowns, and popup areas may make it impossible to work with a static set of AOIs. Eye tracking data for dynamic content is typically recorded in video mode, then fixations are mapped to AOIs for each time period by stepping through video frames. Fixations may have to be counted manually within dynamically appearing AOIs. One near-term solution to this problem is to define sets of AOIs that can be turned on or off by the usability analyst, given changes within the stimuli. For web applications, methods that read a page’s DOM to determine when page elements are visible have also been used, with moderate success. Advanced video research is still necessary to track when defined AOIs are active, and when they have moved to another location.

1.2.3 Assigning Fixations to AOIs Eye tracking software assigns fixations to AOIs based solely upon spatial distance, a potentially flawed algorithm. Using characteristics of a scanpath can provide additional helpful contextual information regarding what an observer is really viewing. Consider Figure 3, showing a bar graph with a single rectangular AOI, in yellow. Two scanpaths are illustrated: one with dark red fixations and a dashed one with white fixations. Based upon their location, only one fixation has been included in the AOI in each case. It is possible that, in the case of the red scanpath, a spurious

fixation was defined within the AOI, due to characteristics of the fixation algorithm. In a sense, the eye may have just been passing through the region on the way to another AOI. In the case of the dashed scanpath, frequent visits around the edge of the AOI may signal that the observer was in fact studying the area within the AOI more carefully than the prior scanpath. The notion of considering a scanpath as part of AOI assignment is controversial. If implemented, it is likely that many fewer fixations will be assigned to AOIs, however metrics based upon fixations in AOIs may better reflect visual attention on a display.

Figure 3. Example scanpaths on a bar graph, with a single rectangular AOI.

the set (after fixating all other targets in the set) can indicate the first time enough visual information is gathered to successfully complete a task. If data must be visually integrated from three graphs, for example, the visual information gathering time is complete upon fixating the correct data element in the third graph that has been viewed. •

Order of AOIs based on initial fixations. Studying the order of AOIs, based upon mean/median first fixation times can amplify small time differences, and reduce large time differences between AOIs.



Time between first target fixation and first target click. For tasks requiring a specific action (e.g., mouse click) on a target AOI, the elapsed time between first viewing the AOI and clicking on it is a measure of uncertainty. While the uncertainty could be in the task instructions, page element design, or page element layout, this metric can help separate the initial perception of the object from its eventual comprehension.



Number/Percentage of fixations within each AOI. This metric provides an overall summary of visual attention within spatial areas.



Scanpath length. In the context of a specific task, the total length of the scanpath indicates task clarity/difficulty, as well as page layout clarity. The length could be measured in pixels, time (e.g., msec), or number of fixations.



Scanpath cumulative angle. Following each fixation, each successive eye movement forms an angle relative to its predecessor eye movement. The angle may be expressed as a direction that is relative to the prior eye movement, or as absolute relative to the page. Summing the absolute values of relative angles within a scanpath or task indicates the directness of scanning, and therefore the complexity or uncertainty of the task and page layout.



Sequential transitions among AOIs. Understanding where the eyes look immediately before and after viewing each AOI indicates common scanning sequences. Frequent AOI sequences may point to potential design improvements, such as re-positioning certain features into closer proximity.

1.2.4 Overlapping AOIs Software for eye tracking analysis may allow AOIs to overlap, or to be nested hierarchically. Assigning fixations to the proper level of AOI hierarchy can impact the precision of the analysis. In this case, options for assigning a fixation to an AOI could include: •

Use the smallest or largest AOI containing the fixation.



Use the highest or lowest hierarchy level containing the fixation.



Assign the fixation to all AOIs in which it is contained, then allow the analyst to choose a hierarchical level of analysis.

1.2.5 Defining Eye Tracking Metrics

1.2.6 Gaze Location Error

Usability evaluations of software often utilize task completion time and accuracy as proxies for effectiveness/efficiency of design, layout, and navigation clarity and quality. Consistent with the objectives of a study, many different metrics may be defined from eye tracking parameters [6]:

There can be substantial error between an eye tracker’s returned gaze location, and an observer’s perceived gaze location. Eye tracking system manufacturers claim gaze accuracy estimates of 0.5 arc degrees or less, which at a 50 cm viewing distance, equates to a possible error of 50 tan(0.5) = 0.43 cm. The actual error can be substantially greater than this, for several reasons:





Time to first fixation on intended target. Each viewed AOI has an initial fixation time by each participant. The mean or median of these first fixation times can define the approximate time and order in which AOIs were initially viewed. Because the distribution of first fixation times can be heavily skewed by a few AOIs that are viewed very late, it can be more appropriate to use median rather than mean first fixation times. Time to first fixation using multiple targets. When multiple visual targets are needed to complete a task, the elapsed mean or median time to first fixate the final target in



Calibration Error. An eye tracking system must be calibrated to an observer’s eyes, using known points on a display or visual scene. Following a typical 5-9 point calibration, the eye tracker maps a display location based on a corneal reflection and modeled pupil center. Many eye trackers now calibrate both eyes and average the display location to increase accuracy. Calibration error can still be substantial (greater than 1 arc degree) especially when eye movements during calibration have high location variance or when visual targets are toward the edge of the calibrated area on the display.





Foveal Area and Resolution. In detailed observation tasks, the brain’s ocular control system moves the eyes so that a target of interest is fixated on the fovea of the retinas [Duchowski 2007]. Vision is sharpest at the center of the foveas, and has declining resolution toward their edges, at a diameter of approximately 5 arc degrees. Nonetheless, there is substantial variation in foveal acuity across individuals, and many can perceive targets without looking at them directly. Fixations that are close to a target may therefore indicate eye movements that actually perceive the target. The practical maximum extent of this error could be estimated as 2.5 arc degrees, which at 50 cm viewing distance, corresponds to: 50 tan(2.5) = 2.2 cm. Attentional Dissociation. Also known as the “Eye-Mind Hypothesis,” the gazepoint location of the eye is normally assumed to indicate what is on the top of one’s mental stack of operations [11]. It is normal to have periods of dissociation, however, where the current gazepoint may lead or lag one’s current cognitive processing. These periods may be more frequent and/or longer duration during extremely fatigued, bored or aroused states. Eye tracking studies should usually present discrete, short tasks to participants, and should provide frequent sort breaks [7].

recruitment of 10%-20% additional participants is usually recommended. Participants should always be told that eye tracking will take place, as some may feel that eye tracking is too invasive (even if nothing is attached to the individual). •

Study preparation. Eye tracking for visualization evaluation can be conducted on live prototypes or static images. Using live prototypes can require greater system demands, especially if there is dynamic content or if the user is allowed to explore additional pages. Testing using static interface images may be less realistic, but provides better experimental control over stimuli. Fixation cues may be provided prior to each static image to control starting fixation locations.



Conducting studies. Calibration and practice tasks can take a few seconds to many minutes. Eye tracking status should be monitored throughout a study, as tracking can be lost for a variety of reasons. The experimenter should be located outside of the participant’s direct view, otherwise there may be a tendency for the participant to turn towards the experimenter when completing each task, causing the eye tracker to lose calibration. Allowing the use of a mouse for selection during eye tracking is somewhat controversial, in that participants may view and track the cursor. Proper study design requires a balance between reality and experimental control, and mouse usage should be an outcome of this balance.



Analyzing data. Data analysis can be confusing and time consuming, because so much data is generated, and so many metrics can be computed. It is best to focus analysis around specific task questions and generate metrics that are appropriate to each question. When specific factors have been defined and controlled, univariate or multivariate statistical analysis can be conducted on computed metrics. Analysis may take longer if these metrics are not immediately available from eye tracker software.

1.2.7 Managing Scanpath Interruptions Both short and long-term interruptions in eye tracking are frequent occurrences during a typical study. Long-term interruptions of several seconds or more are apparent to an experimenter who is observing either a real-time scanpath or some other display that indicates current tracking status. Short-term interruptions of a few milliseconds are frequent, due to blinking, head translation, loss of a corneal reflection, or other temporary tracking issues. Tracking is usually regained quickly using the eye tracker’s hardware and/or software algorithms. Handling of scanpath interruptions can have a major impact on analysis. If a scanpath is broken by interruptions into subscanpaths, metrics such as scanpath length will be impacted. If fixations on either side of an interruption period are connected, which is the typical case, then metrics based upon eye movement distance and dispersion among fixations will be artificially inflated.

1.2.8 Eye Tracking Study Practice Eye tracking studies must be designed, recruited, prepared, conducted, and analyzed like any other usability evaluation. Some of these tasks may be more resource intensive for eye tracking than other usability evaluations: •



Study design. Tasks should be designed to address specific questions, such as icon clarity or layout effectiveness. Because of fast data sampling rates, tasks should be of relatively short duration (e.g., less than one minute), and should require discrete responses (e.g., verbal response or mouse click). Total study time, with breaks, should not exceed ~60 minutes, due to potential eye fatigue from infrared illumination. Like other evaluations, testing more factor conditions requires greater numbers of participants. Participant recruitment. Some eye tracking systems may not calibrate well to certain eyes or eyeglass prescriptions, so

2. EVALUATION STUDY: COMPARISON OF INFORMATION GRAPHICS Providing effective graph visualization for a dataset and task is critical to understanding complex information across many domains. Despite its potential shortcomings, eye tracking can provide useful information to a UI designer incorporating information graphics into a user interface. To illustrate the opportunities and challenges of eye tracking in the evaluation of visualizations, a study was conducted in which multiple bar, line, or spider graphs were used to compare hypothetical job candidates to job requirements. The goal of this study was to determine which graph type(s) best support relative comparisons along several dimensions both within and between graphs.

2.1 Methods 2.1.1 Participants Five colleagues from the Applications User Experience organization at Oracle were recruited for this exploratory study. Each was calibrated to the eye tracking system, then provided an orientation to the graph types and tasks used in the study. Each completed the study in 10-15 minutes. No remuneration was provided.

opportunity to ask any questions for clarification. This was followed by the stimulus image for that task, containing four graphs of the same type. Once the participant verbally responded with the task’s answer, the slide was blanked, and the next task started. Eye movements were only collected while the participant scanned the graph slide for each task.

2.1.2 Eye Tracking Apparatus A Tobii T60 eye tracker, running Tobii Studio 1.5.13 software collected eye tracking data (Figure 3). This system provides a 17” diagonally measured 1280 x 1024 display (approx. 98 dpi) that is located above two embedded infrared cameras. Analysis used the Studio software in conjunction with Microsoft Excel. Areas of Interest (AOIs) were created for critical areas that were required for each task, with various metrics exported to Excel for further analysis and plotting.

Figure 3. Tobii T60 eye tracker with integrated display.

2.1.3 Graph Stimuli Three graph types were compared in this study: clustered bar, line, and spider. Examples from each graph type are shown in Figure 4. Each of these displayed identical colors, fonts, screen spacing, categories and data.

Figure 4. Examples of graph stimuli. (A) Clustered bar graphs, (B) Line graphs, and (C) Spider graphs. Participants viewed graphs from four hypothetical job candidates at once, allowing us to compare between-graph and within-graph scanning. These were displayed within a 15 x 15 cm area at the center of the screen. Each individual graph was displayed within a 7.6 x 7.6 cm (~294 x 294 pixels) square, subtending 8.7 arc degrees in height and width at a 50 cm viewing distance. Each graph contained two data sets: the dark blue series represents hypothetical job requirements along an unspecified quantitative dimension. The lighter green series represent values assigned to hypothetical job candidates, plotted on categorical axes, where a-h represent job dimensions such as communication, leadership, teamwork, etc. Due to the nature of the graphs, the size of the data-representing elements differed somewhat: the width of the bar graph bars was about 3 mm, while the width of data lines for the other two graph types was about 1.5 mm. A more effective visualization for understanding differences between job requirements and job candidate abilities would plot differences between these two variables, rather than requiring the graph user to compute these differences visually. However, the present approach was used to better control the comparison between the three graph types.

2.1.4 Tasks Participants completed six short tasks that were presented by the experimenter. They initially viewed slides with dummy data to familiarize them with each task type. This was followed by a sample task, then the experimental tasks. On each task, the participant viewed the task’s question, then was provided an

Each participant received the same six tasks, presented in the same order, as shown in Table 1. Tasks were categorized into two difficulty levels, based upon the required number of dimensions to be searched, and upon the number of candidates to consider. ‘Easy’ tasks considered 1 job candidate on 2 job dimensions; the appropriate graph had to be located first, then the data dimensions compared on the graph. ‘Hard’ tasks required a choice between 2 job candidates, across 8 job dimensions; the two appropriate graphs had to first be located, then compared across each dimension. This categorization of difficulty is mainly attributable to memory load. Table 1. Experimental Tasks. #

Task Type

Graph Type

Task Question

1

Easy

Bar

Should Candidate 2 be hired, considering only job dimensions a and d?

2

Hard

Line

Based upon all job dimensions, which candidate should be hired: 1 or 2?

3

Easy

Spider

On which job dimension was candidate 4 stronger: a or b?

4

Hard

Bar

Which candidate exceeded the greatest number of job requirements: 2 or 4?

5

Hard

Spider

Which candidate had the fewest job dimensions that were below requirements: 2 or 4?

6

Easy

Line

On which job dimension was candidate 3 weaker: a or b?

2.1.5 Areas of Interest (AOIs) Areas of Interest (AOIs) were developed for each task and graph type, using Tobii Studio software. AOIs were defined for each required dimension that had to be compared. Easy tasks required two AOIs in a single graph, whereas hard tasks required eight AOIs in each of two graphs. AOIs for hard tasks are shown for line and spider graphs as colored regions in Figure 5. Per Section 1.2.2, padding was defined around each AOI to capture fixations that were close to each dimension, while providing adequate white space between the AOIs. The area of each AOI in the line and bar graphs was about 5.5 cm2, compared to 2.5 cm2 in the spider graphs. While AOI size varied somewhat between the graph types, AOI sizes were consistent within graph types. Note that, in both bar and line graphs, AOIs were coded as a-h, moving from left to right. In spider graphs, AOIs were coded a-h, clockwise starting at 9:00.

Median Completion Time (sec)

(56.5)

30 25 20 15 10 5 0 bar

line

spider

bar

spider

line

easy

hard

easy

hard

hard

easy

1

2

3

4

5

6

Trial Order

2.1.6 Hypotheses A few hypotheses can be helpful for guiding analyses. These are based upon our expectations for searching the graphs. •

Line and bar graphs should be equally effective for reading individual values, but both should be superior to spider graphs. Spider graph axes are at many orientations, and therefore make it harder to read values across dimensions.



Spider graphs should promote better comparison across multiple dimensions than line and bar graphs, due to the emergent shape formed by the spider graph lines.





No difference is expected between the graph types for finding the correct graph on a page, as each page contained four graphs of the same type marked with large gray job candidate numbers. Line crossings that do not represent meaningful data in line and spider graphs may cause visual confusion, contributing to scanning error.

2.2 Results Many approaches can be taken in the analysis of eye tracking data, from supporting qualitative observations, to mathematical modeling and prediction of transitions among defined AOIs. Here, we supplement task completion time data with eye tracking data to obtain evidence relevant to the preceding hypotheses.

2.2.1 Completion Times Task completion time was measured from initial display of the graph image to participant response. Figure 6 shows median completion times across participants as a function of trial order. Graph type and condition difficulty are also indicated, along with error bars. Single-factor ANOVAs on median completion times revealed that Trial Order was significant (F5,24 = 2.9, p.1). Tukey’s pairwise comparison test revealed that the significant trial order effect was due to significantly faster completion time from Trial 6 (line graphs, easy tasks) than from Trial 2 (line graphs, hard tasks). Although initial tasks were completed more slowly than later tasks, there was a marginal tendency for faster completion time in the easier tasks. Note that participants (eventually) provided correct responses on all tasks, so speedaccuracy tradeoffs were not investigated.

Figure 6. Median completion times, as a function of trial order. Additional labels show graph type and condition difficulty. Error bars indicate ±1 SD, and hatching indicates the hard tasks.

2.2.2 Scanning in Easy Tasks Two AOIs had to be visually compared to successfully complete the easy tasks. Figure 7 shows median time to first fixation on each of the two critical AOIs. Several additional seconds were needed using the bar graph than with the other two graph types. Also, once the first critical region was fixated, the second was usually fixated within 4 seconds. Bars

12 First Fixation Time (sec)

Figure 5. AOIs defined for line (left) and spider (right) graphs, for hard tasks.

Lines

10

Spider

8 6 4 2 0 First Region

Second Region

Critical Region

Figure 7. Median first fixation times in the two critical AOIs from easy tasks. bar graphs required several additional seconds to initially scan both critical regions. The uncertainty time between the second AOI’s first fixation time and task completion is shown in Figure 8. The bar graphs exhibited twice as much uncertainty as the line graphs, while spider graphs fell between these. The high degree of uncertainty with the bar graphs could have been due its early position in the trial order. Example scanpaths can illustrate differences in scanning strategies between the three graph types for this task. Figure 9 illustrates participant 4’s scanpaths in the easy tasks, for each graph type. Although each task only required a single graph, all four presented bar graphs were viewed, whereas only single spider and line graphs were viewed. The bar graphs seemed to promote greater confusion as to the location of the candidate’s graph (Candidate 2, here). It is possible that the greater data ink required for the bar chart might have increased this confusion and uncertainty, particularly since the ink is highly organized into shapes with multiple parallel edges. Both spider and line graphs

Uncertainty Time (sec)

had significant rescanning of the critical regions, but both exhibited much more compact and productive scanpaths, compared with the bar graphs. 14 12 10 8 6 4 2 0 Bars

Lines

Spider

Graph Type

Figure 10. Median First Fixation Times in hard tasks. Graph 1 was always to the left of , and scanned prior to Graph 2.

Figure 8. Median uncertainty times from easy tasks, defined by the difference between completion and second target first fixation times. Error bars indicate ±1 SD.

Figure 9. Scanpaths from participant 4, easy tasks. (A) Bar graphs, (B) Line graphs, and (C) Spider graphs.

2.2.3 Scanning in Hard Tasks The hard tasks required direct comparison of eight corresponding regions between two different graphs. To understand the aggregate trends among participants, median time to first fixation across participants are plotted in Figure 10 for each AOI. Because two graphs had to be fixated for each task, Graph 1 was always to the left of (and fixated earlier than) Graph 2. Several observations were apparent from this figure: •

The second viewed bar graph was generally scanned left to right (except the first AOI), but the first viewed bar graph was not regularly scanned in a particular direction.



Line graphs did not promote a left to right scanning strategy, and there was much more back and forth scanning between the two graph’s lines, than between the two bar charts.



Spider graphs promoted much more back and forth comparison between the two graphs than shown for the other two graph types. This was clearly evident in the scanpath of participants 1 and 4 (Figure 11), where much more crossgraph scanning is found in the spider than the bar graphs.

Figure 11. Scanpaths from participants 1 (green) and 4 (red) in hard tasks, using (A) Spider graphs, and (B) Bar graphs.

3. DISCUSSION The choice of the most appropriate information graphic for a dataset is a complex topic, requiring an understanding of users, tasks, and myriad other factors [1, 8]. In the present study, we narrowed the research focus to three graph types, bar, line, and spider, to gain some understanding for which graph type supports more efficient comparison. Eye tracking can provide valuable insight to the evaluation of visualizations, but it should not be applied without a thorough understanding of its limitations. Many of these limitations, such as gaze location error, fixation definition, and defining metrics, were presented here. Despite these challenges, careful interpretation of eye tracking results can supplement usability evaluations. As a strategy, traditional task completion time from usability evaluations can be used to pinpoint areas for more detailed eye tracking analysis. While the present evaluation included task completion time, it added eye tracking-based metrics to provide

additional insight into user strategies: Median time to first fixate each AOI, and uncertainty time between initially viewing an AOI and completing a task. Eye tracking can also provide many additional metrics to supplement a detailed analysis of user strategies while completing tasks. Qualitative scanpath analysis provided additional insight and verification of user comparison strategies. Studying each scanpath can be time consuming, especially in studies with many factors and/or participants. New software tools that enable automated comparison of scanpaths to find matching scanning strategies, however, are on the horizon [5]. Bar graphs showed some disadvantage here, relative to line and spider graphs, for comparison between dimensions within a graph. This was due to both (1) finding the correct graph, and (2) uncertainty after fixating both required dimensions. Despite receiving a practice trial, the fact that bar graphs were always presented in the first experimental trial may have hindered performance. Another explanation is that the bar graphs contained greater data ink than the other graph types, causing participants to become confused between graph labels, category labels, and data values [13]. Even after fixating both required areas, participants continued scanning bars for a significant time, perhaps verifying the differences between the candidate and job requirements, before responding. The additional data ink in the bars may have made it more difficult to relocate the dimensions after scanning other areas, compared to the other graph types. Another explanation is that a bar graph presents a large mass of parallel lines, which may overload the visual system, making it more difficult to associate a label with its appropriate data value. Further visual confusion may also have resulted from the gap between clusters being the same size as the bar widths. The disadvantage for bar graphs didn’t stand for harder comparison tasks across two graphs and multiple dimensions. Tasks with spider and bar charts were completed in similar time, with line graphs associated with longer times in two participants. In these tasks, the scanning re-verification required by the bar graphs in the easier conditions was required in all graph types. From median first fixation times on AOIs, the strategy for completing the harder task differed between the spider and other graph types. With line or bar graphs, participants tended to read all dimensions within a graph before reading the dimension values in the compared graph. With spider graphs, there was much more back and forth comparison between the same dimensions of two graphs. First fixations on spider graph dimensions varied from one spider graph to the other. A likely explanation is that, compared to the linear axes of the bar and line graphs, it was harder to scan in a circle while also comparing data differences, on the spider graphs. Circular scanning, compared to linear scanning, requires more coordination of fine ocular muscles, and therefore may require a greater number of time-consuming corrective eye movements. Another related possibility is that it may be difficult to remember which dimensions have already been compared with a spider graph because there is no visual cue for a left-most or right-most data value like there is with a bar or line graph. As a follow-up to the present exploratory study, a large-scale eye tracking study of scanning between information graphics could

provide valuable design guidelines. In addition to graph type, the study could manipulate aspects such as distance between data points and axis labels, width of bars/lines, space between bars/lines, number of dimensions, number of data sets, and quantitative versus categorical dimensions. Unlike the present study, presentation order should be carefully manipulated to eliminate any bias to learning. This present work points to the value of eye tracking-derived metrics, when evaluating visualizations. Eye tracking provides information about scanning among defined AOIs, although significant effort may be expended in tallying appropriate metrics to answer task-related questions. We do not recommend eye tracking as a universal addition to usability evaluations. Rather, with appropriate task design and targeted analysis metrics, it can provide a more focused tool to investigate micro-level design issues such as element visibility, clarity, and navigation.

4. REFERENCES [1] Bertin, J. 1983. Semiology of Graphics. Madison, WI, U. Wisconsin Press. [2] Conati, C., and Maclaren, H. 2008. Exploring the role of individual differences in information visualization. AVI’08, ACM Press, pp. 199-206. [3] Few, S. 2005. Keep Radar Graphs Below the Radar – Far Below. http://www.perceptualedge.com/articles/dmreview/ radar_graphs.pdf [4] Few, S. 2006. Information Dashboard Design. O’Reilly Press. [5] Goldberg, J.H., and Helfman, J. 2010 (in press). Scanpath clustering and aggregation, ACM/ETRA 2010, ACM Press. [6] Goldberg, J.H., and Kotval, X.P. 1999. Computer interface evaluation using eye movements: Methods and constructs, International Journal of Industrial Ergonomics, 24: 631-645. [7] Goldberg, J.H., and Wichansky, A.M. 2003. Eye tracking in usability evaluation: A practitioner’s guide, in Hyona, J., Radach, R., and Deubel, H. (Eds.), The Mind’s Eyes: Cognitive and Applied Aspects of Eye Movements, Elsevier Science Publishers, pp. 493-516. [8] Harris, R.L. 1996. Information Graphics, A Comprehensive Illustrated Reference. New York: Oxford Univ. Press. pp. 320-321. [9] Huang, W. 2007. Using eye tracking to investigate graph layout effects. APVIS 2007: Int. Asia-Pacific Symp. on Visualization, pp. 97-100. [10] Huang, W., and Eades, P. 2005. How people read graphs. APVIS 2005: Int. Asia-Pacific Symp. on Visualization, pp. 51-58. [11] Just, M.A., and Carpenter, P.A. 1976. Eye fixation and cognitive processes. Cognitive Psychology, 8, 441-480. [12] Salvucci, D.D., and Goldberg, J.H. 2000. Identifying fixations and saccades in eye-tracking protocols, ACM/ETRA 2000, ACM Press, pp. 71-78. [13] Tufte, E.R. 1983. The Visual Display of Quantitative Information, Cheshire, CT, Graphics Press.