Improving the Accuracy of Gaze Input - Stanford HCI Group

Improving the Accuracy of Gaze Input Manu Kumar, Jeff Klingner, Rohan Puranik, Terry Winograd, Andreas Paepcke Stanford University, HCI Group Gates Building, Room 382 353 Serra Mall Stanford, CA 94305, USA {sneaker, klingner, rpuranik, winograd, paepcke}@stanford.edu ABSTRACT

Using gaze information as a form of input poses challenges based on the nature of eye movements and how we humans use our eyes in conjunction with other motor actions. In this paper, we present three techniques for improving the feasibility of using gaze as a form of input. We first present a saccade detection and smoothing algorithm that works on real-time streaming gaze information. We then present a study which explores some of the timing issues of using gaze in conjunction with a trigger (key press or other motor action) and propose a solution for resolving these issues. Finally, we present the concept of Focus Points, which makes it easier for users to focus their gaze when using gaze-based interaction techniques. Though these techniques were developed for improving the performance of gazebased pointing, their use is applicable in general to using gaze as a practical form of input. ACM Classification: H5.2 [Information interfaces and

presentation]: User Interfaces. – Input devices and strategies. General terms: Human Factors, Algorithms, Performance,

Design Keywords: Eye Tracking, Gaze Input, Gaze-enhanced User

Interface Design, GUIDe, Fixation Smoothing, Eye-hand coordination, Focus Points. INTRODUCTION

The eyes are a rich source of information for gathering context in our everyday lives. A user’s gaze is postulated to be the best proxy for attention or intention [14]. Using eyegaze information as a form of input can enable a computer system to gain more contextual information about the user’s task, which in turn can be leveraged to design interfaces which are more intuitive and intelligent. Our research explores the use of gaze information as a practical form of input by augmenting rather than replacing existing interaction techniques. We have developed gaze-enhanced interaction techniques for pointing and selection [5], scrolling [6], password entry [4] and other everyday computing tasks.

Keep this space free for the ACM copyright notice.

In implementing EyePoint [5], we found that while the speed of a the gaze-based pointing technique was comparable to the mouse, the error rates were significantly higher. To address this problem we conducted a series of studies to better understand the source of these errors and identify ways to improve the accuracy of gaze-based pointing. In this paper we present the three most important methods for improving the accuracy and user experience of gazebased pointing: an algorithm for real-time saccade detection and fixation smoothing, an algorithm for improving eye-hand coordination and the use of focus points. These methods boost the basic performance for using gaze information in interactive applications and in our application made the difference between prohibitively high error rates and practical usefulness of gaze-based interaction. SACCADE DETECTION AND FIXATION SMOOTHING The challenge: Gaze data is noisy

Basic eye movements can be broken down into two types: fixations and saccades. A fixation occurs when the gaze rests steadily on a single point. A saccade is a fast movement of the eye between two fixations. However, even fixations are not stable and the eye jitters during fixations due to drift, tremor and involuntary micro-saccades [13]. This gaze jitter, together with the limited accuracy of eye trackers, results in a noisy gaze signal. The prior work on algorithms for identifying fixations and saccades [9-11, 13] has dealt mainly with post-processing previously captured gaze information. For using gaze information as a form of input, it is necessary to analyze eyemovement data in real-time. Saccade Detection and Fixation Smoothing

To smooth the data from the eye tracker in real-time, it is necessary to determine whether the most recent data point is the beginning of a saccade, a continuation of the current fixation or an outlier relative to the current fixation. We use a gaze movement threshold, in which two gaze points separated by a Euclidean distance of more than a given saccade threshold are labeled as a saccade. This is similar to the velocity threshold technique described in [11], with two modifications to make it more robust to noise. First, we measure the displacement of each eye movement relative to the current estimate of the fixation location rather than to the previous measurement. Second, we look ahead one measurement and reject movements over the saccade threshold, which immediately return to the current fixation. This prevents single outliers of the current fixation from

bbeing mislabeled as saccadess. It should bee noted that thhis loook-ahead intrroduces a onee-measurementt latency (20m ms f the Tobii 1750 for 1 eye track ker [12]) at sacccade thresholdds innto the gaze daata provided to o the applicatioon. The algorithm maintains two sets of points:: the current fixxT a ation window and a a potentiall fixation winddow. If a point is c close to (within a saccade th hreshold) the current c fixatioon, thhen it is added to the curreent fixation wiindow. The neew c current fixationn is calculated d by a weightted mean whicch f favors more reecent points (d described below). If the poiint d differs from thhe current fixaation by more than a saccadde thhreshold, then it is added to the potential fixation f windoow a the currennt fixation is returned. and r Wheen the next daata p point is availabble, if it was closer to the current c fixatioon, thhen we add it to the currentt fixation and throw away thhe p potential fixatioon as an outlieer. If the data point p is closer to thhe potential fixxation, then wee add the pointt to the potentiial f fixation window w and make this the new currrent fixation. The fixation pooint is calculateed as a weighted mean (a onneT s sided triangulaar filter) of thee set of pointss in the fixatioon w window. The weight w assigned d to each poinnt is based on its i p position in the window. For a window wiith n points (P P 0, P1… Pn-1) the mean fixation n would be caalculated by thhe f formula: P fixation =

1P0 + 2 P1 + ... + nPn −1 (1 + 2 + ... + n)

The size of thee fixation win T ndow (n) is caapped to includde o only data pointts that occurred d within a dwelll duration [7, 8] o 400-500ms (20 data pointts for our eye tracker). We do of d thhis to allow the fixation po oint to adjust more rapidly to s slight drift in thhe gaze data. Figure 1 show F ws the output from f the smooothing algorithhm f the x-coorddinate of the ey for ye-tracking dataa. We also shoow a Kalman filterr applied to th he entire raw gaze g data and a K Kalman filter applied a in partss to the fixatioons only. A Kaalm Filter appplied over the entirety of thee raw gaze daata man s smoothes over saccade interv vals. The naturre of eye movvem ments, in particcular the existtence of saccaddes, necessitates thhat the smoothhing function only o be appliedd to fixations, i..e. w within saccadee boundaries. Applying A the Kalman K filter in p parts to the fixations only do oes yield compparable results to o one-sided triangular our t filter discussed abbove. It is posssib that applyinng a non-lineaar variant of thhe Kalman filtter ble [1], or a betterr process mod del of eye movvements for thhe K Kalman filter may m yield bettter smoothing results. The adv vantage of our approach is that the algorithm m is very simpple a most of alll it is tailored and d to account for f the differeent f forms of eye movements m and d also tolerate the noise in eyye trracking data. While there is still room for improvement in the algorithhm W a above by takinng into accoun nt the directionnality of the innc coming data points, we foun nd that our saaccade detectioon a smoothingg algorithm sig and gnificantly impproved the reliiab bility of the ressults for appliccations which rely r on the reaaltiime use of eyee-tracking dataa. A more detaailed descriptioon o our algorithm of m including pseudocode is avvailable in [3].

Figurre 1. Results off our real-time saccade s detecction and smoothing s algo orithm. Note tha at the one mea asurem ment look-ahea ad prevents outliers in the raw w gaze data from bein ng mistaken forr saccades, but inuces a 20ms lattency on sacca ade thresholds.. trodu EYE-H HAND COORD DINATION

Our reesearch on usinng a combinatiion of gaze andd keyboard for perrforming a pointing task [5]] showed that error rates were very high. Additional A worrk on using gaze-based g passwoord entry [4] showed s that thee high error raates existed only when w the subjeects used a com mbination of gaze g plus a keyboaard trigger. Using U a dwell--based triggerr exhibited minim mal errors. Theese observationns led us to hypothesize h that thhe errors may be b caused by a failure of syynchronization beetween gaze annd triggers. To dettermine the caause and the nuumber of errorrs we conductedd two user stuudies with 15 subjects (11 male, m 4 female, average age 26 2 years). In the first studyy, subjects were presented p with a red balloon. Each time theyy looked at the baalloon and preessed the triggger key, the reed balloon poppedd and movedd to a new loocation (Moviing Target Study)). In the seconnd study, subjeects were presented with 20 num mbered ballooons on the screeen and asked to look at each balloon b in ordeer and press thhe trigger key (Stationary ( Targetts Study). Subbjects repeatedd each study tw wice, once optimiizing for speeed and trying to perform thhe task as quickly as possible and a the secondd time optimizing for accuracyy and trying to t perform thee study as acccurately as possibble. The order of the studies was varied foor counterbalanccing and trials were w repeated in case of an errror. An in--depth analysis of the data and classificattion of the errors from the twoo studies reveaaled multiple sources of error: Trackiing errors: cauused due to thhe eye trackerr accuracy. These include casess in which the gaze data froom the eye trackerr is biased or when w the locatiion of the targeet closer to the periphery of the screen resultss in lower accuuracy from the eyee tracker [2]. Early--Trigger errorss: caused becaause the triggerr happened beforee the user’s gazze was in the target t area. Earrly triggers can haappen because a) the eye traacker introducees a sensor lag of about 33ms inn processing thhe user’s eye gaze, g b) the smoothhing algorithm m introduces an a additional latency of

Figure 2. Sourrces of error in gaze input. Sh haded areas show the target region n. Example trigg gers are ed arrows. The e triggers show wn are all indicated by re different attem mpts to click on the upper target region. The trigger points correspond d to: a) early trrigger error, b) raw hit and smooth hit, c) raw miss and smooth s hit, and d) late trig gger error.

220ms at saccaade thresholds c) in some cases c (as in thhe M Moving Targett study) the ussers may havee only looked at thhe target in thheir peripheral vision and preessed the triggger b before they actuually focused on o the target. Late-Trigger errors: caused because the user L u had alreaddy m moved their gaaze on to the next n target befoore they presseed thhe trigger. Laate triggers can n happen onlyy in cases wheen m multiple targetss are visible on o the screen, as a in the Station nary Targets stuudy or in gaze--based typing. Other errors: these include a) smoothingg errors causeed O w when the smooothed data happened to be ouutside the targget b boundary, but the raw data point p would have, h by chancce, r resulted in a hitt, b) human errrors where the subject just was n looking at the not t right thing or the subject looked down at thhe keyboard beefore pressing the trigger. Figure 2 illustraates the differeent error types.. Figure 3 show F ws h often eachh type of error occurred how o in thee two studies.

Figurre 4. Simulation n of smoothing and early trigg ger corre ection (ETC) on n the speed tassk for the Movin ng Targe et Study showss that the perce entage error off the speed task decreasses significantlyy and is compa arable e error rate of the accuracy ta ask. to the

To impprove the accuuracy of gaze-bbased pointing in the case of the speed task, wee implementedd an Early Triggger correction (E ETC) algorithm m which delayys trigger pointts by 80ms to account for the seensor lag, smooothing latencyy and peripheral vision effects. We simulatedd this algorithm m over the data from fr the Moviing Target stuudy. Figure 4 shows the outcom me from the sim mulated results. It should bee noted that both smoothing and early-trigger correction c by themselves t actuallly increased thhe error rate, because b smoothhing introduces a latency that the early triggger would be correcting. The errror rate in thee speed task when using a coombination of smooothing and early e trigger coorrection approoaches the error rate r of the accuuracy task — without w comproomising the speed of the task. While we were ablee to identify late-trigger l errrors in the analyssis of the data, it is difficult to t distinguish a late trigger froom an early triggger or even ann on-time triggger without using semantic inforrmation aboutt the location of the targets. Since S our approoach has focussed on providinng a generally appplicable technniques for gazee-input which do d not rely on appplication or operating system m specific inforrmation we did noot attempt to coorrect for late triggers. t We noote that the use off semantic infoormation aboutt target locatioons has the potenttial to significcantly improvee the accuracyy of gazebased input by allow wing the currentt fixation to bee applied to the cloosest target. FOCUS POINTS

Figure 3. Analysis of errors in n the two studies show that a large nu umber of errors s in the Speed Task T happen due to earrly triggers and d late triggers – errors in synchronizatio on between the e gaze and trigg ger events.

In our paper on gazee-based pointinng [5], we introoduced the use off Focus Points— —a grid patterrn of dots overrlaid on the magnified view thatt contained thee targets (see Figure 5). We hyypothesized thaat focus points assist the userr in making a moree fine-grained selection by focusing f the user’s u gaze, thereby improving thhe accuracy off the eye trackking. However, the t studies in the t EyePoint paper p showed no n conclusive efffect of an im mprovement in tracking accuuracy when using focus f points. To tesst this hypotheesis further, wee conducted a user study with 17 subjects (11 male, 6 femalle, average agee 22. In the first part of the studdy subjects weere shown a red balloon

points)). The above techniques deescribe softwarre changes that caan be made at an application layer to improove the use or gazze as a form of o input and arre orthogonal to t any improvem ment in the unnderlying trackking technologyy that provides for f more accurracy and head movement froom the eye trackerr. REFER RENCES

Figure 5. Magnified view for gaze-based po ointing techd without focus points. Using focus f points nique with and provides a visu ual anchor for subjects s to focus their gaze on o making it easie er for them to click c in the text box.

aand asked to loook at the cen nter of the ballloon. Once theey h looked at the had t balloon forr a dwell durattion (450ms) thhe b balloon autom matically moved to a new location. l In thhe s second part, subbjects repeated d the study, butt with the centter p point of the ballloon clearly marked m with a focus f point. Thhe o order was varieed and each su ubject was shown 40 balloonns. T user’s raw The w and smootheed gaze positioons were loggeed f each ballooon. At the end for d of the study users were prres sented with a 7-point Likeert scale questtionnaire whicch a asked them whhich condition was easier annd whether theey f found the focuss point at the ceenter of the ballloon useful. We computed the standard deviation W d of thee Euclidean diistaance of each gaze g point from m the center pooint of the targeet. T results from The m the study sh how that withiin the bounds of thhe measurablee accuracy of the eye tracker1 , the use of o f focus points didd not have a sig gnificant impaact in concentraatinng the user’s gaze on the center of the taarget. The queestiionnaire resultts however, in ndicate that subbjects found thhe c condition with the focus poiint easier and found the focuus p point to be useeful when tryin ng to look at the t center of thhe taarget. These reesults are conssistent with ourr findings in thhe o original EyePoiint paper [5]. We conclude that W t while the use of focus points may not n m measurably impprove the accu uracy of the raw w gaze data from thhe eye trackerr, they do indeeed make poinnting easier annd p provide a betteer user experien nce. This is illlustrated by Figgu 5, which shhows two view ure ws of the magnnified view from E EyePoint. If thee subject inten nds on clickingg in the text areea inn the bottom right of the mag gnified view, the t task is easiier f the subject in the conditio for on with focus points, p since thhe f focus points proovide a visual anchor for the subject to focuus u upon. C CONCLUSION

The techniquess presented ab T bove improve the use of gazze f input by addressing chan for nges in how we w interpret thhe g gaze data from eye trackers (ssaccade detection and smoothhinng), how to match m gaze inpu ut with an exterrnal trigger (eyye h hand coordinatiion) and by inttroducing featuures which makke itt easier for thhe user to look k at the desireed target (focuus 1

The accuracy of the eye track ker is approxim mately 1˚ of visuual d of 33 pixels in any direction (ddiangle which prrovides a spread ameter of spreaad ~66 pixels)

1. Aruulampalam, M. S., S. Maskelll, N. Gordon, and a T. Claapp. A Tutoriall on Particle Fiilters for Onlinne Nonlineaar/Non-Gaussiaan Bayesian Trracking. IEEE Transactionns on Signal Processing P 50(22). pp. 174, 20002. 2. Beiinhauer, W. A Widget Librarry for Gaze-bassed Interacction Elements.. In Proceedinggs of ETRA: Eyye Trackingg Research andd Applications Symposium. S Saan Diego, Callifornia, USA: ACM Press. pp. p 53-53, 20066. 3. Kuumar, M., GUID De Saccade Deetection and Sm moothing Alggorithm. Technnical Report CS STR 2007-03, Stanford Unniversity, Stanfo ford 2007. httpp://hci.stanfordd.edu/cstr/reports/2007-03.pddf 4. Kuumar, M., T. Gaarfinkel, D. Booneh, and T. Winograd, W Redducing Shouldeer-surfing by Using U Gaze-bassed Password Entry. Techhnical Report CSTR C 2007-05, Stanford Unniversity, Stanfo ford 2007. httpp://hci.stanfordd.edu/cstr/reports/2007-05.pddf 5. Kuumar, M., A. Paaepcke, and T. Winograd. EyyePoint: Praactical Pointingg and Selectionn Using Gaze and a Keyboaard. In Proceeddings of CHI. San S Jose, Califfornia, US SA: ACM Presss, 2007. 6. Kuumar, M., T. Winograd, W and A. A Paepcke. Gaazeenhhanced Scrollinng Techniques. In Proceedinggs of CHI. Sann Jose, Californnia, USA: ACM M Press, 2007.. 7. Maajaranta, P., A. Aula, and K.-JJ. Räihä. Effeccts of Feeedback on Eye Typing with a Short Dwell Time. T In Prooceedings of ETRA: Eye Traccking Researchh & Applicattions Symposiuum. San Antoniio, Texas, USA A: ACM Preess. pp. 139-466, 2004. 8. Maajaranta, P., I. S. S MacKenzie, A. Aula, and K.-J. K Räiihä. Auditory and a Visual Feeedback During Eye Typingg. In Proceedings of CHI. Ft. Lauderdale, Fllorida, US SA: ACM Presss. pp. 766-67, 2003. 2 9. Moonty, R. A., J. W. W Senders, annd D. F. Fisher, Eye Moovements and thhe Higher Psycchological Funnctions. Hilllsdale, New Jeersey, USA: Errlbaumpp. 19788. 10. Sallvucci, D. D. Innferring Intent in Eye-Based Interfaces: Traacing Eye Movvements with Process P Modelss. In Proceeedings of CHI. Pittsburgh, Peennsylvania, US SA: ACM Preess. pp. 254-61, 1999. 11. Sallvucci, D. D. annd J. H. Goldbberg. Identifyinng Fixationns and Saccadees in Eye-Trackking Protocolss. In Proceeedings of ETRA A: Eye Trackinng Research & Applicationns Symposium.. Palm Beach Gardens, G Floridda, USA: AC CM Press. pp. 71-78, 7 2000. 12. Tobbii Technologyy, AB, Tobii 17750 Eye Trackker, 2006. Sw weden. http://ww ww.tobii.com 13. Yaarbus, A. L., Eyye Movements and a Vision. Neew York: Pleenum Presspp. 1967. 14. Zhaai, S. What's inn the Eyes for Attentive A Inputt, Commuunications of thhe ACM, vol. 466(3): pp. 34-399, March, 20003.