Using Eye Region Biometrics to Reveal Affective ... - Semantic Scholar

Using Eye Region Biometrics to Reveal Affective and Cognitive States Ric Heishman, Zoran Duric, Harry Wechsler George Mason University Department of Computer Science Fairfax, VA 22030 {rheishman,zduric,wechsler}@cs.gmu.edu Abstract Various facial region biometrics have been used extensively in the areas of recognition and authentication. However, some regions of the face provide more information than is currently being fully utilized in these specific capacities. Biometrics associated exclusively with the eye region hold a key to identifying and classifying particular affective and cognitive states. This paper focuses on 1) methods for identifying and deriving the appropriate biometric data inherent to the eye region that is most useful in specific HCI scenarios and, 2) outlining a framework for classification of these biometric data into affective and cognitive states relative to a particular HCI context.

1. Introduction Various biometrics have been used extensively in the recognition and authentication of individuals. Features from the entire face region are collected and used in the recognition of individual subjects from patterns in existing data repositories. Information from iris and pupil patterns can be used in the further authentication of those individuals. However, features in particular areas of the human face provide more information than is currently being fully utilized in these specific capacities. Biometrics associated with the eye region hold the key to identifying particular affective and cognitive states relative to an individual subject within specific HCI contexts. This paper focuses on 1) identifying and deriving the appropriate biometric data inherent to the eye region that is most useful in specific HCI scenarios and, 2) outlining a framework for classification of these biometric data into affective and cognitive states relative to a particular HCI context. The identification of the appropriate biometric data requires an understanding of known characteristics of the eye and eye

region and of potential problems involved in the interpretation of features in this area of the human face. Derivation of data associated with this region requires the use of existing and novel feature extraction techniques for the acquisition of information and the observation of patterns and feature interactions. Finally, this derived information can then be extrapolated into specific classifications of affective and cognitive states using a rule-based multi-dimensional matrix. Our specific focus at this juncture involves the identification and classification of cognitive engagement and fatigue of computer users involved in direct and indirect cognitive activities in a controlled lab environment. The balance of the paper is apportioned as follows: Section 2 will briefly discuss some existing efforts relative to this area of research; Section 3 will present the requisite set of biometrics; Section 4 will describe the biometric processing; Section 5 provides an overview of the general framework for the analytical engine and Section 6 presents our conclusions and plans for extended work in the area.

2. Related Work Most work exclusively focused on the eye region in computer vision centers on matters peripheral to our area of interest. Mainstream efforts in this area typically involve gaze tracking, for the purpose of determining the location of the subject’s attention, or gaze control, for visual GUI interaction; the use of individual eye region biometrics in the identification and authentication of individual subjects; and the monitoring of pupillary response relative to various stimuli. There are also efforts that more closely parallel ours in the derivation of cognitive and emotional states. A person’s affective state can be correlated with visual features derived from images of the mouth, eye and eye region [4, 7]. Some efforts that focus specifically on the recognition of affective states involve the use of Paul

Elkman’s Facial Action Coding System (FACS) and facial features to assemble facial expressions from component parts; speech recognition systems that analyze neutral-content speech to recognize emotional expression, and efforts that employ a combination of similar strategies. Still other efforts examine certain aspects of facial biometrics to gauge driver fatigue [2] and basic cognitive engagement [6]. Our effort departs from those in the areas mentioned here in that it focuses exclusively on the eye region biometrics to identify and characterize specific affective and cognitive states relative to a particular HCI context without the use of formal coding mechanisms. It accomplishes this using a collection of specific eye region biometrics, novel applications and a unique multidimensional framework.

3. Biometric Identification The eye region contains many potential features of interest to the biometric researcher. The specific biometrics central to our effort are the irises, pupils, eyelids, eye folds, eyebrows, blink characteristics and other derived parameters relative to these features. Figure 1 depicts the basic eye region biometrics relevant to our effort.

Figure 1. Basic Eye Region Biometric Diagram

These basic biometrics can then be used in certain combinations to provide additional derived biometrics, for example: • Upper/ Lower Eyelid Area (ULEA): The polygonal area between lines representing the upper and lower eyelids. This essentially represents the combined visible area of the iris, pupil and sclera. • Upper Fold / Lower Eyelid Area (UFLEA): The polygonal area between lines representing the upper eye fold and the lower eyelid. This represents one perspective on the eye region that affords insight into the state of the eye region. and sclera.

Additional insight into the state of the eye region can be obtained by examining different combinations of these basic biometrics in the manner described above. The eye region features in Figure 1 can be split into two basic categories. The more novel of these categories consists of the set of pupils, which is uniquely characterized by the branch of psychology termed pupillometry. Pupillometry, the psychology of the pupillary response, is a field of psychology that has long focused on the potential information that can be derived from the pupil within certain contexts. Within this field, psychologists generally agree on a few observations relative to the focus of this paper [3]. First, that pupils dilate during increased cognitive activity and constrict or return to some baseline when the activity decreases. This pupil size variation is considered to be involuntary and therefore constitutes a valid index of autonomic nervous system activity. Additionally, there is evidence that the sporadic and somewhat constant motion of the pupil (referred to as pupillary unrest or hippus) is more accentuated under conditions of fatigue or drowsiness. There are also various environmental conditions that can present an impact on pupil size and response, including the amount and intensity of light in the subject’s viewing area (particularly when the target of interest is itself a light source); the changing proximity of the subject to the point of focus; as well as effects from various emotional and audio input. Potential impact from these factors are controlled to the extent possible in our current effort and are therefore ignored. There are a number of potential pupil biometrics that may be employed in characterizing pupil response. In addition to basic pupil diameter, peak and minimum size are important in discerning the degree of the pupil response within a given context. Information can also be derived from the latency to peak size and the pupil size variance (potentially the best measure to represent concentration). Our current effort will focus strictly on relative pupil size. The second category of eye region biometrics comprises all other features listed beyond the pupils. These features have garnered a bit more attention from researchers in our area of interest and are generally qualified by the psychological literature for usefulness in this capacity [4]. Pertinent biometrics in this category, in addition to those aforementioned derived biometrics, include: • eyelid position relative to the iris • eyebrow position relative to the iris • occlusion of the iris by the upper and lower eyelid • eye blink frequency and patterns

4. Biometric Processing The first step was to identify those biometrics that produced meaningful and measurable responses within the prescribed HCI scenarios. To accomplish this, five subjects (Data Set 0)were digitally videotaped while actively engaged in interaction with various activities on a laptop computer in a controlled laboratory environment designed for video and image collection. These activities included reading, artistic challenges, puzzle solving, and daydreaming. A Canon ZR-20 digital video camcorder was used for color video collection. Reflective halogen lighting was utilized to maximize ambient light while minimizing glare and hotspots. The videotaping was restricted to the eye region of the subjects, in order to maximize the resolution of the captured eye region biometrics and to insure a degree of anonymity for the volunteer subjects. Each subject was maintained at a fixed distance from the video camera, and horizontal and vertical movement of the subjects eye region was voluntarily minimized. Normalization is provided by correlation between right and left eyes and iris size, which constitutes a constant (i.e., unchanging) biometric. Adobe Premiere was used to capture the digital video to disk and to extract specific video frames for subsequent processing and analysis by the Eye Region Biometric Processing System (ERBPS). ERBPS was designed and written in the Java Programming Language to provide manual processing and analysis of extracted color video frames. ERBPS allows the user to mark designated eye region biometrics visually and capture the coordinates of the individual biometrics and marked image files. In subsequent real-time systems, ERBPS may be used for base-lining and manual system initialization. The preliminary test subjects were used to determine that the postulated biometrics were potentially useful in the identification and classification procedures and these initial tests supported that assertion. They also demonstrated, however, that certain biometrics of individual subjects would not be as useful as anticipated. For example, subjects with very light eyebrows or underdefined lower eye folds might make related derived biometrics unusable. Given this initial assessment of the proposed set of biometrics, five additional subjects were selected for the next stage of the process. The five subjects of Data Set 1 participated in each of four experimental sessions. These were designated: FD (Fatigued/Disengaged), FE (Fatigued/Engaged), ND (Non-Fatigued/Disengaged), NE (Non-Fatigued/Engaged). The fatigued sessions were held late in the evening after a full continuous day. The non-fatigued sessions were held early in the morn-

ing after a full nights rest (minimum of 8 hours). In the disengaged sessions, the subject maintained continuous eye contact with the computer screen and was instructed to clear their minds (defocus) or daydream. The length of these sessions averaged 15-20 minutes each. The engaged sessions involved the subjects exercising problem solving skills on a specific graphical puzzle activity. The length of these sessions averaged 20-30 minutes each. Figure 2 presents sample frames from each of the classification categories for a single subject. These frames were processed using ERBPS.

Figure 2. Sample frames from each of the classification categories for a single subject

Figure 2 provides the following observations: • disengaged frames (FD, ND) reflect smaller pupil diameters than the engaged frames (FE, NE) • nonfatigued frames (ND, NE) reflect increased iris occlusion over the fatigued frames (FD, FE) • nonfatigued frames (ND, NE) reflect higher Eyebrow / UpperFold area over the fatigued frames (FD, FE)

Figures 3– 4 present eye blink data for the five Data Set 1 subjects. Each figure depicts the four classification categories (FD, FE, ND, NE) for a single subject.

Subject #3 25

Subject #1 20

20

15

blink rate

25

10

10

5

5

0

blink rate

15

ND

NE

FD

FE

FD

FE

FD

FE

fatigue/engagement Subject #4 25

0 ND

NE

FD

FE

fatigue/engagement Subject #2 20

20

15

blink rate

25

10

10

5

5

0

blink rate

15

ND

NE fatigue/engagement Subject #5

25

0 ND

NE

FD

FE

fatigue/engagement

20

Figure 3. Eye blink data for Subjects 1 and 2 Figures 3 and 4 provide the following observations:

15

blink rate

• Subjects 1, 3 and 4 afford a significant correlation in the patterns across the four categories

10

• Subject 2 exhibits blink patterns that are similar to subjects 1, 3 and 4, although less exaggerated • Subject 5 represents an individual with relatively consistent blink rates regardless of scenario • Baseline blink rates can vary significantly between subjects The classification process can be automated using the following technique. Figure 5 depicts a 7x4 matrix

5

0 ND

NE fatigue/engagement

Figure 4. Eye blink data for Subjects 3–5

nonfatigued/engaged

fatigued/engaged 1.6

normalized area

normalized area

1.6 1.5 1.4 1.3 1.2 1.1 1

1.4 1.3 1.2

1

right

left eye

nonfatigued/disengaged

fatigued/disengaged

normalized area

1.6

1.4 1.3 1.2 1.1

1.5 1.4 1.3 1.2 1.1

left

1

right

left

eye nonfatigued/engaged

fatigued/engaged 1.1 1

normalized area

normalized area

1 0.9 0.8 0.7 0.6

0.9 0.8 0.7 0.6

left

right

left

eye nonfatigued/disengaged

fatigued/disengaged 1.1 1

normalized area

normalized area

1 0.9 0.8 0.7 0.6

0.8 0.7

right

left eye

nonfatigued/engaged

fatigued/engaged

0.6

0.6

0.5

0.5

0.4 0.3 0.2 0.1

0.4 0.3 0.2 0.1

left

0

right

left

right

eye

eye

nonfatigued/disengaged

fatigued/disengaged 0.6

0.5

0.5

iris occlusion

0.6

0.4 0.3 0.2 0.1 0

right

eye

iris occlusion

iris occlusion

Figure 5. Key Frame matrix for eye blink

iris occlusion

0.9

0.6 left

0

right eye

1.1

Fatigue and engagement are the affective and cognitive states of interest in the classification phase. As demonstrated in the previous section, numerous metrics can be utilized in the biometric analysis and classification phase of these states – some more useful than others for a given subject. At this stage of our work, available biometric data sets are manually analyzed and the significant biometrics are used as input into the FE Matrix. Figure 6 depicts individual subject FE matrices for the following derived biometrics:

right eye

1.1

5. Classification Framework

right

eye

1.5

1

1.5

1.1 left

1.6

normalized area

representing the frames at which some change occurs (e.g., blink, look down, engage, etc.). We refer to these as key frames. The images were obtained by selecting a region of interest in the first frame and comparing the region to subsequent frames. When a considerable change occurred the frame was tagged as significant (i.e., a key frame). Change was considered significant if the sum of squared differences (SSD) between the region of interest in the previous frame and the current frame was larger than a threshold T (empirically chosen). The key frames can than be clustered to reduce their number and labeled so that changes in images can be recognized as events.

0.4 0.3 0.2 0.1

left

right eye

0

left

right eye

• Upper / Lower Eyelid Area (ULEA) • Upper Fold / Upper Eyelid Area (UFUEA) • Iris Occlusion (IO)

Figure 6. Sample individual subject FE matrices – ULEA (top two rows), UFUEA (middle two rows), IO (bottom two rows)

These derived biometrics are normalized by the iris radius, which constitutes a stable biometric. Figure 6 provides a few observations for this particular subject: • The ULEA matrix shows that the ND state pattern is unique in range, and that the right eye tracks consistently lower in the NE and ND states • The UFUEA matrix shows that the ND state pattern is also unique in range • The IO matrix shows that the FE and ND state patterns track higher and lower (respectively) than the more consistent NE and FD state patterns Finally, Figure 7 provides a sample FE matrix depicting the combined blink data for all test subjects of Data Set 1.

20

20

15 10 5 0

[1] K. Grauman, M. Betke, J. Gips, and G. Bradski. Communication via Eye Blinks Detection and Duration in Real Time. In Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1010–1017, 2001.

15 10 5

#1

#2

#3 subject

#4

0

#5

#1

20

20

15 10 5 0

#2

#3 subject

#4

#5

[2] H. Gu, Q. Ji, and Z. Zhu. Active Facial Tracking for Fatigue Detection. In Proc. VI IEEE Workshop on Applications of Computer Vision, 137– 142, 2002.

fatigued/disengaged 25

blink rate

blink rate

nonfatigued/disengaged 25

15

[3] M.P. Janisse. Pupillometry. Hemisphere Publishing Company, Washington, 1977.

10 5

#1

#2

#3 subject

#4

#5

0

7. References References

fatigued/engaged 25

blink rate

blink rate

nonfatigued/engaged 25

Future work related to this effort includes extended experimentation across additional subjects to further refine and stabilize the biometric behavior feedback using the key frame technique. Another interesting direction focuses on the potential generalization of identified trends and characteristics across multiple subjects. This will require identification and understanding of anomalies across individuals and specific subgroups of subjects. There is also an opportunity to investigate those aspects of pupillometry that were currently unused, such as latency trends and peak minimum/maximum size as they relate to and enhance the understanding of cognitive engagement.

#1

#2

#3 subject

#4

#5

Figure 7. Fatigue Engagement (FE) Matrix populated with eye blink data from 5 subjects

6. Conclusions and Future Work In this paper we have identified eye region biometrics that can be uniquely and collectively employed to characterize a single subjects specific affective and cognitive state within in a particular HCI context (e.g., video security system monitoring). At this juncture the results obtained via manual processing and static data sets (over 15 hours of color video across 10 subjects) indicate that the process is potentially useful within defined HCI contexts and thus provides a basis for expanded research in the area. Further, we have demonstrated that the classification process can be automated using the Key Frame technique.

[4] M.L. Knapp and J.A. Hall. Nonverbal Communication in Human Interaction. 4th ed., Harcourt Brace College Publishers, Ft. Worth, Texas, 1997. [5] T. Moriyama, T. Kanade, J. Cohn, J. Xiao, Z. Ambadar, J. Gao, and H. Imamura. Automatic Recognition of Eye Blinking in Spontaneously Occuring Behavior. In Proc. XVI International Conference on Pattern Recognition, 40078–40081, 2002. [6] M. Nakayama, K. Takashi, and Y. Shimizu. The act of task difficulty and eye-movement frequency for the Oculo-motor indices. In. Proc. Eye Tracking Research and Applications Symposium, 37–42, 2002. [7] R. Picard, E. Vyzas and J. Healey. Toward machine emotional intelligence: Analysis of Affective Physiological State. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 1175– 1191, 2002.