Measuring Emotional Arousal for Online ... - Semantic Scholar

2013 Humaine Association Conference on Affective Computing and Intelligent Interaction

Measuring Emotional Arousal for Online Applications: Evaluation of Ultra-Short Term Heart Rate Variability Measures Kristina Schaaff

Marc T. P. Adam

FZI Research Center for Information Technology Karlsruhe, Germany Email: [email protected]

Karlsruhe Institute of Technology Karlsruhe, Germany Email: [email protected]

Abstract—The objective of this paper is to examine the possibilities and limitations of heart rate variability (HRV) as an indicator of emotional arousal for mobile applications which require online biofeedback. In contrast to offline classification, feature extraction for online applications sets other requirements to the window size in which data is analyzed as the delay between a change of a person’s arousal level and the reaction of an application should be as short as possible. For this purpose we compare various HRV features in order to evaluate how far window size can be decreased to enable online arousal recognition. Using data from a study where high and low arousal were induced in a game scenario, HRV features are analyzed for their discriminatory power depending on the window size using Fisher’s discriminant analysis. Moreover, we use these features to train an SVM based classifier. Results indicate that for some features it is possible to use ultra-short term window sizes, i.e. window sizes shorter than the 5 minute window which has traditionally been used for short term HRV analysis.

I.

can be advantageous in simple tasks, for difficult tasks it can cause a decrease in performance [4]. To achieve the best learning performance, it is therefore crucial to detect the point of optimal arousal level during a task, and provide online biofeedback based on ultra-short time windows. Traditional methods like questionnaires and interviews have often been used to collect information about the emotional state. However, these methods can only be used to assess the conscious level of emotional processing and are difficult to obtain in dynamic environments. In particular, when using selfreport questionnaires and interviews, the subjects (i) need to be interrupted during their task or (ii) need to report their experience after the task is over and (iii) answers can be influenced by subjective factors. In contrast, physiological measurements can be assessed continuously in the very moment of emotional processing. Moreover, they are less prone to intentional manipulation. Due to these reasons, it is important to complement existing methods for assessing subjects’ emotional states with physiological measurements which also allow insight into the subconscious facets of human emotional experience [5]. With the technical advances in sensor technology, more and more physiological parameters can be assessed in an unobstrusive way and in field environments.

I NTRODUCTION

Emotions have an important influence on our behavior in everyday life. They facilitate our interactions in social environments, shape our expectations and influence our performance in various kind of tasks. For this reason, more and more human computer interfaces aim at integrating information about the emotional state of the user. Also for learning environments information about the emotional state can help to increase learning outcome. The concept of emotion aware learning environments was first introduced by Rosalind Picard in 1995 [1]. Especially when learning is done in a mobile environment, information about the emotional state of the user can help to incorporate changing environmental conditions. For instance, task difficulty can be adjusted to the current emotional state of the user or results can be presented in relation to results when a user has been in a similar state [2]. The circumplex model of affect [3] describes emotions in a two-dimensional space which is spanned by the two axes valence and arousal. In this paper we will focus on the recognition of emotional arousal. Previous research found that emotional arousal is strongly related with learning performance. According to the Yerkes-Dodson-Law there exists an optimal arousal level for a high learning performance. There are different interpretations of the original publication. One approach how to interpret the findings is depicted in Figure 1. It illustrates that while a very high arousal level 978-0-7695-5048-0/13 $26.00 © 2013 IEEE DOI 10.1109/ACII.2013.66

Fig. 1.

The Yerkes-Dodson-Law [4]

During the last years, a lot of research has been done towards systems that are able to recognize arousal from physiological data. For instance in [6] a recognition rate of 82% could be achieved on a 5 point arousal-scale using ECG, EDA, breathing rate, temperature and electromyographic signals. In [7] even higher recognition rates were achieved (89.73% on 362

leads to an acceleration of HR [20]. A comprehensive overview of HRV and the relationship to other physiological processes can be found in [20].

a continuous arousal scale with a bandwidth of 10%) using a neural network classifier on features extracted from EMG, EDA, skin temperature, blood volume pulse, electromyography and respiration. Building on these results, a lot of research has been done towards learning systems which consider affective information in a stationary learning environment (e.g. [8]– [10]). Also other application areas such as games (e.g. [11]– [13]) have been subject of investigation.

With respect to HRV, a large number of different features can be analyzed. Analysis can either be done in the time or in the frequency domain. The most common features which are used for HRV analysis are described in the following. In the time domain, HRV features can be analyzed using statistical, geometrical or Poincaré plot-based methods. The following statistical features are commonly used to describe HRV:

When assessing physiological signals in in a mobile scenario, it is important that the sensors used for signal recording are as unobtrusive as possible. Therefore, multimodal approaches may be inappropriate as they require multiple sensors to capture all signals. To the best of our knowledge, so far there exist no recording devices which allow collecting multimodal physiological signals in an unobtrusive way over a longer period of time. For this reason, in the recent study we focus solely on arousal recognition from ECG signals as there exist sensor solutions like the ekgMove ECG belt1 which can be worn throughout the day without disturbing a person.

SDNN: Standard deviation of the NN intervals

•

RMSSD: Square root of the mean of the sum the squared differences between adjacent NN intervals

•

pNNx: number of adjacent NN intervals with a minimum difference of x ms divided by the number of NN intervals in the corresponding time frame

For pNNx a value of 50 ms is most commonly used. However, there has been evidence that shorter durations of pNNx can help to improve analysis of HRV changes [21].

Emotion recognition based on the ECG signal has so far mainly been done using features extracted directly from heart rate (HR). Heart rate variability (HRV), which refers to the beat-to-beat variations of HR, has often been disregarded since traditionally HRV analysis has usually been done using 5 minute or 24 hour ECG recordings [14]. It is obvious, that this duration is inappropriate for applications which require online feedback about the emotional arousal level of a user. However, analyzing HRV can help to obtain more detailed information about the interplay between the sympathetic and the parasympathetic nervous system. Therefore, researchers have started to evaluate even shorter time frames. First studies dealing with so called ultra-short term HRV analysis indicate that window length can be decreased to shorter window sizes ranging from 10 to 60 seconds depending on the analyzed feature. While most of these studies dealt with the accuracy and reliability of ultra-short term HRV measures [15]–[17], other studies investigated how far ultra-short term measures can be used to detect mental stress [18], [19].

Geometrical measures refer to the distribution of the NN intervals in a histogram. The HRV triangular index is the integral of the density distribution divided by the maximum of the density distribution while TINN is the triangular interpolation of RR intervals in the histogram. Compared to other HRV features geometrical measures have the advantage that they are relatively insensitive to artifacts [22]. HRV features in the time domain can also be computed based on the geometry of the Poincaré plot. Poincaré plots are used to illustrate the differences between adjacent NN intervals. The scatter plot provides information about the distribution of the NN intervals. The plot geometry is described by the standard deviations SD1 and SD2 illustrated in Figure 2. While SD1 reflects the instantaneous beat-to-beat variability, SD2 gives information about the standard deviation of continuous long-term NN intervals. In other words, higher HRV causes a decrease in SD1 and an increase in SD2. From the standard deviations, the SD1/SD2-ratio can be computed [23].

In the current paper we use ECG data from a study where high and low arousal were induced to find out to what extent different ultra-short term HRV features can be used to discriminate between emotional arousal levels of a person. Differences in discriminatory power will be analyzed and compared to the recommended window size of 5 minutes using Fisher’s discriminant ratio. In contrast to previous studies on ultra-short term HRV features, we will also evaluate how far the features are appropriate candidate features for classification based on support vector machines (SVM). II.

•

In the frequency domain, three main spectral components are commonly used for HRV analysis [14]: •

very low frequency (VLF): 0.003 - 0.04 Hz

•

low frequency (LF): 0.04 - 0.15 Hz

•

high frequency (HF): 0.15 - 0.4 Hz

While high power in the LF spectrum reflects mainly sympathetic activity, high power in the HF spectrum can be seen as an indicator for high parasympathetic activity. The LF/HF-ratio is a commonly used indicator which gives information about the interplay between both, SNS and PNS.

T HEORETICAL BACKGROUND

HRV reflects the regulatory capacity of the autonomic nervous system (ANS). The ANS can be subdivided into two branches: the inhibitory parasympathetic nervous system and the excitatory sympathetic nervous system. Both systems interact antagonistically which causes the variations in interbeat intervals of subsequent heart beats. During relaxing periods, the parasympathetic nervous system is more active which decreases HR while arousal increases sympathetic activity and

Most of the features described above are usually computed for a window size of at least 5 minutes [14]. The only exception are the geometrical measures. For these features, the recording length of the signal used for analysis should be at least 20 minutes [14]. In the following, we will examine to what extent these features can used as a meaningful indicator to discriminate emotional arousal for smaller window sizes.

1 http://www.movisens.de

363

2) Software: During the experiment, the participants faced an arousing game environment. The game environment aims at inducing different levels of arousal (see experimental procedure) and was developed in Java. The physiological data was recorded using the xAffect software framework for biosignal processing [27].3 The software has been designed for online affect recognition and biofeedback applications and can also be used to record and store multisensor data. The modular design of the xAffect software allows the easy integration of the required sensors and algorithms depending on the study requirements. Ex post data processing and classification for our analyses was done using the MATLABTM software. B. Game Design The aim of the current study is to examine physiological changes in relation to different arousal levels. For this purpose, a game was designed to induce two different arousal levels. First, in the high arousal (HA) condition, the player faces a highly arousing game environment. Second, in the low arousal (LA) treatment, all arousing game elements are disabled. Fig. 2.

The major task of the game is to continuously find a sequence of 5 symbols on the screen until the game ends after 10 minutes. The sequence to find is displayed in the center of the screen. On the lower screen there is a list of symbol sequences the player has to choose from. The number of symbol sequences to choose from depends on the treatment condition. In the LA treatment, there are 5 different symbol sequences the player has to choose from and a relaxing music is being played in the background. In the HA treatment, there are 20 different symbol sequences the player has to choose from and a fast paced music is being played in the background which increases during the game. The competitive scenario in the HA treatment aims at increasing the subjects’ arousal. The players get 20 points for a correct decision while they lose 10 points for an incorrect decision. If they do not decide for a symbol sequence for 7 seconds they automatically lose 30 points. Thus, taking no decision is worse than taking an incorrect decision. Finally, correct and incorrect decisions are followed by intense sounds. All these game features are deactivated in the LA treatment.

Poincaré plot for 5 minutes of ECG data during baseline

III.

DATA C OLLECTION

For this study, 144 subjects were recruited from a pool of undergraduate students using the ORSEE software environment [24]. 113 subjects were male and 31 female with a mean age of 22.86 years. Due to technical problems during the recording only 117 data sets could be used for analysis of ECG data. A. Equipment In the following we give an overview about the sensors and the software which was used to conduct the study and to analyze the data. 1) Hardware: To capture the ECG data for HRV analysis two different devices were used: •

•

The laboratory measurement system [25] which was developed at Karlsruhe Institute of Technology is a stationary system which can record a several physiological signals simultaneously at a high signal quality.

C. Experimental Procedure The experiment is based on a between-subjects design, i.e., the subjects are randomly assigned to either play the HA or the LA game, but not both. For participating in the experiment, the players receive a monetary compensation. In the LA treatment, the subjects have to throw a dice at the experiment. Depending on what number the dice shows, the subjects receives a monetary payoff between 0 EUR (number 6) and 15 EUR (number 1). In particular, the subjects cannot collect points in the game and their game performance has no influence on their individual payoff. In the HA treatment, however, a subject’s individual payoff depends on how many points he or she collected in the game in comparison to the other subjects in the same session of the experiment. If a subject gained more points than all other players of his or her session, the player receives a payoff of 15 EUR. If a subject gained fewer points than all others, the player receives a payoff of 0 EUR.

The varioport-e2 is a mobile recording system for ECG, photoplethysmographic (PPG) and electrodermal activity (EDA) data.

Both systems do not support wireless real-time transmission of the recorded data and are thus not suitable for mobile biofeedback scenarios. Nevertheless, we decided to use these devices for the current study as we also wanted to capture PPG and EDA. Additionally, we recorded the force applied to a mouse using a force sensitive mouse. The results for mouse button force are published in [26]. In the present article, we focus specifically on the unpublished results regarding heart rate and heart rate variability. 2 http://www.becker-meditec.de

3 http://www.xaffect.org

364

TABLE I.

M EAN VALUES AND STANDARD DEVIATIONS FOR THE RESPECTIVE WINDOW SIZES AND ANOVA Window size 15

meanNN SDNN RMSSD pNN12 pNN20 pNN50 SD1 SD2 SD1/SD2 LF HF LF/HF

776.600 39.840 34.403 0.619 0.476 0.147 25.942 47.517 0.578

(127.745) (20.152) (20.022) (0.221) (0.238) (0.158) (13.970) (25.146) (0.218) -

30 777.137 47.041 36.521 0.627 0.477 0.159 27.194 58.937 0.493 0.321 0.186 3.030

(127.241) (23.506) (19.877) (0.205) (0.224) (0.162) (14.140) (31.186) (0.197) (0.141) (0.104) (3.209)

LA TREATMENT

ANOVA results 60

776.689 48.984 36.682 0.629 0.471 0.160 26.352 62.881 0.439 0.305 0.156 2.997

RESULTS FOR

300

(127.145) (22.293) (18.228) (0.192) (0.215) (0.154) (12.875) (29.995) (0.173) (0.129) (0.085) (2.655)

774.924 51.282 35.824 0.623 0.463 0.156 25.429 67.444 0.370 0.274 0.119 3.083

(125.073) (17.577) (17.358) (0.187) (0.209) (0.140) (12.313) (22.704) (0.117) (0.103) (0.065) (2.159)

F -Value 0.003 3.207* 0.175 0.029 0.049 0.083 0.181 5.584* 13.697*** 2.114 8.595*** 0.015

SE 23.547 3.905 3.511 0.037 0.041 0.029 2.478 5.102 0.033 0.023 0.016 0.503

HSD95% 60.942 10.108 9.086 0.097 0.101 0.074 6.414 13.206 0.087 0.055 0.038 1.189

* p < .05; ** p < .01; *** p < .001

IV.

M ETHOD

This section describes the methodology which was used in this study to analyze the data. We start with a validation as to whether our study design did adequately induce high arousal in the participants. After some details about feature extraction, specific features are analyzed and rated for their discriminatory power using Fisher’s discriminant analysis. In the last step, a subject independent classifier is developed using a support vector machine (SVM). A. System Validation As a first step, it has to be validated, whether the game is an appropriate method to induce emotional arousal. For this purpose we used the SAM ratings which were filled in (i) after the baseline period and (ii) after playing the game. In order to test whether the two treatments in fact induced different levels of arousal, we analyze the reported arousal and valence scores after the baseline period and after playing the game. As expected, a set of Wilcoxon rank-sum tests reveals no significant difference in arousal after the baseline period (W = 5138.5, Z = −.342, p = .733) while there is a significant difference in arousal between the LA and the HA treatment after playing the game (W = 3952.0, Z = −5.126, p < .001). In other words, the subjects’ reported arousal levels after playing the game are significantly stronger for the HA treatment than they are for the LA treatment. By contrast, and as intended by our design, we can neither observe a significant difference in valence between the two treatments after the baseline period (W = 5214.5, Z = −.023, p = .982) nor after playing the game (W = 4989.0, Z = −.953, p = .341). Thus, in terms of SAM self-reports, the game manipulation in fact induces a significant increase in arousal with comparable levels of valence.

Fig. 3.

Heart rate characteristics during the game

B. Feature Extraction For both treatments, features are extracted from the middle of the game. Before computing HRV features for the respective windows sizes, R-peaks have to be detected in the raw ECG signal. For this purpose, we use the OSEA algorithm [28]. Artifact correction is done following the recommendations by [29]. For further computation of the HRV parameters we use only valid normal beats. Other beats like premature ventricular contractions are excluded from further analysis. The differences between two normal beats are referred to as NN-intervals.

Another indicator that emotional arousal has been induced can be seen in the characteristics of the heart rate curve during the game. Figure 3 shows the mean relative changes of heart rate during the game compared to the baseline period averaged over all subjects for the respective treatment. For the HA treatment, there is a strong increase at the beginning of the game while for the LA treatment HR remains constantly at a low level during the gameplay. A t-test confirms significant differences between both treatments for relative HR changes during the game (t75.309;95% = 9.624, p < .001).

Time domain features can be computed directly from the NN-intervals. For our analysis, we compute SDNN, RMSSD, pNN12, pNN20 and pNN50. Additionally, SD1, SD2 and the SD1/SD2-ratio describing the Poincaré geometry are computed. Geometrical measures are excluded from our analysis as the minimum recommended duration of 20 minutes makes them inappropriate for short term analysis [14]. This is also 365

TABLE II.

M EAN VALUES AND STANDARD DEVIATIONS FOR THE RESPECTIVE WINDOW SIZES AND ANOVA Window size 15

meanNN SDNN RMSSD pNN12 pNN20 pNN50 SD1 SD2 SD1/SD2 LF HF LF/HF

636.361 26.805 21.900 0.405 0.264 0.078 16.644 32.380 0.550

(124.810) (16.356) (16.143) (0.258) (0.231) (0.108) (11.980) (20.117) (0.301) -

30 638.082 30.455 23.034 0.417 0.271 0.077 17.023 38.506 0.456 0.373 0.190 3.077

(124.265) (15.765) (15.870) (0.280) (0.245) (0.108) (11.243) (19.912) (0.224) (0.170) (0.097) (3.974)

HA

TREATMENT

ANOVA results 60

640.425 34.013 23.203 0.426 0.281 0.075 16.846 44.255 0.390 0.326 0.156 3.601

RESULTS FOR

300

(125.079) (16.347) (15.093) (0.270) (0.240) (0.111) (10.721) (21.393) (0.214) (0.151) (0.090) (5.076)

639.639 40.218 24.493 0.427 0.281 0.077 17.430 53.606 0.321 0.234 0.115 3.208

(121.034) (16.962) (15.514) (0.258) (0.231) (0.108) (10.956) (22.470) (0.116) (0.115) (0.073) (2.893)

F -Value

SE

0.012 7.192*** 0.271 0.077 0.068 0.006 0.052 10.901*** 10.504*** 13.514*** 10.941*** 0.264

HSD95%

22.795 3.013 2.883 0.051 0.045 0.021 2.069 3.866 0.043 0.027 0.016 0.751

58.987 7.796 7.461 0.131 0.116 0.054 5.353 10.005 0.110 0.064 0.038 1.776

* p < .05; ** p < .01; *** p < .001

difference between two means is significant at the 5% level.4 For RMSSD, pNNx, SD1 and LF/HF-ratio there are only small variations between different window sizes. However, for other features like SDNN, SD2, SD1/SD2, LF and HF there are large differences between the computed values for the different window sizes. Moreover, mean values of the features from the frequency domain indicate that these features might not be appropriate to discriminate between different arousal levels as the mean values of the respective frequency domain features are quite similar. In general, differences between window sizes are larger for the HA treatment than for the LA treatment.

confirmed by the results presented in [18], [19]. Time domain features are computed for the following window sizes: 15, 30, 60 and 300 seconds. To be able to analyze data in the frequency domain, data has to be transformed to the frequency domain. For this purpose, the Lomb transform [30], [31] is used as this method can handle unevenly sampled data and therefore, no resampling of the HR data is required. In the frequency domain we analyze 30, 60 and 300 seconds as the lowest frequency which can be resolved for a window size of 15 seconds is 0.067 Hz which is above the lower threshold of LF. VLF is excluded from the analysis as the required spectrum cannot be fully resolved when analyzing ultra-short time windows. Additionally, we compute meanNN as the mean of the NN-intervals within the respective window. This is the reciprocal of HR and will be used as a reference for the HRV features.

To gather further insights about the ability of the HRV features to separate between the two arousal classes we compute Fisher’s discriminant ratio (FDR) for every feature. FDR describes the ratio of the between-class variance to the withinclass variance. It is defined as follows: F DR =

There are large interpersonal differences within healthy adults for HRV which are influenced by many factors such as age, sex or physiological condition [32]. As the study utilizes a between-subject design, for further processing we use the normalized features fnorm which are obtained by normalizing the feature from the game fgame using the corresponding features from the five minute baseline period fbase for every subject: fgame − fbase (1) fnorm = fbase

(μ1 − μ2 )2 σ12 + σ22

(2)

where μ1 and μ2 are the means and σ12 and σ22 the variances of classes 1 and 2. TABLE III.

meanNN SDNN RMSSD pNN12 pNN20 pNN50 SD1 SD2 SD1/SD2 HF LF LF/HF

For some of the pNN20 and pNN50 features there occurred zero values during the baseline period. These features were excluded from further analysis. Therefore, in the following for pNN20 and pNN50 NHA = 58 and for pNN50 NLA = 53 while for all other features NHA = 59 and NLA = 58. C. Evaluation of Features

F ISHER ’ S DISCRIMINANT RATIO FOR ALL FEATURES F DR15

F DR30

F DR60

F DR300

1.494 0.337 0.376 0.290 0.098 0.025 0.373 0.298 0.000 -

1.500 0.443 0.376 0.318 0.101 0.013 0.424 0.402 0.002 0.020 0.008 0.000

1.412 0.404 0.419 0.313 0.061 0.014 0.405 0.366 0.011 0.004 0.003 0.000

1.525 0.318 0.321 0.377 0.097 0.017 0.323 0.303 0.040 0.007 0.163 0.007

Table III shows the FDR values for all features and the respective window sizes. The results show that for most features, window size does not decrease class separability

Table I and II summarize the mean values and corresponding standard deviations of the extracted features for the different window sizes in the LA and HA treatment, respectively. Moreover, the tables list the standard errors (SE) of the mean values as well as ANOVA results testing for the influence of window size on the different features. Finally, HSD95% indicates Tukey’s honest significant difference (HSD) for an alpha level of 5%. This value indicates the threshold above which the

4 For instance, the HSD 95% value is .038 for the HF feature in the LA treatment. This means that a difference between two means is significant if it is greater than .038. In this example, the HF value for 30 seconds is significantly different from the HF value for 300 seconds (.186 − .119 = .067 > .038). However, the difference between the HF value for 60 seconds is not significantly different from the HF value for 300 seconds (.156 − .119 = .037 < .038).

366

when the window of the analyzed data is decreased. Moreover, it can be seen that not all features show good class separabilty for the two classes. Especially for features from the frequency domain, class separabiliy is very low compared to the other HRV features. According to the FDR best class separability is given by the mean NN interval.

results obtained from an SVM indicate that a shorter window size does not decrease recognition rates for most of the HRV features. However, recognition rates for the data we used for our investigations are quite low ranging from chance level for frequency domain features to 80% for meanNN. A possible explanation is that classification is performed subject independent, i.e. the classification generalizes over all subjects. HRV parameters are influenced by a large variety of different factors such as age and gender [35] or expertise [36]. Therefore, using HRV parameters for analysis of different arousal states might be better suited in a scenario where a classifier is trained in a subject-dependent way. Besides meanNN—which is more a measure of HR than of HRV—pNN12, pNN20 and RMSSD seem to be very promising features for arousal classification based on HRV as classification results and class separability are pretty stable for different window sizes.

D. Classification For classification we use a support vector machine (SVM) with a linear kernel. For this purpose, libSVM [33] was integrated into MATLAB. Before classification, all features are normalized such that values range from 0 to 1. Due to the amount of available data sets, training and classification are performed using a 10-fold cross-validation. The penalty parameter C is optimized separately for each feature. Based on the recommendations from [34] we use the following values for optimization: C = 2−5 , 2−4 , ..., 215 . In table IV classification results for the ultra-short window size and the window size of 300 seconds are illustrated. As a small window size is very important for online classification of psychophysiological data we use a window size of 15 seconds for time domain features. Window size for the frequency domain features is set to 30 seconds due to the limitations from the frequency transformation explained above. TABLE IV. Feature meanNN SDNN RMSSD pNN12 pNN20 pNN50 SD1 SD2 SD1/SD2 HF LF LF/HF

VI.

In the current study, we investigated to what extent features computed from HRV can be used for online classification of emotional arousal. While HR is a well known indicator for emotional arousal and has already been used as an onlinemeasure of arousal (e.g. [13]), to the best of our knowledge no studies have been conducted using ultra-short term HRV features for arousal feedback, although it is a wellknown indicator for the interplay between the sympathetic and parasympathetic nervous system. To evaluate to what extent shorter durations than the typically used window of 5 minutes can be used for classification, we collected data from two treatment groups: one for high and one for low arousal. Results show, that especially pNN12, pNN20, RMSSD and SD1 are good candidate features for classification for ultra-short window sizes such as 15 seconds. In our study we used a subject independent classifier. We suppose that by using a subject dependent classifier, i.e. personalizing the arousal recognition to one user, a higher accuracy can be achieved. Therefore, it seems promising for future research to include building subject dependent classifiers which are able to distinguish more than two arousal levels. Combining the results with HR based arousal recognition, robust classifiers can be designed which might even be robust enough to be used in a mobile environment.

C LASSIFICATION RESULTS FOR SINGLE FEATURES window size [sec]

ultra-short

reference (300 sec)

15 15 15 15 15 15 15 15 15 30 30 30

0.838 0.675 0.667 0.709 0.718 0.496 0.658 0.667 0.487 0.496 0.487 0.487

0.829 0.624 0.709 0.701 0.744 0.496 0.709 0.624 0.624 0.496 0.607 0.538

As indicated by the FDR, the frequency domain features do not show good class separability and therefore, classification results for both window sizes are around chance level. Also pNN50 and SD1/SD2-ratio show bad classification results close to chance level. RMSSD, pNN12 and pNN20 show comparatively good results. However, they are still outperformed by meanNN which only reflects the mean HR but not HRV. Combination of multiple features does not help to further improve recognition rates. V.

C ONCLUSIONS AND F UTURE W ORK

Using HRV for classifying emotional arousal opens new possibilities for biofeedback applications which are designed for mobile environments, because there already exist unobtrusive ECG sensors which can be worn throughout the day. As described above, using online feedback about the emotional arousal can help to increase learning performance by adapting the task difficulty to the current arousal level of a user. Besides that, there exists a large number of other applications for online arousal recognition such as trading scenarios as emotions have also been found to influence our economic decisions [37], [38]. Future research can therefore also investigate to which extent ultra-short term HRV measures and live biofeedback can help users with taking “better” decisions.

D ISCUSSION

For online biofeedback applications it is important that feedback is presented without long delay. Therefore, algorithms are required which are able to extract the required information from very short time windows. In the current study, we evaluate how far features based on HRV can be reliably used for classification based on shorter time windows than the 5 minute window recommended in [14]. Our findings are in accordance with previous findings where RMSSD turned out to be more appropriate for ultra-short term HRV analysis than frequency based features (e.g. [19]). So far, pNN12 and pNN20 have not been analyzed for this purpose but they turned out to be very well suited for ultra-short term analysis. Classification

As there is currently also a lot of research going on towards even less obtrusive sensor devices which can be integrated into everyday items, solutions for recording of ECG signals which do not even require an additional sensor might soon become available. This again illustrates the need for robust

367

algorithms to extract information from physiological signals about the users’ emotional arousal level.

[19]

R EFERENCES

[20]

[1] [2]

[3] [4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

R. Picard, “Affective computing,” MIT Media Laboratory, Perceptual Computing Section, Tech. Rep. 321, November 1995. K. Schaaff, “Enhancing mobile working memory training by using affective feedback,” in IADIS International Conference on Mobile Learning, 14-16 March 2013, Lisbon, Portugal, 2013. J. A. Russel, “A circumplex model of affect,” Journal of Personality and Social Psychology, vol. 39, pp. 1161–1178, 1980. D. M. Diamond, A. M. Campbell, C. R. Park, J. Halonen, and P. R. Zoladz, “The temporal dynamics model of emotional memory processing: a synthesis on the neurobiological basis of stress-induced amnesia, flashbulb and traumatic memories, and the yerkes-dodson law.” Neural Plasticity, vol. 2007, pp. 1–33, 2007. M. T. P. Adam, M. Gamer, J. Krämer, and C. Weinhardt, “Measuring emotions in electronic markets,” ICIS 2011 Proceedings, Shanghai, China, 2011. A. Lichtenstein, A. Oehme, S. Kupschick, and T. Jürgensohn, “Comparing two emotion models for deriving affective states from physiological data,” in Affect and Emotion in Human-Computer Interaction, C. Peter and R. Beale, Eds. Berlin, Heidelberg: Springer-Verlag, 2008, ch. Comparing Two Emotion Models for Deriving Affective States from Physiological Data, pp. 35–50. A. Haag, S. Goronzy, P. Schaich, and J. Williams, “Emotion recognition using bio-sensors: First steps towards an automatic system,” Lecture Notes in Computer Science, vol. 3068, pp. 33–48, 2004. S. K. D’Mello, R. W. Picard, and A. C. Graesser, “Towards an affectsensitive autotutor,” Special issue on Intelligent Educational Systems – IEEE Intelligent Systems, vol. 22, no. 4, pp. 53–61, 2007. A. Sarrafzadeh, S. Alexander, F. Dadgostar, C. Fan, and A. Bigdeli, “”how do you know that i don’t understand?” a look at the future of intelligent tutoring systems,” Comput. Hum. Behav., vol. 24, no. 4, pp. 1342–1363, Jul. 2008. B. Woolf, W. Burleson, I. Arroyo, T. Dragon, D. Cooper, and R. Picard, “Affect-aware tutors: recognising and responding to student affect,” Int. J. Learn. Technol., vol. 4, no. 3/4, pp. 129–164, Oct. 2009. G. N. Yannakakis and J. Hallam, “Entertainment modeling through physiology in physical play,” International Journal of Human-Computer Studies, vol. 66, no. 10, pp. 741 – 755, 2008. G. N. Yannakakis, H. P. Mart´ınez, and A. Jhala, “Towards affective camera control in games,” User Modeling and User-Adapted Interaction, vol. 20, no. 4, pp. 313–340, Oct. 2010. P. Jerˇcić, P. J. Astor, M. T. P. Adam, O. Hilborn, K. Schaaff, C. A. Lindley, C. Sennersten, and J. Eriksson, “A serious game using physiological interfaces for emotion regulation training in the context of financial decision-making,” ECIS 2012 Proceedings, vol. 20, 2012. Task Force of the European Society of Cardiology the North American Society of Pacing and Electrophysiology, “Heart rate variability : Standards of measurement, physiological interpretation, and clinical use,” Circulation, vol. 93, no. 5, pp. 1043–1065, 1996. U. Nussinovitch, K. P. Elishkevitz, K. Katz, M. Nussinovitch, S. Segev, B. Volovitz, and N. Nussinovitch, “Reliability of ultra-short ecg indices for heart rate variability.” Ann Noninvasive Electrocardiol, vol. 16, no. 2, pp. 117–122, Apr 2011. E. B. Schroeder, E. A. Whitsel, G. W. Evans, R. J. Prineas, L. E. Chambless, and G. Heiss, “Repeatability of heart rate variability measures.” Journal of Electrocardiology, vol. 37, no. 3, pp. 163–172, Jul 2004. T. Thong, K. Li, J. McNames, M. Aboy, and B. Goldstein, “Accuracy of ultra-short heart rate variability measures,” in Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE, vol. 3, 2003, pp. 2424–2427 Vol.3. D. Kim, Y. Seo, S.-h. Kim, and S. Jung, “Short term analysis of long term patterns of heart rate variability in subjects under mental stress,” in Proceedings of the 2008 International Conference on BioMedical Engineering and Informatics - Volume 02, ser. BMEI ’08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 487–491.

[21]

[22]

[23]

[24]

[25]

[26] [27]

[28] [29]

[30]

[31]

[32]

[33] [34]

[35]

[36]

[37]

[38]

368

L. Salahuddin, J. Cho, M. G. Jeong, and D. Kim, “Ultra short term analysis of heart rate variability for monitoring mental stress in mobile settings.” Conf Proc IEEE Eng Med Biol Soc, vol. 2007, pp. 4656–4659, 2007. U. Rajendra Acharya, K. Paul Joseph, N. Kannathal, C. M. Lim, and J. S. Suri, “Heart rate variability: a review.” Med Biol Eng Comput, vol. 44, no. 12, pp. 1031–1051, December 2006. J. Mietus, C.-K. Peng, and I. Henry, “The pNNx-files: Reexamining a widely used heart rate variability measure.” Heart, vol. 88, pp. 378–380, 2002. M. Malik, R. Xia, O. Odemuyiwa, A. Staunton, J. Poloniecki, and A. J. Camm, “Influence of the recognition artefact in automatic analysis of long-term electrocardiograms on time-domain measurement of heart rate variability.” Med Biol Eng Comput, vol. 31, no. 5, pp. 539–544, Sep 1993. M. Tulppo, T. Makikallio, T. Takala, T. Seppanen, and H. Huikuri, “Quantitative beat-to-beat analysis of heart rate dynamics during exercise.” American Journal of Physiology, vol. 271, pp. H244–H252, 1996. B. Greiner, “An Online Recruitment System for Economic Experiments,” in Forschung und wissenschaftliches Rechnen, K. Kremer and V. Macho, Eds., 2004, pp. 79–93. A. Gharbi, S. Hey, L. Jatoba, U. Grossmann, J. Ottenbacher, C. Kuncoro, W. Stork, and K. Muller-Glaser, “System for body and mind monitoring in coaching process,” in Medical Devices and Biosensors, 2008. ISSS-MDBS 2008. 5th International Summer School and Symposium on, 2008, pp. 89–91. K. Schaaff, R. Degen, N. Adler, and M. T. P. Adam, “Measuring affect using a standard mouse device,” Biomed Tech, vol. 57, p. 1, 2012. K. Schaaff, L. Müller, M. Kirst, and S. Heuer, “xAffect - a modular framework for online affect recognition and biofeedback applications,” in 7th European Conference on Technology Enhanced Learning (ECTEL 2012), MATEL Workshop, Saarbrücken, Germany, 2012. P. Hamilton, “Open source ecg analysis,” in Computers in Cardiology, 2002, Sept. 2002, pp. 101–104. G. Clifford, Advanced Methods and Tools for ECG Data Analysis. Artech House, Boston, 2007, ch. ECG Statistics, Noise, Artifacts, and Missing Data, pp. 55–100. N. R. Lomb, “Least-squares frequency analysis of unequally spaced data,” Astrophysics and Space Science, vol. 39, no. 2, pp. 447–462, 1976. J. D. Scargle, “Study in astronomical time series analysis. II. Statistical aspects of spectral analysis of unevenly spaced data,” The Astrophysical Journal, vol. 263, pp. 835–853, 1982. D. Nunan, G. R. H. Sandercock, and D. A. Brodie, “A quantitative systematic review of normal values for short-term heart rate variability in healthy adults,” Pacing and Clinical Electrophysiology, vol. 33, no. 11, pp. 1407–1417, Nov 2010. C.-C. Chang and C.-J. Lin, LIBSVM: a Library for Support Vector Machines, 2008, last updated: May 13, 2008. C. W. Hsu, C. C. Chang, and C. J. Lin, “A practical guide to support vector classification,” Department of Computer Science, Taipei, Tech. Rep., 2003, last updated: May 21, 2008. M. W. Agelink, R. Malessa, B. Baumann, T. Majewski, F. Akila, T. Zeit, and D. Ziegler, “Standardized tests of heart rate variability: normal ranges obtained from 309 healthy humans, and effects of age, gender, and heart rate,” Clinical Autonomic Research, vol. 11, no. 2, pp. 99–108, April 2001. M. Fenton-O’Creevy, J. T. Lins, S. Vohra, D. W. Richards, G. Davies, and K. Schaaff, “Emotion regulation and trader expertise: Heart rate variability on the trading floor,” Journal of Neuroscience, Psychology, and Economics, vol. 5, no. 4, pp. 227–237, 2012. M. T. P. Adam, J. Krämer, and C. Weinhardt, “Excitement up! Price down! Measuring emotions in Dutch auctions,” International Journal of Electronic Commerce, vol. 17, no. 2, pp. 7–39, 2012. M. T. P. Adam and E. Kroll, “Physiological evidence of attraction to chance,” Journal of Neuroscience, Psychology and Economics, vol. 5, no. 3, pp. 152–165, 2012.