Measuring response times to auditory stimuli during an audiometry

Measuring response times to auditory stimuli during an audiometry A. Fernandez

M. Ortega

B. Cancela

VARPA Group. Department of Computer Science University of A Coruña A Coruña, Spain



[email protected] C. Vazquez

[email protected] M.G.Penedo

[email protected] LM Gigirey

Audiology Unit-University School of Optics and Optometry University of Santiago de Compostela (USC) Santiago de Compostela, Spain


Audiology Unit-University School of Optics and Optometry University of Santiago de Compostela (USC) Santiago de Compostela, Spain

[email protected]

[email protected]

[email protected]

ABSTRACT

1.

This paper provides a methodology for the measure of the patient’s reaction times to auditory stimuli during the performance of an audiometry. An audiometry is the standard way of analyzing the hearing of a patient in order to diagnose hearing loss. From a video sequence recorded during this test, the method will be able to detect the instants when the expert is sending the auditory stimulus and when the patient responds consciously to it by raising his hand, being able, this way, to measure its reaction time. The proposed method was tested on several video sequences from different individuals yielding highly accurate results. The possibility of quantitative measure the reaction times will allow the experts to conduct several studies and to further complete the evaluation of their patients.

Population aging represents the most important demographic phenomenon of the twentieth century, being, nowadays, a global process [2, 4, 5]. Spain fills an important place international in foresight studies and projections made by the United Nations (UN). This organization places Spain as the world’s oldest population by 2050, approaching the 40% of the population over the of age 60. For this date, the “old dependence ratio” will reach nearly 58%[1]. The greater longevity of the population implies an increase in the years lived with incapacity and disability. The latest available data show that in our country there are 3,787,447 million people older than 4 years old with a disability or limitation for the Activities of Daily Living (ADL)[3]. Of these, 58.8% are 65 years old or older. Furthermore, 290,530 people live in residential or gerontology centers or other centers for people with disabilities. Of these, 87% are dependent (requiring assistance of another person because they can not fend for themselves). Two out of three Spanish dependents are people with 65 years old or more [7]. The latest report on Aging by the General Foundation of the CSIC [5] (October 2010) states that the most common disabilities (and which start at an earlier age) is the decrease of the sensory abilities (hearing and vision), being those related to cognitive functions, communication and connection those with the fastest growing relationship with age [7]. In turn, the studies of Mulrow et. al[13] and A. Davis[7] point out the hearing loss as the disability more closely related to aging; and recent research (Arthur Wingfield et al.;Brandeis University) reveal that hearing loss may compromise the cognitive ability of our elders. About 71 million European adults between 18 and 80 years old suffer hearing loss above the 25dBHl (hearing loss criteria recognized by the WHO), representing a total number of more than 55 million people. In the case of North America, this figure is around 35 million[1]. The Liminar Tonal Audiometry represents the “gold stan-

Categories and Subject Descriptors I.4.9 [Image Processing and Computer Vision]: Applications

General Terms Algorithms

Keywords Audiogram, hearing responses, auditory stimuli, hand detection, skin color, Viola-Jones

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISABEL ’11, October 26-29, Barcelona, Spain c 2011 ACM ISBN 978-1-4503-0913-4/11/10 ...$10.00. Copyright

INTRODUCTION

dard” in the evaluation of the hearing capacity [6] and prevalence of hearing problems [6]. However, the widespread use of audiometric tests involves some operational constraints, especially among some population groups with “special needs” or disabilities. In addition, in recent years, data from selfassessment questionnaires have been used to determine the prevalence of certain chronic diseases such as damage or hearing loss (HHIE-S, SAC, QDS) [12]. However, the results of these tests may have limitations in certain populations because of different factors such as the cultural level, lifestyle or cognitive state. In this kind of tests the typical behavior is that the patient raises his hand to inform the expert that he has heard the auditory stimulus. Considering this, our proposal is based on motion analysis, focusing on the detection of the hand raising. Color is a useful feature for detecting body parts, such as face or hands, because it provides robust information against rotation, scaling or partial occlusions [11]. Prior studies pointed out that different skin colors from different races fall in a compact region of the color spaces [16]. Partially related to the hands detection task, we can find [9], where a methodology for the location of face and arms in color images, combining color information with region information and domain knowledge information, is detailed. The aim of this work is to develop an objective screening system that allow the experts to evaluate the patient’s response times. The quantification of these times will allow them to numerically corroborate their subjective impressions based on their experience. The measurement of these times will allow them to carry out various studies and to evaluate their patients more fully and accurately. This paper is organized as follows. Section 2 introduces the basis of the traditional protocol performed by the experts. Section 3 explains all the steps in the methodology. In Section 4 representative examples are included to show the capabilities of the method. Finally, Section 5 expounds the conclusions and intended future work.

2.

Prior to any exploration, the volunteers are informed about the type of study that will be carried out, as well as the test to be performed and the potential benefits derived therefrom. When they agree to participate in the study they declare their consent according to Spanish Law 15/1999 of 13 December on the Protection of Personal Data. The audiological protocol consists in one otoscopy followed by and ATL (air and bone conduction, bottom-up approach). The device used for the evaluation of the hearing thresholds was an audiometer Beltone Electronics with headphones 510-CO-17. The auditory stimuli used were pure-tone and the frequency range tested was between 0.25kHz-6kHz. Since this audiometer (it can be seen in Fig. 2) is an analogical machine, there is no automatic way to know when an auditory stimulus has been sent or the frequency and intensity of that stimulus. Towards automatically analyze the responses of the patients to the auditory stimuli, we need to know this information. To get the desired frequency and intensity the experts needs to manipulate the two symmetrical wheels that are set in the bottom part of the device, after that, when he is ready to send the stimulus he presses the blue touch-pad that is located between this two wheels. While the stimulus is being send, the red light just above the touch-pad is on. In order to analyze the responses of the patients it is very important to know the precise moment when the stimulus is being send. Moreover, the patient is asked to raise his hand when his perceives the stimulus (left hand if he perceives the sound in the left ear, and right hand in the opposite case); so, this information, along with the moment when the stimulus is sent, will allow us to characterize the response of the patient.

THE TRADITIONAL PROTOCOL

The hearing loss is usually analyzed by the experts performing a Liminar Tonal Audiometry (LTA) (Fig. 1). Audiograms are set out with frequency in hertz (Hz) on the horizontal axis, and a linear dB scale on the vertical axis. Normal hearing is classified as being between -10dB and 15dB. The test involves different tones being presented in a specific frequency (pitch) and intensity (loudness).

Figure 2: Beltone Electronics audiometer. As previously mentioned, the typical behavior is that the patient raises his hand when hears the stimulus, which is also the most evident behavior and can be interpreted even by a non-expert (in Figs. 3(a) and 3(b) we can see this typical scenario). Other unconscious reactions such as gaze direction, head movements, frown, and so on, require more experience to be interpreted and will be studied in future works. In this first approach, we will start the research in this domain by automating part of the analysis of this process, which was until now totally manual, by analyzing the most common and representative reaction: the hand raising. From this analysis, we will provide information to the experts that may be relevant for the evaluation of the patient.

3.

Figure 1: Sample image of a clean audiogram.

METHODOLOGY

A schematic representation of the methodology can be seen in Figure 4. The video sequences recorded during the audiometry are analyzed in two different directions. On the one hand, it is necessary to determine the moments when the auditory

Once the stimuli indicator is located, a relative threshold is established to distinguish its status (on/off). By applying this criteria to all the frames in the sequence, we can know the precise moment when the stimuli are sent to the patient, being able to obtain a stimuli signal.

3.2

(a)

(b)

Figure 3: Screenshots during the audiometries.

The position of the face and the unconscious expressions that the patient shows will be on interest in future studies, therefore its location is a relevant step. Besides, in this work the location of the head if going to facilitate the location of the hands when they are raised. Since we are working in a stable domain in which the conditions in which the test is performed are already known, we can know that the faces will always be in a frontal position, so, a Viola-Jones [15] approach can be applied. A classifier for the detection of frontal faces has been trained and optimized and is available in the OpenCV repository.

3.3 Figure 4: Schematic representation of the methodology.

stimuli are sent. On the other hand, we need to detect when the patient raises his hand and, to this end, we are going to use information about the location of the head. Finally, combining the stimuli and reaction moments, we are going to be able to analyze the patient’s behavior on issues that are relevant to the experts.

3.1

Stimuli detection

As mentioned in Section 2, the fact that the audiometer was an analogical device made crucial the need to somehow automate the detection of the moments in which the auditory stimuli were being sent. Without this initial step, it would not be possible to correlate stimuli and responses, so, any study about the patient’s response times would be impossible to achieve. The way this step is carried out is further detailed in [10]. In summary, knowing that when a stimulus is being sent the red light just above the touch-pad turns on (Fig. 2), this step consists on locating this light by defining areas of interest and looking for image templates using normalized cross correlation (represented in Fig. 5). These considerations are possible since we are working in a specific domain and we know that the experiments are going to be recorded following a similar layout to the one shown in picture 5(a).

Face detection

Hand detection

The automatic detection of the hand raising will allow the experts to evaluate the tests after they have been conducted, and also to extract quantitative information about the patient’s response making possible the performance of new studies. The hand detection task is accomplished using skin color information. The method for the skin color detection is further detailed in [8], where, a survey about the suitability of different color spaces for the skin color detection task was conducted. The results of this survey indicated that the TSL color space [14] was the most suited one for this task, discarding the brightness component L and working only with the T and S components in order to minimize the lighting effects. The first step towards the hand detection is the detection of the skin-like regions. Since the domain is known and the conditions under which the test is performed are quite stable, we can define two search areas where the hands can be located. As it can be seen in Fig. 6, in the lower third of the image is where the audiometer is always located, so this is not a search area. Furthermore, since we know the where the head is located, we can deduce that the hands are always raised to one side or another of the head, but never over or under it. This way, we can define the highlighted search areas a and b (always relative to the face location as in Fig. 6), reducing the computational cost and avoiding incorrect detections in other areas.

Figure 6: Search regions for skin detection: a and b. Figure 5: Steps towards the location of the stimuli indicator.

Once the skin pixels are detected, they are group into skin regions (if it is the case) based on 8-connectivity. Regions

with a negligible size due to slight inaccuracies are discarded in order to avoid unnecessary processing. The domain knowledge will allow the establishment of a set of conditions that a raised hand should fulfill. The first one is related to the required number of skin pixels inside one region to be considered as a hand candidate: more than the 40% of the pixels contained need to be classified as skin pixels. Furthermore, it is set that: limSup = head.y - (0.25 * head.y); limInf = head.y + head.height + (0.25 * head.y);

(1)

where head.y represents the y coordinate of the upper end of the head and head.height represents the height of the head. A region is discarded if: (skinRegion.y < limSup) or (skinRegion.y > limInf) (2) where skinRegion.y represents the y coordinate of the upper end of the skin region evaluated at the moment. Meaning that if the upper end of the skin region considered is located well above or well below regarding to the location of the face, most likely it will not correspond with a hand. The size of the skin region is also taken into account. A region is also discarded is some of these conditions apply: (a) (b) (c)

skinRegion.height < 0.42 * head.height skinRegion.width < 0.3 * head.width skinRegion.width > head.width + 0.25*head.widht (3) meaning that if the skin region is very small regarding to the size of the head, or if it is very much wider than the head, then, it does not correspond to a hand. Finally, the remaining regions are classified as hands, and we can proceed to analyze the responses.

4.

RESULTS

This section presents some representative results obtained with the methodology developed. To test the method, a dataset composed of 28 video sequences from 9 different patients is considered. All of them gave their informed consent prior to be recorded and included in this study. Each one of the video sequences takes between two and four minutes (processing this way between 1500 and 3000 different frames). In each one of this sequences between 30 and 50 auditory stimuli are sent to the patient, of which the patient perceives between 15 and 35, and responds to them raising his hand. The 9 patients were previously classified by the experts according to their response times to stimuli as: 7 patients with normal response times, and the other 2 as slow patients.

4.1

Accuracy

Firstly, we tested the accuracy of the methodology for both the detection of the light stimuli indicator and the detection of the hand raising. This results are shown in Table 1. Table 1: Accuracy of the methodology Stimuli detection Hand detection 100% 98.76%

4.2

Analysis of responses to stimuli

Experts reported that the most relevant parameter for this test is to obtain a quantitative measure the reaction times of the patient. According to the experts, this time should be measured from the moment the auditory stimulus starts until the moment in which the patient starts its conscious reaction to it. Consequently, combining the stimuli signal (Fig. 7(a)) and the sequence that represents the hand raising (Fig. 7(b)), we can compute the reaction times d as shown in Fig. 7(c). This way, a patient is characterized by a sequence of n d’s. Based on these reaction times, patients can be classified the same way as the experts do (but now with quantitative information), as “patients with a normal response” and “patients with a slow response”. Experts were consulted and they reported that they are only interested in identifying patients with reaction times significantly slower than the average, since this could be a symptom of other pathologies. For that reason, they do not need a more precise classification with a higher number of categories.

(a)

(b)

(c) Figure 7: (a) Stimuli signal. (b) Hand raising. (c) Reaction times d The reaction times for the 28 video sequences were computed. On the one hand, we have the reaction times for the air conduction test (Table 2), and on the other, the reaction times for the bone conduction test (Table 3). Results are divided according to the conducting medium since the physiognomy of the patient can make them to be noncomparable. Three measures were considered: the mean, the median and the mean of the elements between the first and the third quartile. For each of them, we have studied the reaction times (in milliseconds) of “normal” (first row of the tables) and “slow” patients (second row); in the third row the difference between both measures is shown. In view of these results, we can see that it is possible to separate patients based on their reaction times.

Table 2: Reaction time measures for air conduction Normal Slow Difference

Measure 1 1030.9ms 2010.6ms 979.7ms

Measure 2 833.3ms 1666.7ms 833.4ms

Measure 3 862.3ms 1744.3ms 882ms (a)

Table 3: Reaction time measures for bone conduction Normal Slow Difference

5.

Measure 1 1068.4ms 1751.9ms 683.5ms

Measure 2 1000ms 1666.7ms 666.7ms

Measure 3 1025.1ms 1679.4ms 654.3

6.

[10]

[11]

[12]

[13]

ACKNOWLEDGMENTS

This paper has been funded by the Regional Industrial Ministry of the Xunta de Galicia (projects 10/CSA918054PR and 10TIC009CT).

7.

Figure 8: Hand raising detections.

CONCLUSIONS

In this paper, we presented an approach that enables the measurement of the patient’s response times to the auditory stimuli. The proposed method detect the patient’s conscious responses expressed by the hand raising and allow the experts to obtain quantitative information about the velocity on the reaction, information that until now was only a subjective perception. As future work, obtaining more video sequences of this test is necessary to extend and reinforce the obtained results. Therefore, the following steps will be aimed to study and detect other more subtle and unconscious responses to the stimuli.

[14]

REFERENCES

[1] World population aging 1950-2050. In Population Division,DESA; United Nations. [2] Las personas mayores en espa˜ na. In Instituto de Mayores y Servicios Sociales (IMSERSO), 2008. [3] Encuesta de discapacidad, autonom´ıa personal y situaciones de dependencia. In INEbase, 2010. [4] Inebase. In Instituto Nacional de Estad´ıstica, 2010. [5] Libro blanco del envejecimiento activo. In IMSERSO, Oct. 2010. [6] A. Davis. The prevalence of hearing impairment and reported hearingdisability among adults in great britain. volume 18, pages 911–17. Int J Epidemiol., Dec. 1989. [7] A. Davis. Prevalence of hearing impairment. In Hearing in adults, chapter 3, pages 46–45. London: Whurr Ltd, 1995. [8] A. Fernandez, N. Barreira, L. Lado, and M. Penedo. Evaluation of the color space influence in face detection. In Signal Processing, Pattern Recognition and Applications (SPPRA), pages 241–247, Innsbruck, Austria, 2010. [9] A. Fernandez, M. Ortega, B. Cancela, and M. Penedo. Contextual and skin color region information for face

(b)

[15]

[16]

and arms location. In Eurocast 2011, pages 139–140, Las Palmas, Spain, 2011. A. Fernandez, M. Penedo, M. Ortega, B. Cancela, C. Vazquez, and L. Gigirey. Automatic analysis of the patient’s conscious responses to the emission of an auditory stimuli during the performance of an audiometry. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA) (Accepted and soon published), 2011. P. Kakumanu, S. Makrogiannis, and N. Bourbakis. A survey of skin-color modeling and detection methods. Pattern Recogn., 40:1106–1122, 2007. J. L´ opez-Torres, C. Boix, J. T´ellez, M. L´ opez, J. del Campo, and F. Escobar. Functional status of elderly people with hearing loss. volume 49, pages 88–92. Arch Gerontol Geriatr, 2009. C. Murlow, C. Aguilar, J. Endicott, R. Velez, M. Tuley, W. Charlip, and J. Hill. Asociation between hearing impairment and the quality of life of elderly individuals. volume 38, pages 45–50. J Am Geriatr Soc, 1990. J.-C. Terrillon and S. Akamatsu. Comparative performance of different chrominance spaces for color segmentation and detection of human faces in complex scene images. In Proc. of the 12th Conf. on Vision Interface (VI ’99), pages 180–187, 2000. P. Viola and M. Jones. Robust real-time object detection. In International Journal of Computer Vision, 2001. T.-W. Yoo and I.-S. Oh. A fast algorithm for tracking human faces based on chromatic histograms. Pattern Recogn. Lett., 20:967–978, 1999.