Sleep Stage Classification Based on Respiratory Signal - IEEE Xplore

33 downloads 55 Views 988KB Size Report
to develop a sleep monitor, is sleep stages classification. This paper presents an algorithm for wakefulness, rapid eye movement sleep (REM) and non-REM ...
Sleep Stage Classification Based on Respiratory Signal Alexander Tataraidze, Graduate Student Member, IEEE, Lesya Anishchenko, Lyudmila Korostovtseva, Bert Jan Kooij, Mikhail Bochkarev and Yurii Sviryaev

Abstract— One of the research tasks, which should be solved to develop a sleep monitor, is sleep stages classification. This paper presents an algorithm for wakefulness, rapid eye movement sleep (REM) and non-REM sleep detection based on a set of 33 features, extracted from respiratory inductive plethysmography signal, and bagging classifier. Furthermore, a few heuristics based on knowledge about normal sleep structure are suggested. We used the data from 29 subjects without sleep-related breathing disorders who underwent a PSG study at a sleep laboratory. Subjects were directed to the PSG study due to suspected sleep disorders. A leave-one-subject-out crossvalidation procedure was used for testing the classification performance. The accuracy of 77.85 ± 6.63 and Cohen's kappa of 0.59 ± 0.11 were achieved for the classifier. Using heuristics we increased the accuracy to 80.38 ± 8.32 and the kappa to 0.65 ± 0.13. We conclude that heuristics may improve the automated sleep structure detection based on the analysis of indirect information such as respiration signal and are useful for the development of home sleep monitoring system.

I. INTRODUCTION Sleep disorders are highly prevalent in general population and often go undiagnosed. Thus, insomnia symptoms are present in about 32% adults [1]. According to various researches 6-24% of adults have sleep-related breathing disorders (SBD) [1, 2]. Furthermore, more than 80% of the cases of obstructive sleep apnea go undiagnosed [3]. Sleep disorders are associated with the increased risk of accidents, psychiatric disorders, cardiovascular pathology and metabolic disturbances [1]. Long-term ambulatory sleep monitoring can enable early diagnosis of sleep disorders and their control, improve sleep quality and determine the most appropriate time for waking up. Two research tasks should be solved for the development of a sleep monitor – sleep stages classification and detection of breathing pauses. This study is devoted to the first one. Sleep is classified into two different phases: rapid eye movement sleep (REM) and non-REM sleep (NREM), which alternate cyclically throughout the night. The gold standard for the evaluation of sleep stages is polysomnography (PSG). Based on PSG data, namely electroencephalogram, electrooculogram and electromyogram, an expert manually Research supported by Russian Foundation for Basic Research (14-07-31151 mol_a) and the grant of President of Russian Federation (МК-7812.2015.7) A. Tataraidze, L. Anishchenko are with Bauman Moscow State Technical University, Moscow, 105005, Russian Federation (phone/fax: +7 (495) 632-22-19, e-mail: [email protected]). L. Korostovtseva, M. Bochkarev, Y. Sviryaev are with Federal NorthWest Medical Research Centre, St. Petersburg, 197341, Russian Federation. B.J. Kooij is with Delft University of Technology, Delft, 2628 CD, the Netherlands.

978-1-4244-9270-1/15/$31.00 ©2015 IEEE

TABLE I.

SUBJECTS CHARACTERISTICS (N = 29)

Male:Female

9:20

Age (years)

45.4 ± 15.7 (22 - 67)

Body Mass Index (kg/m2)

27.3 ± 6.1 (17 - 48)

Apnea Hypopnea Index (episodes/hour)

2.3 ± 1.4 (0.0 – 4.9) Mean ± SD (range)

TABLE II.

SLEEP STAGE CHARACTERISTICS

Total number of epochs

31260

Wakefulness (%)

24.0 ± 12.3 (5.8 – 52.9)

REM (%)

17.7 ± 5.6 (9.3 – 29.3)

NREM(%)

58.2 ± 9.6 (33.9 – 73.9)

Sleep Efficiency (%)

77.9 ± 11.9 (53.4 – 94.4)

Sleep Onset Latency (min)

35.6 ± 40.0 (2.0 – 201.5)

REM Latency from Sleep (min)

107.2 ± 65.2 (42.0 – 319.5) Mean ± SD (range)

scores each epoch (30 seconds interval) as wakefulness (W), REM or NREM. Furthermore, NREM is categorized into three levels. Being rather complicated and time-taking, PSG is inconvenient for long-term ambulatory sleep monitoring. Heart rate variability (HRV) is a commonly used alternative method for sleep structure detection [4-6]. Meanwhile, as the breathing process varies in different sleep stages and wakefulness [1,7], Redmond et al. suggest combining HRV analysis with the analysis of respiration signals [8]. Only respiration signals were used to assess sleep structure both in infants [9] and adults [10, 11]. However, the data are lacking, although respiration analysis is of great interest as breathing is easier to detect by non-contact methods, such as bioradiolocation [12], ballistocardiography [13], or video monitoring [14]. In this paper, we introduce a method for W/REM/NREM classification based on set of 33 features, extracted from respiratory inductive plethysmography (RIP) signal and bagging classifier. Furthermore, a few heuristics which increase the classification performance are suggested. The purpose of the research is to validate the possibility of using heuristics for increasing classification performance. II. MATERIALS AND METHODS A. Clinical Protocol We analyzed data from 29 subjects who were referred for the PSG study at Sleep Medicine Laboratory, Federal NorthWest Medical Research Centre (St. Petersburg, Russia) due to suspected sleep disorders. Full-night PSG (Embla N7000,

358

Natus, USA) including registration of respiratory movements by RIP was performed. Only subjects without sleep-related breathing disorders (Table I) were enrolled in the analysis. PSG records were scored by an expert physician according to the American Academy of Sleep Medicine Scoring Rules [15].

Amplitude [a.u.]

2

C. Feature extraction A set of 33 features extracted from a RIP signal was used. Cycle-based features were extracted from breathing cycles if their peaks were located in the analyzable part (an epoch or a window) of the signal. Thus, an interval for extraction these features might be a bit less or more than the analyzable part depending on the first breathing cycle's left trough and the last breathing cycle's right trough (fig. 2). Other features were extracted directly from the analyzable part. Sample entropy [11], motion and spectral features were extracted from an epoch. Spectral features were estimated using a Fourier transform with a Hamming window. In the frequency domain, these features were extracted: the logarithm of the power in the very low frequency range (VLF) between 0.01 and 0.05 Hz; the logarithm of the power in the low frequency range (LF) between 0.05 and 0.15 Hz; the logarithm of the power in the high frequency range (HF) between 0.15 and 0.50 Hz; the ratio between LF and HF (LF/HF); the peak frequency and its power in HF. These

10

Time [sec.]

20

30

Figure 1 An example of RIP signal during one epoch. The peaks and troughs are represented by filled triangles and squares, respectively. Amplitude [a.u.]

2

Motion artifacts were detected by entropy analysis. A moving window of 5 seconds with a step of 2 seconds was used for Shannon entropy calculation. Signal intervals featuring entropy levels three times bigger than the mean value for a subject were identified as artifacts. Also periods were identified as artifacts when the signal was absent. Those occurred when a subject got out of bed and unplugged the polysomnography system from a base station during the night. During artifact intervals the signal was replaced with zeros. Z-normalization of the signal was performed for each inter-artifacts period by subtracting the mean value and dividing by the standard deviation.

The original signal without artifact rejection and normalization was used for motion-based feature calculation. Other features were extracted from the signal which passed all preprocessing stages.

0 -1 0

B. Data preprocessing At PSG study RIP includes the recordings by abdominal and thoracic belts. Only the signal from the thoracic belt was analyzed in this research, except one record (subject #9), in which thorax RIP signal was absent due to technical reasons, so the abdominal signal was used for that record. The signal was filtered with a Butterworth low-pass filter at cut-off frequency of 0.6 Hz, and the baseline was removed.

Peaks and troughs were detected based on the search of turning points (fig. 1). Each breathing cycle was described by means of the peak, the left trough, the right trough, the width and the amplitude, where the width is the distance between left and right troughs, and the amplitude is the height from nearest trough to the peak. Breathing cycles with amplitudes or width twice less than average for a subject were removed as false.

1

1 0 -1 295

305

315 Time [sec.]

325

335

Figure 2 An epoch and an interval for extraction cycle-based features are represented by dotted and dashed lines, respectively. Amplitude [a.u.]

2 1 0 -1 0

10

Time [sec.]

20

30

Figure 3 The areas between the curves and the baseline are filled in light and dark gray for inhalation and exhalation periods, respectively.

motion features were included: a length of motion period, a sum of absolute values of the signal during a motion period. These cycle-based features were extracted from the cycles related to an epoch: the median and interquartile range (IQR) of breathing cycle amplitudes, the median and IQR of breathing cycle widths, the median and IQR of peaks, the median and IQR of troughs. Furthermore, these features were extracted: the median of areas between the signal and baseline during inhalation (MAI), the median of areas between the signal and baseline during exhalation (MAE), the ratio between MAI and MAE (MAI/MAE) (fig. 3). The standard deviation of breathing frequency [5], dynamic time and frequency wrapping [11] were extracted using a moving window of 5 epochs. The median of peaks divided by IQR of peaks, and the median of troughs divided by IQR of troughs, the median of cycle amplitudes, and 7 volume-based features [11] were extracted using a moving window of 25 epochs. Z-normalization was performed on each feature per subject in order to remove subject-to-subject variations. D. Classification A bagging classifier was used in the experiments. Moreover, a few simple heuristics based on knowledge of normal sleep structure were used to improve classification performance:

359

1.

First 20 minutes were scored as wakefulness;

2. 3. 4.

If an epoch did not belong to one of the nearest stages, it was scored as previous stage; All REM epochs during the first 60 minutes of records were scored as previous stage; If an interval between REM epochs was less than 15 minutes, all epoch included in the interval were scored as REM.

E. Experiments A Leave-one-subject-out cross-validation procedure (LOSOCV) was used for testing the classification performance. A training set was formed from the features of 28 subjects and data of the last remaining subject was used as a testing set. That was repeated 29 times with changing subjects included in training and testing sets. For the evaluation of the classification performance, classification accuracy, and Cohen's kappa coefficient (k) were computed for a test subject on each of LOSOCV iteration. Sleep stage classification is an imbalanced task because NREM takes about 75-80% of total sleep time in a healthy subject. In that situation, Cohen's kappa coefficient of inter-rater agreement, being insensitive to imbalance, is a more important metric than accuracy. TABLE III. Algorithm Manual Wakefulness REM NREM

Mean, standard deviation and range were calculated for the accuracy and k. III. RESULTS The accuracy of 77.85 ± 6.63, 65.42 – 90.73 (mean ± SD, range) and k of 0.59 ± 0.11, 0.40 – 0.80 were achieved with the classifier, while usage of the classifier with heuristics resulted in the accuracy of 80.38 ± 8.32, 60.18 – 93.73 and k of 0.65 ± 0.13, 0.37 – 0.87. Fig. 4 presents hypnograms for subject #11 plotted by a physician, the classifier and the classifier with the heuristics. Table III shows the confusion matrices for the classifier and the classifier with the heuristics. Table IV shows obtained k and accuracy for each subject. Table V gives the comparison between the performance of our methods and those reported in literature. TABLE IV. Subject k

Wakefulness

REM

NREM

5105(5271) 470(571) 1985(1718)

581(301) 2591(3962) 2095(1350)

1077(1030) 760(1181) 16236(15862) Classifier (classifier + heuristics)

W R N

Classifier

Classifier + heuristics

W R N W R N

Physician

0

100

200

300 Time [min.]

400

Bugging Accuracy (%)

1 0.51 2 0.58 3 0.65 4 0.61 5 0.70 6 0.64 7 0.50 8 0.47 9 0.46 10 0.46 11 0.70 12 0.68 13 0.47 14 0.45 15 0.71 16 0.71 17 0.68 18 0.66 19 0.71 20 0.49 21 0.40 22 0.65 23 0.54 24 0.57 25 0.80 26 0.67 27 0.48 28 0.40 29 0.64 Mean ± SD 0.59 ± 0.11

CONFUSION MATRIX

500

Figure 4. Hypnograms of subject #11 (k of 0.70 and 0.86 for the classifier and classifier + heuristics, respectively). W – wakefulness, R – REM, N – NREM.

OBTAINED RESULTS FOR THE SUBJECTS

74.33 76.41 82.09 77.29 84.47 77.52 76.62 70.47 69.09 67.12 85.64 81.83 75.89 72.70 84.03 85.95 83.01 82.93 85.19 73.60 65.96 84.20 75.34 77.03 90.73 80.36 71.20 65.42 81.30 77.85 ± 6.63

Bugging + Heuristics k Accuracy (%) 0.61 0.63 0.70 0.67 0.74 0.77 0.62 0.51 0.51 0.51 0.86 0.70 0.38 0.61 0.75 0.78 0.72 0.78 0.78 0.55 0.37 0.70 0.62 0.65 0.87 0.78 0.60 0.43 0.66 0.65 ± 0.13

78.99 78.15 83.90 80.17 85.87 85.87 81.06 72.60 61.05 69.54 93.17 82.78 64.80 79.33 85.84 89.01 84.61 88.55 88.81 76.12 60.18 85.87 79.58 80.84 93.73 86.59 75.56 65.67 82.70 80.38 ± 8.32

k – Cohen’s kappa coefficient.

TABLE V.

COMPARISON W/REM/NREM CLASSIFICATION PERFOMANCE

First author/year Xiao, 2013 [5] Redmond, 2007 [8] Long, 2014 [10] Long, 2014 [11]

Signals HRV HRV, RIP RIP RIP

Number of subjects 45 31 48 48

Estimation method LOCOCV LOSOCV 10-fold CV 10-fold CV

This paper

RIP

29

LOCOCV

Classifier RF LD LD LD BAG BAG+H

Accuracy (%) 72.6 ± 6.7 76.1 ± 5.9 77.1 ± 7.6 76.2 ± 7.9 77.85. ± 6.63 80.38 ± 8.32

Cohen's kappa 0.46 ± 0.09 0.46 ± 0.10 0.48 ± 0.17 0.45 ± 0.15 0.59 ± 0.11 0.65 ± 0.13

RIP- Respiratory Inductance Plethysmography, HRV – Heart Rate Variability, LD – Linear Discriminant, RF – Random Forest, BAG – Bagging, HMM - Hidden Markov Model, kNN – k-Nearest Neighbor, H – heuristics. 10-fold CV – subjects are divided to 10 subsets during each iteration, data from 9 subsets are used for training and another one is used for testing.

360

IV. DISCUSSION Our results are better than those reported in previous studies (Table IV) wherein contact sensors were used and k for 3-stage (W/REM/NREM) classification was presented. However, the dataset used in our study is small, some subjects were not healthy (having concomitant disease, sleep disorders other than SBD), male/female ratio is imbalanced with the predominance of women. Thus, the results should be accepted with caution. The research of Long et al. [10] is the closest study to this. Our results even without heuristics are better, the k of 0.59 vs. 0.48. In contrast to that work, we used stronger classifier (bagging vs. linear discriminant), another method for the performance testing (LOSOCV vs. 10-fold cross-validation), and slightly different feature set, though they significantly overlap. Moreover, particulars of our dataset might also affect to the results, e.g. the percentage of wakefulness, 20.5% vs. 12.9% in our and their datasets, respectively. It should be noted that all included subjects were free from SBD. Although Redmond et al. [16] used combination of respiratory and HRV analysis for sleep structure detection of subjects with SBD, the possibility of sleep stage classification in the subjects, based on analysis of respiratory activity, has been poorly investigated. This is a topic for further research. The applied heuristics significantly improved the results; k changed from 0.59 to 0.65. We have not tried to generate optimal heuristics, because the main purpose of the research was to fast validate the opportunity of using heuristics. They are quite simple and based on the general knowledge of normal sleep. Using the heuristics we obtained worse results for two records: #13 (-0.09 of k) and #21 (-0.03 of k), k was increased in the range from 0.01 to 0.16 for other records. Meanwhile, some potentially important information might be lost, e.g. short awakenings localized in one epoch. Nonetheless, we think that using additional information, such as knowledge of normal human sleep structure, is particularly important for automated sleep staging based on non-direct data. Thus, we suggest more careful heuristics generation in future studies that will not lead to the decreased classification performance in an individual subject or to lose important information. Accessory of an epoch to sleep stage is not an independent event and depends on previous epochs. We hypothesize that sequential supervised learning algorithms and a heuristic model of sleep structure should provide better performance of sleep staging. The model have to consider information from physiological signals, sleep knowledge domain, individual information about subjects (age, sex, BMI, etc.) and environmental conditions (light on/off, noise level, etc.). These challenges should be addressed in further research.

and time domains features, was applied. The kappa of 0.59 was obtained for the 3-stage (W/REM/NREM) classification. We suggest a few empirical rules which increase the kappa to 0.65. The results appear to be contributing for home sleep monitoring system development. REFERENCES [1] T. Lee-Chiong, Sleep medicine: essentials and review. New York: Oxford University Press, 2008.

[2] P. Jennum, and L. R. Renata, “Epidemiology of sleep [3] [4] [5] [6]

[7] [8] [9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

V. CONCLUSION The algorithm for sleep structure detection based on the analysis of a respiratory signal and bagging classifier is described. A set of 33 features, consisting in motion, spectral

361

apnoea/hypopnoea syndrome and sleep-disordered breathing,” European Respiratory J., vol. 33, pp. 907–914, Apr. 2009. T. Young, L. Evans, L. Finn, and M. Palta, “Estimation of the clinically diagnosed proportion of sleep apnea syndrome in middleaged men and women,” Sleep, vol. 20, pp. 705–706, Sep. 1997. B. Yιlmaz, M. H. Asyali, E. Arιkan, S. Yetkin, and F. Ozgen, “Sleep stage and obstructive apneaic epoch classification using single-lead ECG,” Biomed. Eng. Online, vol. 39, Sep. 2010. M. Xiao, H. Yan, J. Song, Y. Yang, and X. Yang, “Sleep stages classification based on heart rate variability and random forest,” Biomed. Signal Process. Control, vol. 8, pp. 624–633, 2013. M. O. Mendez, M. Matteucci, V. Castronovo, L. Ferini-Strambi, S. Cerutti, and A. Bianchi, “Sleep staging from Heart Rate Variability: time-varying spectral features and Hidden Markov Models,” Int. J. Biomed. Eng. Technol., vol. 3, pp. 246–263, Apr. 2010. N. J. Douglas, D. P. White, C. K. Pickett, J. V. Weil, and C. W. Zwillich, “Respiration during sleep in normal man,” Thorax, vol. 37, pp. 840–844, Nov. 1982. S. Redmond, P. de Chazal, C. O’Brien, S. Ryan, W. T. McNicholas, and C. Henegan, “Sleep staging using cardiorespiratory signal,” Somnologie, vol. 11, pp. 245–256, Oct. 2007. P. Terrill, S. Wilson, S. Suresh, D. Cooper, and C. Dakin, “Application of recurrence quantification analysis to automatically estimate infant sleep states using a single channel of respiratory data,” Med. Biol. Eng. Comput., vol. 50, pp. 851–865, Aug. 2012. X. Long, J. Yang, T. Weysen, R. Haakma, J. Foussier, P. Foneseca, and R. M. Aarts, “Measuring dissimilarity between respiratory effort signals based on uniform scaling for sleep staging,” Physiol. Meas., vol. 35, pp. 2529–2542, Nov.2014. X. Long, J. Foussier, P. Foneseca, R. Haakma, and R. M. Aarts, “Analyzing respiratory effort amplitude for automated sleep stage classification,” Biomed. Signal Process. Control, vol. 14, pp. 197– 205, 2014. L. Anishchenko, M. Alekhin, A. Tataraidze, S. Ivashov, Soldovieri F., and Bugaev A. “Application of step-frequency radars in medicine,” in Proc. SPIE Symp. on Defense and Security, Radar Sensor Technol. XVIII Conf., Baltimore, 2014 , pp. 90771N-1…N-7. J. M. Kortelainen, M. O. Mendez, A. M. Bianchi, M. Matteucci, and S. Cerutti, “Sleep staging based on signals acquired through bed sensor,” IEEE Trans. Inf. Technol. Biomed., vol. 14, pp. 776–785, May 2010. A. Heinrich, X. Aubert, and G. de Haan, “Body movement analysis during sleep based on video motion estimation,” in Proc. 15th Int. Conf. on e-Health Networking, Applications and Services, Lisbon, Portugal, pp. 539–543, 2013. C. Iber, S. Ancoli-Israel, A. L. Chesson, and S. F. Quan, The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specification. Westchester, IL: American Academy of Sleep Medicine, 2007. S. J. Redmond, and C. Heneghan, “Cardiorespiratory-based sleep staging in subjects with obstructive sleep apnea,” IEEE Trans. Biomed. Eng., vol. 53, Mar. 2006.