Multivariate Statistical Monitoring of Sensor Faults of

0 downloads 0 Views 4MB Size Report
polynomial by using a linear least squares objective (Sav- itzky and Golay (1964)). ... polynomials for calculating SGF convolution coefficients ... smooth and estimate its sth order derivative using a filter of 2m + 1 ... The generalized form for the ...
Multivariate Statistical Monitoring of Sensor Faults of A Multivariable Artificial Pancreas Kamuran Turksoy ∗ Iman Hajizadeh ∗∗ Elizabeth Littlejohn ∗∗∗ Ali Cinar ∗∗∗∗ ∗

Department of Biomedical Engineering, Illinois Institute of Technology, Chicago, IL, USA (e-mail: [email protected]). ∗∗ Department of Chemical and Biological Engineering, Illinois Institute of Technology, Chicago, IL, USA (e-mail: [email protected]) ∗∗∗ Department of Pediatrics and Medicine, Section of Endocrinology, Kovler Diabetes Center, University of Chicago, Chicago, IL, USA (e-mail: [email protected]) ∗∗∗∗ Department of Chemical and Biological Engineering, Illinois Institute of Technology, Chicago, IL, USA (e-mail: [email protected]) Abstract: Sensor faults in an artificial pancreas (AP) system for people with type 1 diabetes (T1D) can yield insulin infusion rates that may cause hypoglycemia or hyperglycemia. New statistical process monitoring methods are proposed for real-time detection of sensor-related faults in AP systems to mitigate their effects. Remodulated dynamic time warping for synchronization of signal trajectories and Savitzky-Golay filter for calculation of real-time numerical derivatives are integrated with multiway principal component analysis. Data from 14 subjects that participated in 60 hours of closed-loop AP experiments with variations in meals and physical activity levels and times are used. Glucose measurements from a continuous glucose monitoring sensor are monitored for fault detection. The results illustrate that the proposed method is able to detect various types of unexpected dynamical changes or faults, and label them correctly. There were no missed faults in any tested cases. The algorithm can inform AP users about sensor faults and unexpected changes, and provides valuable information to an AP for prevention hypoglycemia or hyperglycemia that may be caused by sensor faults. Keywords: Statistical Process Monitoring, Fault Detection, Sensor Faults, Artificial Pancreas 1. INTRODUCTION Our artificial pancreas (AP) system consists of a glucose sensor, a wearable device that measures biometric variables, an insulin pump and a computing device such as a smartphone that executes the AP software (Cinar et al. (2016)). Failures in glucose sensors may cause undesirable and life threatening outcomes. A sudden increase in continuous glucose monitoring (CGM) measurements due to sensor error may cause an AP to overdose insulin and induce hypoglycemia. Sensor fault detection in AP systems has been addressed in a number of publications. Detection of pressure induced sensor attenuation (PISA) based on the rate of change in CGM readings was proposed and PISA was defined to be a negatively biased reading (Baysal et al. (2014)). A Kalman-filter-based method for detecting sudden spikes and loss of sensitivity in CGM was reported (Facchinetti et al. (2013)). A Kernel-based stochastic modeling technique was used for detection of CGM spikes in data collected from critically ill patients (Signal et al. (2012)). A discrete wavelet-transform-based online CGM dropouts detector was developed (Shen et al. (2010)). A method

that compared different statistical monitoring charts was proposed for detecting unexpected changes in CGM readings (Zhao and Fu (2015)). Various studies focused on assessing the correctness of CGM sensor readings rather than detection of sensor failures (Bondia et al. (2008); Leal et al. (2013b,a)). A glucose sensor (CGM) fault detection method that combines of model- and process history-based algorithms is developed by our group for detection of CGM related faults (Turksoy et al. (2016)). The model-based component uses the extension of Bergman’s model (Bergman et al. (1981); Roy and Parker (2007)). Due to the complexity of the human body, some states defined in first-principle models are not measurable and they are usually estimated from available measurements (Roy and Parker (2007)). Model parameters are also identified from available measurements. Because intra- and inter-subject variability over time must be captured in the algorithm, time-invariant model parameters are not appropriate for the minimal model used. Model states and parameters are estimated simultaneously by defining uncertain model parameters as augmented states. An Unscented Kalman filter (UKF) (Julier and Uhlmann (2004); Wan and Van Der Merwe

(2000)) is implemented for state estimation of nonlinear systems. The data-driven portion of the fault detection method is based on principal components analysis (PCA). PCA is used to develop a model that describes the expected variation under normal operation (NO). The results based on data from 51 subjects to assess the performance of the algorithm indicate successful detection with 84.2% sensitivity. Overall, 155 (out of 184) of the CGM failures are detected with a 2.8 minutes average detection time. Statistical process monitoring (SPM) is used extensively in various industries (Cinar et al. (2007)). Multivariate SPM (MSPM) techniques developed for continuous and batch processes have gained importance in recent years ¨ (MacGregor and Cinar (2012); Undey et al. (2003); Negiz and C ¸ inar (1998); Raich and C ¸ inar (1997); Raich and Cinar (1996)). MSPM provides more accurate information about the system state, give warnings earlier than univariate SPM, and is easy to compute and interpret (Cinar et al. (2007)). The method reported focuses on MSPM and sensor fault detection techniques for multivariable AP systems (Turksoy et al. (2014); Cinar et al. (2016)). They integrate multiway principal component analysis (MPCA), dynamic time warping (DTW) and Savitzky-Golay filter (SGF) to identify time periods where unexpected changes in data values occur and detect sensor faults. 2. METHODS The proposed method is data-driven and it requires training data rich in normal variation of glucose-insulin dynamics of the body to develop a model of error-free operation. Otherwise, some new dynamic changes that are not captured in the model may be interpreted as faults and increase the number of false positive error detections during its use for monitoring the system. Also, the training data should not include episodes with faults. If they are not removed before building the model of normal operation (NO), the algorithm may consider them as normal dynamical change and fail to detect the faults with similar patterns in real-time, which would increase the missed alarms and decrease the sensitivity of the algorithm. Multivariate SPM based subspace techniques use latent variables such as principal components (PC) to determine the statistical distance of the current state of the system from a reference state. They indicate the status of the system (normal or abnormal operation) by using two statistics: the Hotellings T 2 and the squared prediction error (SPE). The T 2 statistic indicates the distance of the current operation from the reference state as captured by the PCs included in the development of the model. Since only the first few PCs that capture most of the variation in the data are used to build the model, the model is a somewhat accurate but incomplete description of the process. The SPE chart captures the magnitude of the error caused by deviations resulting from events that are not described by the model. The T 2 and SPE statistics are used as a pair and if either chart indicates a significant deviation from the reference state, the presence of an abnormal situation is declared (Cinar et al. (2007)). The method has an offline stage to develop the models that represent NO and an online stage for online monitoring and fault detection (Figure 1).

Fig. 1. Flowchart of the proposed algorithm Monitoring and sensor fault detection in AP operation can be cast in the framework of batch process operation where the length of a batch run is set as one day (24 hours). The behavior of the body and glucose dynamics in response to daily events that are repeated can be considered as the operation of a batch process. These events include meals and exercise. Glucose levels increase after a meal and they are reduced when insulin is infused. This pattern is repeated for every meal. Exercise causes variation patterns in glucose levels by affecting the sensitivity to insulin and insulin-independent glucose uptake. These sequences of events are repeated during a day and over time. 2.1 Multiway Principal Component Analysis Batch process data can be structured as a three-way matrix. During the progress of a batch, j = 1, 2, ..., J variables are measured at k = 1, 2, ..., K time instants. Similar data are collected for i = 1, 2, ..., I repeated batches (Nomikos and MacGregor (1995)). Batch data are represented as a three-dimensional (3D) matrix X(I × J × K) (Figure 2). The convention is to arrange different batches along the vertical axis of the 3D data array, measured variables along the horizontal axis, and their observations over time along the third dimension (Figure 2). Multiway PCA (MPCA) has been used in batch process monitoring to decompose the 3D data matrix (Wold et al. ¨ (1987b); Nomikos and MacGregor (1994, 1995); Undey et al. (2003)). MPCA is equivalent to performing PCA on a 2D matrix X constructed by unfolding the 3D matrix X. Batch-wise MPCA (BMPCA) unfolds X into X(I × KJ) by adding vertical slices (I × J) side by side to the right (Figure 2), enabling the analysis of data variability among the batches in X with respect to variables and their observation times. This is the most suitable unfolding for SPM (Nomikos and MacGregor (1995); Wold et al. (1987b)), but it requires all batches to be of equal length and completed. Variable-wise MPCA (VMPCA) unfolding is performed by adding vertical (I×J) slices which provides

that similar features in the patterns are matched (Berndt and Clifford (1994)). DTW nonlinearly warps two trajectories such that similar events are aligned and a minimum distance between them is obtained. DTW requires a reference trajectory to be used for synchronization of the other trajectories. Denote the reference and new trajectory by xR and xT with data lengths R and T , respectively. DTW will find a sequence F ∗ of L points on a R × T grid: F ∗ = [f (1), f (2), ..., c(κ), ..., c(L)] (2) max(R, T ) ≤ L ≤ R + T where f (κ) = [i(κ), j(κ)] is an ordered pair indicating a position in the grid. κ is the number of grid points along a path between two trajectories. i, j are the sample points. DTW defines the Euclidean distance d between each point of the two trajectories: d(i(κ), j(κ)) = [xR (i(κ)) − xT (j(κ))]2 (3) The total distance between two batch trajectories is: L

D (R, T ) = ∑ d (i(κ), j(κ))

(4)

κ=1

The optimal path and minimum total distance are the solution of the optimization problem:

Fig. 2. Three-way array formation unfolding Figure 1. Three-way array formation and and unfolding. the variance information along all batches and time. There is no need for batches to be completed or to be of equal length. However, the mean centering step leaves the nonlinear time-varying trajectories in the data matrix because it simply subtracts a constant, the grand mean of each variable over all batches and time, from the trajectory of each variable. Hence, the results may be weak for small disturbances when the goal is to check deviations from the ¨ mean trajectory (Undey et al. (2003)). The X matrix can be decomposed to the summation of the product of score vectors tr (I × 1) and loadings matrix Pr (J × K) and a residuals matrix E: R

X = ∑ tr pTr + E = X + E

(1)

r=1

where R is the number of PCs. In BMPCA, tr (I × 1) and pr (KJ × 1) are the score and loading vectors of the unfolded X(I × KJ) matrix. The PCs for PCA can be computed by spectral decomposition or singular value decomposition (Cinar et al. (2007)). In this work, the nonlinear iterative PLS (NIPALS) method is used for the calculating PCs (Wold et al. (1987a)). 2.2 Dynamic Time Warping Several disturbances to glucose homeostasis may occur during different times of a day. Meals cause sharp rises in blood glucose concentration (BGC) that taper off slowly to a steady value. Meal times and compositions vary. Physical activity and stress (both affect the sensitivity to insulin) may also occur at different times from batch to batch. Mismatch in events and event times may cause erroneous interpretations, and it is necessary to align batches to be used for developing the reference that describes the normal operation of the body. Dynamic time warping (DTW) is a deterministic pattern matching method that works with pairs of patterns to locally translate, compress, and expand the patterns so

D∗ (R, T ) = min D (R, T ) F

or

F ∗ = min D (R, T ) F

(5) Dynamic programming is used to solve this optimization: ⎧ DF (i − 1, j) ⎪ ⎪ ⎪ DF (i, j) = d (i, j) + min ⎨ DF (i − 1, j − 1) (6) ⎪ ⎪ D (i, j − 1) ⎪ F ⎩ while satisfying local and global constraints: DF (R, T ) = d (R, T ) DF (1, 1) = d (1, 1) (7) i (k + 1) ≥ i (k) j (k + 1) ≥ j (k) 2.3 Derivative Dynamic Time Warping (DDTW) DTW may suggest aligning one single point on one trajectory to many points of the other trajectory. The peaks or valleys in trajectories might be slightly lower (or higher) from one trajectory to another. Use of the point first order derivative (d1 ) instead of Euclidean distance has been suggested as a measure of the difference between two trajectories to overcome this singularity problem (Keogh and Pazzani (2001)). The derivative DTW (DDTW) can be stated as: d1 (i(κ), j(κ)) = [x1R (i(κ)) − x1T (j(κ))]2 (8) where xR (i) − xR (i − 1) (9) x1R (i) = 2 The first derivative of a function provides information about the slope of the function (increasing or decreasing), and the second derivative indicates whether the function is convex or concave at that time point. Since the datasets considered may not follow a Normal distribution, higher order moments are also included in the proposed method. The third moment measures skewness, and the fourth moment measures kurtosis. The high-order DDTW (HODDTW) proposed captures more statistical

information by using higher-order derivatives. The distance for the HODDTW is:

Subject 1

400

R= 0.919

4

d (i (k) , j (k)) = ∑ az dz (i (k) , j (k))

(10)

z=0

where dz is the Euclidean distance between each point of the z th derivatives of two trajectories. az denotes the coefficients to give different weights to different derivative orders and are defined as: −z az = ∑4 e e−zj f or z = 0, . . . , 4 (11)

300

p