(EMG) Pattern Recognition - MDPI

9 downloads 0 Views 1MB Size Report
Jun 13, 2017 - Since the distribution density of RPs approximates the PDF of the feature space, ... new samples, and the distribution of the new group of RPs.
sensors Article

A Novel Unsupervised Adaptive Learning Method for Long-Term Electromyography (EMG) Pattern Recognition Qi Huang 1, *, Dapeng Yang 1 , Li Jiang 1, *, Huajie Zhang 1 , Hong Liu 1 and Kiyoshi Kotani 2 1

2

*

State Key Laboratory of Robotics and System, School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China; [email protected] (D.Y.); [email protected] (H.Z.); [email protected] (H.L.) Research Center for Advanced Science and Technology, the University of Tokyo and PRESTO/JST, Tokyo 153-8904, Japan; [email protected] Correspondence: [email protected] (Q.H.); [email protected] (L.J.); Tel.: +86-451-8640-2330 (L.J.)

Received: 15 April 2017; Accepted: 8 June 2017; Published: 13 June 2017

Abstract: Performance degradation will be caused by a variety of interfering factors for pattern recognition-based myoelectric control methods in the long term. This paper proposes an adaptive learning method with low computational cost to mitigate the effect in unsupervised adaptive learning scenarios. We presents a particle adaptive classifier (PAC), by constructing a particle adaptive learning strategy and universal incremental least square support vector classifier (LS-SVC). We compared PAC performance with incremental support vector classifier (ISVC) and non-adapting SVC (NSVC) in a long-term pattern recognition task in both unsupervised and supervised adaptive learning scenarios. Retraining time cost and recognition accuracy were compared by validating the classification performance on both simulated and realistic long-term EMG data. The classification results of realistic long-term EMG data showed that the PAC significantly decreased the performance degradation in unsupervised adaptive learning scenarios compared with NSVC (9.03% ± 2.23%, p < 0.05) and ISVC (13.38% ± 2.62%, p = 0.001), and reduced the retraining time cost compared with ISVC (2 ms per updating cycle vs. 50 ms per updating cycle). Keywords: long-term EMG pattern recognition; adaptive learning; concept drift; particle adaption; support vector classifier

1. Introduction According to a survey on the usage of prostheses [1], 28% of the users are categorized as “prosthesis rejecters”, who use their prostheses no more than once a year, mainly because of the clumsy control of commercial prostheses. Compared with conventional control methods, the control scheme based on pattern recognition (PR), employing advanced feature extraction and classification technology, shows up the potential to leverage the intuitiveness and functionality of myoelectric control [2,3]. However, when employing PR-based methods to realize myoelectric control, interfering factors such as temperature and humidity changes, skin impedance variation, muscular fatigue, electrode shifting and limb position changes will cause classification degradation [4–6], hindering the clinical application and the commercialization of the PR-based EMG control scheme. After analyzing both industrial and academic demands, Farina et al. [7] divided the demand for reliability of upper limb prosthesis control system into two parts: (a) the robustness to instantaneous changes such as the electrode shifting when donning and doffing, and arm posture variation; and (b) the adaptability to slow changes such as muscular fatigue and skin impedance variation. The robustness is usually related

Sensors 2017, 17, 1370; doi:10.3390/s17061370

www.mdpi.com/journal/sensors

Sensors 2017, 17, 1370

2 of 28

to research on advanced signal recording methods, such as optimizing the size and the layout of EMG electrodes [8,9], employing high-density electrodes to get more information [10,11], and finding out invariant characteristics in EMG signals [12], whereas the adaptability is usually related to adaptive learning methods [13–17]. The objective of this paper is to employ the adaptive learning method based on the theory of concept drift to endow the PR classifier with adaptability to slow changes of EMG signals. Related work is described in the following section. 1.1. Related Work 1.1.1. Adaptive Learning on Surface EMG Data The phenomenon whereby the probability distribution of target data changes along time is called concept drift [18,19], where the term concept refers to the probability distribution in feature space. Hence, the adaptive learning is defined as the learning method that updates its predictive model online to track the concept drift. According to whether the learning method needs labeled samples to retrain the predictive model or not, the adaptive learning methods can be categorized into supervised adaptions [15,20,21] and unsupervised adaptions [16,22]. The supervised adaption is able to achieve high recognition accuracy, but at the cost of a cumbersome training process to acquire labeled samples repetitiously, whereas the unsupervised adaption is user-friendly but at the cost of reduced accuracy [13]. Therefore, the crux of the unsupervised adaption is to choose exemplars with high confidence. Researchers tried to evaluate the confidence of the classification for unlabeled samples by using the likelihood [14], the entropy [23], the consistency of classification decisions [24], or simply rejecting unknown data patterns [25]. However, those kinds of evaluations would introduce extra computational costs during the retraining processes and result in data jamming during online control. Specific to the predictive model, most researchers [13,14,16,20–22,25] chose linear discriminant analysis (LDA) because of its simplicity to retrain the model [26], whereas only a few researchers [27,28] chose support vector machine (SVM) [29,30], a better classifier that has solid mathematical foundation and good performance with small training sample set. The SVM has been used for non-adapting myoelectric control [31,32]. Designing adaptive classifier based on the SVM is challenging because: (a) the computational cost of the training process of SVM is unacceptable for real-time updating; (b) there is no direct mapping from the output of SVM decision function to the classification confidence [28], resulting in difficulties with choosing reliable exemplars. This paper is to design an adaptive classifier based on the SVM by overcoming those adverse conditions. 1.1.2. Adaptive Learning Based on SVM Based on standard SVM, two modifications were proposed to reduce the computational cost of the training process of the SVM: (a) incremental SVM (ISVM) [33], which reduced the computational cost by an incremental learning method, and (b) least square support vector machine (LS-SVM) [34], which reduced the computational cost by eliminating the heuristic nature from the training process of support vectors. Similar to the SVM, the incremental learning method could be applied on LS-SVM to further reduce the retraining cost [35,36]. To update the predictive model of LS-SVM with fewer labeled samples and lower computational cost, Tommasi et al. proposed a model adaption method [37], which could compute model coefficients from models prepared in advance. On the other hand, unsupervised adaptive learning methods based on the SVM suffered the accumulation of misclassification risk of unlabeled samples. Transductive learning method based on the SVM [38], which could relabel a sample after more samples being processed, was expected to suppress the accumulation of misclassification risk. However, the transductive learning method was computational costly and assumed the same probability distribution of source dataset and target dataset, which was inapplicable for adaptive learning. Another unsupervised adaptive learning method was to estimate the posterior probability of the output of SVM’s decision function and choose reliable samples accordingly [28]. The method was also computational costly when estimating the posterior probability repeatedly.

Sensors 2017, 17, 1370

3 of 28

1.1.3. Validation Method for Adaptive Learning The major concern on validating the adaptive learning is how to build a picture of accuracy changes over time. The conventional cross validation method is inapplicable [17], because it assumes the same distribution is shared by the training data and the verification data. To the contrary, a commonly used method is computing prequential errors of sequential data, no matter the data were acquired from realistic environment or generated with controlled permutations. Unfortunately, specific to myoelectric control, the validation data sequences of most researches by now were limited to either four to six sessions continually acquired within 2 to 3 h [13–15,17,27], or numerous sessions acquired at two time points [22] (one for training, the other one for validating), which did not agree with the reality of using prostheses, and thus could not perfectly reflect the adaptability of the learning method. Some researchers argued that the closed-loop (or human-in-the-loop) learning and validation scheme could reveal the co-adaption between the user and the adaptive system [17]. However, the open-loop validating is essential to check the inherent adaptability of the adaptive learning method, and this paper will employ the open-loop validating method. 1.2. Significance of This Paper According to the related work, we could conclude that the unsupervised adaptive learning issue remains an open problem. Existing methods based on LDA have the problem of high misclassification risk because of its unsophisticated predictive model, whereas methods based on SVM have the problem of low computation efficiency and the rapid accumulation of misclassification risk during updating. Particle adaptive classifier (PAC), proposed in this paper, is to address the problem of SVM-based unsupervised adaptive learning methods. It improves the computational efficiency by constructing universal incremental LS-SVM and representative-particles-based sparse LS-SVM. It reduces the accumulation of misclassification risk by employing neighborhood updating based on the principles of smoothness assumption and cluster assumption [39], the principles widely used in semi-supervised learning. To analyze the performance of adaptive learners, we design a validating paradigm that measures prequential errors of two kinds of data: (a) simulated data with controlled permutations, to analyze the adaptability of the adaptive learner, and (b) acquired data with the time span of one day, to check the actual applicability of the adaptive learner. To explore the feasibility in various working situations, we compared the performances of adaptive learners in both unsupervised and supervised retraining scenarios. 2. Materials and Methods 2.1. Principles of Adaptive Learning Compared with non-adapting methods, adaptive learning methods are able to track the concept drift to compensate the long-term performance degradation, but introduce the retraining/updating risk. The retraining/updating risk is caused by adopting wrong samples to retrain/update the predictive model. The wrong samples refer to samples having labels conflicting with their ground truth labels according to current concept. In supervised adaptive learning scenarios, old samples are likely to be wrong samples because they may be outdated after concept drifting. In unsupervised adaptive learning scenarios, both outdated old samples and misclassified new samples are wrong samples, and such misclassification will result in the accumulation of the misclassification risk. Therefore, when designing an adaptive learning method, we should take the following facts into consideration: (a)

Since we don’t know when the concept drift happens, information from old data may on one hand be useful to the classification generalization, and on the other hand be harmful to the representability of the classifier [40]. Therefore, a good adaptive classifier should balance the benefit and the risk of using old data.

Sensors 2017, 17, 1370

(b)

4 of 28

Drifting concepts are learnable only when the rate or the extent of the drift is limited in particular ways [41]. In many studies, the extent of concept drift is defined as the probability that two concepts disagree on randomly drawn samples [19], whereas the rate of concept drift is defined as the extent of the drift between two adjacent updating processes [42]. Considering that the sample distribution and the recognition accuracy are immeasurable for a single sample, we define the dataset of samples with the same distribution as a session. In this way, drift extent (DE) is defined as the difference of sample distribution between two sessions, and drift rate (DR) is defined as the drift extent between two adjacent sessions in a validating session sequence.

2.2. Incremental SVC 2.2.1. Nonadapting SVC To solve the binary classification problem with a set of training pairs {(xi , yi )}il=1 , xi ∈ Rn , yi ∈ {−1, +1}, a standard C-SVC [29] formulates the indicator function as f (x) = sgn(wT ϕ(x) + b) and the optimization primal problem as follows: l

min JSVC = 21 wT w + C ∑ ξ i

w,b,ξ

s.t.

yi

− [w T ϕ (x

,

i =1

i ) + b]

(1)

≤ ξ i , i = 1, 2, . . . , l

where ξ i is the training error of vector xi , ϕ(•) is the mapping from original feature space to reproducing kernel Hilbert space, C > 0 is the regularization parameter. The dual problem of Equation (1) is: min WSVC = α

s.t.

1 2

l

l

l

l

i =1

i =1

∑ ∑ αi Qi,j α j − ∑ αi + b ∑ yi αi

i =1 j =1

l

,

(2)

∑ yi αi = 0, 0 ≤ αi ≤ C, i = 1, 2, . . . , l

i =1

  where Qi,j = yi y j k xi , x j , k xi , x j is the kernel function satisfying the Mercer Theorem [29]. A commonly used kernel function is RBF kernel [43], which is formulated as: k(xi , x j ) = exp(−γkxi − x j k2 ).

(3)

After optimizing W with α and b, the indicator function of SVC can be rewritten as f (x) =   l sgn ∑ αi yi k(xi , x) + b . During the optimization process of W, samples lying far away from the i =1

class boundary will be pruned by Karush-Kuhn-Tucker (KKT) conditions and their corresponding coefficients will be turned into zero [29]. Specifically, given a training sample (xk , yk ), following condition function holds:    ≥ 0; αk = 0 yk f (xk ) − 1 = 0; 0 < αk < C . (4)   ≤ 0; α = C k Those samples with non-zero coefficients are named as support vectors (SVs). The dataset of support vectors (noted as SV) is a subset of original training dataset, and thus the predictive model of standard C-SVC is sparse, and the indicator function can be simplified as ! f (x) = sgn

∑ α i y i k ( xi , x ) + b .

xi ∈SV

The standard C-SVC can not change its model when concept drift happens and thus shows nonadapting property and marks the performance degradation with time. To avoid ambiguity, we note the standard C-SVC as nonadapting SVC (NSVC) in this paper.

Sensors 2017, 17, 1370

5 of 28

Sensors The 2017, standard 17, 1370

5 of 28 C-SVC can not change its model when concept drift happens and thus shows nonadapting property and marks the performance degradation with time. To avoid ambiguity, we5 of 28 C-SVC as can not changeSVC its model when concept noteThe the standard standard C-SVC nonadapting (NSVC) in this paper.drift happens and thus shows nonadapting property and marks the performance degradation with time. To avoid ambiguity, we 2.2.2.note Incremental SVM standard C-SVC 2.2.2.the Incremental SVM as nonadapting SVC (NSVC) in this paper.

Sensors 2017, 17, 1370

To change SVMSVM intointo an an adaptive classifier, strategyisistotoupdate update predictive model To change adaptive classifier,aatypical typical strategy thethe predictive model 2.2.2. Incremental SVM with new samples, namely, learning incrementally. When the data come in the form of stream, they with new samples, namely, learning incrementally. When the data come in the form stream, they are To into change SVMdifferent into an adaptive a. Newly typical strategy is tobatch update predictive model segmented different batches . .2},...} . Newly labeled data batch and old support vector { Π1 , Π {classifier, 2 1, ,. are segmented into batches labeled data ΠΠ t and old support vector tthe with new samples, namely, learning incrementally. When the data come in the form of stream, theyvector set SVset will compose a new training thepredictive predictivemodel model and support t −1SV t−1 will compose a new trainingset setto to retrain retrain the and getget newnew support vector {1 ,  2 ,...} are. segmented into different batches . Newly labeled data batch Πt and old support vector set SV 1 shows thethe schema ofoftypical SVC(ISVC) (ISVC) [33]. set SVt. Figure 1 shows schema typicalIncremental Incremental SVC [33]. t Figure set SVt−1 will compose a new training set to retrain the predictive model and get new support vector set SVt. Figure 1 shows the schema of typical Incremental SVC (ISVC) [33].

 

Figure 1. Generic schema of Incremental SVC. Support vectors (SVs) from previous predictive model

Figure 1. Generic schema of Incremental SVC. Support vectors (SVs) from previous predictive model are combined with newly coming samples to train a new predictive model and select new support are combined with newly coming samples to train a new predictive model and select new support Figure Generic of Incremental SVC. Support vectors (SVs) from previous predictive model vectors1.based onschema KKT conditions. vectors on KKT are based combined with conditions. newly coming samples to train a new predictive model and select new support vectors on KKT conditions. Since based the predictive model of ISVC is composed of support vectors (SVs), the penalizing

Since the predictive model of ISVC is composed of support vectors (SVs), penalizing mechanism for redundant information is based on KKT conditions [29]. However, KKTthe conditions predictive model ofofISVC is composed support vectors (SVs), the penalizing can Since neither reduce the information complexity inof long term,[29]. nor reduce the misclassification mechanism for the redundant ispredictive based onmodel KKT conditions However, KKT conditions can mechanism for redundant information based on However, KKT conditions risk of new samples. Figure 2 shows isthe influence of conditions KKT onthe themisclassification adaption of ISVC. neither reduce the complexity of predictive model inKKT long term,conditions nor [29]. reduce risk of can neither reduce the complexity of predictive model in or long term, nor reduce thecan misclassification According to whether the boundary of classes changes not, the concept drift be categorized new samples. Figure 2 shows the influence of KKT conditions on the adaption of ISVC. According to risk new samples. Figure 2 shows the Only influence of KKT drift conditions the adaption of ISVC. intoof real concept drift and virtual drift [19]. real concept has anyon influence on performance whether the boundary of classes changes or not, the concept drift can be categorized into real concept According to whether boundary classes two changes or not, the concept driftlying can be categorized degradation. After realthe concept driftofbetween adjacent concepts, samples close to the old drift into andreal virtual drift [19]. real concept has anydrift influence performance degradation. concept drift and Only virtual Onlydrift real concept has anyon influence performance class boundary are more likely drift to be[19]. outdated according to the position of the newon class boundary. Afterdegradation. real concept drift between two adjacent concepts, samples lying close to the old class After realsamples conceptwould drift between two adjacent concepts, lying close conditions. to theboundary old Unfortunately, these be kept as support vectors aftersamples penalizing of KKT are more likely to be outdated according to the position the new boundary. Unfortunately, class boundary aresamples more likely to beincrease outdated according toofthe position of the new class risk boundary. Those outdated would the complexity and the class misclassification of the these beand kept as support vectors after penalizing of KKT Those conditions. theseUnfortunately, samples would be kept supervised as would support vectors after penalizing of KKT conditions. outdated predictive models insamples both unsupervised adaptive learning scenarios. Those outdated samples would increase the misclassification complexity and risk the misclassification of the samples would increase the complexity and the of the predictiverisk models in both predictive models in both supervised and unsupervised adaptive learning scenarios. supervised and unsupervised adaptive learning scenarios. Old SVs of Class 1 Old SVs of Class 2 Old SVs of Class New Patterns of 1Class 1 Old SVs of Class New Patterns of 2Class 2 New of Class 1 Old Patterns Class Boundary NewPatterns Class Boundary New of Class 2 Updated Classifier Boundary Old Class Boundary New Class Boundary Updated Classifier Boundary

Figure 2. Schematic drawing to show the influence of KKT-conditions-based sample pruning on adaption. Old support vectors, which are close to the old class boundary, are more likely to be Figure 2. Schematic drawing show the influence influence sample pruning on on outdated according to the new class boundary, and toofof beKKT-conditions-based added into the new predictive model, and Figure 2. Schematic drawing totoshow the KKT-conditions-based sample pruning adaption. Old support vectors, which are close to old the old class boundary, are more likely be thus to increase misclassification risk complexity of class the new predictive adaption. Old support vectors, which areand close to the boundary, aremodel. more likely to betooutdated outdated to theboundary, new class boundary, to be into added into thepredictive new predictive model, according to according the new class and to beand added the new model, andand thus to thus to increase misclassification risk and complexity of the new predictive model. 2.2.3. Program Codes and Parameters of NSVC and ISVC

increase misclassification risk and complexity of the new predictive model. The codes of ISVC and NSVC were modified from libSVM [44]. To realize multi-class classifying, 2.2.3. Program Codes and Parameters of NSVC and ISVC strategy [45]Parameters was employed to get higher classification accuracy. Parameters C (relaxation 2.2.3. 1-against-1 Program Codes and of NSVC and ISVC The codes of ISVC andγ NSVC were modified from libSVM [44]. To realize multi-class classifying, parameter of SVM) and (RBF kernel parameter) were determined by n-fold cross validation [5] The codes of ISVC and NSVC were modified from libSVM [44]. To realize multi-class classifying, 1-against-1 was employed to get higher classification accuracy. Parameters C (relaxation during thestrategy training[45] process. 1-against-1 strategy [45]and wasγ employed get higherwere classification accuracy. C (relaxation parameter of SVM) (RBF kerneltoparameter) determined by n-foldParameters cross validation [5] during the training process. parameter SVM) and γ (RBF kernel parameter) were determined by n-fold cross validation [5] 2.3. Theof Parcticle Adaptive Classifier

during the training process.

2.3. The Parcticle Adaptive Classifier

2.3. The Parcticle Adaptive Classifier 2.3.1. General Structure of PAC Classifier According to above descriptions, ISVC has two shortcomings for concept adaption: (1) the rapid accumulation of the misclassification risk, and (2) the increasing complexity of the predictive model.

17, 1370 SensorsSensors 2017, 2017, 17, 1370

6 of 286 of 28

2.3.1. General Structure of PAC Classifier

In order to amend to theabove shortcomings ofISVC ISVC, a novel adaptive learning structure According descriptions, haswe twoproposed shortcomings for concept adaption: (1) the rapid for SVM,accumulation the particle adaption classifier (PAC). Figure 3 illustrates the working principles of PAC. The core of the misclassification risk, and (2) the increasing complexity of the predictive model. concept of PAC is its replaceable support as representative Compared In order to amend the shortcomings of vectors, ISVC, wenamed proposed a novel adaptiveparticles learning (RPs). structure for particle classifier 3 illustrates the working principles of PAC. The with SVM, SVs ofthe NSVC or adaption ISVC, RPs of PAC(PAC). have Figure the following characteristics: (1)

(2)

(3)

core concept of PAC is its replaceable support vectors, named as representative particles (RPs). The distribution is or sparse. Different traditional pruning, Compared with SVsof of RPs NSVC ISVC, RPs of PAC from have the followingKKT-conditions-based characteristics:

the sparsity of RPs is from the down original training dataset. The (1) The distribution oforiginated RPs is sparse. Different fromsampling traditionalof KKT-conditions-based pruning, thedown sampling process keep the distribution characteristics of the whole feature space, sparsity of RPswill is originated from the down sampling of original training dataset. The rather down than the distribution characteristics of distribution the class boundary. Specifically, eachfeature RP can be regarded sampling process will keep the characteristics of the whole space, rather as a center of a subspace in the feature and thusSpecifically, the distribution density RPs largely thanpoint the distribution characteristics of thespace, class boundary. each RP can be of regarded as a centerthe point of a subspace in the feature (PDF) space, and thus the distribution of RPs approximates probability density function of the feature space. Thedensity predictive model largely approximates density function (PDF) of the feature space. The predictive trained by RPs will havethe theprobability similar classification performance with the predictive model trained model trained by dataset, RPs will have the similar classification performance with the predictive model by original training but have higher computational efficiency. trained by original training dataset, but have higher computational efficiency. With numerous replacements of RPs, the general predictive model is able to track the concept drift. (2) With numerous replacements of RPs, the general predictive model is able to track the concept Sincedrift. the distribution density ofdensity RPs approximates the PDF the of the replacement Since the distribution of RPs approximates PDFfeature of the space, featurethe space, the for an RP with an appropriate new sample, is equivalent to update the PDF estimation of replacement for an RP with an appropriate new sample, is equivalent to update the PDF a subspace of the feature space. In this way, the classifier is able to track the concept drift. estimation of a subspace of the feature space. In this way, the classifier is able to track the concept drift. Based on RPs, it is possible to evaluate the misclassification risk of a new sample. When we try to (3) Basedthe onmisclassification RPs, it is possible to evaluate the sample misclassification risk of a new sample.learning When we try evaluate risk of a new in unsupervised adaptive scenarios, to evaluate the misclassification risk of a new sample in unsupervised adaptive learning we have to establish some basic assumptions. Considering the property of EMG signals [5], scenarios, we have to establish some basic assumptions. Considering the property of EMG at least two widely used assumptions on the class boundary are valid for the classification problem signals [5], at least two widely used assumptions on the class boundary are valid for the of EMG signals: (a) cluster assumption [39], and (b) smoothness assumption [39]. The cluster classification problem of EMG signals: (a) cluster assumption [39], and (b) smoothness assumption assumes that two samples near enough in the feature space are likely to share the assumption [39]. The cluster assumption assumes that two samples near enough in the feature samespace label.areThe smoothness that the class boundary smooth, namely, likely to share theassumption same label. assumes The smoothness assumption assumesisthat the class the sample farisaway from the class boundary is more likely to maintain the boundary smooth, namely, the sample far away from the class boundary is label more after likelyconcept to drifting, thanthe samples nearconcept the class boundary. If a sample close to an and has maintain label after drifting, than samples nearisthe classenough boundary. If aRP, sample is the enoughlabel to anwith RP, the andRP, hasitthe same predicted labelclassified. with the RP, it is pruning likely to be rightly sameclose predicted is likely to be rightly When newly coming classified. Whendistances pruning to newly samples their the distances to RPs, itofismisclassification possible to samples with their RPs,coming it is possible to with suppress accumulation suppress the accumulation of misclassification risk. The area in which the new sample risk. The area in which the new sample is adopted to replace old RP is the attractive zone of is the RP. adopted to replace old RP is the attractive zone of the RP.

Figure 3. Schematic drawing showthe the updating updating mechanism ofof PAC. Different fromfrom ISVC, the PAC Figure 3. Schematic drawing totoshow mechanism PAC. Different ISVC, the PAC strategy updates the predictive model by replacing old representative particles (RPs), other than strategy updates the predictive model by replacing old representative particles (RPs), other than adding new representative particles, to maintain the complexity of the predictive model. On the other adding new representative particles, to maintain the complexity of the predictive model. On the other hand, new samples close enough to one RP will replace the RP. The area lying in which the sample hand, new samples close enough to one RP will replace the RP. The area lying in which the sample could replace the RP is the attractive zone of the RP. In this way, we could choose samples with high couldconfidence replace the RP is the zone of the RP. In this way, we could choose samples with high to reduce the attractive retraining risk. confidence to reduce the retraining risk.

The size of the attractive zone is crucial, because it determines the tracking speed of PAC, the reliability of unlabeled new samples, and the distribution of the new group of RPs. Larger attractive zone will be able to track the concept drift with lager drift rate, but will also result in higher updating risk and worse representability of RPs. Furthermore, old RPs should have larger attractive

Sensors 2017, 17, 1370

7 of 28

Sensors 2017,The 17, 1370 size of the attractive zone is crucial, because it determines the tracking speed of PAC, the7 of 28

reliability of unlabeled new samples, and the distribution of the new group of RPs. Larger attractive zone will be able to track the concept drift with lager drift rate, but will also result in higher updating zone risk than new RPs, because theyofare more likely toold be RPs outdated. Besides, initialization of RPs and worse representability RPs. Furthermore, should have largerthe attractive zone than and the of the they predictive model also essential for PAC. The initialization of RPs affects the newchoice RPs, because are more likelyare to be outdated. Besides, the initialization of RPs and the choice of the predictive model are also for PAC. initialization of 2.3.3. RPs affects representability representability of RPs, which willessential be discussed in The details in Section Sincethe the RPs is sparse, it is of RPs, which will be discussed in details in Section 2.3.3. Since the RPs is sparse, it is to choose of better to choose a full-dense predictive model to maintain the sparsity as well as thebetter representability full-dense predictive model to maintain the sparsity as well as the representability RPs. The full- the RPs. aThe full-dense property means that there is no further pruning mechanismofwhen training dense property means that there is no further pruning mechanism when training the predictive predictive model. In the case of PAC, the full-dense feature of predictive model refers that all RPs model. In the case of PAC, the full-dense feature of predictive model refers that all RPs act as SVs for act as SVs for the predictive model. We chose LS-SVM as the full-dense predictive model [46] in this the predictive model. We chose LS-SVM as the full-dense predictive model [46] in this paper. To paper. To retrain the LS-SVM efficiently, we proposed the universal incremental LS-SVM the first retrain the LS-SVM efficiently, we proposed the universal incremental LS-SVM for the firstfor time, time,which whichcould couldoperate operateinserting, inserting, deleting and replacing support vectors in the predictive model deleting and replacing support vectors in the predictive model conveniently. Figure 4 shows the general learning structure of proposed PAC. conveniently. Figure 4 shows the general learning structure of proposed PAC.

Figure 4. The learning structureofofPAC. PAC.Different Different from recognition method, PAC PAC Figure 4. The learning structure fromordinary ordinarypattern pattern recognition method, chooses before trainingLS-SVM LS-SVM during initialing process, and adds updating process after chooses RPsRPs before training duringthe the initialing process, andan adds an updating process samples. The updating process process is composed of attractiveofzone decisionzone (to select after labeling labelingunlabeled unlabeled samples. The updating is composed attractive decision unlabeled samples with with high high confidence), updating RPs with selected samples, and and universal (to select unlabeled samples confidence), updating RPs with selected samples, universal incremental LS-SVM with updated RPs. incremental LS-SVM with updated RPs.

2.3.2. Universal Incremental LS-SVM

2.3.2. Universal Incremental LS-SVM

 x ,y  , x  R To solve the binary classification problem with a set of training pairs {(x , y )}l , x To solve the binary classification problem with a set of training pairs

l

i

i

i

yi  , LS-SVM[34] [34] formulates formulates the primal problem as follows: yi ∈ {− 1,{+1,1}1} , LS-SVM theoptimization optimization primal problem as follows:

1 T Cl l 2 w w  min J LS  min(  ) ,ξ T b ,ξ min (w1, bw 2w + C2 ∑ 2 i ξ1i2 )i min JLSw ,= . w,b,ξ w,b,ξ 2T s.t. yi T [w  ( xi )  b] i=i 1, i  1,2,..., N

i 1

i

i

i =1

.

n

i

,

∈ Rn ,

(5)

(5) yi − [w ϕ(xi ) + b] = ξ i , i = 1, 2, . . . , N where  i is the training error of vector x i ,  () is the mapping from original feature space to where ξ i is the training error of vector xi , ϕ(•) is the mapping from original feature space to reproducing reproducing kernel Hilbert space, C  0 is the regularization parameter. The Lagrangian for above kernel Hilbert space, C >is:0 is the regularization parameter. The Lagrangian for above optimization optimization problem problem is: l L( w , b , ξ )  J LS l n  {[wT  ( x )  b]    yi } . o (6) L(w, b, ξ ) = JLS − ∑ αi i1 [i wT ϕ(xii) + b] +i ξ i − yi . (6) s.t.

i =1

where i is the Lagrange multiplier of x i . Conditions for the optimality of L( w , b , ξ ) are:

where αi is the Lagrange multiplier of xi . Conditions for the optimality of L(w, b, ξ ) are:  l   ∂L(w, b, ξ )/∂w = 0 → w = ∑ αi ϕ(xi )    i =1    l   ∂L(w, b, ξ )/∂b = 0 → ∑ α = 0 i . i =1   ∂L ( w, b, ξ ) /∂ξ = 0 → Cξ = α  i i i    l     ∂L(w, b, ξ )/∂αi = 0 → αi [ ∑ α j k (xi , x j ) + 1/C ] + b = yi j =1

(7)

Sensors 2017, 17, 1370

8 of 28

where k (xi , x j ) = ϕ(xi ) T ϕ(x j ) is the kernel function similar to standard C-SVC. Equation (7) can be rewritten in the form of linear equations: "

0 EV

EV T K + C −1 I

#"

b α

#

"

=

0 Y

# .

(8)

where EV = [1, 1, . . . , 1] T , α = [α1 , α2 , . . . , αl ] T , Y = [y1 , y2 , . . . , yl ] T , ∈ Rl × 1 , I is an l-rank identity matrix, K is an l-rank kernel matrix with entries Ki,j = k (xi , x j ). Let H = K + C −1 I, the solution of Equation (8) is: (

EV T H − 1 Y EV T H − 1 EV H−1 (Y − bE

b= α=

V)

.

(9) l

The dual model of the indicator function can be written as f (x) = sgn{[ ∑ αi k (xi , x)] + b}. i =1

The computational complexity of the linear equations indicated in Equation (9) is of order O(l3 ) [47]. The major complexity lies in the computing of H −1 , the inverse of an l-rank symmetric positive definite matrix [47]. With universal incremental learning method, the computational complexity for retraining the predictive model will be reduced to O(l2 ). The universal incremental LS-SVM is able to insert, delete and replace a sample in any position in the RP dataset. Before detailed description, we formulate the operation of inserting, deleting and replacing respectively. Given a RP dataset S(l) with l samples, and the pth sample xp (1 ≤ p ≤ l) of S(l), we define the dataset deleting xp from S(l) as S(l − 1). The operations and their corresponding learning process are defined as:

• • •

Inserting: adding xp at the pth position into S(l − 1), is equivalent to training from S(l − 1) to S(l); Deleting: deleting xp at the pth position from S(l), is equivalent to training from S(l) to S(l − 1); Replacing: deleting xp at the pth position from S(l), adding xp * at the pth position into S(l − 1), is equivalent to the combination of training from S(l) to S(l − 1) and training from S(l − 1) to S*(l).

Since the training process from S(l − 1) to S*(l) is similar with the process from S(l − 1) to S(l), we only need to work out an algorithm to train from S(l) to S(l − 1) and from S(l − 1) to S(l). We note the H matrices for training classifiers with S(l) and S(l − 1) as H(l) and H(l − 1) respectively. H(l) and H(l − 1) can be divided as:   " # H1 h1 H2 H1 H2   H( l − 1) = , H(l ) =  h1T h pp h2T . (10) H2T H3 H2T h2 H3 where H1 ∈ R( p − 1) × ( p − 1) , H2 ∈ R( p − 1) × (l − p) , H3 ∈ R(l − p) × (l − p) are divisions of H(l − 1) at pth row and pth column, h1 = [k(x1 , xp ), . . . , k(xp − 1 , xp )]T , h2 = [k(xp + 1 , xp ), . . . ,k(xl , xp )]T , hpp = k(xp , xp ) + 1/C. According to the property of Cholesky factorization [48], H(l) and H(l − 1) are symmetric definite and hence they can be Cholesky factorized uniquely. We note the upper triangular matrices after Cholesky factorization as U and W, and then we get H(l − 1) = WT W, H(l ) = UT U. Similar to H(l) and H(l − 1), U and W can be written as:   " # U1 u1 U2 W1 W2   W= , U =  0T u pp u2T . (11) O W3 O 0 U3

Sensors 2017, 17, 1370

9 of 28

where O ∈ R( p − 1) × ( p − 1) is a matrix of zeros, 0 ∈ R( p − 1) × 1 is a vector of zeros. Because of the uniqueness of Cholesky factorization, we can compute from W to U as: T U1 = W q1 , U2 = W2 , u1 = (U1 ) u pp = h pp − u1T u1

−1

u2 = (h2 − U2T u1 )/u pp U3T U3 = W3T W3 − u2 u2T

h1 .

(12)

On the other hand, we can compute from U to W as: W1 = U1 , W2 = U2 . W3T W3 = U3T U3 + u2 u2T

(13)

The detailed algorithms of the inserting process and the deleting process are show in Algorithm 1. In the algorithm, we retrain the predictive model without computing the inverses of large matrices, but with forward substitutions of triangular matrices and low-rank Cholesky downdating/updating of symmetric positive definite matrices. The computational complexity of forward substitution is of order O(l2 ), while the computational complexity of Cholesky updating/downdating is of order O[(l − p)2 ]. The replacing process is realized by one-step deleting and one-step inserting. Therefore, the overall computational complexity of universal incremental LS-SVM is of order O(l2 ), much lower than that of standard LS-SVM, which is of order O(l3 ). Algorithm 1. Algorithm for inserting and deleting processes of universal incremental LS-SVM. Inserting  Input: S(l − 1) = {(xi , yi )}il=1,i6= p , R(l − 1), x p , y p , 1 ≤ p ≤ l, C Output: R(l ), α(l ), b(l )

5

W ← R( l − 1) " # W1 W2 //divide at pth ←W O W3 row and clomn    T h1 ← k x1 , x p , . . . , k x p−1 , x p    T h2 ← k x p + 1 , x p , . . . , k x l , x p  h pp ← k x p , x p + 1/C

6

U1 ← W1 , U2 ← W2

7 8 9

u1 ← (U1T ) h1 //forward substitution u2 ← (h2 − U2T u1 )/u pp U3T U3← W3T W3 − u2 u2T//low-rank downdate U1 u1 U2   U ←  0T u pp u2T  O 0 U3 −1 //forward EV T U − 1 ( U T ) Y b ( l ) ← T −1 T −1 EV U ( U ) EV substitution  //forward −1 α ( l ) ← U−1 U T (Y − bEV ) substitution R( l ) ← U

1 2 3 4

10 11 12 13

−1

Deleting Input: S(l ) = {(xi , yi )}il=1 , R(l ), 1 ≤ p ≤ l Output: R(l − 1), α(l − 1), b(l − 1) 1 U  ← R( l )  U1 u1 U2   2  0T u pp u2T  ← U O 0 U3 3 W1 ← U1 , W2 ← U2

//divide at pth, p + 1st row and clomn

4 W3T W3 ← U3T U3 + u2 u2T //low-rank update " # W1 W2 5 W← O W3 −1 EV T W − 1 ( W T ) Y 6 b ( l − 1 ) ← T −1 T −1 EV W ( W ) EV  −1 7 α ( l − 1 ) ← W−1 W T (Y − bEV ) 8 R( l − 1) ← W

Sensors 2017, 17, 1370

10 of 28

2.3.3. Initialization and Updating of RPs The initialization of RPs affects the representability as well as the sparsity of RPs. Though many methods could initialize RPs appropriately, this paper initializes RPs with following two steps: (a) (b)

Dividing the training dataset into m clusters based on kernel space distance and k-medoids clustering method [49]; Randomly and proportionally picking samples from each cluster as RPs with a percentage p.

Here, choosing kernel space distance to cluster samples rather than the euclidean metric, is to reduce the time cost of computing attractive zone as well as to be consistent with the kernel trick in the predictive model. The kernel space distance between two vectors xi and xj is defined as: d ( xi , x j ) =

q

q

kφ(xi ) − φ(x j )k2 =

k(xi , xi ) + k (x j , x j ) − 2k(xi , x j ).

(14)

If we choose RBF kernel, namely, k (xi , x j ) = exp(−γkxi − x j k2 ), where γ is the kernel parameter, the kernel space distance can be simplified as: d ( xi , x j ) =

q

2 − 2k (xi , x j ).

(15)

When determine the attractive zone of RPs, following principles are taken into consideration: (a) (b) (c)

The new sample is close enough to its nearest RP; Older RP is more likely to be replaced; The new sample is in the same class with its nearest RP (for supervised adaption only).

Therefore, for the new sample xN , we firstly find out the nearest RP to xN according to the kernel space distance and the timing coefficient, as:   I = arg min[exp(ti /λ)d(x N , xi )] . i

i

(16)

where I is the index of the nearest RP to xN , xi is the ith RP, ti is the unchanging time from last time xi being replaced, λ is the factor controlling the influence of time. Then we use the threshold dTh to determine whether xN is close enough to its nearest RP, xI , or not: ( D (x N , x I ) = d Th − exp(t I /λ)d(x N , x I )

> 0 to replace x I with x N , t I = 0 . ≤ 0 to ignore x N , all ti = ti + 1

(17)

Considering that kernel product k (x N , xi ) is computed for making the prediction, not much extra computational burden is added for the decision function. The replacing of the representational particle is computed with the algorithm of universal incremental LS-SVM. The detailed algorithm for uPAC is shown in Algorithm 2.

Sensors 2017, 17, 1370

11 of 28

Algorithm 2. Initializing and updating algorithm for uPAC. Require: m > 0, p > 0, dTh > 0, λ > 0 1 2 3 4 5 6 7

Cluster training dataset into m clusters //k-medoids clustering Extract representative particles with percentage p into dataset RP Train LS-SVM predictive model PM with RP while xN is valid do y N ← f ( PM, x N ) //predict label allti ←ti  + 1 //update unchangingtime I ← arg min[exp(ti /λ)d(x N , xi )] i

8 9 10 11 12 13 14

i

D ← d Th − exp(t I /λ)d(x N , x I ) if D > 0 tI ←0 //clear unchanging time xI ←xN //update RP Update PM //replcaing with universal incremental LS-SVM end if end while

2.3.4. Program Codes and Parameters of PAC To realize multi-class classifying, one-against-one strategy is employed to get higher classification accuracy [45]. Parameters C (relaxation parameter of LS-SVM) and γ (RBF kernel parameter), which control the generalization of LS-SVM, are determined by n-fold cross validation during the training process of LS-SVM. Parameters m and p control the sparsity and the representability of RPs. Parameters dTh and λ balance the ability to track concept drifts with large drift rate and the ability to choose samples with high confidence. When determining the parameters m, p, dTh , and λ, we use one trial of data described in Section 2.5.4, based on the measure of AER described in Section 3.3.1. Figure 5 illustrates the classification performance with varying classifier parameters. For different parameters, the performance shows different changing tendencies as follows:









Parameter p: With the increasing of p, the ending accuracy AER of the data sequence increases, while the slope of the increasing decreases. Such changing tendency indicates that PAC with redundant RPs is not only inefficient but also unnecessary. The recommended interval for p is between 10% and 20%. Parameter m: The adjustment of m changes the performance slightly. With the increasing of m, the performance experiences slight improvement at first and then slight deterioration. It has an optimal interval as between 9 and 28. Parameter λ: The choices of dTh and λ are highly related. There is an optimal interval for λ as between 104 and 106 , when dTh is chosen as 0.99. On one hand, the choices of λ below the lower boundary of the optimal interval will result in indiscriminate replacement of RPs and complete failure of the classifier. On the other hand, the choices of λ higher than the upper boundary of the optimal interval will weaken the influence of time and result in slight deterioration. Parameter dTh : Similar to λ, there is an optimal interval for dTh as between 0.9 and 1.1, when λ is chosen as 105 . It is remarkable that, because we choose the nearest RP of the new sample to be replaced, the classifier does not complete fail even when dTh is set as 0.

Sensors 2017, 17, 1370 Sensors2017, 2017,17, 17,1370 1370 Sensors

12 of 28 12ofof28 28 12

(a) (a)

(b) (b)

(c) (c)

(d) (d)

Figure 5. Classificationperformance performance withvarying varying classifierparameters. parameters.(a) (a)Varying Varyingppwith withmm==10, 10,λλ Figure Figure5.5.Classification Classification performancewith with varyingclassifier classifier parameters. (a) Varying p with m = 10, 5 5 5 ,and =0.99; 0.99; (b)(b) Varying with 10%, =10 10 and 0.99; (c)Varying Varying with 10%, 10510 5and 5 , and , ,and ddThThd=Th (b) Varying mmwith pp==p10%, λλ=λ ddThThd==Th 0.99; (c) λλwith pp==p10%, mm =λ=10 = , and = 0.99; Varying m with = 10%, = ,10 = 0.99; (c) Varying λ with = 10%, 5.5. Th==0.99; 0.99; (d) Varying d Thwith with p = 10%, m = 10, and λ = 10 =10, 10, and d 5 Th (d) Varying d Th p = 10%, m = 10, and λ = 10 =m and d = 10, and d = 0.99; (d) Varying d with p = 10%, m = 10, and λ = 10 . Th

Th

As shown shown inin Figure Figure 5,5, impropriate impropriate pp and and λλ will will result result inin complete complete failure failure ofof the the classifier. classifier. As As shown in Figure 5, impropriate p and λ will result in complete failure of the classifier. Therefore, Therefore,when whenchoosing choosingthe theclassifier classifierparameters, parameters,we weshould shouldchoose chooseppand andλλfirstly, firstly,and andthen thenchoose choose Therefore, when choosing the classifier parameters, we should choose p and λ firstly, and then choose the other the other two parameters accordingly. However, it remains an open problem to avoid over fitting the other two parameters accordingly. However, it remains an open problem to avoid over fitting two parameters accordingly. However, it remains an open problem to avoid over fitting and local and local optimum during the optimizing process of the parameters. Since this paper is to prove the and local optimum during the optimizing process of the parameters. Since this paper is to prove the optimum during the optimizing process of the parameters. Since this paper is to prove the potentiality potentialityofofPAC PACclassifier, classifier,after aftertrying tryingsome somegroups groupsofofPAC PACparameters, parameters,we wechoose chooseone onegroup groupofof potentiality of PAC classifier, after trying some groups of PAC parameters, we choose one group of PAC parameters PAC parameters with moderate performance, and compare the performance with ISVC and NSVC. PAC parameters with moderate performance, and compare the performance with ISVC and NSVC. with moderate performance, and compare the performance with ISVC and NSVC.5 5The chosen group The chosen group of PAC parameters are as, m = 10, p = 10%, d Th = 0.99, and λ = 10 . The chosen group of PAC parameters are as, m = 10, p = 10%, dTh =5 0.99, and λ = 10 . of PAC parameters are as, m = 10, p = 10%, dTh = 0.99, and λ = 10 . Furthermore,totoillustrate illustratethe thedistribution distributionofofRPs, RPs,we wemeasure measurethe theproportion proportionofofboundary boundaryRPs RPs Furthermore, Furthermore, to illustrate the distribution of RPs, we measure the proportion of boundary RPs to to all RPs, by training the NSVC with RPs. After training, RPs with non-zero coefficients are regarded to all RPs, by training the NSVC with RPs. After training, RPs with non-zero coefficients are regarded all RPs, by training the NSVC with RPs. After training, RPs with non-zero coefficients are regarded as asboundary boundaryRPs. RPs.Figure Figure66shows showsthe thechanging changingtendency tendencyofofthe theproportion proportionofofboundary boundaryRPs. RPs.Session Session as boundary RPs. Figure 6 shows the changing tendency of the proportion of boundary RPs. Session 1 is theinitializing initializingsession sessionfor forPAC; PAC;Sessions Sessions22toto24 24are arethe thevalidating validatingsessions. sessions.As Asshown shownininthe thefigure, figure, 11isisthe the initializing session for PAC; Sessions 2 to 24 are the validating sessions. As shown in the figure, the proportion of boundary RPs maintains less than 40% of all RPs, which satisfies the requirement the proportion of boundary RPs maintains less than 40% of all RPs, which satisfies the requirement the proportion of boundary RPs maintains less than 40% of all RPs, which satisfies the requirement of therespresentability. respresentability. ofofthe the respresentability.

Figure6.6. 6.Proportion Proportionof ofboundary boundaryRPs RPsto toall allRPs RPsduring duringthe theadaptive adaptivelearning learningprocess. process. Figure Figure Proportion of boundary RPs to all RPs during the adaptive learning process.

Sensors 2017, 17, 1370 Sensors 2017, 17, 1370

13 of 28 13 of 28

2.4. Performance Validation for for Adaptive Adaptive Learning Learning 2.4. Performance Validation 2.4.1. Supervised and Unsupervised Unsupervised Adaptive Adaptive Learning Learning Scenarios Scenarios 2.4.1. Supervised and Label information information is is overwhelmingly overwhelmingly important important for for adaptive adaptive learning learning [19]. [19]. For For an an adaptive adaptive Label learner, adapting processes with complete, timely supervised label information represents the ideal ideal learner, adapting processes with complete, timely supervised label information represents the case in in which which the the learner learner can can achieve achieve its its best best performance, performance, whereas whereas adapting adapting processes processes with with none none of of case supervised label information represents the realistic case in which the learner shows its basic supervised label information represents the realistic case in which the learner shows its basic adaptability. adaptability. Above two kindsprocesses of adapting processes defined as supervised adaptive learning Above two kinds of adapting are defined asare supervised adaptive learning scenarios and scenarios and unsupervised adaptive learning scenarios respectively. unsupervised adaptive learning scenarios respectively. PAC working working in in unsupervised unsupervised adaptive adaptive learning learning scenarios scenarios is is the the original original algorithm algorithm shown shown in in PAC Algorithm 2, 2, whereas whereas PAC PAC working working in in supervised supervised adaptive adaptive learning learning scenarios scenarios introduces introduces aa pruning pruning Algorithm strategy that ignores misclassified samples when choosing and updating RPs to maintain high strategy that ignores misclassified samples when choosing and updating RPs to maintain high confidence of the updating. ISVC working in unsupervised adaptive learning scenarios is to retrain confidence of the updating. ISVC working in unsupervised adaptive learning scenarios is to retrain predictive model model with with output output predictions predictions of of the the old old predictive predictive model, model, whereas whereas ISVC ISVC working working in in predictive supervised adaptive learning scenarios is to retrain predictive model with supervised labels. PAC supervised adaptive learning scenarios is to retrain predictive model with supervised labels. PAC working in in supervised supervised and and unsupervised unsupervised adaptive adaptive learning learning scenarios named as working scenarios are are named as sAPC sAPC and and uPAC uPAC respectively, whereas ISVC working in supervised and unsupervised adaptive learning scenarios are respectively, whereas ISVC working in supervised and unsupervised adaptive learning scenarios are named as sISVC and uISVC respectively. named as sISVC and uISVC respectively. Validation 2.4.2. Data Organization for Validation In this thispaper, paper,we wedefine define a session a of setdata of data samples the same concept. Anda session thus a a session as aasset samples with with the same concept. And thus session sequence is defined as a series of sessions with varying concept. The minimum to measure sequence is defined as a series of sessions with varying concept. The minimum unit unit to measure the the recognition accuracy concept drift a session.ItItisisessential essentialtotoexamine examine the the variation variation of recognition accuracy andand thethe concept drift is ais session. recognition accuracy in a session sequence. The The actual actual verification verification process process is is illustrated illustrated in in Figure Figure 7. 7. For a session sequence with n sessions, samples are aligned into a line with the session number and to the the updating updatingcycle cycleone onesample sampleby byone onesample. sample.The Themisclassified misclassified samples counted unit fed to samples areare counted in in unit of of sessions. The rate correctly labeled samples to all samples in one session is defined value sessions. The rate of of correctly labeled samples to all samples in one session is defined as as thethe value of of recognition accuracy. recognition accuracy. RA1

RA2

RA3

RA4

RAn

Counting Misclassified Samples

Prediction Retraining

Predicting PM

Session 1

Updating Cycle Session 4

Session 2

Session n

Session 3

7. The verification process for for adaptive adaptive learning learning classifiers. classifiers. Samples with the same concept concept Figure 7. session. Sessions session sequence. sequence. Samples in the compose a session. Sessions with different concepts compose a session session sequence sequence are are fed fed to the updating updating cycle cycle one by one, and recognition accuracy (RA (RAii) is concluded session recognition accuracy session. During an updating cycle, we firstly use the predictive model (PM) to classify in the unit of session. the sample and make the prediction, prediction, then use the the sample sample and and its its corresponding corresponding supervised supervised label information adaptive learning scenarios) or prediction result (in unsupervised adaptive information (in (insupervised supervised adaptive learning scenarios) or prediction result (in unsupervised learning to retrain to theretrain predictive model. model. adaptivescenarios) learning scenarios) the predictive

To thoroughly compare the performance of PAC with ISVC and NSVC, we conducted experiments based on both simulated data with controlled concept drift and continuously acquired

Sensors 2017, 17, 1370

14 of 28

Sensors 2017, 17, 1370

14 of 28

To thoroughly compare the performance of PAC with ISVC and NSVC, we conducted experiments based on both simulated data with controlled concept drift and continuously acquired realistic realistic one-day-long data to explore both the theoretical difference in adaptability and the empirical one-day-long data to explore both the theoretical difference in adaptability and the empirical difference difference in clinical use. in clinical use.

2.5. Experimental Setup 2.5. Experimental Setup 2.5.1.EMG EMGData DataRecording Recordingand andProcessing Processing 2.5.1. EMG signals signals were were acquired acquired by by seven seven commercial commercial electrodes electrodes (Otto (Otto Bock, Bock, Vienna, Vienna, Austria Austria [50]) [50]) EMG which output regularized (0–5 V) and smoothed root mean square (RMS) features of EMG signals [6], which output regularized (0–5 V) and smoothed root mean square (RMS) features of EMG signals [6], rather than than raw raw EMG EMG signals. signals. We We employed employed the the output output of of the the electrodes electrodes as as EMG EMG signal signal features features rather directly. Since the bandwidth of the output signal feature was in the range of 0–25 Hz [51], the directly. Since the bandwidth of the output signal feature was in the range of 0–25 Hz [51], the features features were acquired at the rate of 100 Hz, as researches in [6,51,52]. The electrodes were placed were acquired at the rate of 100 Hz, as researches in [6,51,52]. The electrodes were placed around the around the forearm, apex of muscle the forearm with a uniform The distribution. layout of the forearm, apex of the forearm bulge,muscle with abulge, uniform distribution. layout ofThe the electrodes is electrodes is shown in Figure 8. shown in Figure 8.

(a)

(b)

Figure 8.8. Layout Axial layout of EMG electrodes; (b) radial layout of EMG Figure Layout ofofEMG EMGelectrodes. electrodes.(a)(a) Axial layout of EMG electrodes; (b) radial layout of electrodes. EMG electrodes.

During the experiment, eight typical hand and wrist movements shown in Figure 9 were During the experiment, eight typical hand and wrist movements shown in Figure 9 were measured. measured. The eight motions could be categorized into two classes: motions a–d were four typical The eight motions could be categorized into two classes: motions a–d were four typical wrist motions wrist motions whereas motions e–h were four typical hand motions. It would prove the applicability whereas motions e–h were four typical hand motions. It would prove the applicability of adaptive of adaptive learners on the prosthetic hand-wrist system if the adaptive learners could discriminate learners on the prosthetic hand-wrist system if the adaptive learners could discriminate the eight the eight motions properly. Furthermore, the four hand motions could be regarded as two grasp/open motions properly. Furthermore, the four hand motions could be regarded as two grasp/open pairs: pairs: powerful grasp and precise grasp [53], which are essential hand movements in activities of powerful grasp and precise grasp [53], which are essential hand movements in activities of daily daily life (ADLs). life (ADLs). Since transient EMG signals contains more useful information than stable EMG signals [54], we Since transient EMG signals contains more useful information than stable EMG signals [54], trained and verified the classifier with acquired transient EMG signals. To eliminate the interference we trained and verified the classifier with acquired transient EMG signals. To eliminate the interference of weak contractions, only those contractions larger than 20% of the maximum voluntary contraction of weak contractions, only those contractions larger than 20% of the maximum voluntary contraction (MVC) were recorded. Specifically, before the principal experiment, we acquired 100 samples for each (MVC) were recorded. Specifically, before the principal experiment, we acquired 100 samples for each kind of motions while subjects conducted maximum voluntary contractions, and set an averaged kind of motions while subjects conducted maximum voluntary contractions, and set an averaged value value for one subject (MVCi, i corresponding to the serial number of the subject) over all samples, all for one subject (MVCi , i corresponding to the serial number of the subject) over all samples, all motions motions and all channels. During principal experiment, the averaging value over all channels of a and all channels. During principal experiment, the averaging value over all channels of a sample sample would be checked with its corresponding MVCi and would be recorded only if it was larger would be checked with its corresponding MVCi and would be recorded only if it was larger than 20% than 20% of MVCi. During recording process, a progress bar was shown to the subject to indicate how of MVCi . During recording process, a progress bar was shown to the subject to indicate how many many valid samples had been recorded. valid samples had been recorded. The data were acquired in units of sessions. Every session included 4000 samples: 500 samples The data were acquired in units of sessions. Every session included 4000 samples: 500 samples for for each of the eight motions. Acquiring the data of one session needed around 3 min. After each of the eight motions. Acquiring the data of one session needed around 3 min. After acquisition, acquisition, all the data were processed by PAC, ISVC and NSVC on a PC computer with the CPU of all the data were processed by PAC, ISVC and NSVC on a PC computer with the CPU of 3.2 GHz and 3.2 GHz and memory of 8 GB. memory of 8 GB.

Sensors 2017, 17, 1370

15 of 28

Sensors 2017, 17, 1370

15 of 28

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Figure 9. 9. Motions Motions during during EMG EMG data data acquiring acquiring experiments. experiments. (a) (a) Wrist Wrist flexion flexion (WF); (WF); (b) (b) wrist wrist extension extension Figure (WE); (c) ulnar deviation (UD); (d) radial deviation (RD); (e) all finger extension (AE); (f) all finger finger (WE); (c) ulnar deviation (UD); (d) radial deviation (RD); (e) all finger extension (AE); (f) all flexion (AF); (g) thumb-index extension (TE); (h) thumb-index flexion (TF). Motions (a–d) are four flexion (AF); (g) thumb-index extension (TE); (h) thumb-index flexion (TF). Motions (a–d) are four typical wrist wristmovements, movements,whereas whereasmotions motions(e–h) (e–h)are arefour fourtypical typicalhand handmovements. movements. typical

2.5.2. Subjects 2.5.2. Subjects Eight able-bodied subjects (five men and three women, all right-handed) participated in the Eight able-bodied subjects (five men and three women, all right-handed) participated in the experiments. Their average age was 24.25 ± 2.12 years old and their average body mass index (BMI, experiments. Their average age was 24.25 ± 2.12 years old and their average body mass index (BMI, weight/height2) was 21.84 ± 2.99 kg/m2. All the experimental protocols of this study were approved weight/height2 ) was 21.84 ± 2.99 kg/m2 . All the experimental protocols of this study were approved by the Ethical Committee of Harbin Institute of Technology and conformed to the Declaration of by the Ethical Committee of Harbin Institute of Technology and conformed to the Declaration of Helsinki. All subjects were fully informed about the procedures, risk, and benefits of the study, and Helsinki. All subjects were fully informed about the procedures, risk, and benefits of the study, written informed consent was obtained from all subjects before the study. and written informed consent was obtained from all subjects before the study. 2.5.3. Experiment Experiment I:I: Simulated Simulated Data Data Validation Validation 2.5.3. One important important measure measure on on the the performance performance of of an an adaptive adaptive learning learning method method is is the the adaptability, adaptability, One referring to the tracking ability of the algorithm when concept drifts of different rates and extents extents referring to the tracking ability of the algorithm when concept drifts of different rates and happen. Without shift functions, which could tune both the happen. Without the the loss lossof ofgenerality, generality,linear lineartime timevarying varying shift functions, which could tune both extent and the rate of the drift of the concept, were added to raw EMG signals, to simulate the possible the extent and the rate of the drift of the concept, were added to raw EMG signals, to simulate concept driftconcept of EMG signals. Dataset of one session was chosenwas as chosen the original noted as the possible drift of EMG signals. Dataset of one session as thedata, original data, 4000 ,{xwhereas session session with maximum simulated concept drift drift extent waswas noted as E ( 0 )  as { x (E 0 ) i }i  = 1 noted }4000 , whereas with maximum simulated concept extent noted (0)

(0) i i = 1

40 00 Therefore, a simulated session sequence, within which a following session was E (1)E(1){ x= as {xi (11)i.}4000 . Therefore, a simulated session sequence, within which a following session (1) i }

i=1

was generated by adding a portion maximum driftextent extenttotoits itspreceding preceding session, session, is noted as generated by adding a portion of of maximum drift as n }n QQnn = {E , where n is the total number of divisions of maximum drift extent, is the serial , where n is the total number of divisions of maximum drift extent, k the serial  { E ( k(k/n k k0= 0 / n ) }) number of the number of the session session in in the the session session sequence. sequence. Within Within aa session session sequence, sequence, larger larger kk corresponded corresponded to to larger drift extent from session E . Two sessions in different session sequences, sharing the same larger drift extent from session E(0) (0). Two sessions in different session sequences, sharing the same value . valueof ofk/n, k/n,had hadthe thesame samedrift driftextent extentfrom fromsession sessionEE(0) (0). Specifically, Specifically,to tosimulate simulatethe theactual actualconcept conceptdrift driftof ofEMG EMG signals, signals, simulated simulated shifts shifts were were composed composed of two steps: (a) decreasing the signal-noise ratio (SNR) of every channel of EMG sensors of two steps: (a) decreasing the signal-noise ratio (SNR) of every channel of EMG sensors to to simulate simulate the the muscular muscular fatigure fatigureor orthe the change change of of skin-electrode skin-electrodeimpedance; impedance;(b) (b)fusing fusingthe the signal signal of of adjacent adjacent two two channels of EMG sensors to simulate the minor shift of electrodes. For the data session E in ) in the channels of EMG sensors to simulate the minor shift of electrodes. For the data session (Ek/n the (k / n) session sequence Qn , signals of each channel were computed as: session sequence Qn, signals of each channel were computed as: ( xe(k/n (k/n (k/n), j j6=  x()kj/[n1) j −  0.5( k /)] n )+ 0.5xx((k/n 77 1 0.5  0.5e k / n ))jj+ 1 (1k / n ), x(k/n) j = (18) .7 . x( k / n ) j xe  [1 − 0.5(k/n)] + 0.5e (18) x ( k/n ) , j = (k/n )j ( k/n ) 1      x 1 0.5( k / n ) 0.5 x ( k / n ), j 7   ( k / n ) j  ( k / n )1 x

+0.25(k/n)

x (0)) jj  0.25( k / was n ) the signal with direct current noise. Therefore, for the data session where ) j = (10 + was the signal with direct current noise. Therefore, for the data where xe(xk/n 0.25(k/n) (k /n) j  k / n )the signals were generated as: 1  0.25( with maximum concept drift, session with maximum concept drift, ( the signals were generated as: 0.4x(0) j + 0.4x(0) j+1 + 0.2, j 6= 7 x (1) j = (19)  0.4 x(0) j 1  0.2, j  7 . 0.4 x(0)+j 0.4x j = 7. (0)1 + 0.2, x(1) j 0.4x (19)  (0) j j7 0.4 x(0) j  0.4 x(0)1  0.2,

Sensors 2017, 17, 1370 Sensors 2017, 17, 1370

16 of 28 16 of 28

During experiment, session sequences with n = 16, 18, 20, 22, 27, 32, 64, 128 were generated. During experiment, session sequences with n = 16, 18, 20, 22, 27, 32, 64, 128 were generated. When verifying the adaptive classifiers with a session sequence, we used original session E(0) to When verifying the adaptive classifiers with a session sequence, we used original session E(0) to initialize the the classifiers, classifiers, and and used used the the whole whole session session sequence sequence to to realize realize performance performance validation validation as as initialize described in in Figure Figure 7. 7. described Validation 2.5.4. Experiment II: One-Day-Long Data Validation To verify verifythethe performance of adaptive learning clinical use, eight subjects To performance of adaptive learning methodsmethods in clinicalinuse, eight subjects participated participated in one-day-long dataexperiment. acquiring experiment. Each subject completed trial. Therefore, in one-day-long data acquiring Each subject completed one trial. one Therefore, we got we got eight trials of one-day-long EMG data acquisitions in total. During one trial, subjects eight trials of one-day-long EMG data acquisitions in total. During one trial, subjects conducted conducted the donning electrodes donning 9:30 (24 h clock, hereinafter) and theafter doffing after and the electrodes before 9:30before (24 h clock, hereinafter) and the doffing 22:00, and22:00, no extra no extra and donning andwas doffing was conducted during trial. After electrodes the data donning doffing conducted during one trial. one After electrodes prepared,prepared, the data sessions sessions were every recorded half an10:00 hourtofrom 10:00 to 21:30. after a data one-day-long data were recorded half every an hour from 21:30. Hence, after aHence, one-day-long acquiring trial, acquiring trial,ofwe a total naming of 24 sessions, naming them1from 1 to 24. Inofthe of we got a total 24got sessions, them from Sessions to 24.Sessions In the intervals theintervals recording, recording, the move subjects could move their arms detaching freely without detaching the electrodes. the subjects could their arms freely without the electrodes. Compared with the total acquiring time, the elapsing time within one session was ignorable. Therefore, the data data with thethe same concept. In this way way the oneTherefore, data in inone onesession sessionwere wereregarded regardedasas data with same concept. In this the day-long data composed a session sequence. In order to reduce the influence of accidental factors on one-day-long data composed a session sequence. In order to reduce the influence of accidental factors initializing session, as as well as as totomaintain the on initializing session, well maintainsequence sequencelength lengthand andmaximum maximum drift drift extent extent from the beginning of of the sequence sequence to the end of the sequence, we used two validating order order for every session beginning sequence. Figure 10 illustrated the validating validating order order of of an an acquired acquired data data sequence. sequence. sequence. Sessions Acquiring Time (Starting Point) Initializing Natural Validating Order Serial No. Revesed Initializing Validating Order Serial No.

Session 1

Session 2

10:00

10:30

X 2 24

23

... ... ... ...

Session 23

Session 24

21:00

21:30

23

24

X

2

Figure 10. method for for acquired session sequence. For each sequence acquiredacquired by one-dayFigure 10. Validation Validation method acquired session sequence. For each sequence by long experiment, sessionssessions were organized in both in the natural order (following the acquiring time) one-day-long experiment, were organized both the natural order (following the acquiring and the reversed orderorder of theofnatural order.order. time) and the reversed the natural

In natural natural order, order, we we used used Session Session 11 to to initialize initialize classifiers, classifiers, and and Sessions Sessions 22 to to 24 24 to to compose compose the the In validating session sequence according to the acquiring time. In reversed order, session order in the validating session sequence according to the acquiring time. In reversed order, session order in the sequence were reversed, namely, Session 24 acting as initializing session, Sessions 23 to 1 composing sequence were reversed, namely, Session 24 acting as initializing session, Sessions 23 to 1 composing the validating validatingsession sessionsequence. sequence.InInorder ordertotoavoid avoid ambiguity, endowed recognition accuracy the ambiguity, wewe endowed thethe recognition accuracy RA RA of a session in the validating sequence with a serial number as the corner mark, RA i. The serial of a session in the validating sequence with a serial number as the corner mark, RAi . The serial number number corresponded to theofposition of the session in the sequence. validatingFor sequence. For the corresponded to the position the session in the validating example, theexample, recognition recognition accuracy of Session 2 in natural order was noted as RA 2, whereas in reversed order it was accuracy of Session 2 in natural order was noted as RA2 , whereas in reversed order it was noted noted as RA as. RA23. 23

3. Results Results 3. Time Cost Cost 3.1. Support Vectors and Time amount of of support support When validating the performance of adaptive classifiers, we recorded the amount vectors and and the the time time cost cost per per updating updating cycle cycle along along with with the the retraining retraining times. times. Figure 11 shows shows the the vectors one-day-long EMG data by averaging averaging the eight eight average variations during the classification tasks of one-day-long natural and and reversed reversed validating validating sequences. sequences. As shown in Figure 11a,b, 11a,b, both both sISVC sISVC subjects and both natural and uISVC uISVCincreased increasedtheir theiramounts amountsofof SVs, from less than amounts of uPAC to more and SVs, from less than thethe amounts of uPAC and and sPACsPAC to more than thantimes two times the amounts uPAC and sPAC after 92,000 times of retraining. Furthermore, the two of the of amounts uPAC and sPAC after 92,000 times of retraining. Furthermore, the increment increment of sISVC was less than that of uISVC. As the amount of SVs increased, the time cost per

Sensors 2017, 17, 1370

17 of 28

Sensors 2017, 17, 1370

17 of 28

of sISVC was less than that of uISVC. As the amount of SVs increased, the time cost per updating cycle of sISVC uISVCand increased from 10 ms to more than 50 ms. At the same updating cycleand of sISVC uISVC accordingly, increased accordingly, from 10 ms to more than 50 ms. At time, the the time costthe per updating of uPAC and sPAC maintained less than 2less ms.than Figure 11 Figure illustrates same time, time cost percycle updating cycle of uPAC and sPAC maintained 2 ms. 11 that the proposed has higher efficiency than ISVC and meets theand requirement illustrates that thePAC proposed PACcomputational has higher computational efficiency than ISVC meets theof online learning. requirement of online learning.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 11. Amount of support vectors and time costs of updating cycles for adaptive classifiers. (a) Figure 11. Amount of support vectors and time costs of updating cycles for adaptive classifiers. Amounts of support vectors of uPAC, uISVC and NSVC; (b) Amounts of support vectors of sPAC, (a) Amounts of support vectors of uPAC, uISVC and NSVC; (b) Amounts of support vectors of sPAC, sISVC and NSVC; (c) Time cost per updating cycle of uPAC; (d) Time cost per updating cycle of sPAC; sISVC and NSVC; (c) Time cost per updating cycle of uPAC; (d) Time cost per updating cycle of sPAC; (e) Time cost per updating cycle of uISVC; (f) Time cost per updating cycle of sISVC. An updating (e) Time cost per updating cycle of uISVC; (f) Time cost per updating cycle of sISVC. An updating cycle cycle included predicting the sample label and retraining the predictive model. Time cost per included predicting the sample label and retraining the predictive model. Time cost per updating cycle updating cycle of sPAC and uPAC maintained less than 2 ms whereas that of sISVC and uISVC of sPAC and uPAC maintained less than 2 ms whereas that of sISVC and uISVC increased from 10 to increased from 10 to 50 ms along with retraining times. 50 ms along with retraining times.

3.2. Classification Performance for Simulated Data 3.2. Classification Performance for Simulated Data Figure 12a–d shows the validation results with simulated data. Within each figure, the solid lines Figure 12a–d shows the validation results with simulated data. by Within each figure, the classifier, solid lines illustrate the recognition accuracy of validating sequences classified a specified adaptive illustrate the dot recognition of validating sequences classified by a specified classifier, whereas the line markaccuracy the recognition accuracy of the validating sequence with n =adaptive 128 classified by whereas the dot line mark the recognition accuracy of the validating sequence with n = 128 classified NSVC. From Figure 12a–d, the adaptive classifiers are uPAC, sPAC, uISVC, and sISVC respectively. by NSVC. From Figure 12a–d, the adaptive classifiers are uPAC, sPAC, uISVC, and sISVC respectively.

Sensors 2017, 17, 1370

18 of 28

Sensors 2017, 17, 1370

18 of 28

Sensors 2017, 17, 1370

18 of 28

(a)

(b)

(a)

(b)

(c)

(d)

Figure 12. Recognition accuracy of simulated session sequences. (a) Recognition of uPAC (d) accuracy Figure 12. Recognition (c) accuracy of simulated session sequences. (a) Recognition accuracy of uPAC (compared with NSVC); (b) Recognition accuracy of sPAC (compared with NSVC); (c) Recognition (compared with NSVC); (b) Recognition accuracy of sPAC (compared with NSVC); (c) Recognition Figure 12. Recognition accuracy of simulated sequences. (a) Recognition accuracy of uPAC accuracy of uISVC (compared with NSVC); session (d) Recognition accuracy of sISVC (compared with accuracy of uISVC (compared with NSVC); (d) Recognition accuracy of sISVC (compared with Classifier (compared with NSVC); (b) Recognition accuracy of sPAC (compared with NSVC); (c) Recognition Classifier NSVC). Classifier NS is short for NSVC. Each line represents a session sequence. All the NSVC). Classifier NS sequences is(compared short forshare NSVC. Each represents session sequence. All the simulated accuracy ofsession uISVC with NSVC); (d) Recognition of sISVC (compared with max), whereas amounts of simulated the same line maximum drift aaccuracy extent (DE session sequences share the same maximum drift extent (DE ), whereas amounts of sessions in the Classifier NSVC). Classifier NS is short for NSVC. Each line represents a session sequence. All the max sessions in the session sequences are different and noted as n. Since DEmax is fixed, for a session max ), whereas amounts of simulated session sequences share the same maximum drift extent (DE session sequences are different and noted as n. Since DE is fixed, for a session sequence, larger max sequence, larger n corresponds to smaller drift rate of the session sequence. Parameter k is the serial n sessions in session sequences aresession different andanoted assequence, n. Since DE maxserial fixed, for ato session corresponds toathe smaller drift rate of the sequence. Parameter k islarger the number of a session number of session in the session sequence. Within session kiscorresponds larger sequence, larger n corresponds to smaller drift rate of the session sequence. Parameter k is the serial in the session sequence. Within a session sequence, larger k corresponds to larger drift extent from drift extent from session E(0). If two sessions in different session sequences shares the same value of number of a session in the session sequence. Within a session sequence, larger k corresponds to larger k/n, E they have same drift extent from session E(0). session twothe sessions in different session sequences shares the same value of k/n, they have the (0) . If drift extent from session E(0). If two sessions in different session sequences shares the same value of same drift extent from session E(0) . k/n, they have same12, drift extent from session E(0). As shown in the Figure when drift extent increased from 0 to its maximum extent, the recognition accuracy of NSVC decreased from nearly 100% for E(0) to less than 20% for E(1). The recognition As shown in Figure 12,12, when extent increased from00totoitsits maximum extent, recognition As shown in Figure whendrift drift extent increasedstable from maximum the the recognition accuracy of adaptive classifiers either maintained or decreased a extent, little, keeping higher accuracy of NSVC decreased from nearly 100% for EE(0) tothe less than20% 20% forE(1) Ewith . different The recognition accuracy of accuracy NSVC decreased from nearly 100% for about (0) to less than for .(1)The recognition recognition than NSVC. To get more details accuracy variation drift accuracy of adaptive classifiers either maintained or(0) decreased akeeping little, keeping higher accuracy adaptive classifiers either maintained stablestable or decreased a ,little, higher rates,of we focused on recognition accuracy degradation from E to E(1) as shown in Figure 13.recognition recognition than NSVC. get more details about the accuracy variation different drift accuracy than accuracy NSVC. To get moreTodetails about the accuracy variation withwith different drift rates, rates, we focused on recognition accuracy degradation from E (0) to E(1), as shown in Figure 13. we focused on recognition accuracy degradation from E to E , as shown in Figure 13. (0)

(1)

Figure 13. The recognition accuracy degradation of adaptive classifiers changing along with the ratio of drift rate (DR) to maximum drift extent (DEmax). In this figure, each line represents the accuracy Figure 13. 13. The accuracydegradation degradation of adaptive classifiers changing along Figure Therecognition of adaptive classifiers along with the with ratio the degradation ofrecognition a type of accuracy adaptive classifier, and each point on the changing line represents the accuracy ratioof of drift rate (DR) to maximum drift extent (DE ). In this figure, each line represents the max drift rate (DR) to maximum drift extent (DE max). In this figure, each line represents the accuracy degradation of a session sequence. A session sequence is determined by the ratio of DR to DEmax, accuracy degradation of a type of adaptive classifier, and each point on the line represents the accuracy degradation of a type of adaptive classifier, and each point on the line represents the accuracy which equals 1/n. Considering that all sessions share the same DEmax, this figure shows the changing degradation a session sequence. session sequence isisdetermined byby the ratio ofdegradation DRDR to DE degradation aofsession sequence. AAsession sequence determined the ratio of to max DE tendency of of performance degradation along with drift rate. The recognition accuracy is, max , max , this figure shows the changing which equals 1/n. Considering that all sessions the same DE which equals 1/n. Considering that all sessions share the same DE , this figure shows the changing max computed as the error rate of E(1) in the session sequence. tendency of performance degradationalong along with with drift drift rate. accuracy degradation is is tendency of performance degradation rate.The Therecognition recognition accuracy degradation (1) in the session sequence. computed as the error rate of E computed as the error rate of E in the session sequence. (1)

Sensors 2017, 17, 1370

19 of 28

Sensors 2017, 17, 1370

19 of 28

As shown in Figure 13, the recognition accuracy degradation was related to the drift rate of a session sequence. Larger drift rate resulted in larger recognition accuracy degradation. Moreover, for As shown in Figure 13, the recognition accuracy degradation was related to the drift rate of one validating sequence, the recognition accuracy degradation of different adaptive classifiers a session sequence. Larger drift rate resulted in larger recognition accuracy degradation. Moreover, increased with the order of sISVC, sPAC, uPAC, and uISVC. for one validating sequence, the recognition accuracy degradation of different adaptive classifiers increased with the order of sISVC, sPAC, uPAC, and uISVC.

3.3. Classification Performance for One-Day-Long Data 3.3. Classification Performance for One-Day-Long Data

3.3.1. Overall Performance 3.3.1. Overall Performance

Figure 14 shows the accuracy variations for one-day-long data by averaging the classification Figure 14 shows the accuracy variations for one-day-long data by averaging the classification results over all validating session sequences and all motion classes. As shown in the figure, at the results over all validating session sequences and all motion classes. As shown in the figure, at the beginning of the day, all classifiers with exception of sISVC shared the similar performance at around beginning of the day, all classifiers with exception of sISVC shared the similar performance at 85%.around Specifically, the classification accuracy of the firstofsession a validating sequencesequence was 83.48% 85%. Specifically, the classification accuracy the firstinsession in a validating ± 11.08% (Mean ± SD, similarly hereinafter) for uPAC, 82.94% ± 11.08% for sPAC, 84.04% ± was 83.48% ± 11.08% (Mean ± SD, similarly hereinafter) for uPAC, 82.94% ± 11.08% for 14.73% sPAC, for uISVC, 85.12% ± 13.40% for NSVC, and ± 5.30% sISVC. by, the 84.04% ± 14.73% for uISVC, 85.12% ± 96.02% 13.40% for NSVC,for and 96.02%As ± time 5.30%went for sISVC. Asdifference time between classifiers increased. Byclassifiers the endincreased. of the day, uPAC sISVC almost maintained went by, the difference between By the end ofand the day, uPAC and sISVC almost the maintainedNSVC the performance, and uISVC experienceddegradation performance degradation less than performance, and uISVCNSVC experienced performance to less thanto75%, whereas 75%, whereas sPAC experienced performance improvement more than 90%. sPAC experienced performance improvement to more thanto90%.

Figure Recognition accuracy accuracy of of realistic data from EMG acquiring trials. The recognition Figure 14. 14. Recognition realistic dataone-day-long from one-day-long EMG acquiring trials. The accuracy in the validating session sequence are arranged according to the temporal distances from their recognition accuracy in the validating session sequence are arranged according to the temporal corresponding sessions to the training session. distances from their corresponding sessions to the training session.

To reduce randomizederror, error,we we averaged averaged the accuracy of last fivefive sessions to to To reduce thethe randomized theclassification classification accuracy of last sessions describe the performance of a classifier at the end of the day, which was noted as AER (Average Ending describe the performance of a classifier at the end of the day, which was noted as AER (Average Recognition Accuracy). Given the recognition accuracy of ith session in a validating sequence, RAi , Ending Recognition Accuracy). Given the recognition accuracy of ith session in a validating sequence, the value of AER of a validating sequence was computed as:

RAi, the value of AER of a validating sequence was computed as: 1 12424 AER = AER =5 ∑RA RAi i×100%. 100% . 5i=20

(20) (20)

i  20

According to AER, performancesofofclassifiers classifiersat atthe the end end of of the the day According to AER, thethe performances day were were83.93% 83.93%±±8.79% 8.79% for for uPAC, 91.23% ± 5.84% for sPAC, 70.55% ± 14.22% for uISVC, 92.67% ± 5.71% for sISVC, and uPAC, 91.23% ± 5.84% for sPAC, 70.55% ± 14.22% for uISVC, 92.67% ± 5.71% for sISVC, and 74.90% ± 74.90% ± 11.84% for NSVC, respectively. A repeated measure ANOVA together with post hoc test with 11.84% for NSVC, respectively. A repeated measure ANOVA together with post hoc test with Bonferroni correction was employed to examine the overall and pairwise significance of all classifies in Bonferroni correction was employed to examine the overall and pairwise significance of all classifies AER. Based on the repeated measure ANOVA test with Greenhouse-Geisser correction, the overall in AER. Based on the repeated measure ANOVA test with Greenhouse-Geisser correction, the overall difference among all classifiers were significant (p < 0.001). Based on post-hoc multiple comparisons

difference among all classifiers were significant (p < 0.001). Based on post-hoc multiple comparisons

 5

with Bonferroni correction of    10 comparing pairs, the significance of the pairwise differences

 2

is shown in Figure 15.

Sensors 2017, 17, 1370

with Bonferroni correction of Sensors 2017, 17, 1370

20 of 28

5 2

!

= 10 comparing pairs, the significance of the pairwise differences

is shown in Figure 15.

20 of 28

Figure 15. Statistics of all classifierswith withthe themeasure measure of of AER. sequence was regarded Figure 15. Statistics of all classifiers AER.Each Eachvalidating validating sequence was regarded as a repeated measure. Repeated measure ANOVA and post-hoc multiple comparisons were applied as a repeated measure. Repeated measure ANOVA and post-hoc multiple comparisons were applied to examine the significance of differences. (***, p < 0.001; **, p