Gaussian Process Regression for Hand Gesture

Gaussian Process Regression for Hand Gesture Recognition and Neuroprosthetic Applications

November 11, 2014

Author

Supervisor

Michele Xiloyannis

Dr Aldo Faisal

[email protected]

[email protected]

A thesis submitted in conformity with the requirements for the degree of Master of Science, Department of Bioengineering, Imperial College London i

ii

Abstract Every year approximately 5000 to 6000 amputations are carried out in the UK only, a quarter of them being from the upper limb. Although major advances have been achieved in the last 50 years in the attempt to implant the functionality of a natural limb in a prosthetic device, currently commercialised systems can restore only a very limited degree of dexterity, offering up to 5 grasp types, thus often causing frustration, social denial and rejection of the prosthetic replacement. We propose a new framework for extracting information from extrinsic muscles in the forearm that enables a continuous, natural and intuitive control of a robotic hand by looking for a continuous mapping between muscle activity and joint angles rather than discretising hand gestures. This is achievd by introducing Regression in the scene of Human-Machine interfaces to learn the relation between muscle-related signals and joints’ states. We recorded the Electromyography (EMG) and Mechanomyography (MMG) activity of 5 extrinsic muscles in the forearm of 6 healty subjects that performed everyday grasp tasks, while monitoring 18 Degrees Of Freedom (DOF) of their hand using a CyberGlove. We then used this data to train a Gaussian Process (GP) and a Vector AutoRegressive model with Exogenous inputs (V ARM AX) to learn the mapping from muscle activity and previous joint velocity to current joint velocity. We found high prediction accuracy in terms of correlation between the predicted and actual velocities for the test data. We investigated the performaces of both models across tasks, subjects and different joints for varying time-lags, finding that both models have good generalisation properties and high correlation even for time-lags in the order of hundreds of milliseconds. Our results suggest that Regression is a very appealing tool for natural, intuitive and continuos control of robotic devices, with particoular focus on prosthetic replacements where high dexterity is required for complex movements.

iii

iv

The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind. James Clerk Maxwell - 1850

Acknowledgements

First and foremost, I would like to say thank you to my Mum , who learned how to use Whatsapp just to send me pictures of her holidays in NY and Australia while I was working, and to my Dad with yellow kiwis.

, to whom I should also aplogise because this report has nothing to do

I would like to thank my supervisor, Dr Aldo Faisal , for supporting me troughtout my MSc course, giving me valuable advice for the project and helping me in the difficult decisions about my future. This project would not have achieved much without the continuous support of Constantinos

Gavriel who has invested time and effort, always being ready to give me valuable and constructive feedback. I would also like to thank Andreas Thomik, for sharing with me his vast knowledge about hand-movements and for helping me in the installation and usage of LibHand, and Dr Marc Deisenroth, for having guided me in the usage of GPs. Finally, I would like to say thank you to all my friends, for their continuous encouragement and support, and all those who “lent a hand” for the accomplishment of this project .

v

vi

Contents Abstract

iii

Acknowledgements

v

1 Introduction

1

2 Background Theory 2.1 Neuromuscular Signal Acquisition for Control 2.2 Motion Data Acquisition . . . . . . . . . . . . 2.3 State of the Art Prosthetic Hands . . . . . . . 2.4 VARMAX Models for Time Series Analysis . . 2.5 Gaussian Processes for Regresssion . . . . . . 2.5.1 The Standard Linear Model . . . . . . 2.5.2 Feature Space . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

3 Materials & Methods 3.1 Instrumentation and Experimental Paradigm . . . . . . . . . 3.1.1 EMG Acquisition . . . . . . . . . . . . . . . . . . . . 3.1.2 MMG Acquisition . . . . . . . . . . . . . . . . . . . . 3.1.3 Experimental Task . . . . . . . . . . . . . . . . . . . 3.1.4 LibHand: Data Visualisation and Animation . . . . . 3.1.5 Glove Calibration . . . . . . . . . . . . . . . . . . . . 3.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 The GPML Toolbox . . . . . . . . . . . . . . . . . . . . . . . 3.3.1 Model Selection and Adaptation of Hyperparameters 3.4 Missing Fingers Predictability . . . . . . . . . . . . . . . . . 3.5 State-ahead Inference of Joints Velocity . . . . . . . . . . . . 3.5.1 VARMAX . . . . . . . . . . . . . . . . . . . . . . . . 3.5.2 GP Model . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . .

. . . . . . .

5 5 6 7 8 9 10 11

. . . . . . . . . . . . .

13 13 13 14 15 18 18 19 20 25 26 27 28 28

4 Results 31 4.1 Visualisation of Experimental Data . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2 Missing Fingers Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.3 State-Ahead Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 vii

5 Discussion 5.1 Summary of Thesis Achievements and Future Work 5.1.1 Missing Finger Predictability . . . . . . . . 5.1.2 State-Ahead Predictions . . . . . . . . . . . 5.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . A A.1 B

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

53 53 53 53 56

57 K-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

59 B.1 Sparse Approximation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

viii

Chapter 1 Introduction Motivation and Objectives From insect-inspired robots to computational methods that emulate neural processes, synthesising fundamental science and technological capabilities has proven to be an extremely powerful tool for scientific progress. Indeed, we live in an era were nature is driving design. Major advances have been achieved through this approach in bionics, a field that investigates the interplay between nature and design to bridge the gap between disability and ability, to restrict human limitations and extend human potential. State of the art bionic designs are capable of restoring hearing to the deaf and enable leg amputees to walk naturally, but still very little has been achieved for upper limbs bionic replacements. My vision is that of designing a hand prosthesis so profound in its functioning that not only moves like a natural limb, but also fells like one. Imagine wearing an electromechanical hand that doesn’t only respond to neural signals to trigger a motor command, but that senses how you move and predicts your intentions, that learns from you and optimises its behaviour to create a natural synergy between its movements and your body. Advances in neuroscience and computing are unveiling powerful algorithmic principles that our brain uses to obtain its remarkable plasticity and adaptability; using such principles in a data-driven quantitative framework, we can implant intelligence into the design of a bionic hand so that it can not only restore the functionality of a natural limb, but also the capability of expression, of humanity that a natural limb would have.

The Humble Sea Squirt In his TED talk in July 2011, neuroscientist Daniel Wolpert, trying to answer one of the most obvious and yet underestimated questions in Neuroscience, i.e. “why do we have a brain? ”, states “ ...it’s blindingly obvious why we have a brain. We have a brain for one reason and one reason only, and that’s to produce adaptable and complex movements. There is no other reason to have a brain”. If we actually think about it, there appear to be no moving living organisms 1

on the planet without a brain, and plants don’t seem to require the luxury of a nervous system, but the clinching evidence is an animal called the sea squirt which has a nervous system and swims around during its juvenile life but at one point it settles on a rock, that it never leaves, and the first thing it does is digesting its own brain. This example is an empirical proof of a key idea that we need to keep in mind when trying to understand how the brain works and what its final function is. It turns out that, although it might not seem so intuitive, movement is one of the most complex tasks that our central nervous system has to deal with. In the last decades, research in neuroscience and advances in computing are unveiling powerful algorithmic principles that have been hypothesised to underly the way our brain achieves its remarkable adaptability and efficacy in motor tasks; a popular framework that has been around for the last 50 years is called Bayesian decision theory. It is a principled and now unifying way of thinking of how the central nervous system deals with uncertainty and it is largely used in statistics and machine learning to make inference. The key idea in Bayesian inference is that of having two sources of information, one being data or sensory input and the other being prior knowledge or experience. Bayesian decision theory gives us the mathematics of the optimal way to combine these two sources of information to generate new beliefs, i.e. a posteriori knowledge about what we observe. A very recent anecdote will highlight why we are are in desperate need of a way to emulate the control performances that our brain manages to achieve when driving our body. On May 11, 1997, IBM Deep Blue computer won the second six-game match against world chess champion Garry Kasparov, this event set an important target for artificial intelligence research and was a fundamental proof of the increasing capabilities of machines to emulate human intelligence. On the 3rd of January 2009 Steven Purugganan (USA) set the record for cup stacking to 5.93 seconds, surpassing Emily Fox’s previous record of 7.43 seconds; Daniel Wolpert himself refers to this comparison to emphasise one of the main gaps in state-of-the-art technology, i.e. that we can make a machine that nearly thinks as a human being but we are not even close to emulating the way we move. Movement is, in fact, not at all a trivial task, especially for a system with multiple DOFs such as our body, were moving efficiently means continuously integrating sensory information to solve optimisation problems to deal with dynamic, trajectory and muscle redundancy; if we then consider that both our motor and sensory systems are very noisy [1] we can get an idea of the extraordinary complexity of movement and the efficiency our own nervous system. Due to the ability of our brain to deal with motor tasks, it seems reasonable to seek for a solution in the mathematical framework that resembles, in a principled manner, the way we ourselves deal with movements.

A Mathematical Framework to Express Uncertainty In 1988 E.T. Jaynes argued the existence of “consistent rules for carrying out plausible reasoning, in terms of operations so definite that can be programmed on the computing machine which is the human brain” [2]. Indeed, using a qualitative methodology that very much resembles the one used by Claude Shannon to set the basis of Information Theory [3] and by merging the 2

mathematical framework provided by the past work of Abel and Cox [4] with the principles of common sense and consistency, Jaynes derived a set of rules for plausible reasoning that are equivalent to the sum and product rules of probability theory. In order to distinguish it form the frequentist view, the use of probability to express beliefs is usually referred to as the Bayesian interpretation, recalling one of the most simple and beautiful equations known, formulated by T. Bayes and published post mortem in 1764 in An Essay towards solving a Problem in the Doctrine of Chances [5]: P (B|A)P (A) P (A|B) = P (1.1) P (B|A)P (A) A

which can also be red as: P osterior ∝ Likelihood × P rior. The extraordinary and intuitive principle underlying Bayes’ theory is that of making optimal decisions by combining the observed data with a prior belief about the observations, which comes from experience and prior knowlege. This not only allows Bayesian decision theory to deal with noise and uncertainty in a principled manner but also has the advantage, on a modelling stage, of making inference easily manipulable by a supervised choice of the priors. More recently a novel experimental paradigm from Wolpert and Kording [6] has demonstrated that the human brain indeed uses something conceptually very similar to Bayesian decision theory when dealing with sensorimotor tasks, that is to say when dealing with noise and uncertainty.

Given its ability to deal with uncertainty and noise typical of physiological motor signals, we propose a novel approach to information extraction for hand neuroprotheses control, based on Bayesian decision theory. Specifically, our model differs form the current approaches in that it treats the hand gesture as a continuous variable, learning not to discriminate between a finite set of classes, but seeking for a continuous mapping between residual muscle activity and joint angles. In particular, we propose a state-ahead predictor that uses the current state (hand configuration) and external time series (muscle activity) to learn the most likely next state. This approach is common for financial time-series forecasting, were predictions are done by looking at the past behaviour of the time series and at external, correlated events. The performances of this commonly used linear model will be compared to the predictions given by a Bayesian, more recent, probabilistic and principled approach to non-linear regression, i.e. GP. The prediction stage of the project is preceded by a statistical analysis of hand movements, using the same framework and exploiting the renown interpretability of GPs, whose optimised “parameters” are informative of the relations in the analysed data.

Chapter 2 gives an overview of the background knowledge about the techniques used in the data-acquisition and analysis phase, such as the nature of EMG and MMG signals, a literature review on their application in robotic control, state of the art motion-capture techniques, an overview of commercialised current robotic hands and an introduction to AutoRegressive models and GPs.

3

Chapter 3 presents the hardware and signal-acquisition instrumentation, the experimental paradigm used to collect data and the steps used to smooth data in a pre-processing stage, followed by an explanation of GPs for Regression and VARMAX models, focusing on the advantages of the chosen features and detailing the used optimisation techniques. Chanpter 4 shows the results of the models and is divided in two main parts: first we exploit the power of GPs, combined with the correlations and recurring patterns in natural hand movements, to predict the velocity of missing joints and obtain a statistical analysis of finger independence from their predictability. We then present the results of the state-ahead predictor, showing detailed figures of the performance of the models for different subjects, time-lags and tasks. Chapter 5 contains a discussion on the advantages and limitations of the model, a qualitative evaluation of the results and suggests the future work that could follow from such results.

4

Chapter 2 Background Theory 2.1

Neuromuscular Signal Acquisition for Control

The last 25 years have seen an exciting growth in neuromuscular research since new technologies for monitoring neuromuscular activity have emerged. Human-machine interfaces have increased the range of patterns that can be extracted from neuromuscular signals, finding application in different fields, ranging from diagnosis, rehabilitation and robot-control. Patterns of muscle activation have proven to be useful in understanding motor strategies in upper limbs [7], finger movements [8] [9] and hand grasp [10]. Surface EMG is the oldest method of neuromuscular monitoring, it is based on the detection of the compound action potential generated by the motor unit during muscle contraction [11] filtered and attenuated by a volume conductor [12]. F.Tenore et al. have used surface EMG signals to successfully classify individual finger movement [9]; they proposed a 32-channels surface-EMG system to acquire the signal, followed by a classification of the finger movements in a feature space. The classification task was carried out by a Neural Network (NN) that achieved a very high (> 98%) classification rate. Support vector machines (SVM) are used by B. Crawford et al. for real-time classification of EMG patterns for robotic control [13], high classification accuracy is achieved but the network is tested only on a 4-DOF robotic arm. Recent studies have investigates and proven the efficacy of the mechanical counterpart of muscle electrical activity, known as MMG. MMG is a mechanical signal generated by gross lateral movements of the muscle fibers at the initiation of contraction followed by smaller oscillations at the resonant frequency of the muscle [14, 15] with approximately 90% of its power spectrum represented by frequencies below 50 Hz [16]. Surface MMG has been proven to successfully reflect motor unit recruitment [17] and global motor unit firing rate [18]. N. Alves et al. have successfully extracted patterns of forearm muscle contractions from multi-channel MMG [19] demonstrating that MMG signals are valuable candidates for neuroprothesis control. These previous studies suggest that MMG signals contain at least as much information about muscle-contraction patterns as the long-used EMG. An ultra low-cost system developed in FaisaLab at Imperial College London [20] shall be used for MMG signal acquisition. This will significantly reduce the cost of the whole system while retaining the same qualitative and quantitative standards of a commercial EMG apparatus. 5

Figure 2.1: CyberGlove and position of stretch sensors. The 18 sensors are positioned in critical points to accurately span all the DOFs of the hand. Image adopted from [27].

In order to learn the continuous mapping between physiological signals related to muscle contraction and hand gesture, we need a way to “label” instantaneous EMG and MMG data with the related hand state. Motion capture technologies offer a very appealing options to do so.

2.2

Motion Data Acquisition

In 1973 Johansson conducted the very first experiment of human motion tracking using reflective markers placed on the subject’s joints in order to render articulated human movements [21]. These tracking methods have found vast application for athletes performance monitoring [22], diagnosis of motor disorders [23], animation and anthropomorphic movement reconstruction for entertainment [24]. Although they are very accurate, these technologies suffer from the occlusion problem since the human body is opaque to light. Alternative means of acquiring human motion data include inertial sensors to recover motion from linear acceleration or rotation rates [25]. Although these sensors do not require a source of light and are usually cheap and lightweight, their use in real-world application is severely hampered by their poor accuracy due to drift effects and measurement noise. In the context of hand motion tracking, a successful device for monitoring the critical joint angles of the hand, which has thoroughly been used and tested [26] since its appearance in 1992, is the CyberGlove, produced by Virtual Technologies1 . The glove relies on 18 sensors placed at critical points on the hand (as shown in Figure 2.1) that measure the joint angles through the change in resistance to an electric current caused by the bent of thin strips sewed into the fabric. The system can sample with a frame rate of up to 138Hz and with a resolution of 8 bits per sensor, thus returning an 18-dimensional time-series of values in the range 0-255. The system was used in the project to record the hand gesture of subjects performing everyday grasp-types, as detailed in Chapter 3. 1

Virtual Technologies, Palo Alto, California. http://www.cyberglovesystems.com/products/cyberglove-II/overview

6

Figure 2.2: Cortical motor homunculus. The cortical motor homunculus is a pictorial representation of the anatomical subdivision of Brodmann area 4 of the human brain; its grotesquely disfigured appearance being the result of an unequal employment of cortical brain tissue for the control of different muscles. Notice the predominance of hands, lips and face, which require fine motor control. Image adopted from http://www.intropsych.com/ch02_human_nervous_system/homunculus.html

2.3

State of the Art Prosthetic Hands

Each human hand contains about 29 bones, 34 muscles and at least 123 named ligaments, forming one of the most complex and fascinating bio-mechanical structures known. The cortical motor homunculus (Figure 2.2) has huge hands, suggesting the employment of great computational power in the control of such a complex system. However, recent studies have suggested the contrary, hypothesising the lack of individuation [28] in finger movements and demonstrating that most of the hand gestures can be explained by a low-dimensional manifold [29], [30]. These findings suggest that our brain uses postural synergies to simplify the control of such a complex system and that a similar approach could find successful application in the control of neuroprosthetic devices. State of the art active prosthetic hands such as the i-limb ultra revolution from Touch bionics2 and the bebionic3 3 (shown in Figure 2.3) are just two examples of current hardware technologies available. Although designing a hand prothesis that resembles a natural one poses many challenges such as weight and cost constrains, it is not the reason why these technologies perform very poorly in replacing the lost limb. The bottleneck still lies ion the human-machine interface, i.e. the lack of a robust method for the acquisition and processing of a physiological signal that can be reliably translated into a motor command. This limitation often leads to frustration in the 2 3

http://www.touchbionics.com/products/active-prostheses/i-limb-ultra-revolution http://bebionic.com/the_hand

7

subject and, subsequently, into a rejection of the prosthetic device [31]. Recent research in the use of Machine Learning techniques to analyse physiological data has expanded the range of meaningful information that can be extracted from EMG and MMG signals; this area of research has especially focused on supervised classification methods [32], giving successful results but constraining the predictions into a predefined finite number of clusters. A very recent application of these techniques is the Myo gesture control armband from Thalamic Labs4 which integrates the signal from a novel acquisition interface for myoelectrical activity detection with a 9-axis miniaturised Inertial Measurement Unit (IMU) to discriminate between 5 predefined hand gestures. Although the Classification approach has a great potential, it is still treating joint movements as discrete variables, whereas they are, by nature, continuos. We overcome this approximation and thus focus on Regression models.

2.4

VARMAX Models for Time Series Analysis

We shall first look at commonly used models that learn from past states of a time series and from the dynamics of an external variable to forecast new states. They are known as AutoRegressive (AR) models and are of particular importance for our application if we think of how we might control a prosthetic hand. Ideally we would like to infer a function that, given the current state (hand gesture) and an external event (muscle activity) predicts the most likely next state. AR models with Exogenous inputs do exactly that by making strong assumptions on the relations between the input and output space, namely linearity. We shall see that this assumption is correct for short time steps (e.g. to predict in the order of ≈ 10ms ahead) but fails when trying to predict further ahead, where a non-linear process such as a GP might perform better. AutoRegressive Moving Average (ARMA) and Vector ARMA (VARMA) are statistical tools for time-series analysis of weakly-stationary stochastic processes. They were first theorised in 1952 by Peter Whittle [33] and have found increasing application in financial modelling and forecasting [34]. The main idea behind an ARMA model is that of understanding and forecasting a given time series by modelling its dynamical behaviour in two parts: an autoregressive (AR) one and a moving average (MA) one. Suppose we have a time series Y , an ARMA(p,q) model estimates the value of the series at time t, Yt , as a linear combination of p previous values in the time series and a moving average of size q over white-noise error terms ti . Using a formal notation: Yt = c + t +

p X

ϕi Yt−i +

i=1

q X

θi t−i ,

(2.1)

j=1

where c is a constant offset, ϕi are the coefficients of the linear model for the AR component 4

https://www.thalmic.com/en/myo/

8

and θi the coefficients for the MA one. If we now assume that the generation process is not only determined by endogenous variables, but is correlated with variables outside of the considered system, we can include such time series in the forecasting of the process of interest. Including exogenous variables in an ARMA model naturally generates was is known as a VARMAX model, whose formulation differs from Equation 2.1 only in the presence of an external time series Xt : Yt = c + bXt + t +

p X i=1

ϕi Yt−i +

q X

θi t−i .

(2.2)

j=1

Fitting a VARMAX model involves estimating the parameters defining the linear relation between the vector Yt and the inputs Yt−i , Xt , t−i , which can be done using an Ordinary Least Squares (OLS) for a simple VARMA model or by Maximum Likelihood (ML) for the more complex VARMAX case. As anticipated we used VARMAX models as a mean to compare the predictions of a non-linear model such as Gaussian Process (GP) with a commonly used linear predictor. The time series describing the velocity of the joint were treated as endogenous variables and inferred from past values and from the exogenous muscle activity.

2.5

Gaussian Processes for Regresssion

GPs are principled, practical, stochastic models to learning in kernel machines. They provide multiple advantages compared to more complex models in terms of interpretation of the predictions, model selection and learning. A stochastic process is a generalisation of a probability distribution over functions; whereas probability distributions describe random variables (RV) with scalars or vectors, processes govern the properties of functions. If we then focus on processes which are Gaussian, it turns out that the inference step has a relatively easy solution and the computations required to find the posterior and the predictive distributions have a closed form analytical solution. GPs were first introduced in 1995 by Neal R. M. [35] in the context of Bayesian learning in neural networks (NN), when NNs were becoming mature and researchers and statisticians started realising that their application in practical problems was not ideal due to: (1) the complexity of their structure, (2) the decisions to be made regarding their architecture and (3) the lack of a principled way of answering these questions. In the context of complex Markov Chain methods for inference in large NNs, Neal demonstrated that these become Gaussians in the limit of infinite size, suggesting a simpler way of making inference. The state of the art framework for making inference in Bayesian GP s was thoroughly studied by Rasmussen and Williams in [36] and [37] where the anlytical properties of GP s are combined with the beauty and simplicity of the Bayesian formalism, which also yields to the avoidance of the overfitting problem. An introduction to Bayesian linear regression precedes a brief explanation of the use of GP for regression in Chapter 3. 9

Figure 2.3: Bebionic3 active prothesis. The robotic prosthesis is able to apply a force of up to 140N and perform 36 different hand grasps, ranging form precision to power grips. The myoelectrical control, however, cannot span more than 4 grip patterns. Image adopted from http://bebionic. com/the_hand

2.5.1

The Standard Linear Model

Regression is concerned with the estimation of a continuous quantity and it is, together with Classification, one of the two main approaches in Supervised Learning. Regression with GPs has two equivalent views, the first, more intuitive, is known as the weightspace view and explicitly expresses the contribution of the basis functions, the second, less intuitive interpretation, is known as function-space view. The latter can be easily derived from the former using an alternative formulation of the posterior distribution that explicitly expresses its covariance in terms of the kernel or covariance function k(·, ·) whose role shall be explained in Chapter 3. D is a D-dimensional Suppose we want to regress a data set D = {xn , yn }N n=1 where xn ∈ R input vector and yn is the target. Lets for now assume that there is a linear relation between the inputs and the targets, corrupted by Gaussian noise, of the form: y = f (x) + , f (x) = xT w

(2.3)

where w is a vector of weights (parameters), f is the function value we are looking for and ∼ N (0, σn2 ) is the Gaussian noise. Considering this form for the noise, together with the assumption of independence between observations, automatically generates the likelihood, i.e the probability of the observations given the parameters and the inputs, which can be written as: N Y p(y|X, w) = p(yi |xi , w) = N (X T w, σn2 ). (2.4) i=1

In the Bayesian formalism we need to specify a prior belief over the parameters which expresses our beliefs about the w before we make observations. Lets assume a multivariate Gaussian 10

distribution over the weight vector with zero mean and Σp covariance matrix, of the form: p(w) ∼ N (0, Σp ).

(2.5)

Inference is carried out by combining these two sources of information according to Bayes’ rule (Equation 1.1) and thus finding a posterior distribution over the unknown parameters w: p(w|y, X) =

p(y|X, w)p(w) . p(y|X)

(2.6)

The denominator of the formula above has the role of a normalising term and can be derived by marginalising the numerator over the parameters: Z p(y|X) = p(y|X, w)p(w)dw. (2.7) It can be seen form Equation 2.6 that Bayesian inference combines observations with prior knowledge while capturing all we know about the parameters in the normalising coefficient expressed by Equation 2.7. It can be shown the posterior distribution is itself a Gaussian with ¯ and precision A mean w A−1 Xy −1 ¯ = p(w|X, y) ∼ N w ,A , (2.8) σn2 where the covariance matrix A−1 = σn−2 XX T + Σ−1 p . By maximising the posterior distribution with respect to the parameters one could find what is known as the Maximum a Posteriori estimate (MAP) of w. A fully Bayesian approach, on the other hand, abstracts form point estimates and seeks for a distribution over the latent functions f . This is achieved by averaging the probability of observing a new function value f ∗ , given test inputs x∗ and parameters w, weighted by the posterior distribution over all possible parameters values w: Z ∗ ∗ p(f |x , X, y) = p(f ∗ |x∗ , w)p(w|X, y)dw (2.9) ∗T −1 x A Xy ∗T −1 ∗ =N ,x A x . (2.10) σn2 The predictive distribution is again a Gaussian with mean given by the mean of the posterior distribution in Equation 2.8 multiplied by the test input and proportional to the observations y, as one would expect from a linear model; the variance of the predictive distribution is a quadratic form of the variance of the posterior distribution, showing that the uncertainty grows for large inputs.

2.5.2

Feature Space

A step towards the function-space view of regression can be taken by projecting the input data x into a higher dimensional feature space and applying the same linear model in this new, 11

multidimensional, space. To do so we define a set of basis functions (e.g. powers of x, radial basis functions, logistic functions) that are used to project the input data into a feature space Φ(x). As long as the basis functions are independent of the parameters w, the model is still linear in the parameters and the same formalism used in the previous section can be applied. This idea is common in the Classification literature, where projecting the inputs into a higher-dimensional space is used to make the data linearly separable. Substituting X with Φ(X), Equation 2.10 becomes: ∗

∗

f |x , X, y ∼ N

Φ(x)∗T A−1 Φ(X)y ∗T −1 ∗ , Φ(x) A Φ(x ) σn2

(2.11)

with A−1 = σn−2 Φ(X)Φ(X)T + Σ−1 p . Notice that, if M is the dimensionality of the feature space, computing the predictive distribution involves inverting an M × M matrix, which may not be convenient. By simply rewriting Equation 2.11 using the matrix inversion lemma we can express the mean and the variance of the predictive distribution as a function of Φ(x)T Σp Φ(x), Φ(x∗ )T Σp Φ(x), Φ(x∗ )T Σp Φ(x∗ ). Using this new formulation, computing the predictive distribution involves inverting an N ×N matrix, where N is the number of data points; notice that the computations are not a function of the dimensionality of the feature space, this makes it possible to ideally use an infinite dimensional feature space, that is to say an infinite number of basis functions, which has enormous advantages. For any pair of given data points x and x0 , we can define k(x, x0 ) = Φ(x)T Σp Φ(x0 ),

(2.12)

which is known as kernel or covariance function. Lifting the data into a higher dimensional feature space by replacing inner products in the input space with kernels is known as kernel trick and it is the reason why in GP s the covariance function is of primary interest. One of the very few applications of Bayesian regression for post-processing of physiological data is described by Stegle et al. in [38]. The proposed approach shows an exhaustive example of the capability of Bayesian regression to deal with the uncertainty of nonlaboratory-conditions measurements. The model is used to detect outliers and noise bursts and infer latent heart rates. It relies on a hierarchical clustering technique followed by a GP regression to infer missing data. This nested structure is clearly able to achieve better predictions, as the clustering provides some prior knowledge to the regression phase. In addition to being a convincing proof of the efficiency of regression methods for analysis of physiological data, it clearly shows how GP s enable us to include prior knowledge in the model. Stegle et al. show how using a kernel that accounts for the periodicity of the recorded signal, the performance of the model can be massively improved. These considerations will be very useful when applying a similar technique for the prediction of hand postures from MMG and EMG signals.

12

Chapter 3 Materials & Methods The used experimental setup involved simultaneous acquisition of EMG and MMG signals on specific sites of the subject’s forearm while performing everyday grasp types. The hand gesture was acquired using the CyberGlove’s sensors and aligned with the neuromuscular signal before further analysis. The following section provides the basic technical information about the instrumentation used for the EMG signal acquisition, a presentation of the printed circuit board (PCB) used for the amplification of the MMG signal, Cyberglove calibration procedure and details about the experimental setup. Section 3.2 details the preprocessing techniques used before the application of the GP model, showing examples of their effect on the acquired data. Section 3.3 deals with some of the more technical aspects of GP for Regression, introduced in Section 2.5 (for a more detailed explanation of the mathematical formalism refer to [36]), and their application in the used Matlab gpml Toolbox1 . Finally we discuss the way we used GP Regression for a statistical analysis of natural hand movements (Section 3.4) and to construct a state-ahead probabilistic predictor.

3.1 3.1.1

Instrumentation and Experimental Paradigm EMG Acquisition

The surface EMG signal was acquired using the BrainVision ActiChamp2 amplifier which gives the possibility of acquiring and amplifying a wide range of electrophysiological signals such as Electroencephalogram (EEG), Electrooculogram (EOG), Electrocardiogram (ECG) and EMG. The main purpose of the system is the acquisition of EEG activity from an ActiCAP3 , but it is also equipped with 8 AUX channels which can be used with a full range of biosignal sensors. With its 24-bit resolution, high sampling frequency (up to 100KHz) and high bandwidth (DC to 20KHz) the ActiChamp is ideal for our application. 1

http://www.gaussianprocess.org/gpml/code/Matlab/doc/ http://www.brainvision.com/actichamp.html 3 http://www.brainproducts.com/productdetails.php?id=4 2

13

We interfaced the ActiChamp with the open-source acquisition software offered by BrainProducts, PyCorder, based on Python programming language. PyCorder offers an intuitive an efficient interface to acquire and store data with user-defined modules and configurations. The software can also be used to send/receive an 8-bit trigger input/output to the ActiChamp via a serial port, this makes it ideal for synchronisation of the multi sensory acquisition modalities used throughout the experiment. The EMG channels were interfaced with the skin through disposable Ag/AgCl electrodes with integrated conductive gel. Each channel comprised two electrodes in a bipolar configuration, with the ground (GND) electrode placed on an electrically inactive region, e.g. the elbow.

3.1.2

MMG Acquisition

We acquired the MMG muscle signal, generated by gross lateral movements of the muscle fibres during contraction, using the Kingstate KEEG1542CBL-A electret condenser microphone capsule4 chosen because of its size, ideal frequency sensitivity and high signal to noise ratio (SNR), as detailed in [39]. The condenser microphone is enclosed in a 3D-printed capsule which assures an optimal mechanical coupling of the sensor with the skin and reduces external noise in the measurements. The conditioning phase was performed by a PCB specifically designed for neuroprosthetic applications, as detailed in [20], which has proven to maintain the same quantitative and qualitative standards as commercially available technologies while having only a fraction of their cost. The system designed by S. Fara and C. Sen Vikram for the conditioning and amplification of the transducer output signal comprises: • A biasing stage to assure that the transducer output signal is not half-wave rectified by the operational amplifiers. • A measurement circuit comprising a pull-up resistor and a high value capacitor to prevent any potential offsets from reaching the amplification stage. • A pre-amplification step implemented using 1 operational amplifier (OpAmp) in inverting configuration, followed by 5 OpAmps embedded in the STMicroelectronics TS9255 . • A 5th order Butterworth analogue low-pass filter with a cut-off frequency of 200Hz. We interfaced the conditioning PCB with the computer using an Arduino Uno6 micro-controller, uploaded with a datastream-script using the open-source Arduino IDE software. We red and displayed the signal in real-time via a Graphical User Interface (GUI) in Matlab. Detailed description about the Matlab and Arduino codes can be found in [39]. The sampling frequency was 360 Hz, enough to avoid aliasing of MMG signals, whose fastest components do not exceed 50Hz. The system implemented by S. Fara and C. Sen Vikram had a fixed gain in the pre-amplification stage, so we modified it in order to have the possibility of tuning the gain of the amplification 4

Technical Datasheet: http://www.farnell.com/datasheets/97502.pdf Technical Datasheet: http://www.farnell.com/datasheets/1449868.pdf 6 http://arduino.cc/en/Main/arduinoBoardUno 5

14

(a)

(b)

Figure 3.1: Top (a) and bottom (b) layout of the modified PCB. It differs from the design proposed by S. Fara and C. Sen Vikram in the presence of a digital potentiometer to tune the gain of the amplification stage and in the presence of a decoupling capacitor on the added Integrated circuit (IC).

stage according to the application and to the size of the monitored muscle. This was done by introducing a Digital Potentiometer, the MCP4152-104E/SN from Integrated Circuits (IC)7 , in the design, that replaced a fixed resistance on the inverting input of the pre-amplification stage. The output of the pre-amplification stage becomes: Rpotentiometer Vout = Vin 1 + . (3.1) Rin The digital potentiometer can be programmed using the micro controller via a Serial Peripheral Interface (SPI) and its resistance value adapted to the specific gain needed according to Equation 3.1 in order to extract the desired information from the myomechanical channel. The layout of the top and bottom layer of the re-design board is shown in Figure 3.1. The schematics and the PCB layout were developed using the open-source CAD software Eagle, provided by CadSoft8 .

3.1.3

Experimental Task

Six healthy male subjects, aged 23-28 gave informed consent and participated in the study. One of the subjects was the author and the other five were volunteers. We accurately designed the experimental protocol in order to span the known and classified 7 8

Technical Datasheet: http://www.farnell.com/datasheets/630435.pdf http://www.cadsoftusa.com/

15

Table 3.1: List of objects used in the task

Precision Grip Pen Book Computer Mouse Mobile Phone

Power Grip Ball Bottle Handle

hand grips, based on previous studies. Although many schemes have been elaborated to explain grip types, such as “prismatic”, “circular”, “tripod” or “tip prehensile” grips (as detailed in [40, 41]), the major subdivision still remains the one theorised in [42] which defines two distinct patterns of movements: “precision grip” and “power grip”; the former involving the fingers, possibly in opposition with the thumb, which make contact with the object and exert pressure on it; the latter involving the palm as a way to apply force on the object. To allow comparison with other studies we used objects that have also been used by other authors for the formulation of taxonomies of hand postures [43, 40]. Participants were seated on a chair fitted with a custom elbow rest support, their elbow rested on a flat surface, the forearm was horizontal and the hand was positioned so that the axis of abduction/adduction of the wrist was parallel to the surface; this enabled us to ignore the contribution of gravity in the activation of the muscles in the forearm. In order to prevent the activation of the extensor carpi radialis longus (ECR) for abduction of the wrist during the experiment, the wrist was rested on a specific support. Subjects were instructed to grasp 7 objects (listed in Table 3.1); each grasp type was performed 12 times by each subject in two sessions of 6 trials each. An object was named at the beginning of each trial, and marked using the “My Button” on the ActiChamp, which sends a time stamp to the serial port that is stored as an “event” in the acquired data structure of .eeg format. We used the time stamps for data segmentation (as detailed in Section 3.2). The movement, from the initiation of the grip to release, took ≈ 2.5s and the variation between subjects and trials was not significant for the subsequent data analysis. Subjects were also asked to exert the maximum extension and flexion of their fingers at the beginning and at the end of the experiment in order to extract the Maximum Voluntary Contraciton (MVC), necessary for the pre-processing phase. The synchronisation of the three sensory modalities which were involved in the experiment (MMG, EMG, CyberGlove) was achieved by initiating the recording from the CyberGlove and the Arduino simultaneously from Matlab and sending a marker on the serial port that streamed the EMG signal. We monitored muscle activity with 5 EMG channels and 4 MMG microphones positioned at the bellies of the following muscles (∗ denotes both EMG and MMG monitoring): • Anterior forearm compartment: 1. Flexor Digitorum Superficialis∗ (FDS) 2. Palmaris Longus∗ (PL) • Posterior forearm compartment: 1. Extensor Digitorum Communis∗ (EDC) 16

(a)

(b)

(c)

Figure 3.2: Muscles of in anterior (a) and posterior (b) compartment of the forearm, positioning of the electrodes and arm positioning setup. The channels are enumerated according to the listing in the text. The white circles indicate the belly of the muscles, where the microphones of the MMG channels were positioned, between the two electrodes of the EMG channel in bipolar configuration. Modified image from [45].(c) The elbow was rested on a custom support and the wrist was prevented from moving using a soft support. Notice the position of the MMG microphones, between the two electrodes of each of the bipolar EMG channels. The channels, on the posterior compartment of the forearm, are labeled as in the text. The electrode on the elbow is the GND channel.

2. Extensor Digiti Minimi∗ (EDM) 3. Extensor Carpi Radialis (ECR)

The channels sites were located by palpating the respective muscles, located as shown in Figure 3.2. The MMG microphones were positioned between two electrodes of the same EMG channel in the bipolar configuration in order to capture complementary information about the contraction of the same muscle [44]. Movements of the left hand were measured using the resistive sensors embedded in the CyberGlove. The sensors are positioned in order to capture all of the DOF of the human hand except the distal interphalangeal (DIP) joints of each finger, for a total of 18 sensors including: the metacarpal-phalangeal (MCP) and proximal interphalangeal (IP) for the four fingers (Index: I, Middle: M, Ring: R, Little: L), three relative abduction/adduction angles between the four fingers, the carpo-metacarpal (T-CMC), metacarpal-phalangeal (T-MCP) and interphalangeal (T-IP) joints of the Thumb (T), the abduction angle (T-ABD) between the thumb and the palm of the hand and the 2 sensors to measure yaw and pitch of the palm. We discarded the relative abduction/adduction sensors and the palm sensors since, for the aim of the study, we were only interested in flexion/extension of the fingers. Table 3.2 gives details about the used/discarded sensors and the abbreviations used to indicate each joint. 17

Table 3.2: List of Joints Abbreviations

Thumb

Index

Middle

Ring

Little Wrist

3.1.4

Name Carpo-MetaCarpal MetaCarpal-Phalangeal InterPhalangeal Abduction/Adduction MetaCarpal-Phalangeal InterPhalangeal Abduction/Adduction MetaCarpal-Phalangeal InterPhalangeal Abduction/Adduction MetaCarpal-Phalangeal InterPhalangeal Abduction/Adduction MetaCarpal-Phalangeal InterPhalangeal Yaw Pitch

Abbreviation TCM C TM CP TIP TABD IM CP IIP IABD MM CP MIP MABD RM CP RIP RABD LM CP LIP WY WP

Used X X X X X X X X X X X X X X X X X

LibHand: Data Visualisation and Animation

The high-dimensional time-series returned by the Cyberglove is of difficult visual interpretation so we used an open-source library for rendering and recognising human hand articulation. LibHand9 [46] is developed in C++ but equipped with an intuitive Matlab interface. It was used to animate the data acquired from the experiment and to compare it with the predictions of the GP model. An example of the grasp-types performed during the experiment, displayed using LibHand, is shown in Figure 3.3. Animations of the hand gestures recorded during the experiment and the predictions of both the VARMAX and GP model can be visualised at https://www.dropbox.com/sh/mlw3witpg6ouz08/ AABSMmtPs74__NQV_K00eKOia?dl=0.

3.1.5

Glove Calibration

The raw, 8-bit-resolution, data returned by the CyberGlove needs converting into joint angles; we did so using a calibration procedure comprising two steps. The procedure was performed once per subject and provided an approximate linear calibration between the glove output and the joint angle. The first step was performed automatically by the CyberGlove System Device Configuration Utility10 by asking the subject first to place the hand palm down against a flat surface with the four fingers parallel and the thumb aligned on the side of the palm, and then to form a circle with the index and the thumb. The Configuration Utility automatically uses these two 9 10

http://www.LibHand.org/ http://www.cyberglovesystems.com/products/virtual-hand-sdk/specifications

18

(a)

(b)

(c)

(d)

(e)

(f )

(g)

(h)

Figure 3.3: Rendering of positions of the hand during grasping tasks as defined in Table 3.1. From top left to bottom right: (a) Ball grasp (b) Bottle grasp (c) Book grasp (d) Handle grasp (e) Pen grasp (f) Mouse grasp (g) Telephone grasp (h) Flat hand position. Hand-tuned renderings for visualisation purposes

postures to extract a gain and an offset from the calibration that we subsequently used for the conversion of raw signals to angles, β, according to the linear relation: β = gain ∗ (raw − of f set).

(3.2)

A fine tuning of the parameters was subsequently done using an online visualisation system that created a realistic rendering of a virtual hand. The virtual hand’s posture was manually matched to the subject’s own hand. For details on the calibration refer to [27].

3.2

Preprocessing

Physiological data is rarely informative in its “raw” state; we used a set of preprocessing tools to reduce the effect of noise and smoothen the EMG and MMG data to make it suitable for our needs. The steps involved in the pre-processing stage comprised: • Segmentation of the data of different tasks using the time stamps, produced as described in Section 3.1.3. The resulting segments contained data from different trials of the same task. 19

• Band-Pass filtering of the signal between 10 and 500 Hz (8 and 150 Hz for MMGs). This has the effect of removing high-frequency components, non belonging to the EMG/MMG signal, and the low-frequency components caused by movement artefacts. The filtering was done using a 5th order Butterworth filter designed in Matlab. • Rectification. • Low-Pass filtering of the signal with cut-off frequency at 4Hz. The removal of high frequency components has the effect of smoothing the signal in order to extract its envelope. The filtering was done using a 5th order Butterworth filter designed in Matlab. • Normalisation by MVC. Subjects were asked to exert the maximum possible force in extension and flexion for 5s before and after the experiment. The MVC was extracted as the mean over the 2 5s-periods for each channel, using the flexion trial for the channels in the anterior compartment of the forearm and the extension trials for the ones in the posterior compartment. • Gaussian Convolution Mask. The signal was further smoothened by a Gaussian Mask of size 1 × 300 samples and standard deviation σ = 200 samples. The specifics of the mask were chosen empirically in order to sufficiently smooth the resulting signal without loosing relevant information. • Downsampling at the Cyberglove rate, i.e. 138Hz. This procedure was necessary to obtain a precise alignment of the data, required for the Regression analysis. We chose the Cyberglove fs , the lowest sampling frequency between the three devices (EMG,MMG and Cyberglove), to reduce the computational cost of the GP Regression, which grows with the cube of the number of points. The results of this processing procedure are shown for an EMG and MMG signal in Figures 3.4 and 3.5 respectively, where the raw signal is compared to the processed one in time and Fourier domain. The data acquired from the CyberGlove was much less noisy, we only used a first-order SavitzkyGolay filter with a running window of 23 points to remove discontinuities induced by the A/D converter.

3.3

The GPML Toolbox

We provide here a detailed description of the mathematical framework, introduced in Section 2.5.1, used for inference in GPs for Regression. It is namely the function-space view of regression, which uses a different formulation for the same results previously presented. In particular, we focus on the Model Selection problem and the Adaptation of Hyperparameters; Sparse approximation methods are covered in B.1. The framework that we present was throughly studied by Rasmussen and Williams in [36], freely accessible11 and implemented in a software package that we used, compatible with Matlab. 11

http://www.gaussianprocess.org/gpml/code/Matlab/doc/

20

Figure 3.4: Effect of the pre-processing stage on the raw EMG signal in time and frequency domain. Raw signal in time domain (a) and Fourier domain (c) recorded from the FDS muscle while performing a ball-grasp. Notice the strong presence of noise in time domain and the contribution of an Alternating Current (AC) component at 50 Hz; bursts in power show during muscle contraction. As expected most of the power in the spectrum is represented by components under 50 Hz.Conditioned signal in time domain (b) and Fourier domain (d). Notice that the resulting signal comprises only positive values as a result of the rectification, the AC component has been filtered out and the high frequency components have been strongly decreased. As a result, the burst in amplitude tied to muscle contraction have been emphasised. The oscillations in the Fourier domain are typical of the ripple introduced by a Butterworth filter.

21

Figure 3.5: Effect of the pre-processing stage on the raw MMG signal in time and frequency domain. Raw signal in time domain (a) and Fourier domain (c) recorded from the FDS muscle while performing random finger movements. Notice the strong presence of noise in time domain and the contribution of an Alternating Current (AC) component at 50 Hz; bursts in power show during muscle contraction. As expected most of the power in the spectrum is represented by components under 50 Hz.Conditioned signal in time domain (b) and Fourier domain (d). Notice that the resulting signal comprises only positive values as a result of the rectification, the AC component has been filtered out and the high frequency components have been strongly decreased. As a result, the burst in amplitude tied to muscle contraction have been emphasised. The oscillations in the Fourier domain are typical of the ripple introduced by a Butterworth filter.

22

Noisy observations

y1

y2

...

yn

y∗

Gaussian field

f1

f2

...

fn

f∗

Input space

x1

x2

...

xn

x∗ Test set

Figure 3.6: Chain graph of a GP for regression. Filled circles indicate observed variables and empty ones represent unknowns. The thick horizontal bar represents a set of fully connected nodes. Note that each observation yi is independent, given the latent value fi , from other observations, thus fulfilling the marginalisation property of GPs

Giving a formal definition: Definition 3.1. A Gaussian Process is a collection of random variables, any finite number of which have a joint Gaussian distribution. A GP is fully defined by a mean m(x) and a covariance function k(x, x0 ) of a real process f (x), defined as: m(x) = E[f (x)]

(3.3)

0

0

0

k(x, x ) = E[f (x) − m(x))(f (x ) − m(x ))]

(3.4)

so that a GPcan be written as: f (x) ∼ GP(m(x), k(x, x0 ))

(3.5)

where X is the input domain and and the random variables represent the value of the function f (x). One of the key advantages of Gaussian processes comes from the the consistency properties of marginalisation, i.e. if a GP specifies (y1 , y2 ) ∼ GP (µ, Σ) then it must also specify y1 ∼ GP(µ1 , Σ11 ) where Σ11 is a sub matrix of Σ which practically means that by analysing a bigger set of variables we also define the distribution over a smaller subset. It can be shown, for example, that a Bayesian regression model linear in w of the form: f (x) = Φ(x)T w,

(3.6)

with a Gaussian Prior over the weights w ∼ N (0, Σp ) and a feature space defined by Φ(x), is a Gaussian Process with mean and covariance functions of the form: E(x) = Φ(x)T E(w) = 0 0

(3.7)

T

T

0

T

0

E(f (x)f (x )) = Φ(x) E(ww )Φ(x ) = Φ(x) Σp Φ(x ).

(3.8)

which means that f (x) and f (x0 ) are jointly Gaussian with zero mean and covariance given by the above formulation. It is important to notice that the covariance function plays a key role in 23

6 SE Matern SE+NN

5 4 3

y

2 1 0 −1 −2 −3 −4 −6

−4

−2

0 x

2

4

6

Figure 3.7: Samples drawn forms GPs with zero mean and different kernels. Notice how the kernel that defines the covariance function encodes assumptions about the structure of the data. The black line shows samples (improperly connected by a continuous line) drawn from a GP with a classic SE covariance function, defined by Equation 3.9, as expected the generated data is very smooth. The red line shows samples drawn form a GP with a Matèrn class kernel and it assumes a very rough dynamic of the target data as the the input data changes. The blue line shows samples drawn from a SE covariance function with an added NN kernel, which is an adapted logistic function, typically used in the NN literature. Notice the similarity in smoothness between the blue and the black line far from zero.

a GP as it encodes some of the assumptions about the function that we are trying to learn and it implicitly defines the set of basis functions used to define the feature space Φ(x); in a more intuitive way, it defines a property of similarity between points, that is to say that it encodes ’how similar’ two points y with close inputs x are. We will later show how varying the form of the kernel k(x, x0 ) can significantly alter the quality of a prediction and how it actually enables to include an a priori knowledge about the data in the model to optimise the results. The most commonly used kernel to express the covariance function is the stationary squared exponential (SE), having the form: |xi − xj |2 k(xi , xj ) = exp − . (3.9) `2 This shows how the covariance between the outputs is a function of the distance between the inputs. Many different kernels can be combined to reflect the behaviour of the data we want to fit, a detailed explanation can be found in [36]. Figure 3.7 shows samples drawn form a GP with SE covariance function with length scale ` = 0.25, from a GP with a Matèrn (refer to the work of Matèrn for details [47]) kernel and from a GP with a kernel obtained by summing a SE with a Neural Network (NN) function. Using the definition of kernel from Equation 2.12 in Equation 2.11 and keeping in mind that x are training inputs and x∗ the test inputs, we get the mean and variance of a predictive 24

distribution for a GP using a covariance function k(x, x0 ): f¯∗ = k(x∗ )T (k(X, X) + σn2 I)−1 y ∗

∗

∗

∗ T

V[f ] = k(x , x ) − k(x ) (k(X, X) +

(3.10) σn2 I)−1 k(x∗ )

(3.11)

where, as defined in the linear model of Equation 2.3, f is the latent value function underlying the noisy observations y.

3.3.1

Model Selection and Adaptation of Hyperparameters

We have detailed in the previous section the importance of the covariance function k(x, x0 ) in defining the properties of similarity between data points and encoding our a priori knowledge about the data. Figure 3.7 shows a comparison between samples form GPs with different kernels. A wealth of families of covariance functions exist and they can be combined in multiple ways, as long as the resulting covariance matrix K is positive semidefinite. Each family of kernels has a number of parameters that, due to the theoretically “unparametric” nature of GPs, we shall call hyperparameters. Hyperparameters define the properties of one particular kernel, such as the characteristic length-scale ` of the SE kernel, as shown in Equation 3.9. The problem of choosing the most suitable covariance function and the optimal hyperparameters is commonly known as the Model selection problem. Going back to the definition of the SE covariance function, it can be parametrised as follows: (xp − xq )T M (xp − xq ) 2 k(xp , xq ) = σf exp − + σn2 δpq (3.12) 2 where the matrix M defines the length-scales in the different dimensions of the input space defined by x and δ is a Kronecker delta that adds the noise σn to the kernel for xp = xq . We can then define a vector of hyperparameters Θ = ({M }, σf2 , σn2 ). If we were to stick to a fully Bayesian approach to inference, finding the optimal hyperparameters would involve defining a prior over Θ and applying Bayes’ rule as we have done in Equation 2.8 for the weight-space view. Unfortunately computing the integral defined by Equation 2.7 at higher levels can be computationally not feasible so we will abstract from a fully Bayesian approach and use a crude maximisation. What we are interested in is maximising the probability of the targets y being explained by the model with input data x and hyperparameters Θ; if we take Equation 2.7 and add the dependency on the hyperparameters, we see clearly that what we are interested in is the evidence function or marginal likelihood. Usually, for convenience, the optimisation is done by minimising the negative loge marginal likelihood with respect to Θ. This is not a trivial task, it is usually done using numerical methods such as conjugate gradient and it poses many challenges. By looking at the form of the log marginal likelihood:

log p(y|X, Θ) = −

yT Ky−1 y log |Ky | n log 2π − − 2 2 2 25

(3.13)

we see that it is formed of a quadratic term of the observed target, also known as the data fit term, and a penalty term ∝ log |Ky | that is typical of Bayesian frameworks and causes the marginal likelihood to be peaked at values of the hyperparameters that assure a balance between complexity and generalisation. This characteristic behaviour avoids the overfitting problem by adapting the model to the training data but allowing it to maintain the right amount of generalisation; this effect is also known as Occam’s razor because it resembles the lex parsimoniae introduced in the XIII century by William of Ockham: “Numquam ponenda est pluralitas sine necessitate”12 .

3.4

Missing Fingers Predictability

Before implementing the state-ahead regressor for protheses control, we analysed the predictability of the motion of each finger in velocity space from the gesture of the rest of the hand. We started this part of the project because of a broken sensor on the CyberGlove, the ring finger IP-joint, which was giving a continuous flat signal. We thus used a GP to try to compensate for the broken hardware by training a model to infer the velocity of the missing ring joint from other joints; we then extended the same procedure to all fingers. This work was encouraged by a recent study [30] which assessed the independence of each finger in terms of linear predictability of its velocity from the remaining ones, showing that the ring was the most correlated one and thus easier to predict. For this purpose we used data, acquired through the Cyberglove, of 7 different subjects performing 17 different tasks, recorded prior the damage of the CyberGlove, as described in [48]. We removed from the dataset trials of keyboard-typing, were, due to the nature of the task, it would be pointless to look for correlations. We then computed the first difference of the smoothened glove data and divided it by the sampling period (12.5ms) to obtain velocity, which is known to be closely related to the motor commands [49], and computed the absolute value of the Pearson product-moment coefficient between each pair of the 11 selected time-series (from the total 18 available from the glove): ρX,Y = corr(X, Y ) =

E[(X − µX )(Y − µY )] σX σY

(3.14)

where σ and µ are, respectively, the standard deviation and the mean value of the referred variables. We then used these values to generate an 11×11 symmetric matrix of pairwise correlations between the 11 joints velocities of interest for each subject and computed the “average correlation matrix” over subjects for visual inspection. For each joint we found the 3 most correlated joints in the average correlation matrix, nonbelonging to the same finger, and used them as input-space to train and test a GP regressor. We then constructed a GP model with a SE covariance function, which encodes the assumption of smoothness in human movements [50], and zero-mean, to predict the velocity of each joint 12

”Plurality is never to be posited without necessity.”

26

1 Correlation function Width of Half−Maximum

0.6

Correlation function Width of Half−Maximum

0.5 0.4 Correlation

Correlation

0.5

0

0.3 0.2 0.1 0 τWHM=362ms

−0.1

−0.5

0.5

1

1.5

2 2.5 3 Time Lag [s]

3.5

4

4.5

0.1

(a)

0.2

0.3 0.4 0.5 Time Lag [s]

0.6

0.7

(b)

Figure 3.8: Autocorrelation Function of the angular velocity of the Index MCP-joint. (a) The surface area in in grey shows the Autocorrelation function between 0 and a time-lag τ of 4.7s; the area highlighted in blue shows the region of time-lag where the autocorrelation function is higher than half the maximum value. (b) The Autocorrelation function of the velocity of the MCP-joint shows a width of half-maximum of 362ms. These results were used to choose the range of time-lag τ to use to train the step-ahead model.

from the remaining ones. Due to the high computational demand of GPs and the large number of training points (the analysed data consisted of 11 time-series of recordings of ≈30min at a sampling frequency of 80Hz) we used the FITC sparse approximation method to solve the inference step (refer to B.1), optimising the position of the pseudo-inputs and the value of the hyperparameters by minimising the marginal likelihood, as described in Subsection 3.3.1. The values of the marginal likelihood are highly informative of the goodness of the fit and the optimised hyperparameters contain information about the relation between the input and output space.

3.5

State-ahead Inference of Joints Velocity

The main aim of our work was to test to what degree a principled framework like GP could be successful in continuously predicting the state of the hand from muscle activity. In order to make the control phase as intuitive as possible we would like to find the mapping between muscle activities and hand state; because of the nonlinearity of muscle biomechanics, a direct mapping between muscle activation and hand state is not feasible. The end-position of the hand does not only depend on the degree of muscle contraction, but also on its current state. We used a more realistic approach, i.e. predicting the state (angular velocity), which we shall call α˙ t , at time t using the state at time t − τ , α˙ t−τ , and the current muscle activation signal, which we shall call ut : α˙ t = f (α˙ t−τ , ut ) + (3.15) 27

Where our aim is to learn the function f under the assumption of a Gaussian distributed noise term ∼ N (0, σn ). The time-lag τ was chosen by analysing the autocorrelation function of the α˙ vector for each joint. It is reasonable to state that, due to the natural smoothness of human movements, values that are very close in the velocity vector will be highly correlated and thus easier to predict, while values further apart will be less correlated and thus more difficult to predict. To investigate this assumption we analysed the Autocorrelation function of the velocities vectors of each joint and found the τW HM , i.e. the Width of Half-Maximum of the Autocorrelation function around zero (shown in Figure 3.8 for one joint). The average across joints was τ¯W HM = 362ms, i.e 50 samples at 138Hz. We then trained a GP model and a VARMAX using increasingly higher time-lags between 87ms and 500ms and evaluated their prediction accuracy in terms of Correlation coefficient, ρ and Root Mean Square (RMSE) error. The models were trained for each subject on 6 trials of each task performed during the experiment and tested on the remaining 6. A cross-validation procedure would have been more appropriate but unfeasible due to the computational demand of GP Regression, which grows as O(n3 ), where n is the number of points.

3.5.1

VARMAX

We used a Vector AutoRegressive model with Exogenous inputs to learn the function f in Equation 3.15. The VARMAX model makes the assumption of linearity between the AR terms, the MA terms describing the contribute of the innovations i and the exogenous inputs Xi : p q X X α˙ t = c + but +t + ϕi α˙ t−i + θi t−i . (3.16) i=1

|{z}

EX Inputs

|

j=1

{z

}

AR Contribution

|

{z

}

Innovations

Where we changed p and q in order to investigate how far ahead the model can predict with acceptable accuracy using time-lags between 87ms and 500ms. We used a Dickey-Fuller test to test for stationarity of the time series; we verified that the MA and AR parts of the model where invertible and stable respectively while there is no well-defined notion of stability or invertibility for the exogenous inputs part. The parameters Θ = (c, b, ϕ, θ) were fitted to the training data using Maximum Likelihood Estimate (ML). We performed the forecasting stage by iteratively predicting the state at a time-lag τ ahead and using it as a new input for the next iteration. The starting point was set to α| ˙ t=0 = 0 for each joint.

3.5.2

GP Model

The same approach was used to learn the function f in Equation 3.15 using a GP model, again under the assumption of normally distributed noise . In this case the function f is treated as a latent variable, inferred from the training data but not 28

expressed in the predictive distribution, and the model returns a Process, i.e. not point wise estimates but distributions over functions. This has the advantage of telling us “how certain” the model is of its predictions. We encoded the assumption of linearity in the covariance function, using a composite linear kernel with additive noise of the form: |x − x0 |2 k(x, x ) = x · x + exp − `2 0

0

(3.17)

and a constant zero mean function. We then constructed the input space by using the 9 channels (5 EMG, 4 MMG) of muscle activity at time t and the state in the velocity space at time t − τ ; the target, i.e. the quantity to be predicted, was the state in the velocity space at time t. Algorithm P redictAhead Implements state-ahead predictions using GPs 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

for ∀S ∈ {Subjects} do for ∀F ∈ {Fingers} do ut ← EMG and MMG channels α˙ t−τ ← Velocity of joint at time t − τ α˙ t ← Velocity of joint at time t Input ← [ut , α˙ t−τ ]

. Target Space . Input Space

. Initialise kernel hyperparameters Covf ← k(x, x0 ) Θcov ← Θ0cov M eanf ← 0 Likf ← N (0, σn ) Θlik ← σn0 . Initialise noise level Inf erenceM ethod ← F IT C p ← p0k−means . Initialise PseudoInputs Θ ← {Θ0cov ; Θ0lik } Θ = arg min{− log P (α|Θ, ˙ p)}

. Optimise Hyperparameters

Θ

15:

p = arg min{− log P (α|Θ, ˙ p)}

. Optimise PseudoInputs

p

16: 17: 18: 19: 20: 21: 22: 23: 24: 25:

P red|t=0 ← 0 . Initialise state at time 0 for ∀t ∈ {0 : τ : T } do In|t ← [ut , P redt−τ ] . Update inputs [µ, σ] ← GP(Likf, M eanf, Covf, F IT C{p}, In|t ) . Predictive Dist P red|t ← N (µ, σ) . Sample from Predictive Distribution V ar|t ← σ . Store Variance at time t end for end for end for

The forecasting stage was done by initialising the state at time zero, α| ˙ t=0 = 0 and iteratively predicting the state at a time-lag τ ahead, using a value drawn from the returned Gaussian distribution as an input for the next iteration.13 13

Using GPs in this configuration implies assuming that the sampled predictions at each iteration are noisless,

29

As we did for the VARMAX model, we varied the value of τ between 78ms and 500ms and estimated the goodness of the fit in terms of Correlation Coefficient and RMSE. Because of the high computational demand of GPs we used the sparse approximation FITC n (refer to B.1) to solve the inference phase. We chose the number of pseudo-inputs to be 150 , where n is the number of training inputs (which was empirically found to be a good compromise between computational cost and fitting performance), initialised their position with a K-means algorithm (refer to A.1) and optimised their position by minimising the negative log marginal likelihood, whose derivative with respect to the hyperparameters were also used to find the hyperparameters that guaranteed the best fit. The performances of both the VARMAX and the GP model were evaluated in terms of Pearson’s Correlation Coefficient, ρ (as defined in Equation 3.14), Coefficient of Determination, r2 , and Root Mean Square Error, RMSE, between the actual data α˙ and the predictions, P red. P (P redi − α˙ i )2 i rP2 red,α˙ = 1 − P (3.18) ¯˙ 2 (α˙ i − α) i

s RM SEP red,α˙ =

1 X (P redi − α˙ i )2 . N i

(3.19)

The ρ value is indicative of the correlation between the predictions and the actual data, but insensitive to scale difference, the coefficient of Determination is influenced by both correlation and magnitude and the RM SE gives an intuitive indication of the amplitude difference between the two quantities, in their own units (◦ /s in our case). A pseudocode of the algorithm used to implement state-ahead predictions using GP is shown in Algorithm P redictAhead.

which means that we are not taking into account the uncertainty of the targets when using them as inputs for the next iteration. A theoretically correct approach should consider a forward propagation of the uncertainty; an analytical solutions for this case is given in [51] and known as error-in-variables regression. Our approach is simpler but not rigorous.

30

Chapter 4 Results In this Chapter we present the results of the project. First we show detailed Figures of the data acquired during the experiment, regarding muscle activity and joint velocities, comparing the patterns of muscle activation of each subject for different tasks, and for each task averaging across subjects. Section 4.2 shows the performance of the GP in predicting the velocity of different joints from the remaining ones, focusing on the Ring IP-joint; the performances of the model are evaluated across 7 subjects. We then deal with the results of the main body of the project in Section 4.3 by visualising the results of the state-ahead regressor, comparing the VARMAX and the GP model performances for different tasks and subjects.

4.1

Visualisation of Experimental Data

We preprocessed the data acquired during the experiment to smooth the muscle activity and reject the discontinuities introduced by the A/C converter in the glove data. The activity from the 5 monitored muscles is shown in star plots for two tasks, in a time-dependent colour encoding from 1s before the contact with the object to 0. The activation is shown in fraction of the maximum muscle activation in that window, for visualisation purposes. We haven’t filtered out the effect of cross-talk because, assuming that it is consistent among trials, it should not affect the performances of the Regressor.

4.2

Missing Fingers Predictability

We have tried to infer the instantaneous velocity of “missing” fingers from the velocity of other joints, exploiting the correlations between joints in natural hand movements. We used a dataset of everyday hand movements, acquired through the Cyberglove from 7 right-handed subjects, as described in [27], to study the existing patterns in the movement of 11 joints of the hand. We first plotted the velocity of each joint against that of one of the other 31

(a)

(b)

Figure 4.1: Spider plots of the muscle activation averaged across trials for the Ball (a) and Cilinder (b) grasp types. Muscle activation of the 5 monitored muscles is shown in star plots in fractional multiples of the maximum activation in a 1s window prior contact with the object. Time is encoded in different colours, as shown in the colorbar, Notice how different tasks have a different pattern of muscle activation, which the Regressor could learn to distinguish between hand gesture.

joints and, doing so for each of the 11 time-series, obtained the symmetric plot-matrix shown in Figure 4.2. Computing the Correlation coefficient, as defined in Equation 3.14, for each pair of joints, we obtain a symmetric Correlation matrix, showing the pairwise correlations between joints velocity. Figure 4.3 shows the “average” Correlation matrix, obtained averaging across the 7 subjects. The velocity of the missing sensor on the Cyberglove, the Ring IP-joint, was predicted from the 3 most correlated joints, non-belonging to the Ring finger, found using the Correlation matrix. The results of the Regression, using a GP trained on everyday hands movements, are shown in Figure 4.4 for all subject, in terms of Correlation coefficient, coefficient of Determination and RMSE, all performances are found by comparing the actual values with the mean of the predictive distribution. The model was trained on Subject S0 and and tested on the other 6 subjects1 . The performances for each subject and the mean performance are summarised in Table 4.1. Figure 4.5 shows the accuracy of the GP Regressor for subject S0 (training data), emphasising the linear relation between the actual data and the predictions with a linear fit; we also display 18s of the velocity of the missing joint as predicted by the model and as recorded during the experiment. Figure 4.6 shows the same for one of the test datasets (subject S1 ), notice how the linear relation is maintained, showing a good generalisation of the predictions across subjects, confirming the results shown in Figure 4.4. We used the same approach for all the other joints of each subject and trained a GP Regressor to predict the velocity of each joint from the remaining ones, not belonging to the same finger. Again, because of the computational cost of GPs, we used as inputs only the 3 most correlated joints to the one we wanted to predict, not belonging to the same finger. The Regressor 1

We relied on the Automatic Occam’s razor go GPs to avoid overfitting. Even if a fold cross validation would have been more accurate, the computational demand of GPs make it unfeasible.

32

Figure 4.2: Plot-matrix of pairwise joint velocities in natural hand movements. Ploti,j , with i, j ∈ {1, 2, ..., 11}, displays the velocity of joint i versus the velocity of joint j, during natural hand movements. Notice the presence of strong linear relations between joints, especially in the Middle-Ring-Little (M-R-L) group, while the Thumb (T) looks very poorly correlated. The Index (I) also shows some degree of linearity. Data from [27], for one subject only. Refer to Table 3.2 for an explanation of the joints abbreviations.

33

Figure 4.3: Average Correlation matrix between joints velocities in natural hand movements. Celli,j , with i, j ∈ {1, 2, ..., 11} shows the average over subjects of the absolute value of the Correlation coefficient |¯ ρ| between the velocity of joint i and the velocity of joint j, during natural hand movements. Notice the increasing correlation from the T to the M-R-L group. Refer to Table 3.2 for an explanation of the joints abbreviations.

34

ρ r2 RMSE

1

0.8

0.6

0.4

0.2

µ

7

S

6

S

5

S

4

S

3

S

2

S

1

S

0

Figure 4.4: Bar plot of the goodness across-subjects of the missing-joint predictions. Performance of the GP model across subjects evaluated in terms of ρ (red), r2 (blue) and RM SE (green), the mean value of the Predictive distribution was used for all calculations. The model was trained on subject S0 and tested on the others. The mean and one standard error across the 6 test subjects is presented in the last bar-group on the right, showing: ρ¯ = 0.824 , r¯2 = 0.642 and RM¯SE = 0.235◦ /s.

Table 4.1: Missing Joint Prediction Accuracy

ρ r2 RM SE [◦ /s]

S0 S1 S2 S3 0.912 0.841 0.812 0.827 0.804 0.632 0.575 0.638 0.182 0.295 0.289 0.162

S4 0.826 0.645 0.203

S5 S6 0.740 0.899 0.533 0.798 0.308 0.203

µ ± SE 0.824 ± 0.052 0.642 ± 0.091 0.235 ± 0.061

was trained, using an SE covariance function, zero-mean and FITC Inference method, on one subject and tested on all the other ones. We then tested the predictions on each subjects, evaluating the generalisation ability of the model and the variability of the performance across joints. Figure 4.7 shows the evaluation, in terms of RM SE, Correlation coefficient ρ and coefficient of Determination r2 , of the prediction of the velocity of each joint from the velocity of the 3 most correlated joints, not belonging to the same finger, for the 6 test subjects. The M-R-L group is consistently predicted with higher accuracy, as shown by the higher values of ρ and r2 and by the low RM SE. The mean across test subjects for each joint is shown in Table 4.1 and plotted in Figure 4.8 (b), plotted with an error-bar of one standard error and displaying ρ, r2 and RM SE, which give complementary information about the fitting. We also display the optimised Log Marginal Likelihood (a), i.e. p(α|Θ, ˙ M) (refer to Subsection 3.3.1 for details) on the training data. The Marginal Likelihood is indicative of the goodness of the fit and the its value for each joint is consistent with the information given by ρ, r2 and RM SE. 35

250 200

200 y = 0.93*x + 0.15

Predicted Actual

150

150

100 Velocity [°/s]

Predicted [°/s]

100 50 0 −50 −100

50 0 −50 −100

−150

−150

−200 −250

−200 −200

−100

0 Actual [°/s]

100

10

200

(a)

15 Time [s]

20

(b)

Figure 4.5: Evaluation of the performance of the GP model on subject S0 . Predicted joint velocity versus actual (a); a linear fit shows the strong similarity between the two compared time-series (actual and mean of the predictive distribution) with an angular coefficient of 0.93. (b) Mean of the predictive distribution and actual velocity of the missing joint are plotted as time-series for visual inspection in a window of 18s. The whole dataset comprised ≈28min of recording per subject.

250 200

y = 0.84*x + 0.79

150

50

100

0

50

Velocity [°/s]

Predicted [°/s]

Predicted Actual

150

100

−50 −100

0 −50

−150 −100

−200 −250

−150

−200

−100

0 Actual [°/s]

100

200

10

(a)

15 Time [s]

20

(b)

Figure 4.6: Evaluation of the performance of the GP model on subject S1 . Predicted joint velocity versus actual (a); a linear fit shows the strong similarity between the two compared time-series (actual and mean of the predictive distribution) with an angular coefficient of 0.84. (b) Mean of the predictive distribution and actual velocity of the missing joint are plotted as time-series for visual inspection in a window of 18s. The whole dataset comprised ≈28min of recording per subject.

36

(a)

IP

P

L

C

IP

L

M

P

R

C

IP

R

M

P

M

IP

P

L

IP

L

M C

P

R

C

IP

R

M

M

IP

L

IP

C P M

L

P

R

IP

M C

R

M

P M C

IP

M

I

P

IP

T

M C

I

M

T

T

M C

IP

L

IP

C P M

L

P M C

R

M C

I

M

M C

I

R

0 IP

0

M

0.2

P

0.2

IP

0.4

P

0.4

IP

0.6

T

0.6

C P

0.8

C P

1

0.8

M

ρ r2 RMSE

C

1

T

C

(d) ρ r2 RMSE

M C

M

M

(c)

C

P

IP

I

M

C

P

IP

I

P M C

T

T

M

C

IP

P

L

IP

L

M C

P C M

R

M

M

C

I

P C M

I

M C

T

M C

R

0 IP

0

M

0.2

P

0.2

IP

0.4

IP

0.4

T

0.6

P

0.6

C

0.8

T

1

0.8

T

ρ r2 RMSE

C

1

T

M

(b) ρ r2 RMSE

(e)

C

I

IP

M

I

M

C

P

IP

P

C

C

M

T

M

IP

P

L

C

IP

L

M

C M

R

M

M

C

I

P

I

M

C

C

M

M

C

T

T

P

0

R

0 IP

0.2

P

0.2

M

0.4

IP

0.4

IP

0.6

P

0.6

T

0.8

C

0.8

T

1

C

1

ρ r2 RMSE

T

ρ r2 RMSE

(f )

Figure 4.7: Generalisation of the GP predictions for all fingers on test subjects. The performance of the regressor is expressed in terms of the ρ, r2 and RM SE, shown in red, blue and green respectively. Each plot shows the results for one of the 6 test subjects. Notice the consistency in the accurate predictability of the M-R-L group.37Refer to Table 3.2 for an explanation of the joints abbreviations.

16

x 10

4

1.5 ρ r2 RMSE

14 12 10

1

8 6 4

0.5

2 0

IP

P

L

IP

R

C M

L

IP

M

C P M

R

C P M

I

IP

M

C P

IP

I

M

P

T

C

C M C

T

T

(a)

M

IP

P

L

C

IP

0

M

L

R

IP

M

C P M

R

C P M

I

IP

M

C P

IP

I

M

P

T

C M

T

C

T

M C

−2

(b)

Figure 4.8: Evaluation of accuracy of predictions across test subjects and on training data. Plot (a) shows the Log Marginal Likelihood, p(α|Θ, ˙ M), i.e the probability of the training data being explained by the defined model M with optimised hyperparamters Θ. (b) Displays the average performance across test subjects, i.e. the average across all the plots in Figure 4.7, with one standard error error-bar. Table 4.1 displays the same results in a numerical form. Refer to Table 3.2 for an explanation of the joints abbreviations.

4.3

State-Ahead Predictions

We then used the muscle activity and the data acquired from the Cyberglove to train a model that could learn the relation between the velocity of the joints at time t and the velocity at time t + τ . We compared the predictions of a linear Autoregressive model (VARMAX) trained using a ML approach, with the predictions of a principled framework such as GP , trained using a Bayesian formalism, for different time steps τ . We here show the results achieved in predicting the hand state in velocity space, using bar plots and plotting some of the results as time series, for a visual comparison between the actual and the predicted velocities for the eleven considered joints. Figure 4.9 shows the performance of the two models in terms of RMSE and ρ for each joint and different τ for one subject only. The two models are labelled by two different colours, light blue for the GP and dark blue for the V ARM AX. Of the 2 × 6 matrix of plots, the first and second row display the ρ and RM SE respectively, rows are different time steps τ . In Figure 4.10 we display the predicted trajectories ((b), (c), (d), (e)) and the actual one (a) for a visual comparison of the model’s performance on one subject only. Plots (b), (c) show the results of the VARMAX model for τ = 87ms and 175ms respectively, while plots (d) and (e) show the same for the GP model. Different colours correspond to different joints, as labelled in the legend. The same is shown in Figure 4.11 for τ = 350ms and 500ms. We then computed the grand average performance mean and standard deviation by averaging 38

over tasks and subjects, to obtain a more valuable measure of the performance of the two models for each joint and τ . Figure 4.19 and Table 4.2 show the result of this marginalisation, where the Grand Mean µ ¯ and the Grand Variance σ ¯ for the ρ and RM SE were obtained using the following equations: 1X1X xij S i T j 1 XX σ ¯= (xij − µ ¯)2 . S i j µ ¯=

(4.1) (4.2)

Where S √is the number of subjects and T the number of tasks. Errorbars show the standard error, i.e Sσ¯ . In order to investigate the performance as the task varies, we averaged over subjects and trials for each task and obtained a measure of how well both the V ARM AX and GP predict the gesture of the hand for each trial. Figures 4.12- 4.18 show the performance of the two models for each of the tasks defined in Table 3.1. The performances are displayed using error bars of the mean ± the standard error and the V ARM AX and GP models are identified by light and dark blue respectively and different rows in the plot matrix correspond to the prediction with different time-steps τ . Finally, Figure 4.20 shows the behaviour of the grand average RMSE and ρ as a function of the time-lag τ . As expected, the former decreases and the latter increases for increasing τ . An investigation of this behaviour is shown in Figure 5.1 and explained in the next section. For a pictorial representation of the predicted trajectories for each task of both the GP and the V ARM AX model, see https://www.dropbox.com/sh/mlw3witpg6ouz08/AABSMmtPs74_ _NQV_K00eKOia?dl=0, where animations of the hand gesture (using LibHand) during the experiments and as predicted by the models can be found for comparison.

39

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

RMSE [°/s]

20

ρ

0.6 0.5 0.4

15

10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

M

M

C

I

IP

C

I

M

(a) ρ|τ =87ms

(b) RM SE|τ =87ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE [°/s]

0.6

ρ

P

IP

P

T

C M

C

T

T

M

C

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P C

C

M

M C

T

T

T

0

0

0.5 0.4

15

10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

C M

M

M

P

IP

I

P C

I

M

(c) ρ|τ =175ms

(d) RM SE|τ =175ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE [°/s]

0.6

ρ

IP

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

T

P

T

M C

C C M

T

T

0

0

0.5 0.4

15

10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

I

IP

C

I

M

(e) ρ|τ =350ms

(f ) RM SE|τ =350ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE [°/s]

0.6

ρ

P

IP

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P

T

M C

T

C C M

T

T

0

0

0.5 0.4

15

10

0.3 0.2

5

0.1 IP

L

P C M

IP

L

P

R

IP

M C

R

M

P M C

IP

M

I

P M C

IP

I

P C M

T

T

C

M

C

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

T

P M

T

C

C M C

T

(g) ρ|τ =500ms

T

0

0

(h) RM SE|τ =500ms

Figure 4.9: Average prediction performance over tasks for Subject S1 for τ ∈ {87ms, 175ms, 350ms, 500ms}. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Both models, under the assumption of linearity, perform better for bigger τ , as shown by lower ρ values and bigger RM SE. Refer to Table 3.2 for an explanation of the joints abbreviations.

40

Velocity [°/s]

80

TCMC

60

TMCP TIP

40

IMCP

20

IIP MMCP

0

MIP

−20

RMCP

−40

RIP

−60

LIP

LMCP

−80 10

20

30 Time [s]

40

50

60

(a) Recorded Data 10

20

5

15

Velocity [°/s]

Velocity [°/s]

10 0 −5 −10

5 0 −5 −10

−15

−15 −20

−20 10

20

30 Time [s]

40

50

60

10

(b) V ARM AX|τ =87ms

20

30 Time [s]

40

50

60

50

60

(c) V ARM AX|τ =175ms

30

20 15 10

10

Velocity [°/s]

Velocity [°/s]

20

0

5 0 −5

−10

−10 −20

−15 10

20

30 Time [s]

40

50

60

(d) GP|τ =87ms

10

20

30 Time [s]

40

(e) GP|τ =175ms

Figure 4.10: Velocity predicted by the GP and the V ARM AX model for subject S1 during Ball grasp. The plots show the comparison between a ground truth of recorded data (a) and the predictions of the GP model ((d), (e)) and the V ARM AX model for τ = 87ms ((a), (d)) and τ = 175ms ((b), (c)). Each colour represents a different joint, as defined in the legends (refer to Table 3.2 for an explanation of the joints abbreviations). Notice how the predictions are closer to the actual data for τ = 175ms and become even better for higher values of τ (Figure 4.11); this consideration is consistent across subjects and tasks, as better defined in Figure 4.19. Refer to Table 3.2 for an explanation of the joints abbreviations.

41

TCMC TMCP

60

TIP

Velocity [°/s]

40

IMCP IIP

20

MMCP 0

MIP

−20

RMCP

−40

LMCP

RIP LIP

−60 −80 10

20

30 Time [s]

40

50

60

(a) Recorded Data 60

50 40

40

20

Velocity [°/s]

Velocity [°/s]

30

10 0 −10

20 0 −20

−20 −40

−30 −40

−60

10

20

30 Time [s]

40

50

60

10

(b) V ARM AX|τ =350ms

20

30 Time [s]

40

50

60

50

60

(c) V ARM AX|τ =500ms

40

60 30

40

Velocity [°/s]

Velocity [°/s]

20 10 0 −10

20 0 −20

−20

−40 −30

−60 10

20

30 Time [s]

40

50

60

(d) GP|τ =350ms

10

20

30 Time [s]

40

(e) GP|τ =500ms

Figure 4.11: Velocity predicted by the GP and the V ARM AX model for subject S1 during Ball grasp. The plots show the comparison between a ground truth of recorded data (a) and the predictions of the GP model ((d), (e)) and the V ARM AX model for τ = 350ms ((a), (d)) and τ = 500ms ((b), (c)). Each colour represents a different joint, as defined in the legends (refer to Table 3.2 for an explanation of the joints abbreviations). Notice how the predictions are closer to the actual data for τ = 500ms; this consideration is consistent across subjects and tasks, as better defined in Figure 4.19. Refer to Table 3.2 for an explanation of the joints abbreviations.

42

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P

T

C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

T

P

T

M C

C C M

T

T

0

0

(d) RM SE|τ =175ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P

T

M C

T

C C M

T

T

0

0

(f ) RM SE|τ =350ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P M

C

IP

L

P

R

IP

M C

R

M

P M C

I

IP

M

M C

P

IP

I

P C M

T

T

C

M

C

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

T

P M

T

C

C M C

T

(g) ρ|τ =500ms

T

0

0

(h) RM SE|τ =500ms

Figure 4.12: Comparison of the average performance across subjects for the Ball-grasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

43

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P

T

C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

T

P

T

M C

C C M

T

T

0

0

(d) RM SE|τ =175ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P

T

M C

T

C C M

T

T

0

0

(f ) RM SE|τ =350ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P M

C

IP

L

P

R

IP

M C

R

M

P M C

I

IP

M

M C

P

IP

I

P C M

T

T

C

M

C

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

T

P M

T

C

C M C

T

(g) ρ|τ =500ms

T

0

0

(h) RM SE|τ =500ms

Figure 4.13: Comparison of the average performance across subjects for the Cilindergrasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

44

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P

T

C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

T

P

T

M C

C C M

T

T

0

0

(d) RM SE|τ =175ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P

T

M C

T

C C M

T

T

0

0

(f ) RM SE|τ =350ms

1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4

10

0.3 0.2

5

0.1 0

IP

L

P M

C

IP

L

P

R

IP

C M

R

M

P C M

I

IP

M

M

C

P

IP

I

T

P M

T

C

C

T

(g) ρ|τ =500ms

C

M

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

P M

T

C

C M C

T

T

0

(h) RM SE|τ =500ms

Figure 4.14: Comparison of the average performance across subjects for the Book-grasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

45

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P

T

C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

T

P

T

M C

C C M

T

T

0

0

(d) RM SE|τ =175ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P

T

M C

T

C C M

T

T

0

0

(f ) RM SE|τ =350ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P M

C

IP

L

P

R

IP

M C

R

M

P M C

I

IP

M

M C

P

IP

I

P C M

T

T

C

M

C

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

T

P M

T

C

C M C

T

(g) ρ|τ =500ms

T

0

0

(h) RM SE|τ =500ms

Figure 4.15: Comparison of the average performance across subjects for the Handlegrasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

46

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

T

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P

T

C M

C

T

T

M

C

IP

P

L

IP

C

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P M

T

C

C M C

T

T

0

0

(d) RM SE|τ =175ms

1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4

10

0.3 0.2

5

0.1 0

IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P

T

M

T

C

C C

T

M

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P M C

T

C C M

T

T

0

(f ) RM SE|τ =350ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P M

C

IP

L

P

R

IP

M C

R

M

P M C

I

IP

M

M C

P

IP

I

P C M

T

T

C

M

C

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

T

P M

T

C

C M C

T

(g) ρ|τ =500ms

T

0

0

(h) RM SE|τ =500ms

Figure 4.16: Comparison of the average performance across subjects for the Pen-grasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

47

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P

T

C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P C M

T

C

T

M

C

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

T

P

T

M C

C C M

T

T

0

0

(d) RM SE|τ =175ms

1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4

10

0.3 0.2

5

0.1 0

IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P

T

M

T

C

C C

T

M

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P M C

T

C C M

T

T

0


1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4

10

0.3 0.2

5

0.1 0

IP

L

P M

C

IP

L

P

R

IP

C M

R

M

P C M

I

IP

M

M

C

P

IP

I

T

P M

T

C

C

T

(g) ρ|τ =500ms

C

M

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

P M

T

C

C M C

T

T

0

(h) RM SE|τ =500ms

Figure 4.17: Comparison of the average performance across subjects for the Mouse-grasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

48

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

C

M

M

I

IP

C

P

IP

I

(a) ρ|τ =87ms

M

P

T

C

C

M C

T

T

M

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P C

C

M

M C

T

T

T

0

0


1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

(c) ρ|τ =175ms

M

P

T

C M

C

T

T

M

C

IP

P

L

IP

C

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P M

T

C

C M C

T

T

0

0

(d) RM SE|τ =175ms 30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4 10

0.3 0.2

5

0.1 IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P C

M

M

I

IP

C

P

IP

I

(e) ρ|τ =350ms

M

P

T

M

T

C

T

C

C M

IP

P

L

IP

C

L

M

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P M

T

C

C M C

T

T

0

0


1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE

ρ

0.6 0.5

15

0.4

10

0.3 0.2

5

0.1 0

IP

L

P M

C

IP

L

P

R

IP

C M

R

M

P C M

I

IP

M

M

C

P

IP

I

T

P M

T

C

C

T

(g) ρ|τ =500ms

C

M

IP

P

L

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

P M

T

C

C M C

T

T

0

(h) RM SE|τ =500ms

Figure 4.18: Comparison of the average performance across subjects for the Telephonegrasp task of the V ARM AX and GP model. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Refer to Table 3.2 for an explanation of the joints abbreviations.

49

Table 4.2: Grand Average Performance

Thumb Index GP

Middle Ring Little Thumb Index

V ARM AX

Middle Ring Little

TCM C TM CP TIP IM CP IIP MM CP MIP RM CP RIP LM CP LIP TCM C TM CP TIP IM CP IIP MM CP MIP RM CP RIP LM CP LIP

ρ ± SE τ = 78ms τ = 174ms τ = 350ms τ = 500ms 0.286 ± 0.067 0.321 ± 0.060 0.374 ± 0.045 0.435 ± 0.057 0.228 ± 0.056 0.236 ± 0.061 0.329 ± 0.088 0.343 ± 0.094 0.300 ± 0.072 0.295 ± 0.074 0.387 ± 0.116 0.452 ± 0.134 0.336 ± 0.045 0.372 ± 0.044 0.516 ± 0.048 0.540 ± 0.061 0.177 ± 0.050 0.385 ± 0.080 0.475 ± 0.114 0.501 ± 0.131 0.299 ± 0.054 0.413 ± 0.066 0.516 ± 0.083 0.540 ± 0.097 0.243 ± 0.084 0.310 ± 0.095 0.460 ± 0.130 0.524 ± 0.150 0.288 ± 0.046 0.410 ± 0.071 0.537 ± 0.099 0.569 ± 0.114 0±0 0±0 0±0 0±0 0.316 ± 0.037 0.402 ± 0.042 0.505 ± 0.043 0.521 ± 0.041 0.177 ± 0.066 0.391 ± 0.102 0.492 ± 0.140 0.515 ± 0.050 0.313 ± 0.0741 0.326 ± 0.0525 0.378 ± 0.054 0.413 ± 0.061 0.308 ± 0.074 0.307 ± 0.079 0.307 ± 0.083 0.321 ± 0.086 0.290 ± 0.065 0.386 ± 0.070 0.400 ± 0.115 0.464 ± 0.137 0.352 ± 0.043 0.364 ± 0.044 0.496 ± 0.046 0.545 ± 0.059 0.297 ± 0.062 0.420 ± 0.081 0.467 ± 0.116 0.510 ± 0.128 0.362 ± 0.058 0.353 ± 0.069 0.506 ± 0.086 0.550 ± 0.100 0.276 ± 0.077 0.415 ± 0.097 0.484 ± 0.135 0.519 ± 0.139 0.331 ± 0.048 0.425 ± 0.74 0.530 ± 0.098 0.562 ± 0.112 0±0 0±0 0±0 0±0 0.354 ± 0.045 0.413 ± 0.041 0.506 ± 0.043 0.523 ± 0.044 0.336 ± 0.096 0.368 ± 0.106 0.497 ± 0.143 0.513 ± 0.143

50

30

1

GP Varmax

GP Varmax

0.9

25

0.8 0.7

RMSE [°/s]

20

ρ

0.6 0.5 0.4

15

10

0.3 0.2

5

0.1 IP

P

L

C

IP

L

M

P C

R

IP

R

M

P

M

M

M

C

I

IP

C

I

M

(a) ρ|τ =87ms

(b) RM SE|τ =87ms

1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE [°/s]

0.6

ρ

P

IP

P

T

C M

C

T

T

M

C

IP

P

L

IP

C

L

M

P C

R

IP

M

P

M

C

R

M

M

I

IP

C

P

IP

I

M

P C

C

M

M C

T

T

T

0

0

0.5 0.4

15

10

0.3 0.2

5

IP

L

P C

IP

L

M

P C

R

IP

R

M

C M

M

M

P

IP

I

P C

I

M

(c) ρ|τ =175ms

(d) RM SE|τ =175ms

1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE [°/s]

0.6

ρ

IP

P C M

T

C

T

M

C

0

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P

T

M C

C C M

T

T

0

T

0.1

0.5 0.4

15

10

0.3 0.2

5

IP

L

P C

IP

L

M

P C

R

IP

R

M

M

P M

M

C

I

IP

C

I

M

(e) ρ|τ =350ms


1

30 GP Varmax

0.9

GP Varmax

25

0.8 0.7

20

RMSE [°/s]

0.6

ρ

P

IP

P C M

T

C

T

M

C

0

IP

P

L

IP

L

M C

P C

R

IP

R

M

M

P M

M

C

IP

I

C

P

IP

I

M

P M C

T

C C M

T

T

0

T

0.1

0.5 0.4

15

10

0.3 0.2

5

IP

L

P C M

IP

L

P

R

IP

C M

R

C M

M

P

IP

I

M

M

C

P

IP

I

P C M

T

C

T

(g) ρ|τ =500ms

C

M

IP

P

L

0

M

L

C

IP

R

C P M

R

IP

M

C P M

I

IP

M

I

M

C P

IP

P M

T

C

C M C

T

T

0

T

0.1

(h) RM SE|τ =500ms

Figure 4.19: Grand Average of the performance of the V ARM AX and GP models across subjects and tasks. Errorbars showing the performance, in terms of ρ ((a), (c), (e)) and RM SE ((b), (d), (f)) of the V ARM AX (dark blue) model and the GP (light blue) model for increasing steps-ahead. Dashed lines show the mean over joints; colours are consistent with the ones of the bars. Errorbars show the standard error. Notice the increasing goodness of the predictions as we increase the step-size τ . Refer to Table 3.2 for an explanation of the joints abbreviations.

51

(a) ρ|GP

(b) ρ|V ARM AX

(c) RM SE|GP

(d) RM SE|V ARM AX

Figure 4.20: Grand Average RMSE and Correlation Coefficient for the GP and the V ARM AX model as a function of the time-lag τ . The performance of the GP ((a), (c)) and the V ARM AX ((b), (d)) are shown in terms of Correlation Coefficient ρ ((a), (b)) and RM SE ((c), (d)) for τ ∈ {87ms, 175ms, 350ms, 500ms}. Notice how the RMSE decreases and the ρ increases for both models as we increase τ . An explanation to this behaviour is shown in Figure 5.1.

52

Chapter 5 Discussion We now discuss the results presented in the previous section, giving an qualitative interpretation of their meaning, summarising the achievements of our work and suggesting the future research that could be done to improve the proposed framework.

5.1 5.1.1

Summary of Thesis Achievements and Future Work Missing Finger Predictability

We have used GPs to predict the velocity of “missing fingers” from the velocity of the 3 most correlated joints. The predicted trajectories showed very high accuracy for highly correlated joints (see [30] for a study of digits independence ) such as the M-R-L group. Table 4.1 summarises the accuracy of the predictions for the RIP joint, which we focused on because it was the missing sensor on the used CyberGlove, showing an average correlation coefficient ρ = 0.824 and an RM SE = 0.235. These very accurate and consistent predictions could be used as a control command for robotic finger prosthetic replacement. They are a small step towards abstracting from currently used approaches in the literature, which treat hand movements as discrete variables using classification, to use regression and achieve a more natural and intuitive effect. The development of an online prediction system would be the ultimate test for such a framework.

5.1.2

State-Ahead Predictions

We investigated the use of GPs for state-ahead predictions go joints velocity from muscle activity and compared its performance with that of a linear VARMAX model. This was done for multiple steps-ahead to investigate “how far” we can predict the joint velocity and with what accuracy. The results surprisingly showed an increasing accuracy for increasing time-lags, which we do not expect since natural human movements are very smooth and the assumption of linearity 53

Figure 5.1: Value of the weights learned by the VARMAX model for varying time-lag τ . The value of the AR coefficient ϕ (as defined in Equation 3.16) is shown in blue and its scaling is on the left-side y axis; notice how its value decreases for large τ , in accordance to the autocorrelation analysis shown in Figure 3.8. The value of the b coefficient (as defined in Equation 3.16), showing the contribution of the exogenous inputs, on the other hand, increases for all exogenous inputs with increasing τ , showing that the predictions become more and more dependent on the muscle activity as we try to predict further in time. The scale of the exogenous inputs is shown on the right-side y axis. ◦ The coefficients b have unit s×q×M V C where q ∈ Q are rational multiples of the Maximum Voluntary Contraction.

should be true for small rather than big time-lags τ . The V ARM AX and the GP model did not show significant difference for the a fixed time-lag, both their predictions were consistent and stable across tasks and subjects. There are two main considerations about the used approach for the training of the GP that I would like to emphasise. First of all the training of the model for larger values of τ was done by downsampling the data with a ratio r: r = τ × fs where fs is the sampling frequency and τ ∈ {87ms, 175ms, 350ms, 500ms}. The downsampling reduces the number of points on to which we train the model, to the point were the sparse approximation method FITC (refer to B.1) is not required and the predictive distribution can be found using exact Inference (Equation 3.10) which gives more accurate predictions than FITC. While this could explain the better predictions of the GP for larger τ , it does not justify the performance of the VARMAX model. We thus analysed the weights of the optimised V ARM AX model to seek for an explanation to this behaviour. Figure 5.1 shows the learned value of the coefficients of the Exogenous inputs (b in Equation 3.16) and of the AR term (ϕi in Equation 3.16) for the investigated time-lags τ . Notice how, for small time-lags, the model relies nearly only on the previous state, whilst giving very little importance to the muscle activity; the opposite is true for bigger values of τ , where the muscle activity gives a bigger contribution to the prediction, since the velocity at the previous state is not correlated 54

with the velocity at t (see Figure 3.8). The second consideration about the developed framework is that GPs assume that the input space is noiseless. While we have appropriately pre-processed and average the EMG and MMG data, the noiseless assumption is not satisfied for the iterative use of the joint velocity. At each step in the prediction we sample from the predictive distribution of the previous iteration and use this sample as a noiseless input for the next iteration; by doing so we do not take into account the propagation of uncertainty in time. This approach is simpler than using a principled analytical solution but it is not rigorous. An analytical solution for this problem has been proposed in [51] and used for step-ahead forecasting in [52]. On the other side, using a more complex analytical solution has its drawbacks, such as computational speed issues; when using GPs for online control of robotic platforms, one would also have to deal with the computational requirements of the regressor; the computation of the predictive distribution requires the inversion of an n × n matrix which scales with O(n3 ). This is also why the good result that we obtained for larger time-lags are encouraging. On the other side downsampling under the N yquist frequency 2×fmax would introduce aliasing in the resulting signal, so one would have to constrain the velocity of hand movements to find a compromise between a good prediction and aliasing avoidance. Finally I think that other methods for Regression should be investigated, especially Support Vector Machines, which have a lot in common with GP (as suggested in [36]) but outperform them in computational speed and memory requirements. Having proved that regression is a valuable candidate for a Human-Machine Interface, I think that the next step should be its application to a robotic platform, such as the pneumatic robotic arm presented in [53] and developed in FaisalLab. The anthropomorphic nonlinearity of its pneumatic actuators would make it a perfect candidate to test the potential of the developed framework.

55

5.2

Conclusion

We have developed a stochastic and principled framework for fine gesture control of robotic hands, avoiding the constrain introduced by the classic classification approaches and treating hand-gesture as a continuous variable. By comparing a principled Bayesian approach using GPs with a linear model typically used for financial forecasting (VARMAX) we obtained good correlations of the predictions with the actual data. We obtained very good generalisation across tasks and subjects, suggesting that a further study and application of this framework to robotic prosthetic control could be successful. We believe that this is the right path to follow to significantly improve the users’s experience in the control of an upper limb neuroprothsetic device. The device could offer a continuous, natural and intuitive interface and avoid the frustration that leads to rejection of currently used systems.

56

Appendix A

A.1

K-means clustering

K-means is a non-probabilistic technique for unsupervised clustering of data largely used in the machine learning literature, especially for image segmentation and compression. Suppose we want to identify K groups or clusters of data points in a data set {x1 ...xN } consisting of N observations in a D-dimensional Euclidean space. Intuitively a cluster can be thought as comprising a group of data points whose inter-points distance is small compared to the distances to points outside the cluster; we can formalise this notion by introducing vectors µk , with k = 1, ...K that represent the cluster centres. Clustering the data points can then be achieved by iteratively finding the vectors µk and assigning each point to a cluster using a distance-measure criterion, typically a minimisation of the sum of square distances of each data point to its closest vector µk . To formalise this concept we introduce a binary indicator variable rnk ∈ {0, 1} where k = 1, ...K that describes to which of the K clusters point n belongs to (this is a ’hard’ assignment version of the mixing coefficients πnk typical of Mixture of Gaussians models). We can then define an objective function, or distortion function, in the following way: J=

N X K X

rnk k xn − µk k2

(A.1)

n=1 k=1

which in this case represents the sum of square distances of each data point to the centroid of the cluster it is currently assigned to. The goal of finding the optimal clustering can be achieved by finding the {µk } and the variables rnk so as to minimise J. This can be achieved through an iterative procedure which consists of two successive steps, corresponding to optimisation with respect to rnk and µk . First of all we choose some initial values for µk ; in the first step we minimise J with respect to rnk while keeping µk fixed, thus assigning each point to the ’closest’ cluster centre. In the second step we minimise J with respect to µk while keeping rnk fixed, thus moving the cluster centres according to the new assignments. This procedure is repeated until convergence.

57

58

Appendix B

B.1

Sparse Approximation Methods u

f1

f2

...

fn

f∗ Test case

Figure B.1: Graphical model of the relation between the pseudo-inputs u, the training latent functions values f = [f1 , ..., fn ]T and the test function value f ∗ . A fully connected graph, like the one above, corresponds to the case where no approximation is made to the full joint Gaussian process distribution between these variables. The inducing variables u are superfluous in this case, since all latent function values can communicate with all others. The approximation defined by Equation B.1 corresponds to ignoring the blue thick line between the training and test cases; notice how, by doing so, the latent variables u become the only mean of communication between the two cases. FITC introduces a further approximation, that of full independency, on the latent function values f ; graphically, that would correspond to removing all the thick black and blue lines connecting the nodes f1 , ..., fn , f ∗ . The relation between latent functions f and observations still remains as described in Figure 3.6

The main limitation of GPs is that their computational demand and memory requirements grow with as O(n3 ) and O(n2 ) respectively, being n the umber of training cases, making their actual implementation feasible only for small data sets. To overcome the computational limitations of GPs, numerous authors have recently suggested a wealth of sparse approximations, a unifying view of which is given Qui˜ nonero-Candela and Rasmussen in [54]. Common to all these methods is the use of a set of latent variables u = [u1 , ..., um ] called inducing variables or pseudoinputs. Being latent variables, the pseudo-inputs do not appear in the predictive distribution but they do significantly affect the goodness of the fit. The main assumption made by these methods regards the relation existing between the training latent function values f and the test-cases f ∗ ; assuming conditional independence given u between training and test cases, the joint distribution between the two becomes: Z Z ∗ ∗ p(f , f ) = p(f , f |u)p(u)du ≈ p(f |u)p(f ∗ |u)p(u)du. (B.1) 59

the computational demand of the model scales with O(nm2 ). This equation is useful to justify the name inducing inputs, it comes from the fact that the only way of communication between the independent training and test cases happens through the latent variables u, this is intuitively represented in the graphical model in Figure B.1. We will only be concerned with a formulation of sparse approximation introduced in 2006 by Snelson and Ghahramani [55], and reinterpreted by Rasmussen in [54], known as Fully Independent Training Conditional (FITC). The position of the pseudo-inputs significantly alters the goodness of the fitting procedure and multiple methods exist in the literature regarding the optimal way of learning these latent variables [56]. A common approach, and the one used in this study, is that of inferring the pseudo-inputs u by minimising the negative log marginal likelihood, i.e. using the same procedure described in Subsection 3.3.1 to infer the hyperparameters Θ. A K-means clustering technique was used to initialise their position in the input space (details in Appendix A.1).

60

Figure B.2: Samples drawn from a 2-D GP with zero mean and SE Kernel with varying hyperparameter ` as defined in Equation 3.9. The top figure was generated by sampling from a noise-free [σs = 0] GP with a 2-D input space defined by an SE kernel with hyperparametes Θ = [`1 = 0.3; `2 = 0.3; σs = 0.5], i.e. an isotropic bell-shaped function. The resulting function is smooth and isotropic. By changing the lenght-scales `1 and `2 to 0.1 and 0.3 respectively we obtain the samples shown in the left-bottom plot, where the samples vary more rapidly in the x1 direction; in this sense x1 has “more importance” in determining the target y as it has a shorter length-scale than x2 . By fitting the Θ to the model as described in Subsection 3.3.1 one can determine these relations. This feature makes GP very attractive when it comes to interpretation of the model. The bottom-right plot was generated by setting Θ = [`1 = 0.1; `2 = 0.1; σs = 0.5], notice how the process becomes more rough in both the input dimensions. Surface plots were generated by (improperly) interpolating between a 1000 × 1000 points drawn from a GP.

61

62

References [1] A. A. Faisal, L. P. Selen, and D. M. Wolpert, “Noise in the nervous system,” Nature Reviews Neuroscience, vol. 9, no. 4, pp. 292–303, 2008. (cited in p. 2) [2] E. Jaynes, How does the brain do plausible reasoning? Springer, 1988. (cited in p. 2) [3] C. E. Shannon, “A mathematical theory of communication,” ACM SIGMOBILE Mobile Computing and Communications Review, vol. 5, no. 1, pp. 3–55, 2001. (cited in p. 2) [4] R. Cox, “Probability, frequency and reasonable expectation,” in Readings in uncertain reasoning, pp. 353–365, Morgan Kaufmann Publishers Inc., 1990. (cited in p. 3) [5] M. Bayes and M. Price, “An essay towards solving a problem in the doctrine of chances. by the late rev. mr. bayes, frs communicated by mr. price, in a letter to john canton, amfrs,” Philosophical Transactions (1683-1775), pp. 370–418, 1763. (cited in p. 3) [6] K. P. Körding and D. M. Wolpert, “Bayesian integration in sensorimotor learning,” Nature, vol. 427, no. 6971, pp. 244–247, 2004. (cited in p. 3) [7] N. Takatoku and M. Fujiwara, “Muscle activity patterns during quick increase of movement amplitude in rapid elbow extensions,” Journal of Electromyography and Kinesiology, vol. 20, no. 2, pp. 290–297, 2010. (cited in p. 5) [8] W. G. Darling and K. J. Cole, “Muscle activation patterns and kinetics of human index finger movements,” Journal of neurophysiology, vol. 63, no. 5, pp. 1098–1108, 1990. (cited in p. 5) [9] F. Tenore, A. Ramos, A. Fahmy, S. Acharya, R. Etienne-Cummings, and N. V. Thakor, “Towards the control of individual fingers of a prosthetic hand using surface EMG signals,” in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE, pp. 6145–6148, IEEE, 2007. (cited in p. 5) [10] T. Brochier, R. L. Spinks, M. A. Umilta, and R. N. Lemon, “Patterns of muscle activity underlying object-specific grasp by the macaque monkey,” Journal of neurophysiology, vol. 92, no. 3, pp. 1770–1782, 2004. (cited in p. 5) [11] A. L. Hodgkin, “The ionic basis of electrical activity in nerve and muscle,” Biological Reviews, vol. 26, no. 4, pp. 339–409, 1951. (cited in p. 5) 63

[12] D. Farina, L. Mesin, S. Martina, and R. Merletti, “A surface EMG generation model with multilayer cylindrical description of the volume conductor,” Biomedical Engineering, IEEE Transactions on, vol. 51, no. 3, pp. 415–426, 2004. (cited in p. 5) [13] B. Crawford, K. Miller, P. Shenoy, and R. Rao, “Real-time classification of electromyographic signals for robotic control,” in AAAI, pp. 523–528, 2005. (cited in p. 5) [14] D. T. Barry and N. M. Cole, “Muscle sounds are emitted at the resonant frequencies of skeletal muscle,” Biomedical Engineering, IEEE Transactions on, vol. 37, no. 5, pp. 525– 531, 1990. (cited in p. 5) [15] D. Barry et al., “Vibrations and sounds from evoked muscle twitches.,” Electromyography and clinical neurophysiology, vol. 32, no. 1-2, p. 35, 1992. (cited in p. 5) [16] T. Hemmerling, F. Donati, P. Beaulieu, and D. Babin, “Phonomyography of the corrugator supercilii muscle: signal characteristics, best recording site and comparison with acceleromyography,” British journal of anaesthesia, vol. 88, no. 3, pp. 389–393, 2002. (cited in p. 5) [17] K. Akataki, K. Mita, and M. Watakabe, “Electromyographic and mechanomyographic estimation of motor unit activation strategy in voluntary force production,” Electromyography and clinical neurophysiology, vol. 44, no. 8, pp. 489–496, 2004. (cited in p. 5) [18] T. W. Beck, T. J. Housh, G. O. Johnson, J. T. Cramer, J. P. Weir, J. W. Coburn, and M. H. Malek, “Does the frequency content of the surface mechanomyographic signal reflect motor unit firing rates? a brief review,” Journal of electromyography and kinesiology, vol. 17, no. 1, pp. 1–13, 2007. (cited in p. 5) [19] N. Alves and T. Chau, “Uncovering patterns of forearm muscle activity using multi-channel mechanomyography,” Journal of electromyography and kinesiology, vol. 20, no. 5, pp. 777– 786, 2010. (cited in p. 5) [20] S. Fara, C. S. Vikram, C. Gavriel, and A. A. Faisal, “Robust, ultra low-cost mmg system with brain-machine-interface applications,” in Neural Engineering (NER), 2013 6th International IEEE/EMBS Conference on, pp. 723–726, IEEE, 2013. (cited in p. 5), (cited in p. 14) [21] G. Johansson, “Visual perception of biological motion and a model for its analysis,” Perception & psychophysics, vol. 14, no. 2, pp. 201–211, 1973. (cited in p. 6) [22] D. Fitzgerald, J. Foody, D. Kelly, T. Ward, C. Markham, J. McDonald, and B. Caulfield, “Development of a wearable motion capture suit and virtual reality biofeedback system for the instruction and analysis of sports rehabilitation exercises,” in Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE, pp. 4870–4874, IEEE, 2007. (cited in p. 6) [23] M. D. McPartland, D. E. Krebs, and C. Wall, “Quantifying ataxia: ideal trajectory analysis-a technical note,” Journal of rehabilitation research and development, vol. 37, no. 4, pp. 445–454, 2000. (cited in p. 6) 64

[24] C. Bregler, “Motion capture technology for entertainment [in the spotlight],” Signal Processing Magazine, IEEE, vol. 24, no. 6, pp. 160–158, 2007. (cited in p. 6) [25] Y. Fujimori, Y. Ohmura, T. Harada, and Y. Kuniyoshi, “Wearable motion capture suit with full-body tactile sensors,” in Robotics and Automation, 2009. ICRA’09. IEEE International Conference on, pp. 3186–3193, IEEE, 2009. (cited in p. 6) [26] G. D. Kessler, L. F. Hodges, and N. Walker, “Evaluation of the cyberglove as a wholehand input device,” ACM Transactions on Computer-Human Interaction (TOCHI), vol. 2, no. 4, pp. 263–283, 1995. (cited in p. 6) [27] J. Belic, Classification and Reconstruction of Manipulative Hand Movements Using the CyberGlove. PhD thesis, Imperial College London, July 2010. (cited in p. 6), (cited in p. 19), (cited in p. 31), (cited in p. 33) [28] M. H. Schieber, “Individuated finger movements of rhesus monkeys: a means of quantifying the independence of the digits,” J Neurophysiol, vol. 65, no. 6, pp. 1381–91, 1991. (cited in p. 7) [29] M. Santello, M. Flanders, and J. F. Soechting, “Postural hand synergies for tool use,” The Journal of Neuroscience, vol. 18, no. 23, pp. 10105–10115, 1998. (cited in p. 7) [30] J. N. Ingram, K. P. Körding, I. S. Howard, and D. M. Wolpert, “The statistics of natural hand movements,” Experimental brain research, vol. 188, no. 2, pp. 223–236, 2008. (cited in p. 7), (cited in p. 26), (cited in p. 53) [31] C. Cipriani, F. Zaccone, S. Micera, and M. C. Carrozza, “On the shared control of an EMG-controlled prosthetic hand: analysis of user–prosthesis interaction,” Robotics, IEEE Transactions on, vol. 24, no. 1, pp. 170–184, 2008. (cited in p. 8) [32] N. Hogan, “A review of the methods of processing EMG for use as a proportional control signal.,” Biomedical engineering, vol. 11, no. 3, pp. 81–86, 1976. (cited in p. 8) [33] P. Whitle, Hypothesis testing in time series analysis, vol. 4. Almqvist & Wiksells, 1951. (cited in p. 8) [34] H. L¨ utkepohl, “Forecasting with varma models,” Handbook of economic forecasting, vol. 1, pp. 287–325, 2006. (cited in p. 8) [35] R. M. Neal, Bayesian learning for neural networks. PhD thesis, University of Toronto, 1995. (cited in p. 9) [36] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2006. (cited in p. 9), (cited in p. 13), (cited in p. 20), (cited in p. 24), (cited in p. 55) [37] C. E. Rasmussen, Evaluation of gaussian processes and other methods for making inference. PhD thesis, University of Toronto, 1996. (cited in p. 9) 65

[38] O. Stegle, S. V. Fallert, D. J. MacKay, and S. Brage, “Gaussian process robust regression for noisy heart rate data,” Biomedical Engineering, IEEE Transactions on, vol. 55, no. 9, pp. 2143–2151, 2008. (cited in p. 12) [39] C. S. V. Salvatore Fara, Robust, ultra low-cost mmg system with brain-machine-interface applications. PhD thesis, Imperial College London, 2013. (cited in p. 14) [40] N. Kamakura, M. Matsuo, H. Ishii, F. Mitsuboshi, and Y. Miura, “Patterns of static prehension in normal hands.,” The American journal of occupational therapy: official publication of the American Occupational Therapy Association, vol. 34, no. 7, pp. 437–445, 1980. (cited in p. 16) [41] J. M. Elliott and K. Connolly, “A classification of manipulative hand movements,” Developmental Medicine & Child Neurology, vol. 26, no. 3, pp. 283–296, 1984. (cited in p. 16) [42] J. R. Napier, “The prehensile movements of the human hand,” Journal of bone and joint surgery, vol. 38, no. 4, pp. 902–913, 1956. (cited in p. 16) [43] R. L. Klatzky, B. McCloskey, S. Doherty, J. Pellegrino, and T. Smith, “Knowledge about hand shaping and knowledge about objects,” Journal of Motor Behavior, vol. 19, no. 2, pp. 187–213, 1987. (cited in p. 16) [44] Y. Hong-liu, Z. Sheng-nan, and H. Jia-hua, “Mmg signal and its applications in prosthesis control,” in Proceedings of the 4th International Convention on Rehabilitation Engineering & Assistive Technology, p. 58, Singapore Therapeutic, Assistive & Rehabilitative Technologies (START) Centre, 2010. (cited in p. 17) [45] S. Standring, H. Ellis, J. Healy, D. Jhonson, A. Williams, and P. Collins, “Gray’s anatomy: the anatomical basis of clinical practice,” American Journal of Neuroradiology, vol. 26, no. 10, p. 2703, 2005. (cited in p. 17) ˇ c, “Libhand: A library for hand articulation,” 2011. Version 0.9. (cited in p. 18) [46] M. Sari´ [47] B. Matérn et al., “Spatial variation. stochastic models and their application to some problems in forest surveys and other sampling investigations.,” Meddelanden fran statens Skogsforskningsinstitut, vol. 49, no. 5, 1960. (cited in p. 24) [48] J. J. Belić and A. A. Faisal, “The structured variability of finger coordination in daily tasks,” BMC Neuroscience, vol. 12, no. Suppl 1, p. P102, 2011. (cited in p. 26) [49] E. Todorov and Z. Ghahramani, “Analysis of the synergies underlying complex hand manipulation,” in Engineering in Medicine and Biology Society, 2004. IEMBS’04. 26th Annual International Conference of the IEEE, vol. 2, pp. 4637–4640, IEEE, 2004. (cited in p. 26) [50] E. Todorov and M. I. Jordan, “Smoothness maximization along a predefined path accurately predicts the speed profiles of complex arm movements,” Journal of Neurophysiology, vol. 80, no. 2, pp. 696–714, 1998. (cited in p. 26) 66

[51] P. Dellaportas and D. A. Stephens, “Bayesian analysis of errors-in-variables regression models,” Biometrics, pp. 1085–1095, 1995. (cited in p. 30), (cited in p. 55) [52] A. Girard, C. E. Rasmussen, J. Quinonero-Candela, and R. Murray-Smith, “Gaussian process priors with uncertain inputs? application to multiple-step ahead time series forecasting,” 2003. (cited in p. 55) [53] D. B¨ uchler, Reinforcement Learning For Artificial Muscle Limbs. PhD thesis, Imperial College London, 2013. (cited in p. 55) [54] J. Qui˜ nonero-Candela and C. E. Rasmussen, “A unifying view of sparse approximate gaussian process regression,” The Journal of Machine Learning Research, vol. 6, pp. 1939– 1959, 2005. (cited in p. 59), (cited in p. 60) [55] E. Snelson and Z. Ghahramani, “Sparse gaussian processes using pseudo-inputs,” 2006. (cited in p. 60) [56] M. Titsias, “Variational learning of inducing variables in sparse gaussian processes,” 2009. (cited in p. 60)

67