Hand-Gesture Recognition Using Two-Antenna ...

42 downloads 0 Views 4MB Size Report
these two beat signals into three input channels of a DCNN as two spectrograms ... components (or side-lobes), that appear around the main. Doppler shift when a .... two received waves which in-turn can provide an estimate for the AoA. Thus ...
1

Hand-Gesture Recognition Using Two-Antenna Doppler Radar with Deep Convolutional Neural Networks Sruthy Skaria, Student Member, IEEE, Akram Al-Hourani, Senior Member, IEEE, Margaret Lech, Senior Member, IEEE, Robin J. Evans, Life Fellow, IEEE.

Abstract—Low-cost consumer radar chips combined with recent advances in machine learning, have opened up a range of new possibilities in smart sensing. In this paper, we use a miniature radar to capture Doppler signatures of 14 handgestures to train a deep convolutional neural network (DCNN) to classify the captured gestures. We utilize two receiving-antennas of a continuous-wave Doppler radar capable of producing the inphase and quadrature components of the beat signals. We map these two beat signals into three input channels of a DCNN as two spectrograms and an angle of arrival (AoA) matrix. Classification results of the trained DCNN network show gesture classification accuracy exceeding 92% and very low confusion between gestures. This is more than 6.5% improvement over the single-channel Doppler methods reported in the literature.

I. I NTRODUCTION Device control based on hand-gestures is rapidly becoming an important method of human-machine interface with wide applications in various fields of consumer electronics such as wearable devices, mobile phones, vehicle control, and medical devices. Hand-gesture recognition eliminates the need for physical contact with the device. It simplifies access and enhances convenience, for example, by reducing the risk of viral infection or bacterial contamination in clinical applications. One of the most common gesture acquisition approaches is based on image recognition and tracking by a camera sensor [1]–[3]. In this method, the target object (hand, fingers, eye, etc) is captured and decoded into representative features using an image/video processing unit. The resulting feature parameters are fed into a machine learning unit which is trained to identify the unique set of signatures of each gesture and classify it accordingly. For these methods to properly work, the classifier usually requires high quality images, which can be a drawback especially when image acquisition takes place in a noisy background environment [4]. On the other hand, when using miniature radars for gesture sensing, the negative effects of ambient environment noise can be minimized while maintaining high gesture recognition accuracy [5], with relatively lower processing cost that Manuscript received XX-April-2018 revised XX-XXXX-2018. S. Skaria, A. Al-Hourani and M. Lech are with the School of Engineering, RMIT University, Melbourne, Australia. E-mail: {sruthy.skaria, akram.hourani, margaret.lech}@rmit.edu.au R. J. Evans is with the Department of Electrical and Electronic Engineering at The University of Melbourne, Melbourne, Australia. E-mail: [email protected]

image/video based methods. Miniature radar uses electromagnetic waves in the microwave and millimeter spectrum bands which are unaffected by the visible ambient light in the environment. Furthermore, a radar is capable of instantaneously capturing the rich Doppler information associated with the movements of hands and fingers. In the case of camera-images the task of velocity estimation requires a significant processing effort and produces low accuracy results [6]. Low-cost Doppler radar devices are becoming widely available, thanks to the recent advancements in RF microelectronic technology [7]. Doppler radars detect micro-Doppler signatures caused by electromagnetic signals reflected from moving non-rigid objects, or body parts, such as hands, fingers, wrists, ankles etc. In general, micro-Doppler signatures consist of the frequency components (or side-lobes), that appear around the main Doppler shift when a target exhibits vibration or rotation of its non-rigid parts on top of the main translational motion of the body [8]. Doppler radar is being increasingly studied for gesture recognition due to its high sensitivity to small movements and excellent ability to distinguish non-stationary objects from a stationary background [6], [9]–[13]. In order to recognize and classify signatures of hand gestures, numerous techniques such as machine learning [6], [11], principal component analysis [14], [15], differentiate and cross-multiply algorithms [16], [17] have been applied. Conventional supervised machine learning extracts and classifies gestures using pre-defined characteristic parameters (features) [18], [19]. However, the optimal features, in many cases are unknown and therefore the performance of the classifier varies significantly depending on the selected features. On the other hand, deep learning algorithms, which use multiple layers of filters, such as deep convolutional neural networks (DCNN) do not require pre-defined features, but rather, the network self-learns the features from an input signal during the training process [4]. Training and classification using DCNN is a promising approach in gesture recognition problems as it eliminates the need for predetermining the set of features. To the best of the authors’ knowledge, existing work on gesture recognition using DCNN methods use a single receiving-antenna radar. However, when more than one receiving-antenna is used the additional information result in significantly enhanced performance as demonstrated in this paper. In this paper, we explore the use of radars with two receiving-antennas and analyze the resulting spectrogram sets

2

Fig. 1. The proposed method for mapping two receiving-antenna Doppler radar signals into a three input-channel DCNN.

to train a DCNN for gesture recognition, where the main advantage is the ability to extract the phase-difference between the two antennas to infer the angle of arrival (AoA). We define a pool of 14 different hand-gestures to train a DCNN with three input channels; two of these input channels are fed with spectrograms (one from each antenna), and the third input channel is fed with the AoA obtained from the phasedifference between the two receiving-antennas. The contributions of this paper are summarized as follows, • Development of a novel framework for using Doppler radar with two receiving-antennas for hand-gesture recognition. • Development of a new method to map the output from a two receive antenna Doppler radar system into a three input channel DCNN. • Development of a DCNN architecture to obtain a highperformance gesture recognition system. For convenience all symbols, notations and training parameters are listed in Table. II

Fig. 2. A block diagram of a typical Doppler radar system with two receivingantennas and I/Q outputs.

where a 3 layer DCNN is applied to extract and classify the features representing different hand gestures. In the following subsections, a detailed descriptions of each functional block is provided, starting from the gesture signal collection and ending with the gesture classification output. A. Collection of Radar Signals Consider a typical Doppler radar, transmitting a continuouswave signal sourced from its local oscillator having an analytical representation of e2πfo t . A set of L nonstationary targets moving with accelerating radial velocities {v1 (t), v2 (t), . . . , vL (t)} will cause different frequency shifts to the incident signal due to the Doppler effect. These frequencies are given by {fd1 (t), fd2 (t), . . . , fdL (t) = 2fco vL (t)} as seen from the radar perspective. The beat signal, resulting from mixing the transmitted and received signals is given by, x(t) = exp(−2πfo t) ×

L X

exp (2π [fo + fdl (t)] t)

l=1

|

II. A PPROACH AND M ETHODOLOGY In-order to achieve high classification accuracy from the DCNN, the collected signals by the Doppler radar need to be appropriately pre-processed. The reflected signal from a target is collected using two receiving antennas, where a coherent mixer is utilized to obtain both the in-phase component (I) and the quadrature component (Q) of the beat signal 1 . As a result, four time-domain output signals are generated. These signals include the two outputs from the first antenna denoted by x1I and x1Q , and the two outputs from the second antenna denoted by x2I and x2Q . These raw time-domain signals are converted to a spectrogram using a short-time discrete Fourier transform (STDFT). By combining both time and frequency domain information, the resulting spectrograms provide significantly more meaningful representation of the hand gestures than the raw timedomain signals. The spectrograms of the two beat signals along with the matrix of AoA for each Doppler component are fed into three input channels of the DCNN. Fig. 1 shows a functional block-diagram of the proposed method, 1 The

beat signal is the result of mixing the transmitted and received signals

=

L X

{z

LO signal

}

|

exp (2πfdl (t)t) .

{z

Return signal

} (1)

l=1

The signal x(t) contains the Doppler signatures corresponding to the unique movements of the targets. A typical Doppler radar layout is depicted in Fig. 2, this architecture makes it possible to obtain both the I and Q components, therefore we can differentiate positive and negative frequencies corresponding to approaching and receding targets respectively. We used in out experiment Infineon radar development board BGT24MTR12 operating at 24 GHz [20], as shown in Fig. 3 which follows the typical architecture shown in Fig. 2. The development board consists of three integrated antenna arrays including one transmitting antenna-array and two receiving antenna-arrays. Each receiver set produces I and Q components of the beat signal. These two components are acquired from the radar at a sampling rate of fs = 8000 Hz. Based on the sampling rate, the resulting spectrum would have a maximum Doppler frequency component of ± f2s = ± 4 kHz. This range is much larger than the observed Doppler frequencies for the performed gestures which are within

3

Fig. 3. The 24 GHz radar development kit used for data collection in the experiment.

f1 = ± 1 kHz. Therefore, in-order to reduce the computational load on the neural network the frequency range is restricted to retain only the significant signal components in the range within ±1 kHz. We selected a set of 14 different hand-gestures using right hand for this study. These gestures captures typical hand movements that can be easily performed by an operator. The selected gestures are given as follows, 1) Opening then closing of fingers together (single blinking) 2) Double blinking 3) Moving open hand towards and away from the radar (single to-and-fro) 4) Double to-and-fro 5) Rotating hand in an anti-clockwise direction (single round) 6) Double round 7) Swiping using two fingers (single swiping) 8) Double swiping 9) Moving hand towards and away with thumbs-up (single thumbs up) 10) Double thumbs-up 11) Fingers waving while the palm stays still (single waving) 12) Double waving 13) Sliding the hands from left to right (single sliding) 14) Double sliding The photos of the gestures are depicted in the Fig. 4, Where each gesture is captured over a fixed period of Tg = 3 s. B. Preparing the Signal for the DCNN In order to capture the transient behavior of the Doppler components, the joint time-frequency information is extracted

Fig. 4. Pictures of the employed gestures for the experiment along with experiment setup in the top left of the figure.

from the received signals using the STDFT given as [21], X[k, m] = STDFT{x[n]}[k, m] n=∞ X 2π = x[n]w[n − m]e−jkn N ,

(2)

n=−∞

where x[n] are the time samples of the collected radar signal, and w[n − m] is a windowing function that restricts the signal to a short-time frame of a fixed duration of N samples, n is the index of the time-domain signal samples, k is the index of the frequency-domain component, and m is the window stride index. We utilize a general purpose Hamming window function with soft edges in order to reduce the effect of spectral leakage caused by the share edges of the windowing process. The

4

Fig. 6. Illustration of phase-difference method for determining angle of arrival using two close receivers.

two received waves which in-turn can provide an estimate for the AoA. Thus, the phase-difference could be viewed as an additional source of information that enhances the gesture classification accuracy. The phase-difference between each STDFT element is calculated as follows, ∆ϕ[k, m] = ∠X1 [k, m] − ∠X2 [k, m].

(4)

Since phase information is confined to the circular range between 0 and 2π, a linear transformation of the phasedifference ∆ϕ would lead to severe errors at the edge points. For this reason, a classical non-linear transformation is applied to find the AoA of a narrow band signal at each Doppler frequency component given as, λo ∆ϕ[k, m] , (5) 2πd where λo is the wavelength of the carrier and d is the distance between the two receiving-antennas. This can be explained as follows, consider Fig. 6 where the reflected signal from the target is assumed to have a planar wavefront. If the received signal at the first antenna is given by x1 (t) = cos(2πf t), then the received signal at the second antenna will have a time 0 delay τ = dc corresponding to the path difference d0 and will be given by x2 (t) = cos(2πf (t − τ )). The path difference d0 is given by d0 = d sin θ, where the corresponding phasedifference is given by, θ[k, m] = sin−1

Fig. 5. Sample spectrograms of the gestures collected from the first receivingantenna.

spectrogram is estimated by striding the frame over the entire duration of the captured signal, and calculating the squared amplitude spectrum |X[k, m]|2 for each frame. The selected Hamming window width is Tw = 32 ms which gives N = fs × Tw = 256 samples, selected as a tradeoff between resolutions in the time and frequency domains. The selected overlap between subsequent window-frames is selected as α = 50% which is equal to 128 sample points. Accordingly, the number of time bins in the spectrogram frame is calculated as, fs × Tg M= − 1 ≈ 186 time bins. (3) N/2 Each STDFT window produces N frequency points, however, as discussed earlier only a portion of these frequencies are useful due to the limited extent of the Doppler signature of hanggestures. Accordingly, the the number of useful frequency l points is N × fsf/2 = 64. Examples of the collected gesture spectrograms are shown in the Fig. 5, where only the results of the first receiving-antenna are depicted. The two receiving antennas which are physically spaced by a small distance 1.7 cm apart in our setup, causes a phasedifference between the incident waves. This phase-difference is related to the difference in the path-length between the

∆ϕ = 2πf τ = 2πf

d0 2πd sin θ = , c λo

(6)

accordingly, the AoA is obtained as [22], λo ∆ϕ . (7) 2πd Where the result in (5) is a 2D matrix of the same size of the spectrogram, each element of this matrix represents the relative AoA at different time and frequency points. An example of the AoA matrix is depicted in Fig. 7 for the 14 hand-gestures. The signal amplitudes received by the two antennas are slightly different due to the difference in distances to the target which gives additional information about the angle of the target. Therefore the spectrograms of the beat signals from the first and the second receiving antennas are used as inputs to the first and the second input channels of the DCNN respectively. θ = sin−1

5

Fig. 8. Illustration of deep convolutional neural network with three hidden layers. Fig. 7. Sample AoA matrix corresponding to each Doppler component of the gestures.

III. DCNN A RCHITECTURE In-order to detect the captured gestures, a DCNN is applied to extract and then classify the features provided by the input channels. The main advantage of using a DCNN for this purpose is that it does not require any pre-defined feature estimation, instead, the network itself is capable of learning the features during the training process by optimizing the network weights [23]. Due to their complex multi-layer structure, DCNNs require very large number of training data sets and lengthy training procedures. However, with the recent development of highperformance graphical processing unit (GPUs), practical implementation of DCNNs became feasible. It is widely documented in the research literature that DCNNs are particularly efficient in image classification [9]. It has been attributed to their property of having multiple local connections, shared weights and feature pooling mechanisms which, to certain extend, mimic human visual cortex by performing feature extraction. A typical architecture of a DCNN consists of several convolutional layers which extract features through sliding filter banks, followed by a non-linear activation function. The activation function is then followed by a pooling layer which

merges similar features in-order to reduce the size and noise in the input data. A convolutional layer along with the activation function and the pooling layer constitute a single CNN layer in DCNN, which is then repeated multiple times according to the requirement of the training process. The final layer of a DCNN is a fully connected neural network which classifies the input to its corresponding class. Fig. 8 shows the utilized architecture of the DCNN in the proposed framework having three input channels. The data dimension is fixed to 60x180 (by rounding 64x186 useful frequency points and time bins) in-order to reduce the computation time. We use three CNN layers with six filters in the first layer, four filters in the second and two in the third layer with a filter size of 5 × 5. The size of the filter is empirically determined, where it is common to have 3 × 3 to 20 × 20 [4] in spectrogram classification problems. The training process is carried out using the back propagation algorithm [24] with a stochastic gradient descent (SGD) and batch normalization. This algorithm is selected to address the over-fitting issue and to speed-up the training process. After the convolutional layer, we use a rectified linear unit (ReLU) [25] as the activation function, which is computationally faster and more efficient than conventional activation functions like the sigmoid function and hyperbolic tangent function. ReLU applies f (x) = x+ = max(0, x) to every

element in the input, which basically eliminates negative values from the input data. After ReLU comes the pooling layer, which reduces the data dimension by selecting the mean or maximum value from an n × n data space, where we use 2 × 2 max-pooling. Finally a general fully connected multilayer perceptron (MLP), which connects all input nodes to all output nodes is used for classification. We use 5 epochs for the training with mini-batch size of 10. An epoch means a single pass through the entire training data set during the training process. The training set is divided into mini-batches which aids in fast updating of the weights. Each pass through the mini-batch is known as an iteration. Therefore the product of mini-batch size and number of iterations gives the total training data set size. IV. E XPERIMENTAL VERIFICATION The experimental verification is performed using 250 recoded samples for each of the 14 gestures. This set is divided into 80 % for the DCNN training with 5-fold validation and 20 % for the testing. The training is repeated for 5 times by rotating the testing set and the training set, such that the average classification accuracy is calculated. The samples were taken in different environments (radar locations) with varying R Neural gesture positions, speeds and starting times. Matlab Network Toolbox is used as the template to train the DCNN. The estimated success rate (classification accuracy) βˆ is obtained by simply dividing the number of successes over the total number of test samples. The error margin  can be calculated based on the sample size n such that the actual value of the success probability β is bounded within βˆ ±  with a given probability p (confidence). To obtain the estimation error, we rely on the close approximation of the binomial distribution to Gaussian distribution as the sample size increases, where the standard deviation of the binomial process can be approximated to, s ˆ − β) ˆ β(1 . (8) σβ ≈ n ˆ p σβ , Thus, the confidence interval can be approximated by β±z where zp denotes the inverse-CDF of a standard normal distribution (quantile function) calculated at the probability 1 − 1−p 2 . This is because 1 − p is the out of confidence bounds probability, 1−p is the upper (or lower) out of confidence 2 probability, and thus 1 − 1−p is the corresponding upper 2 probability. Accordingly, the error bound is calculated from, s ˆ − β) ˆ β(1  = zp σβ = zp . (9) n Given our sample size of n = 250, and taking the confidence interval as p = 90%, the error bounds are calculated based on the obtained success probability estimate βˆ and are indicated in Fig. 9. In-order to compare the performance of the proposed two antennas radar system against the performance of a single antenna without changing the DCNN architecture, we send three identical copies of the spectrogram from a single receivingantenna to the 3 input channels of the DCNN. Fig. 9 shows the

Classification accuracy [%]

6

100 80 60 92.13 2.96 %

89.67

85.6 %

3.47%

40 20

Test classification accuracy Test misclassification rate

0 2 antenna

1 antenna

Literature

Fig. 9. Performance comparison between the proposed approach with a single-antenna and two-antennas obtained from the testing, also showing the performance of the single-antenna system described in literature [4]. TABLE I C ONFUSION M ATRIX Class

1

2

3

4

5

6

7

8

9

10

11

12

13

1

81.96

7.85

.15

0

2.08

0

0.17

0.09

1.77

0.505

5.07

0.3

0.068

14 0

2

6.13

85.92

0.38

0.07

0.59

0.57

0.63

1.58

0.23

0.59

0.56

1.91

0.39

0.42

3

0.18

0.15

98.22

0.64

0.30

0.07

0

0.0

0.08

0

0.26

0.08

0

0

4

0

0

0.57

97.38

0

0.96

0

0.08

0.09

0.50

0

0.20

0

0.13

5

0.47

0.82

0.24

0.24

92.11

3.49

0.91

0.16

0.24

0.73

0.23

0.25

0.08

0

6

0.08

0.62

0

0.4

4.24

92.83

0.07

0.90

0

0.08

0

0.60

0

0.17

7

0.64

1.45

0

0

1.42

0.09

88.16

1.30

0.41

0.25

4.74

0.33

1.20

0

8

0.98

2.11

0

0.17

0.22

0.28

2.01

89.13

0

0.11

0.72

2.66

.15

1.42

9

0.80

0.09

0.09

0

0

0

0.09

0

95.71

2.32

0

0

0.90

0

10

0.16

0.32

0

0.40

0

0.15

0

0.07

3.74

92.81

0

0

2.21

0.14

11

2.93

0.91

0.07

0

0.20

0

3.20

1.03

0.27

0

90.97

0.42

0

0

12

0.61

3.16

0

0

0.23

1.29

0.07

2.09

0

0

0.89

91.62

0

0.15

13

0.09

0.09

0

0

0.16

0

0.50

0.09

0.72

2.75

0

0

94.45

1.14

14

0.07

0

0

0

0

0

0

0.08

0

0.08

0

0

1.57

98.21

average classification accuracy obtained using the proposed method for both; single-antenna and two-antenna scenarios as well as we compare with another single-antenna approach described in [4]. The classification accuracy is shown along with the error bounds in each case calculated as per (9). The estimated classification accuracy based on two antennas outperforms the single-antenna by 2.5% and outperforms the single-antenna system described in [4] by 6.5%. Table. I shows the average confusion matrix between gesture obtained based on the training results. It can be seen that the successful classification rate ranges from 82% to 98% depending on the gesture, and the confusion between different gestures is relatively low providing a very good differentiation between the 14 different gestures. The largest confusion is observed between gestures 1 and 2, 1 and 11, and 5 and 6. It is possible that by re-defining these gestures, in a way that makes them more distinctive from each other, can lead to further improvement in the overall classification accuracy. V. C ONCLUSION This paper investigated the feasibility of using signals obtained from a Doppler radar with two receive antennas to train a deep convolutional neural network (DCNN) to classify hand-gestures, where the beat signals from the two receiving antennas were used to generate feature arrays representing 14 different hand gestures. The paper also demonstrated how to map the two-antenna time domain signal into three input arrays of a DCNN whereby two of the arrays were given as spectrograms of the received signals, and one was the angle-ofarrival of each Doppler component. The testing results showed

7

TABLE II N OTATIONS AND S YMBOLS Parameter

Symbol

Value

Carrier frequency Sampling rate Gesture period Time window Doppler frequency Target radial velocity Angle of Arrival STDFT window size STDFT window overlap STDFT Frequency points STDFT Time bins STDFT phase difference Path difference Antenna distance Wavelength Input size Convolutional layers Convolutional layer 1 Convolutional layer 2 Convolutional layer 3 Pooling layer Connected layer elements Training epochs Training iterations per epoch Training mini-batch size Total training iteration

fo fs Tg Tw fd v θ w α N M ∆ϕ 0 d d λo -

24 GHz 8000 samples/s 3s 32ms Hamming with 256 samples 50% 256 points 186 1.7 cm 1.25 cm 60x180x3 3 6 filters 5x5 4 filters 5x5 2 filters 5x5 2x2 max pooling 14 5 224 10 1120

that the proposed two antenna method outperforms previously used single antenna approaches by 6.5% at a similar DCNN complexity. R EFERENCES [1] H.-J. Kim, J. S. Lee, and J.-H. Park, “Dynamic hand gesture recognition using a CNN model with 3D receptive fields,” in International Conference on Neural Networks and Signal Processing. IEEE, june 2008, pp. 14 – 19. [2] E. Ohn-Bar, A. Tawari, S. Martin, and M. M. Trivedi, “Predicting driver maneuvers by learning holistic features,” in IEEE Intelligent Vehicles Symposium Proceedings, june 2014, pp. 719 – 724. [3] S. S. Edris, M. Zarka, W. Ouarda, and A. M. Alimi, “A fuzzy ontology driven context classification system using large-scale image recognition based on deep CNN,” in Sudan Conference on Computer Science and Information Technology, nov 2017, pp. 1 – 9. [4] Y. Kim and B. Toomajian, “Hand gesture recognition using microDoppler signatures with convolutional neural network,” IEEE Access, vol. 4, pp. 7125–7130, 2016. [5] Y. S. Lee, P. N. Pathirana, C. L. Steinfort, and T. Caelli, “Monitoring and analysis of respiratory patterns using microwave Doppler radar,” IEEE Journal of Translational Engineering in Health and Medicine, vol. 2, pp. 1–12, 2014. [6] O. H. Y. Lam, R. Kulke, M. Hagelen, and G. Mollenbeck, “Classification of moving targets using mirco-Doppler radar,” in 2016 17th International Radar Symposium (IRS), may 2016, pp. 1 – 6. [7] A. Al-Hourani, R. Evans, P. M.Farrell, B. Moran, S. Kandeepan, Skafidas, and U. Parampalli, “Millimeter-wave integrated radar systems and techniques,” in Academic Press Library in Signal Processing Volume 7 (SIGP): Array, Radar and Communications Engineering, S. Theodoridis and R. Chellappa, Eds. Elsevier, 2016, ch. 7. [8] B. Cagliyan and S. Z. Gurbuz, “Micro-Doppler-based human activity classification using the mote-scale BumbleBee radar,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 10, pp. 2135–2139, oct 2015. [9] Y. Kim and T. Moon, “Human detection and activity classification based on micro-Doppler signatures using deep convolutional neural networks,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 1, pp. 8–12, jan 2016. [10] M. S. Seyfioglu, A. M. Ozbayoglu, and S. Z. Gurbuz, “Deep convolutional autoencoder for radar-based classification of similar aided and unaided human activities,” IEEE Transactions on Aerospace and Electronic Systems, pp. 1–23, 2018.

[11] L. Zhang, J. Xiong, H. Zhao, H. Hong, X. Zhu, and C. Li, “Sleep stages classification by CW Doppler radar using bagged trees algorithm,” in 2017 IEEE Radar Conference (RadarConf), may 2017, pp. 788 – 791. [12] L. R. Rivera, E. Ulmer, Y. D. Zhang, W. Tao, and M. G. Amin, “Radarbased fall detection exploiting time-frequency features,” in 2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP), jul 2014, pp. 713 – 717. [13] Y. S. Lee, P. N. Pathirana, R. J. Evans, and C. L. Steinfort, “Noncontact detection and analysis of respiratory function using microwave doppler radar,” Journal of Sensors, vol. 2015, pp. 1–13, 2015. [14] M. Mustafa, M. N. Taib, S. Lias, Z. H. Murat, and N. Sulaiman, “EEG spectrogram classification employing ANN for IQ application,” in 2013 The International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE), may 2013, pp. 199 – 203. [15] M. Liang, Y. Li, H. Meng, M. A. Neifeld, and H. Xin, “Reconfigurable array design to realize principal component analysis (PCA)-based microwave compressive sensing imaging system,” IEEE Antennas and Wireless Propagation Letters, vol. 14, pp. 1039–1042, 2015. [16] C. Zheng, T. Hu, S. Qiao, Y. Sun, J. Huangfu, and L. Ran, “Doppler bio-signal detection based time-domain hand gesture recognition,” in 2013 IEEE MTT-S International Microwave Workshop Series on RF and Wireless Technologies for Biomedical and Healthcare Applications (IMWS-BIO), dec 2013, pp. 1– 3. [17] J. Wang, X. Wang, L. Chen, J. Huangfu, C. Li, and L. Ran, “Noncontact distance and amplitude-independent vibration measurement based on an extended DACM algorithm,” IEEE Transactions on Instrumentation and Measurement, vol. 63, no. 1, pp. 145–153, jan 2014. [18] D. Miao, H. Zhao, H. Hong, X. Zhu, and C. Li, “Doppler radar-based human breathing patterns classification using support vector machine,” in 2017 IEEE Radar Conference (RadarConf), may 2017, pp. 456 – 459. [19] Y. Kim and H. Ling, “Human activity classification based on microDoppler signatures using a support vector machine,” IEEE Transactions on Geoscience and Remote Sensing, vol. 47, no. 5, pp. 1328–1337, may 2009. [20] BGT24MTR12 Silicon Germanium 24 GHz Transceiver MMIC, 3rd ed., Infineon Technologies, july 2014. [21] N. A. Khan, M. N. Jafri, and S. A. Qazi, “Improved resolution shorttime fourier transform,” in 7th International Conference on Emerging Technologies, sep 2011, pp. 1 – 3. [22] K. Pasala and R. Penno, “Accurate determination of frequency and angle of arrival from undersampled signals,” in Proceedings of IEEE Antennas and Propagation Society International Symposium, pp. 686 – 689. [23] Y. Kim and T. Moon, “Classification of human activity on water through micro-Dopplers using deep convolutional neural networks,” in Radar Sensor Technology, K. I. Ranney and A. Doerry, Eds. SPIE, may 2016, pp. 1–7. [24] U. Cote-Allard, C. L. Fall, A. Campeau-Lecours, C. Gosselin, F. Laviolette, and B. Gosselin, “Transfer learning for sEMG hand gestures recognition using convolutional neural networks,” in IEEE International Conference on Systems, Man, and Cybernetics (SMC), oct 2017, pp. 1663 – 1668. [25] H. Ide and T. Kurita, “Improvement of learning for CNN with ReLU activation by sparse regularization,” in 2017 International Joint Conference on Neural Networks (IJCNN), may 2017, pp. 2684 – 2691.