Design of an Acoustic Target Classification System ... - IEEE Xplore

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

1

Design of an Acoustic Target Classification System Based on Small-Aperture Microphone Array Jingchang Huang, Student Member, IEEE, Xin Zhang, Feng Guo, Qianwei Zhou, Huawei Liu, and Baoqing Li

Abstract— The acoustic recognition module of the unattended ground sensor (UGS) system applied in wild environments is faced with the challenge of complicated noise interference. In this paper, a small-aperture microphone array (MA)-based acoustic target classification system, including the system hardware architecture and classification algorithm scheme, is designed as a node-level sensor for the application of UGS in noisy situation. Starting from the analysis of signature of the acoustic signal in wild environments and the merits of small-aperture array in noise reduction, a closely arranged microelectromechanical systems MA is designed to improve the signal quality. Considering the similarities between speaker discrimination and acoustic target recognition, a classification algorithm scheme, consisting of a simplified Mel-frequency cepstrum coefficients and the Gaussian mixture model, is developed to distinguish acoustic targets’ patterns. The proposed classification algorithm has been implemented on embedded system after being tested on training datasets. By combining the small-aperture array and low-complexity classification algorithm, the presented acoustic classification prototype system is portable and efficient. To demonstrate the efficiency of the design, the prototype system is verified in a practical situation with the employment of wheeled and tracked vehicles. Evaluation of the system performances in comparison with other state-of-the-art methods indicates that the proposed design is practical for the acoustic target classification and may be widely adopted by UGS. Index Terms— Acoustic classification system, Gaussian mixture model, Mel-frequency cepstrum coefficients, microelectromechanical systems, noise, small-aperture array.

I. I NTRODUCTION

T

HE unattended ground sensor (UGS) system, consisting of a lot of wireless sensor nodes, is usually employed in wild environments to acquire military intelligence about intruding targets through detecting and processing their image, acoustic and infrared signals [1]–[3]. Target classification is one of the most important and demanding technologies for UGS [2]. Compared with seismic, image, and infrared signals, moving target classification approach based on acoustic signals provides a simple, portable, easily implemented, and

Manuscript received April 26, 2014; revised August 9, 2014; accepted October 6, 2014. This work was supported in part by the Research Fund under Grant CXJJ-14-S77 and in part by the Research Foundation under Grant 9140C18010213ZK34001. The Associate Editor coordinating the review process was Dr. Antonios Tsourdos. J. Huang, X. Zhang, F. Guo, and Q. Zhou are with the Shanghai Institute of Microsystems and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China, and also with the University of Chinese Academy of Sciences, Beijing 100190, China (e-mail: [email protected]). H. Liu and B. Li are with the Shanghai Institute of Microsystems and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIM.2014.2366979

biologically safe scheme [4]–[6]. Therefore, the research of acoustic target classification has been received various attentions [2], [7]–[9]. The previous classification algorithms of the acoustic signal can achieve satisfied performances in the condition of pure signal, nevertheless, the results will degrade when the signal is contaminated by the noise [10]. Moreover, in wild environments where UGS is deployed on, the target signals are usually interfered by strong wind noise. In addition, the wind noise is unavoidable since it cannot be totally removed by windshield. Thus, an acoustic target classification scheme which is robust to noise interference is urgently demanded by the practical applications. The beam forming technology, based on microphone array (MA), can effectively improve the quality of the acoustic signal by emphasizing the desired component and restraining the interferential noise [11]. Thus, in the field of speech signal processing, MA is usually employed to obtain highquality signal for speech and speaker recognition [12]–[15]. Specifically, most of the positions of the speech source and observer (acoustic sensor) are fixed, which means the time delay between an acoustic source and a microphone is comparatively constant, so the adaptive enhancement algorithms are likely to be fast converged when processing the speech signal. That is why it is easy for MA to achieve speech enhancement by using current algorithms [16]. However, as to the acoustic signal processing of moving targets, the time delay between acoustic source and sensor changes quickly, the conventional array-based signal enhancement algorithms can hardly work well in the embedded system. Moreover, in tradition, the electret condenser microphones (ECMs) are usually adopted to construct acoustic array but the weaknesses of ECM array, such as large-aperture and high cost, constrain its application in UGS. Fortunately, the emergence of small-aperture MA induced by microelectromechanical systems (MEMS) technology, brings new ideas for the scheme of acoustic target classification. Now, the advances in MEMS have enabled the development of MEMS microphones that are capable of integrating an acoustic transducer, a preamplifier, and even an analog-todigital converter (ADC) into a single chip. The superiorities of MEMS microphone, including high value (low cost/high quality) and small package, determine the aperture of acoustic array to be designed much smaller than before, making the MEMS microphone-based array more portable and useful. Although MA has been widely used in speech and speaker recognition [12], [13], [17], few of them are considered for the application of acoustic target classification in UGS. Considering the prospects of MEMS MA, in this

0018-9456 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

paper, an acoustic target classification system based on small-aperture MEMS MA, is designed for the requirements of UGS. The main idea here is first to obtain high-quality acoustic signal using small-aperture array, then to classify the target’s pattern by an efficient scheme, which is inspired by speaker recognition, because the principle of acoustic target classification is similar to that of speaker. In view of this, first, the merits of small-aperture array in signal enhancement are discussed on the basis of the characteristic of MA and environmental noise. Second, the classification scheme, consisting of a simplified Mel-frequency cepstrum coefficients (SMFCCs) feature and the Gaussian mixture model (GMM) classifier is proposed for acoustic target classification. Finally, the proposed prototype system is verified in experiment and the performances are discussed in detail. The main contribution of this paper is the design and implementation of a arraybased acoustic identification system, which is robust to noise interference. This paper is organized into five sections including the present one. Section II discusses the characteristics of the acoustic signal obtained in wild environments, and then the merits of small-aperture array in signal enhancement are analyzed. Section III illustrates the design of acoustic target classification, including the system hardware architecture and the classification algorithm scheme. In Section IV, the performances of the proposed acoustic classification system is verified in experiments and compared with other conventional methods. Finally, in Section V, some of the distinctive features of this investigation are highlighted.

important to mention that pitch (or fundamental frequency) corresponds to the special harmonic whose q value is equal to 1. N p is the total number of pitch and K h is the number of harmonics of the hth pitch. The normalized angular frequency ωh,q of the qth harmonic is related to the normalized angular pitch frequency ωh,0 , expressed by ωh,q = qωh,0

(2)

where ωh,0 = 2π Fh,0 /Fs , Fh,0 , and Fs are the hth pitch frequency and the sampling frequency of x(n) in Hz, respectively. The harmonics of the acoustic signal with the capability of providing a fingerprint, and thus can be used for moving target classification [18]. For example, in [2], harmonic amplitudes approximating the harmonic signature of the time-domain acoustic signal captured by microphone node are estimated for vehicle discrimination. The method in [2] is energy efficient and adequate for low-power unattended sensor, which perform sensing, feature extraction, and classification in a standalone scenario. However, in the practical environment, the target signal is contaminated by complicated acoustic noise, especially the wind noise [10], [19], which results in the loss of harmonic information. In the presence of an additive wind noise v(n), a pure target acoustic signal x(n) gets contaminated and produces the noisy target acoustic observations y(n), which is expressed as y(n) = x(n)+v(n) =

N p Kh

Ah,q cos(nωh,q +ϕh,q )+v(n). (3)

h=1 q=1

II. S IGNAL P RESENTATION M ODEL BASED ON S MALL -A PERTURE MA A. Acoustic Signature of Moving Targets in Wild Environments Although a moving target’s acoustic signal consists of a combination of various kinds of sound signals generated by engine, propulsion system, exhaust system, aerodynamic effects, and mechanical effects, the spectral content of a moving target’s signal is approximately regular and mainly dominated by the engine, propulsion system and exhaust system [18]. Since the engine, propulsion, and exhaust system originate in mechanical rotation mechanisms, so all of them produce periodic sounds, but since these pressure waveforms are not purely sinusoidal, their spectra contains fundamental frequencies and corresponding harmonics [2]. Therefore, the interested spectral contents of the targets are consisted of a limited number of pitch, harmonics, and some high-frequency components [18]. Typically, a target’s acoustic signal x(n) consisted of N p K h harmonics can be expressed as a sum N p pitch and h=1 of its pitch and harmonics shown in the following: x(n) =

N p Kh

Ah,q cos(nωh,q + ϕh,q )

(1)

h=1 q=1

where the parameters Ah,q , ϕh,q , and ωh,q represent the amplitude, phase, and the normalized angular frequency of the hth pitch’s qth harmonic of signal x(n), respectively. It is

The power spectrums of the acoustic signals generated by the same tracked vehicle and sampled at different wind noise conditions are quite different at low-frequency band, as shown in Fig. 1, the left figure is the power spectrum of signals collected under two-level wind power and the right one is sampled under four-level wind power. The colorbar of Fig. 1 shows the value of power spectrum, the x- and y-axis represent the time and frequency, respectively. As shown in Fig. 1, the harmonics are distinct when the wind noise is weak, however, the harmonics degrade and can hardly be separated from noise when the wind level is as high as 4. According to (1), the target acoustic signal is mainly characterized by pitch ωh,0 and harmonics ϕh,q , which represent as the distribution of harmonics. Therefore, an unknown target is hardly recognized once its harmonics are confused with noise. In the following sections, we attempt to classify unknown targets using the noisy observations y(n). B. Improvement of Small-Aperture MA in Signal Quality In [20], an array of many ADMP411 MEMS microphones (produced by analog devices) are closely spaced in a circuit to improve the overall signal-to-noise ratio (SNR) of the system to a point that it can be used for very low noise recording studio applications. In addition, a distributed MEMS MA-based system for sound source localization is proposed in [21]. In [22], a small MEMS MA system is designed for the direction finding of outdoors moving vehicles. Inspired

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. HUANG et al.: DESIGN OF AN ACOUSTIC TARGET CLASSIFICATION SYSTEM

Fig. 1.

3

Tracked vehicle’s power spectrum obtained at different wind power level conditions. (a) Two-level wind power. (b) Four-level wind power.

by the application of MEMS microphone in small-aperture array [20]–[22], in this paper, an acoustic target classification system based on small-aperture MA, is considered for the requirement of UGS. Having closely placed N microphones in a small-aperture array of which the maximal distance among microphones is no more than 10 cm for the aim of portable application. The observed acoustic signal of one of the microphones is yi (n) = x i (n) + vi (n),

i = 1, 2, . . . , N

(4)

where x i (n) and vi (n) represent the received version of the i th microphone of the acoustic source and that of noise signal, respectively. Because all microphones are tightly arranged and the bandwidth of the acoustic signal is narrow, which results in the excellent consistency of the phase of receiving signals, so the differences between their received versions of acoustic source are nearly negligible, which means x i (n) ≈ x j (n),

i = j.

(5)

Let s(n) be the average value of all microphones, so y1 (n) + y2 (n) + · · · y N (n) N x 1 (n) + v1 (n) + · · · + x N (n) + v N (n) = N v1 (n) + · · · + v N (n) ≈ x 1 (n) + . N

s(n) =

(6)

Since the received noise vi (n) consists of wind and circuit noises, moreover, all of these noises are comparatively incoherent [19], [22], then the SNR of s(n) is Power(x 1 (n)) Power(signal) = 10 lg Power(noise) Power(v1 (n))/N Power(x 1 (n)) + 10 lg N. (7) = 10 lg Power(v1 (n))

SNR(s(n)) = 10 lg

According to (7), every time the number of microphones in the array is doubled, the overall SNR increases by about 3 dB (if N = 2, 10 lg 2 ≈ 3), that is to say that the average method of small-aperture array can effectively improve the signal quality [20]. Based on the superiority of small-aperture array in the improvement of SNR, this paper employs the average value of MA for acoustic target classification.

III. D ESIGN OF THE ACOUSTIC TARGET C LASSIFICATION S YSTEM In this section, we first elaborate our choice of the system hardware architecture, and then describe the design of the classification scheme. A. System Hardware Design 1) MA Geometry: In general, the uniform array can provide balanced space for circuit design and uniform circular arrays have the same resolution in all directions [16]. Therefore, uniform circular geometry is employed to deploy the microphone. According to Section II-B, the more microphones are employed in the array, the better signal will be obtained. However, the increase of the number of microphone consumes too much space of printed circuit board (PCB) and makes the circuit too complexity. Here, we fix the circuit aperture as 4 cm and uniformly deploy eight MEMS microphones on the PCB. The reason for choosing 4 cm as the MEMS array’s aperture is for the aim of portability. The employed MEMS microphone is the ADMP504, a high-SNR microphone, which is produced by analog devices and honored as the best microphone of 2012. To compare the performances of MEMS array with ECMs, an ECM array whose array geometry is the same as MEMS array, is designed and used in experiment. Here, an ECM microphone CHZ-213, manufactured by BAST Corporation, is adopted to construct the ECM microphone array. Since the volume of single ECM microphone is large, the aperture of ECM array is hardly designed to be as small as the MEMS array’s. Therefore, an ECM MA whose maximal aperture is 15 cm is designed for comparison. The superiority of ADMP504 is high SNR, the advantage of CHZ-213 is the outstanding lowfrequency response capability which outperforms ADMP504. The appearances of MEMS and ECM arrays are shown in Fig. 2. 2) System Hardware Architecture: The block diagram of the prototype system is shown in Fig. 3. The system is divided into four modules according to their functions: MA (Module 1), preprocessing and sampling (P&S) module; (Module 2), realtime digital signal processing (DSP) module (Module 3), and radio frequency (RF) module (Module 4). There is a connector between Modules 1 and 2 for exchanging MEMS array and ECMs in experiments. The MA is employed to capture the



Fig. 2.

MA geometry (a) MEMS array. (b) ECM array.

Fig. 3.

Diagram of system hardware architecture.

B. Classification Scheme Design

Fig. 4.

System hardware circuits.

acoustic signal, after preprocessing of synchronized filters and amplifiers, simultaneous sampling ADCs are used to quantify the signals from the microphones. The synchronized filters and amplifiers mean that a comparatively strict demand on the consistency of all the channels is requested. The classification algorithm is implemented in a DSP chip of Module 3, then the RF module transmits the classification results to the command center of UGS. The layout of PCB is shown in Fig. 4. The specific devices used in Modules 2–4 can be found in [22].

It is known that the MFCC and the GMM algorithm are popular in the field of speaker recognition [23]. In this paper, according to the acoustic characteristics of targets, a simplified version of MFCC and the GMM are employed for acoustic target classification. In the training stage, the acoustic signals of known targets are used to train the parameters of GMM. In the test stage, the output probabilities are calculated using the classification coefficients obtained in training stage, then the decision is made on the basis of maximal output probability. The classification scheme is shown in Fig. 5. a) Feature Extraction Using SMFCC: MFCC is the most widely used feature in speech and speaker recognition [23]. In practice, we found it is easy to distinguish different moving targets by hearing their sound. It means that the problem is somewhat similar to speaker recognition. According to the descriptions of Section II-A, in the spectrum of the acoustic signal, the harmonics induced by different components (i.e., engine and propeller) of targets are concurrent in the instantaneous frequency. Specially, the powers of the low-frequency harmonics of the observed signal are larger than that of the high-frequency harmonics, since the high-frequency signal attenuates more quickly than the low frequency signal [24], as the sound propagates through medium. In addition, human ear is not equally sensitive to all frequency bands for it is more impressible at lower frequency and more obtuse at higher frequency. That is why researchers employ MFCC for the moving targets classification and it achieves satisfying accuracy [25]. However, there are some differences between speech and acoustic signals of moving target. On the one hand, the


5

where f (q) is the center frequency of qth triangular filer in Mel scale. 4) Take the logarithms of the powers at each of the triangular filter E(q) = ln Mel(q), 1 ≤ q ≤ Nfilter.

(13)

5) Conduct the discrete cosine transform of the list of Mel logarithm powers L π(q − 0.5)i E(q) cos . (14) SMFCC(i ) = L q=1

Fig. 5.

Classification scheme.

frequency of the acoustic signal ranges from 0 to 600 Hz, which is narrower than speech signal [26]. On the other hand, the acoustic signal of moving target is mainly produced by periodic mechanical movement, which means its periodic appearance are more obvious than speech signal [18], [27]. Therefore, an SMFCC method is employed on the basis of the characteristics of the acoustic signal. The only difference between the SMFCC and standard MFCC is whether to conduct delta operation. Because the acoustic signal is more stationary and narrower than speech signal, so the delta operation will make no sense and can be omitted. Specifically, the SMFCC can be extracted through the following steps. 1) Preprocess the acoustic signal, including preemphasizing and windowing, by the following: Preemphasizing: s(n) = x(n) − 0.9x(n − 1) (8) 2πn ∗ 0.54 − 0.46 cos Windowing: y(n) = s(n) L −1 0≤n ≤ L −1 (9) where x(n) is the original signal and L is the frame length. 2) Take the Fourier transform on the y(n) Y (K ) =

L−1

y(n)e

− j 2πLK n

, 0 ≤ K ≤ (L − 1). (10)

n=0

3) Map the powers of the spectrum obtained above into the Mel scale, using triangular overlapping filter bank Mel(q) =

L−1

Hq (K )|Y (K )|2, 1 ≤ q ≤ Nfilter (11)

K =0

where Nfilter is the number of filter in filter bank, it is usually configured as 24 H K ⎧q ⎪ 0, ⎪ ⎪ ⎪ [2K − f (q−1)] ⎨ f (q)− f (q−1)] , = [ f (q+1)−[2f (q−1)][ f (q+1)−K ] ⎪ ⎪ [ f (q+1)− f (q−1)][ f (q+1)− f (q)] , ⎪ ⎪ ⎩ 0,

K < f(q − 1) f (q − 1) < K < f (q) f (q) < K < f (q + 1) K > f (q + 1) (12)

Finally, the obtained amplitudes of the resulting spectrum SMFCC(i ) are regarded as the SMFCCs. b) Target Classification Using GMM Algorithm: The GMM algorithm has been successfully applied to text classification, image information retrieval, and speech signal processing, especially the speaker recognition [28], [29]. In UGS, GMM-based classifiers can be employed in a node of acoustic array in the same way as the applications mentioned above. Let X be a T -dimensional vector and P(X|w, μ, ) be a GMM with M component densities shown in the following: P(X|w, u, ) =

M

wm N(X|um , m )

(15)

m=1

where wm > 0, m = 1, . . . , M are the mixture weights, with which the summation of wm is 1, and P(X|μm , m ), m = 1, . . . , M are the T -variate Gaussian densities with mean vector um and covariance matrix m . GMM can assume several different forms, depending on the type of covariance matrices. The parameters λ = {wm , um , m }, m = 1, . . . , M of the GMM should be trained to apply the above classification procedure. It is commonly performed by the expectation maximization (EM) algorithm using training data [30]. Let q(i |X j ) be the class conditional probability with which the j th data X j is generated from the i th Gaussian component. Estimations of the mixture weight wm , the mean vector um , and the covariance matrix m are carried out through iterations of sequential parameter updating from (16)–(19) for each component i . Through these operations, the likelihood of the model converges to a local maximum ˆ i) wˆ i N(X j |uˆ i , q(i |X j , λ) = M ˆ i) ˆ m N(X j |uˆ i , m=1 w

(16)

1 q(i |X j , λ) (17) wˆ i = T j =1 T j =1 q(i |X j , λ)X j uˆ i = T (18) j =1 q(i |X j , λ) T ˆ i )T (X j − uˆ i ) j =1 q(i |X j , λ)(X j − u ˆi = . (19) T j =1 q(i |X j , λ) T

The GMM method can be conducted by the following steps. 1) Determine the number of component densities of GMM, namely, choose a value for M, for example, 4, 8, 16,



TABLE I D IFFERENT TARGET S PECIFICATIONS

Fig. 8.

Classification rate varies with the number of Gaussian component.

B. Efficiency of Small-Aperture Array in Signal Enhancement

Fig. 6.

Situation of experiment.

or 32. The optimal value of M is difficult to be derived in theory, therefore, it is commonly decided through experiments. 2) Set the value of converge threshold and the maximal iteration times. Specifically, the training procedure can be stopped at the condition that |λi − λi+1 | is less than 0.0001 or the iteration times is larger than 100 (|·| denotes the Euclidean distance). 3) The GMM is initialized by k-means algorithm and iterated by EM method, just as (16)–(19). IV. E XPERIMENT AND R ESULTS Experimental studies were performed on some suburban districts around Shanghai and Nanjing to demonstrate the feasibility of the system and the performance of the classification scheme proposed in this paper.

According to Section II-B, the small-aperture array is capable of improving the quality of the acoustic signal, here, the comparisons of the output signal of single microphone and that of array after averaging processing are shown in Fig. 7. The output signal of a single ECM microphone of a wheeled vehicle, sampled when the wind power is three-level, is displayed in Fig. 7(a), and that of a tracked vehicle is shown in Fig. 7(b). Correspondingly, Fig. 7(c) and (d) show the output signals of ECM array after averaging processing, and Fig. 7(e) and (f) show those of the MEMS array. Fig. 7 observes the signal quality of array output, which undergoes averaging processing is better than the single microphone’s. In addition, the contour of harmonics of Fig. 7(e) is more obvious than that of Fig. 7(c), at the same time Fig. 7(f) is better than Fig. 7(d), which means the result of MEMS array in signal enhancement is superior than ECMs. As discussed in Section II-B, the efficiency of array in signal enhancement is inversely proportional to the aperture of array, that is why the MEMS array is superior to ECMs. C. Classifier Parameters Optimization

A. Datasets and Other Details In experiment, four different targets are employed to induce acoustic signal and then build datasets. Table I lists some properties regarding these targets and divides all targets into two categories, wheeled and tracked vehicles. The smallaperture MEMS array (maximal aperture 4 cm) and ECM array (maximal aperture 15 cm) are simultaneously applied to capture the acoustic signals of all moving targets. The signals are sampled by the MAXIM MAX11043, a 16-b simultaneous-sampling ADC (Maxim Integrated Products, Sunnyvale, CA, USA), with 2048-Hz sampling rate. The length of road is 700 m, 350 m on each side of the acoustic array. Fig. 6 shows the experimental situation where the wind power is usually at three to four level, sometimes even at five to six levels. Only one target at a time is driven past the acoustic sensor at variable speeds with different directions while the data are collected. A total of 600 runs are sampled, the corresponding wind power information is recorded by an anemometer, and the statistical windy information of the whole experiment stage is shown in Table II.

As discussed in Section III-B, the GMM classifier involves several parameters, including the number of Gaussian components, the maximal iteration times, and the converge threshold. To improve the performance of classifier, the first two parameters are usually selected to be sufficiently large and the last one is always chosen to be very small. The last two parameters are only used in training stage and do not bring any burden to the implementation of algorithm on embedded system. Therefore, the maximal iteration times and the converge threshold are configured as 100 and 0.0001, respectively. However, increasing the number of Gaussian component requires storing more classification coefficients and performing more calculations. Hence, a moderate value should be configured for the number of Gaussian component. The classification results vary with Gaussian components’ number, as shown in Fig. 8. According to Fig. 8, as the value of M increases, the classification ratio becomes higher. In addition, the classification ratio becomes relatively stable when M is higher than 10. To preserve moderate computational overhead, the number of Gaussian component is configured as 10.


7

TABLE II S TATISTICAL W IND P OWER I NFORMATION OF THE W HOLE E XPERIMENT S TAGE

Fig. 7. Spectrum of the acoustic signal. (a) Tracked vehicle acoustic signal sampled by a single ECM microphone. (b) Wheeled vehicle acoustic signal sampled by a single ECM microphone. (c) Tracked vehicle acoustic signal sampled by ECM array and processed by averaging method. (d) Wheeled vehicle acoustic signal sampled by ECM array and processed by averaging method. (e) Tracked vehicle acoustic signal sampled by MEMS array and processed by averaging method. (f) Wheeled vehicle acoustic signal sampled by ECM array and processed by averaging method.

D. Results and Discussions 1) Classification Accuracy: In the classification experiment, the datasets are equally divided into two subsets, and each subset has a size of 300 recordings. One of the subsets is used to train the classifier, then the acquired classifier coefficients are applied to recognize the other subset. To obtain a convincing conclusion, the division of dataset and classification experiment is repeated 100 times and the average results are

shown in Table III, where the single MEMS displays the classification results of a method that utilize the SMFCC and the GMM classifier using only one microphone’s data, whereas the same as the single ECMs meaning. Moreover, the state-of-theart method Time Domain Harmonics’ Amplitudes (TDHA), which also employs acoustic signal for vehicle classification [2], is adopted for performances comparison. The variation of resolution is controlled by dividing a recording into frames with different lengths.



TABLE III C LASSIFICATION A CCURACY

TABLE IV C LASSIFICATION R ATIO VARIES W ITH W IND P OWER (R ESOLUTION I S F IXED AT 2 H Z )

corresponding sub-dataset. The classification results are listed in Table IV. According to Table IV, no matter under which wind power condition, the classification accuracy of SAAC is higher than other methods. In addition, the variance of SAAC is the smallest among all classification schemes. Considering these aspects, we believe that SAAC is more robust in the presence of wind noise. 3) Complexity Discussion: The proposed SAAC is a spectral-based algorithm whose complexity is O(nlog2 n). The TDHA, consisted of the feature of time-domain harmonics’ amplitudes and feedforward neural network classifier, provides a simple and energy-efficient method whose computational complexity is O(h*nt), where h is the number of harmonics and nt is the template width [2]. Although the TDHA is much easier than SAAC, its single fundamental frequency signal presentation model is too simple to fit the actual acoustic signal of most vehicles [18], thus, even in ideal condition, its best classification ratio is just 88.21%. In addition, the TDHA is very sensitive to wind interference, its performance degrades more severely than others’ when contaminated by noise. Therefore, the SAAC can be regarded as an alteration scheme for the demanding of practical application. V. C ONCLUSION

According to Table III, the classification ratio of the proposed small-aperture array-based acoustic classification (SAAC) is up to 98.38%, which is superior to other methods conducted in the experiment. The phenomenon that SAAC performance is higher than that of single MEMS shows that the improvement of quality of waveform is benefit to improve the recognition result. Due to the averaging processing, the signal quality of the output of MEMS array is better than the signal of single MEMS member, and thus the performance of SAAC is advantageous than that of single MEMS. Nevertheless, the results of ECM array are nearly just the same as that of single ECM, because the averaging method degrades due to the incoherent among ECM array’s member, which are resulted by its comparative large aperture. 2) Robust Performance: To explore the robust performances of the above-mentioned classification schemes at different wind power conditions, sequential recognition experiments are conducted using sub-datasets collected at different wind power. Since the size of signals sampled at wind power level 0, 1, 5, and 6 are little (as displayed in Table II), those data collected at adjacent wind power level are converged. Therefore, the datasets are divided into five sub-datasets, the wind power of each sub-dataset is 0–1, 2, 3, 4, and 5–6. In each recognition classification, a half of a sub-dataset is used to train the classifier, then the acquired classifier coefficients are applied to recognize the other part of the

A practical prototype system of acoustic target classification based on small-aperture MEMS MA is designed for the application of UGS. In this paper, first, the signature of the acoustic signal in wild environments and the merits of small-aperture array in noise reduction are discussed, then a closely arranged MEMS MA is designed for the improvement of signal quality. Second, considering the similarities between speaker identification and acoustic target recognition, a classification algorithm scheme consisting of SMFCCs and GMM is developed to distinguish acoustic targets’ patterns. Through combining the small-aperture array and low-complexity classification algorithm, the presented acoustic classification prototype system is portable and efficient. To demonstrate the efficiency of the design, the prototype system is verified in a practical situation with employment of the wheeled and tracked vehicles. Evaluation of the system performances in comparison with other state-of-the-art methods indicates that the proposed design is practical for the acoustic target classification. Note that in the proposed method, each time only one target is assumed to pass the acoustic sensor, which is a simplified version of the practical situation. In addition, the classification of mixture targets is beneficial to improve the efficiency of the acoustic classification system. Therefore, in the future, we will focus on the area of mixed target classification, besides more types of targets, MA geometries and noisy environments will be studied to further explore the performances of SAAC. ACKNOWLEDGMENT The authors would like to thank J. Chen for providing technological support on the MEMS microphone, as well as the associate editor and anonymous reviewers for their valuable comments and suggestions to improve this paper.


R EFERENCES [1] J. Huang, Q. Zhou, X. Zhang, E. Song, B. Li, and X. Yuan, “Seismic target classification using a wavelet packet manifold in unattended ground sensors systems,” Sensors, vol. 13, no. 7, pp. 8534–8550, 2013. [2] P. E. William and M. W. Hoffman, “Classification of military ground vehicles using time domain harmonics’ amplitudes,” IEEE Trans. Instrum. Meas., vol. 60, no. 11, pp. 3720–3731, Nov. 2011. [3] T. Damarla, “Azimuth & elevation estimation using acoustic array,” in Proc. 13th IEEE Conf. Inf. Fusion, Edinburgh, U.K., Jul. 2010, pp. 1–7. [4] G. Cauwenberghs et al., “A miniature low-power intelligent sensor node for persistent acoustic surveillance,” Proc. SPIE, vol. 5796, pp. 294–305, Aug. 2005. [5] J. Lan, Y. Xiang, L. Wang, and Y. Shi, “Vehicle detection and classification by measuring and processing magnetic signal,” Measurement, vol. 44, no. 1, pp. 174–180, 2011. [6] J. Lan, S. Nahavandi, T. Lan, and Y. Yin, “Recognition of moving ground targets by measuring and processing seismic signal,” Measurement, vol. 37, no. 2, pp. 189–199, Mar. 2005. [7] H. Wu and J. M. Mendel, “Classification of battlefield ground vehicles using acoustic features and fuzzy logic rule-based classifiers,” IEEE Trans. Fuzzy Syst., vol. 15, no. 1, pp. 56–72, Feb. 2007. [8] D. Li, K. D. Wong, Y. H. Hu, and A. M. Sayeed, “Detection, classification, and tracking of targets,” IEEE Signal Process. Mag., vol. 19, no. 2, pp. 17–29, Mar. 2002. [9] X. Jin, K. Mukherjee, S. Gupta, A. Ray, S. Phoha, and T. Damarla, “Asynchronous data-driven classification of weapon systems,” Meas. Sci. Technol., vol. 20, no. 12, pp. 1–12, 2009. [10] Y. Xu, Z. Xue-Yuan, X. Dong-Feng, and L. Bao-Qing, “A novel denoising method for acoustic target classification in wild environment,” in Proc. IEEE Int. Conf. Comput. Sci. Netw. Technol. (ICCSNT), Harbin, China, Dec. 2011, pp. 1398–1402. [11] M. Brandstein and D. Ward, Microphone Arrays: Signal Processing Techniques and Applications. New York, NY, USA: Springer-Verlag, 2001. [12] M. L. Seltzer, “Microphone array processing for robust speech recognition,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, USA, 2003. [13] D. C. Moore and I. A. McCowan, “Microphone array speech recognition: Experiments on overlapping speech in meetings,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hong Kong, Apr. 2003, pp. 497–500. [14] M. Omologo, M. Matassoni, P. Svaizer, and D. Giuliani, “Microphone array based speech recognition with different talker-array positions,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Munich, Germany, Apr. 1997, pp. 227–230. [15] A. R. Abu-El-Quran, R. A. Goubran, and A. D. C. Chan, “Securitymonitoring using microphone arrays and audio classification,” IEEE Trans. Instrum. Meas., vol. 55, no. 4, pp. 1025–1032, Aug. 2006. [16] I. A. McCowan, J. Pelecanos, and S. Sridharan, “Robust speaker recognition using microphone arrays,” in Proc. Speaker Odyssey-Speaker Recognit. Workshop, Crete, Greece, Jun. 2001, pp. 101–106. [17] D. Sun and J. Canny, “A high accuracy, low-latency, scalable microphone-array system for conversation analysis,” in Proc. ACM Conf. Ubiquitous Comput., Harrisburg, PA, USA, Sep. 2012, pp. 290–300. [18] V. Cevher, R. Chellappa, and J. H. McClellan, “Vehicle speed estimation using acoustic wave patterns,” IEEE Trans. Signal Process., vol. 57, no. 1, pp. 30–47, Jan. 2009. [19] D. K. Wilson, R. J. Greenfield, and M. J. White, “Spatial structure of low-frequency wind noise,” J. Acoust. Soc. Amer., vol. 122, no. 6, pp. 223–228, 2007. [20] Analog Devices, “High performance, low-noise studio microphone with MEMS microphones, analog beamforming, and power management,” Tech. Rep. CN-0284, 2013. [21] J. Tiete, F. Domínguez, B. D. Silva, L. Segers, K. Steenhaut, and A. Touhafi, “SoundCompass: A distributed MEMS microphone arraybased sensor for sound source localization,” Sensors, vol. 14, no. 2, pp. 1918–1949, 2014. [22] X. Zhang, J. Huang, E. Song, H. Liu, B. Li, and X. Yuan, “Design of small MEMS microphone array systems for direction finding of outdoors moving vehicles,” Sensors, vol. 14, no. 3, pp. 4384–4398, 2014.

9

[23] D. Jurafsky and J. H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition, 2nd ed. Cambridge, MA, USA: MIT Press, 2000. [24] G. P. Succi, T. K. Pedersen, R. Gampert, and G. Prado, “Acoustic target tracking and target identification: Recent results,” Proc. SPIE, vol. 3713, pp. 10–21, Jul. 1999. [25] Y. Kim, S. Jeong, D. Kim, and T. S. López, “An efficient scheme of target classification and information fusion in wireless sensor networks,” Pers. Ubiquitous Comput., vol. 13, no. 7, pp. 499–508, 2009. [26] J. Huang, X. Zhang, Q. Zhou, E. Song, and B. Li, “A practical fundamental frequency extraction algorithm for motion parameters estimation of moving targets,” IEEE Trans. Instrum. Meas., vol. 63, no. 3, pp. 267–276, Feb. 2014. [27] B. G. Ferguson, “A ground-based narrow-band passive acoustic technique for estimating the altitude and speed of a propeller-driven aircraft,” J. Acoust. Soc. Amer., vol. 92, no. 3, pp. 1403–1407, 1992. [28] D. A. Reynolds, “A Gaussian mixture modeling approach to textindependent speaker identification,” Ph.D. dissertation, Dept. Elect. Eng., Georgia Inst. Technol., Atlanta, GA, USA, 1992. [29] T. H. Falk and W.-Y. Chan, “A sequential feature selection algorithm for GMM-based speech quality estimation,” in Proc. Eur. Signal Process. Conf., Antalya, Turkey, Sep. 2005, pp. 1–4. [30] J. A. Bilmes, “A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models,” Int. Comput. Sci. Inst., vol. 4, no. 510, pp. 1–13, 1998. Jingchang Huang received the B.S. degree in communication engineering from Yunnan University, Kunming, China, in 2010, and the degree in signal processing with the University of Science and Technology of China, Hefei, China, in 2011. He is currently pursuing the Ph.D. degree with the Science and Technology on Microsystem Laboratory, Shanghai Institute of Microsystems and Information Technology (SIMIT), Chinese Academy of Sciences (CAS), Shanghai, China. He has been involved in developing algorithms of target detection and classification for unattended ground sensors system. His current research interests include patterns recognition, array signal processing, and wireless sensor networks. Xin Zhang received the B.S. degrees in electronic engineering and information technology from the University of Science and Technology of China, Hefei, China. He is currently pursuing the Ph.D. degree with the SIMIT, CAS, Shanghai, China. His current research interests include array signal processing, and seismic and acoustic sensors. Feng Guo received the B.S. degree in communication engineering from Sichuan University, Chengdu, China, in 2012. He is currently pursuing the Ph.D. degree with the SIMIT, CAS, Shanghai, China. His current research interests include array signal processing, patterns recognition, and wireless sensor networks. Qianwei Zhou, received the B.S. degrees in communication engineering from Hangzhou Dianzi University, Hangzhou, China, in 2009. He is currrently pursuing the Ph.D. degree from the SIMIT, CAS, Shanghai, China. His current research interests include signal processing of active and passive sensors, and patterns recognition. Huawei Liu received the B.S. degree in electronic and information engineering and the M.S. degree in underwater acoustics engineering from Harbin Engineering University, Harbin, China, in 2005 and 2008, respectively. He is currently pursuing the Ph.D. degree with the SIMIT, CAS, Shanghai, China. He is currently a Research Assistant with the Wireless Sensor Networks Laboratory, SIMIT, CAS. His current research interests include array signal processing and pattern recognition. Baoqing Li received the Ph.D. degree from the State Key Laboratory of Transducer Technology, Shanghai Institute of Metallurgy, CAS, Shanghai, China, in 1999. He is currently a Professor with the SIMIT, CAS. His current research interests include application of wireless sensor networks.