Method for Determining Classification Significant Features ... - CiteSeerX

15th World Conference on Non-Destructive Testing

Rome (Italy) – 15-21 October 2000

Method for Determining Classification Significant Features from Acoustic Signature of Mine-like Buried Objects D. Antonic Ministry of the Interior, Zagreb, Croatia [email protected]

M. Zagar Faculty of Electrical Engineering and Computing, University of Zagreb Zagreb, Croatia [email protected]

Keywords: feature extraction, signal analysis, classification, landmine detection Abstract Good feature selection method is an essential step in a classification system. That is especially true for detection systems that have to deal with low signal-to-noise ratio, and varying background conditions, which is the case for landmine detection systems. Proposed method analyzes spectrum of a signal collected from the microphone placed inside the deminers prodder and extracts set of features with best discrimination ability. Feature selection is performed in two stages. First, huge initial set of near 2x104 features is reduced to approximately 100 features and from reduced set best feature subset is selected. Algorithm was successfully applied to the set of unified samples from different materials, as well as on the real landmines and harmless objects.

1. INTRODUCTION Landmines present a threat to population in many countries. Classical demining technologies have number of drawbacks, including risk for the deminer, low speed and high unit cost. Introducing detection systems capable of accurately and quickly detect buried landmines is the only possibility to significantly improve the demining process. Due to low signal-to-noise ratio, changing environment conditions that influence measurements (humidity, temperature, composition of soil, etc.), and existence of other natural or man-made objects that give sensor readings similar to the landmine, interpretation of sensor data for landmine detection is a complicated task. Essential stage in any classification system is determining optimal set of features from sensor reading, where useful information is hidden in a raw signal. Proposed method finds representative features from spectrums of acoustic signals. For the sensor we used simple prodder equipped with microphone, shown in Figure 1. [2,3,7]

Figure 1. Prodder Microphone placed inside the prodder handle registers vibrations generated by touching the object with the prodder tip. Generated audio signals are collected using PC sound card. All processing is performed on the PC, within the MATLABTM environment. We made an assumption that enough information is embedded into the generated signal that will make possible recognition of examined objects.



In order to test the relationship between material of the sample and the generated signal, we built a stand, shown in Figure 2.

Figure 2. Stand with prodder, samples and microphone in front Prodder is mounted at the stand, and the tip is passing through the piece of clay. Function of the clay is to simulate the soil that normally surrounds the tip during prodding. Experiments showed that clay (and soil) is actually enhancing the usable signal, because it attenuates the tip vibrations. Samples are mounted at the pendulum, which is used to move the sample against the tip. Experiments were restricted to four different materials: wood, plastic, iron, and stone. Described environment allows controlled and repeatable experiments, excluding all variables except the material of the sample. Feature analysis is performed on signal spectrums, by analysing average signal energy across different frequency windows. This analysis gives initial feature set of almost 2x104 features, which due to the combination explosion prevents direct application of common feature extraction algorithms. For that reason we used hybrid approach: first extract all non overlapping features based on the Best feature selection algorithm, and on such reduced set apply three different feature selection algorithms: FSS (forward sequential selection), BSS (backward sequential selection), and Complete [5]. We tried to apply the same principle to some real-world samples, shown in Figures 3 and 4: PMA-1, PMA-2, PROM-1 and VS-50 landmines [4], and various pieces of iron, stone, and wood, buried at the depth of 5 cm.



Figure 3. Landmine samples

Figure 4. Other samples

As expected, method is sensitive not only to the material of an object, but also to its shape, composition, roughness, etc., so classification by the material gives poor results. Therefore we had to use different approach: extract features of particular object of interest (typically one of the landmines) against all other irrelevant objects. This approach should be, and is used in landmine detection systems which purpose is to distinguish one or few classes of mine-like objects against all other harmless objects existing in an environment. Determining optimal set of “mine-likeness” attributes (features) for a certain detection system insures that detection ability of a given system is limited only by physical limitations of a sensor. 2. SIGNAL ANALYSIS In order to be able to make energy comparisons, collected signals are normalised by amplitude. Signals from wood and plastic samples are shown in Figure 5 and signals from stone and iron in Figure 6. 0.8

wood plastic

0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

1

2

3

4

5

6

7

8

t [s]

Figure 5. Signals from wood and plastic samples

9

-3 x 10



0.8

stone iron

0.6 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8

1

2

3

4

5

6

7

8

9

-3 x 10

t [s]

Figure 6 Signals from stone and iron samples Signals are recorded using PC sound card, at the sample rate of 48 kHz. For each signal we collected 8192 samples, corresponding to the interval of 170 ms. By looking signals at Figures 5 and 6 it is relatively easy to distinguish between “soft” and “hard” materials, but discriminating between e.g. stone and iron is not trivial. From the given samples it is obvious that the time domain representation of signals is not appropriate. That is also true for a combined timefrequency representation, like wavelets. 100 Therefore we decided to use frequency domain representation of signals. 50 We apply 8192 points FFT (Fast Fourier Transform) on recorded signals and calculated square to convert it to energy. Because signals are not periodic, no windowing was necessary. Frequency resolution is about 5.9 Hz (48000/8192). In further analysis we used first 1024 channels, covering the frequency range from 0 to 6 kHz. Figure 7 presents energy spectrums for given samples. For each sample, spectrums from 20 experiments are shown at the same plot. Parts of the spectrum that are different for different samples and similar for the same sample are good feature candidates.

0 0

500

1000

1500 2000 Wood f [Hz]

2500

3000

0

500

1000

1500 2000 Plastic f [Hz]

2500

3000

0

500

1000

1500 2000 Rock f [Hz]

2500

3000

0

500

1000

2500

3000

100

50

0 100

50

0 100

50

0

Iron

1500 f [Hz]

2000

Figure 7. Spectrums



3. FEATURE EXTRACTION Feature is an average energy in frequency window [f0, f0+∆f], where f0 takes values from interval [fmin, fmax-∆f], fmin=5.9 Hz, fmax=6 kHz. ∆f is the window width and takes discrete values from the set {8, 12, ..., 80} channels that correspond to frequencies from approx. 47Hz to 470Hz. That gives initial set of 19247 features. Since complexity of feature extraction algorithms usually vary from exponential O(2d) for complete search to polynomial O(d2) for sequential search (d is dimension of the feature set), it is not feasible to search any significant part of the feature space. 3.1 Selection of non-overlapping feature subset If we take a closer look at the feature set, it is obvious that initial set is highly redundant, due to many features containing information from the same channel. Minimal width of the frequency window is about 47Hz, what implies that a reduced feature set may contain at most 128 nonoverlapping features. We decided to use the Best feature search to reduce the original feature set. Best feature search algorithm performs feature set evaluation criteria on each feature individually and the best f features are chosen as the feature subset. Because it does not take into consideration interactions between features, if applied independently it usually gives poor results. First step is to calculate the fitness function for all individual features. We choose the fitness function as a ratio between average Euclidean distance between instances from different classes DO, and average distance between instances belonging to the same class DI (Equations 1). Therefore, features with better discrimination ability are ranked higher. C ni −1

DI =

ni

∑ ∑ ∑ d(x i =1 j =1 k = j +1 C

ij

, xik )

ni

∑( 2 ) i =1

C −1

DO =

C

ni

nj

∑ ∑∑∑ d ( x

ik

, x jl )

i =1 j =i +1 k =1 l =1 C

C

- number of classes

n

- total number of instances

ni

- number of instances in i-th class

xij

- j-th instance from i-th class

(1)

d(xij,xk l) – Euclidean distance between xij and xk l

( n ) − ∑ ( ni ) 2 2 i =1

Values of the fitness function for all features are shown in Figure 8.

18 16 14 12 10 8 6 4 2 0

0

1000

2000

3000 f [Hz]

4000

Figure 8. Fitness function

5000

6000



Searching for the best features includes additional step for removing redundant features. When the feature having the highest value of fitness function is found, that feature is removed from the feature set, together with all features that overlap with selected feature. Figure 9 presents different values of the fitness function for extracted features. 20

Fitness function

15

10

5

0

0

10

20

30 40 Feature number

50

60

70

Figure 9. Fitness function for consecutive features In this case the algorithm selected set of 70 non-overlapping features. It is obvious that the first few have significantly higher discrimination ability than the others. 3.2 Finding the best feature subset We made a comparison of three feature selection algorithms applied to the reduced feature set. Complete search [5] examines all feature subsets. It will always find an optimal solution but with cost of computational time. Forward and backward sequential selection [1,5] are the most common sequential search algorithms. FSS begins with zero features, evaluates all subsets with exactly one feature and select the one with largest fitness function. Then it evaluates all subsets with previously selected feature and one of the remaining features, then again selects one with largest fitness function. This cycle repeats while improvement by adding the new feature is above the predefined level. BSS instead begins with all features and repeatedly removes a feature whose removal causes the least decrease of the fitness function value. In this stage we used different fitness function. Since features in reduced feature set are selected according to their discrimination ability, they already have the intrinsic property of grouping together instances from the same class. For the fitness function we used minimal distance between any two elements from neighboring classes. This way feature subset that lead to the balanced distribution of classes will prevail the subset that well separates one class, leaving the others close together. In Figure 10 comparison between FSS, BSS and Complete search algorithm is shown. Algorithms are tested on the subset of best ten features and fitness functions are plotted. Plot shows that BSS outperformed FSS and is close to the Complete search. The reason for BSS outperforming FSS is that BSS evaluates the contribution of a given feature in the context of all other features while FSS can evaluate the utility of a single feature only in the limited context of previously selected features.



1 0.9 0.8

Fitness

0.7 0.6 0.5 0.4 0.3 0.2

FSS BSS Complete

0.1 0

1

2

3

4

5 6 Number of features in a set

7

8

9

10

Figure 10. Comparison of FSS, BSS and Complete search 4. CLASSIFICATION

For the real-world samples, shown in Figures 3 and 4, we extracted features of the PMA-1 mine against all irrelevant objects from Figure 4. In this case, fitness function is ratio between average distance from a class center to instances not belonging to the class of interest and average distance between members of the class. Figure 12 presents classification of PMA-1 mine against other non-mine objects.

1 Wood Plastic Rock Iron

0.9 0.8

10. 240 - 311 Hz

0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.2

0.4 0.6 1. 844 - 1242 Hz

0.8

1

Figure 11. Classification: materials 1 PMA-1 Other

0.9 0.8 0.7 2. 580 - 955 Hz

Selected set of features may be used directly by various classification algorithms, like K-nearest neighbors, Bayesian, neural network or fuzzy classifier. Choosing the feature vector dimension is usually a compromise between classification performance and computational time. Figure 11 presents classification of materials using the best 2-dimensional feature subset. At the same plot there are 15 training and 5 test samples of each material. Test samples are represented by filled symbols.

0.6 0.5 0.4 0.3 0.2 0.1 0

0

0.2

0.4 0.6 1. 158 - 533 Hz

0.8

1

Figure 12. Classification: PMA-1 against other objects



5. CONCLUSION This paper proposes the hybrid feature extraction method applied to the buried landmine detection problem. Method is based on the Best individual features selection algorithm, that is used to reduce the complexity of initial large feature set, followed by Complete and Sequential search algorithms for determining feature subset with highest discrimination ability. This approach gives good results on test samples from different materials, as well as on the real-world samples. It could be applied to various sensor configurations, not restricted to the landmine detection. Described feature selection approach performs well even for the sensor configuration used which is far from optimal. Through the signal path, from the tip of the prodder through the handle and finally through the air inside the prodder handle to the microphone, significant information is lost. There is a lot of space for improvements to the sensor configuration. Holding with a passive approach, where no energy is transmitted from the prodder after the contact with object, direct tip vibration measurement using strain gauge or piezzo force sensor would be more appropriate. Considering active approach it is possible to use either an ultrasonic transducer or vibration mechanism that will repeatedly make contacts with examined object. Sensor modification may require slight modification of part of the algorithm responsible for generation of the initial feature set, while the rest of the algorithm will remain unaffected.

References [1]

Aha, D.W., Bankert, R.L., A Comparative Evaluation of Sequential Feature Selection Algorithms, In. Fisher, D., Lenz, J.H. (Eds.), Artificial Intelligence and Statistics, SpringerVerlag, New York, 1996

[2]

Antonic, D., Improving the Process of Manual Probing, SUSDEM’97, Zagreb 1997

[3]

Antonic, D., Ratkovic, I., Ground Probing Sensor for Automated Mine Detection, KoREMA’96-41st Annual Conference, Opatija, Croatia, Sept. 1996, pp. 137-140

[4]

Banks, E., Anti-Personnel Landmines: Recognising & Disarming, Brasseys Inc., 1998

[5]

Dash, M., Liu. H., Feature Selection Methods for Classification, Intelligent Data Analysis: An International Journal, Vol. 1, No. 3, 1997

[6]

Dash, M., Liu. H., Hybrid Search of Feature Subsets, PRICAI'98, Singapore, SpringerVerlag, Nov. 1998, pp. 238-249

[7]

Dawson-Howe, K.M., Williams, T.G., Automating the Probing Process, SUSDEM’97, Zagreb 1997

[8]

Scherf, M., Brauer, W., Feature Selection By Means of a Feature Weighting Approach, Forschungsbarichte Künstliche Intelligenz FKI-221-97, Techniche Universität München, 1997 (ISSN 0941-6358)