Seismic detection using support vector machines - Semantic Scholar

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Seismic detection using support vector machines A.E. Ruano a,b,n, G. Madureira c, O. Barros b, H.R. Khosravani b, M.G. Ruano b,d, P.M. Ferreira e a

Centre for Intelligent Systems, IDMEC, IST, Portugal University of Algarve, Portugal Instituto Português do Mar e da Atmosfera, I.P., Centro Geofísico de S. Teotónio, Portugal d CISUC, University of Coimbra, Portugal e University of Lisbon, Faculty of Sciences, Large-scale Informatics Systems Lab. (LaSIGE), Portugal b c

art ic l e i nf o

a b s t r a c t

Article history: Received 29 April 2013 Received in revised form 21 October 2013 Accepted 12 December 2013 Communicated by G. Thimm

This study describes research to design a seismic detection system to act at the level of a seismic station, providing a similar role to that of STA/LTA ratio-based detection algorithms. In a first step, Multi-Layer Perceptrons (MLPs) and Support Vector Machines (SVMs), trained in supervised mode, were tested. The sample data consisted of 2903 patterns extracted from records of the PVAQ station, one of the seismographic network’s stations of the Institute of Meteorology of Portugal (IM). Records’ spectral variations in time and characteristics were reflected in the input ANN patterns, as a set of values of power spectral density at selected frequencies. To ensure that all patterns of the sample data were within the range of variation of the training set, we used an algorithm to separate the universe of data by hyper-convex polyhedrons, determining in this manner a set of patterns that have a mandatory part of the training set. Additionally, an active learning strategy was conducted, by iteratively incorporating poorly classified cases in the training set. The proposed system best results, in terms of sensitivity and selectivity in the whole data ranged between 98% and 100%. These results compare very favourably with the ones obtained by the existing detection system, 50%, and with other approaches found in the literature. Subsequently, the system was tested in continuous operation for unseen (out of sample) data, and the SVM detector obtained 97.7% and 98.7% of sensitivity and selectivity, respectively. The classifier presented 88.4% and 99.4% of sensitivity and selectivity when applied to data of a different seismic station of IM. Due to the input features used, the average time taken for detection with this approach is in the order of 100 s. This is too long to be used in an early-warning system. In order to decrease this time, an alternative set of input features was tested. A similar performance was obtained, with a significant reduction in the average detection time (around 1.3 s). Additionally, it was experimentally proved that, whether off-line or in continuous operation, the best results are obtained when the SVM detector is trained with data originated from the respective seismic station. & 2014 Elsevier B.V. All rights reserved.

Keywords: Seismic detection Neural networks Support vector machines Early warning systems

1. Introduction In the past decade, Computational Intelligence (CI) techniques have been applied in the area of seismology for several classes of problems: earthquake magnitude prediction [1,2], control and monitoring of civil engineering structures [3,4], discrimination between event types (earthquakes, explosions, volcanic, and underwater) [5,6], phase determination [7–9] and seismic imaging [10].

n Corresponding author at: University of Algarve, Electronic Engineering and Informatics Dept., Faculty of Science & Technology, DEEI-FCT, Campus de Gambelas, 8005-117 Faro, Portugal. Tel.: þ351 289800912. E-mail addresses: [email protected] (A.E. Ruano), [email protected] (G. Madureira), [email protected] (O. Barros), [email protected] (H.R. Khosravani), [email protected] (M.G. Ruano), [email protected] (P.M. Ferreira).

Although a significant amount of research has been devoted to automatic seismic detection algorithms, the majority of the systems employed in seismic centres are based on the short time average (STA)/long-time average (LTA) ratio and its variants [11]. These algorithms produce a significant number of false alarms and missing detections, therefore needing human supervision at all times. Thus, continuous research efforts are required aiming at highly reliable real time seismic event detectors to be applicable on continuous seismic data. A short summary of existing approaches to detect the P-phase onset is done below, where the accuracy figures obtained are highlighted, with a view to enable a comparison with the proposed methodology, described later on. In terms of off-line approaches, i.e., methods that have been applied to a set of specific segments of seismic signals containing earthquakes, or just background noise, Dai and MacBeth [12] proposed a back-propagation neural network (BPNN) to identify P and S arrivals from three-component recordings of local earthquake data. The BPNN

0925-2312/$ - see front matter & 2014 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2013.12.020

Please cite this article as: A.E. Ruano, et al., Seismic detection using support vector machines, Neurocomputing (2014), http://dx.doi.org/ 10.1016/j.neucom.2013.12.020i

2

A.E. Ruano et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

was trained by selecting trace segments of P and S waves and noise bursts, converted into an attribute space based on the degree of polarization (DOP). One thousand three hundred and sixty-three seismic records were used for training and validation. Compared with a manual analysis, this trained system correctly identified between 76.6% and 82.3% of the P arrivals, and between 60.5% and 62.6% of the S arrivals. Gentili et al. [13] proposed a neural network system for P and S-picking and location of earthquakes. Their approach has been applied to 7108 seismograms corresponding to 1147 earthquakes occurred in Northeastern Italy in the period 2000–2003, with magnitude ranging from 0.6 to 5.6. Its results are compared with two sets of manual picks and with the picks performed by the existing seismic alert system. The P detection Recall values for the two systems are 0.93 and 0.80, considering the first database of manual picks, and 0.80 and 0.62, considering the second database, respectively. Riggelsen and Ohrnberger [14] have applied a machine learning approach based on supervised learning and Dynamic Bayesian Networks (DBN). The methodology, which was introduced in [15], was applied for off-line detection of seismic events recorded in two stations, BOSA and LPAZ, belonging to the International Seismic Stations (IMS). A time–frequency decomposition provided the basis for the required signal characteristics needed in order to derive the features defining typical ‘signal’ and ‘noise’ patterns. Each pattern class is modelled by a DBN, specifying the interrelationships of the derived features in the time–frequency plane. Subsequently, the DBNs are trained using previously labelled segments of seismic data, using Generalized Expectation Maximization. For training the classifier for BOSA and LPAZ, 1 week of IMS data from July 2008 was used. A separate test-set (disjoint with the training set) was compiled from the data of the same week of July 2008. Sensitivity values in the range of 0.8–0.86, and Specificity values in the range of 0.84–0.97 were obtained. The range of the magnitude of the events considered was not specified. Different approaches have also been applied to continuous seismograms, with different durations. Tiira [16] used artificial neural networks – MLPs – to detect the P-phase. Their inputs were 3 STA and 1 LTA values computed at seven different frequency bands, from 0.5 Hz to 3.4890 Hz. Separate detectors were trained for each one of three different seismic stations in Finland. The training data base was obtained from P-wave signals of 193 teleseismic events. The detection capability of the neural detector was tested using a voting system together with results from all three stations. Testing was performed by passing 10 consecutive days data (1–10 March, 1996) through the detectors. The number of seismic events marked by International Data Center was 657 (only events with distance 4201 from the stations and magnitude greater than 3.5 have been used). The STA/LTA detector found 144, and the total number of detections was 941. The best neural network system found 25% more events than the LTA/STA detector and produced 50% less detections indicating smaller false alarm rates. Botella et al. [17] have implemented a new earthquake detector, based on STA/LTA, applied to seismic signals pre-filtered using the discrete wavelet transform. They compared the performance of this algorithm against two well know detection algorithms: XRTP [18] and XDetect [19], using seismic data from the Local Seismic Network in the Province of Alicante (LSNPA) in Spain. The performance of their proposed algorithm was found to be dependent on the tuning parameters. Using seismic data of March 2001, and the detector tuned for high sensibility, the detection rate was 97.4% (in contrast with XRTP, which achieved 74.8%), but at the expense of a high false alarm ratio (72.8%). This value could be reduced to 40.6%, but with a detection rate of only 85.2%. Beyreuther and Wassermann [20] proposed the use of Hidden Markov Modelling (HMM) to the detection of small to medium size

earthquakes. The seismic signals were recorded with three stations of the Bavarian Earthquake Service. The performance of their algorithm was compared with a recursive LTA/STA detector, within a continuous one-month period. The detection rate was 81%, compared with 90%, for the LTA/STA, in a universe of 69 earthquakes. This approach was further developed in [21], and applied to a data set from the Swiss Seismological Survey in [22]. Although the performance of this approach cannot be directly compared with the results presented here, as only events close to the seismic station employed were considered, and only short sections of the continuous data set have been tested, the HMS detector was able, after re-training, to achieve 97% of correct detections in a universe of 206 seismic events, comprised of earthquakes, blasts and rockfalls. Real-time seismic monitoring and earthquake early-warning system (EWS) must be capable of not only detecting a seismic event, but of producing estimates (with possible uncertainties) on the location and size of an earthquake beginning after a few seconds after the event is first detected. Thus, one key parameter of an EWS is time. The larger the time available before the catastrophic phenomenon hits the target, the more effective will be the countermeasures that can be taken [23]. The lead-time for EWS applications is of the order of a few seconds to a few tens of seconds depending on the target hypocentral distance. There is always a trade-off between the warning time and the reliability of the earthquake information. For instance, the approach detailed in [24] is able to detect an earthquake within 0.2 s. But for 301 events inside the Irpinia Seismic Network, 104 outside, and 49 false events, their approach could not detect 19 and 28, respectively, and produced 10 event declarations, out of the 49 false events. In the work hereby presented, we propose a seismic detection system, to be implemented at the seismic station, using computational intelligence models. This system should be able to distinguish segments of seismic records containing signals caused by local and regional earthquakes and explosions, from all other situations. The aim is to build classifiers that assign one of two classes to periods of the seismic record of pre-determined fixed duration: Class 1, local and regional earthquakes and explosions and Class 2, all the other possibilities. The data used was collected from two seismic stations, located in the south/centre of Portugal: PVAQ,1 located in Vaqueiros, Algarve (37124.220 N, 07143.040 W), and PESTR, located in Estremoz, in Central Alentejo (38152.030 N, 07135.410 W). The structure of the paper is as follows. In Section 2, the procedures used in an early stage for data collection and feature extraction are described. The training methods used in the experiment are also indicated in this section. In Section 3 the trainings are described and the results analyzed. The performance of the classifier, in continuous operation, is discussed in Section 4. Section 5 deals with the time taken to detect an event. It is shown that using an alternative set of windows, similar accuracy performance can be obtained, with a significant reduction in detection time. Conclusions and future work are highlighted in Section 6.

2. Data and training methods 2.1. Input data Non-stationary signals occur naturally in many real-world applications: examples include music, biomedical signals, radar, sonar and seismic waves. Time–frequency representations such as the 1 In general, Portuguese seismic stations begin with a ‘P’, that stands for Portugal, followed by an abbreviation of the location name, in this case ‘VAQ’ stands for Vaqueiros



spectrograms are important tools for processing such time-varying signals. In this work, the spectrogram is used as the first stage of earthquake detection. The Power Spectrum Density (PSD) is estimated using periodogram averaging [25]. Only positive frequencies are taken into account (the so-called one-sided PSD). PSD values are slightly smoothed by taking the average of PSD values in a constant relative bandwidth of 1/10 of a decade. The procedure to achieve that smoothness was as follows: Let P(f) be the PSD values in some set of discrete frequencies F. Starting with the lowest frequency of F (fmin), we created a sequence of frequencies separated by 1/10 of a decade f k ¼ f min 10k 1=10 ;

k ¼ 1; 2; …

ð1Þ

We then split F into disjoint subsets Dk, Dk ¼ ff g : f k r f r f k þ 1 ;

f A F; k ¼ 1; 2; …

ð2Þ

each set Dk is associated with a frequency fk as defined above. The smoothed PSD, Ps(fk), is given by: P S ðf k Þ ¼

∑f A Dk Pðf Þ #Dk

ð3Þ

We have divided segments of 120 s into 5 non-overlapping intervals of equal duration. For each one of them we computed the PSD. This was done with standard Matlab functions. We then picked the power at 6 frequencies 1, 2, 4, 8, 10 and 15 Hz. This means that 30 different features (6 frequencies for each of the 5 intervals) will be used for the classifier. This was a constraint that we imposed, in order to limit the classifiers complexity. Fig. 1 illustrates a seismic-record and the spectral content for each of the 5 intervals considered, presented with the selected frequencies highlighted. Please note that, before being applied in training, each one of the 30 features is scaled in a range between 1 and þ1. In most experiments, a Butterworth digital high-pass filter was applied to the signal previous to PSD computation. The cut-off frequency was 0.5 Hz and the order of the filter was 5, values selected by previous experiments. Removal of low frequency signal’s content was performed since only local and regional seismic events were of interest.

3

2.2. Target data Seismic data, previously classified was collected from the PVAQ station of the seismic monitoring system of IM. Seismic data was classified by seismologists of the National Data Center (NDC) at IM. The seismic detector used at a station level (like PVAQ) is a standard STA/LTA ratio based detector. Fig. 2 outlines the operation of such a detector. The input data is band-pass filtered to maximise sensitivity within a specific frequency band of interest, and to reject noise outside this band. Averages of the modulus of signal amplitude are computed over two user-defined time periods, a short time average (STA) and a long time average (LTA), and the ratio of the two (STA/LTA), at each sample point, is computed. If this ratio exceeds a user-defined threshold, then a trigger is declared, and the system remains in a triggered state until the ratio falls below the defined threshold. These STA/LTA ratio based detectors (existent at several seismic stations) show in general very modest performance, i.e., they miss large numbers of seismic events and produce also several false alarms. However, a seismic network can drastically improve the overall performance considering clusters of stations. The likelihood of noise events occurring in a given time interval at various stations is very small, thus reducing the likelihood of making false alarms. In addition, an event that is not detected by a particular station is likely to be detected by other stations of the group, thereby increasing dramatically the ability of detection. However, the automatic system at the NDC is always supervised by seismologists.

2.3. Collected data From the year of 2007, 2903 examples were collected, 502 representing Class 1 (representing all the seismic event classified by the seismologists at NDC, and where seismic phases were identified in the PVAQ records), and the other 2401 classified as non-seism (background noise). For the positive case, the PVAQ detection system miss-classified 50% of the events, i.e., the number of true positives (TP) was 251 and the number of false negatives (FN) was also 251. In the non-seism class, 50% of the examples

Fig. 1. (a) 120 s of seismic record and (b) spectrogram.



4

Fig. 2. Block diagram of a typical STA/LTA detector.

were randomly selected representing events that triggered the detection system, but that were not classified as seismic by the NDC (i.e., the number of false positives – FP – was 1200), while the rest of the examples were selected randomly, neither coinciding with events detected by the system nor classified as earthquakes by the NDC (i.e., the number of true negatives – TN – was 1201). This way, the station automatic detection system achieved values of 50% of sensitivity and specificity (measures introduced latter) in the data collected. We assign to each Class 1 example the value of þ1, and to each negative example, the value of 1.

2.4. Training methods In this work, MLPs and SVMs were used as classifiers. We shall briefly describe below the training method for MLPs employed. For more information, the reader is referred to, for instance [26]. First of all, we assign to each Class 1 example the value of þ1, and to each negative example, the value of 1. Input data is scaled and the classifier nonlinear parameters are initialized with a stochastic procedure that does not exacerbate the condition number of the Jacobean matrix of the model. The MLP parameter estimation is achieved by applying the Levenberg–Marquardt algorithm [27] for the minimization of a criterion that exploits the separability of the classifier parameters, since linear parameters are used in the output layer. This process is applied to the training data, and terminates when a local minimum is found, or when the performance in another data set, denoted here as a test set, deteriorates. This is the well-known method of early-stopping [28]. Due to the use of early-stopping, the test set is also (indirectly, as it is employed in the termination criterion) used in the identification of the classifiers. Due to this, the classifier performance is further assessed in a third data set, not involved in the design, denoted here as the validation set. Support vector machines implement complex decisions surfaces in terms of hyperplanes in high dimensional spaces and were originally introduced by Vapnik and co-workers [29]. Conceptually, SVMs can be seen as first mapping the training points by a linear function ϕ to a high-dimensional space, where the patterns can be linearly separable. A separating hyperplane is then determined, maximizing its distance (denoted typically as the margin) from the projections of the training patterns. The determination of the large margin hyperplane is performed, in SVMs, by solving a constrained Quadratic Problem. Making use of Kuhn Tucker theory, the Lagrangian N

N

i¼1

i;j ¼ 1

L ¼ ∑ αi ∑ αi αj di dj Kðxi ; xj Þ

ð4Þ

must be maximized with respect to αi , subject to the constraints: αi Z 0 N

∑ αi di ¼ 0

ð5Þ

i¼1

In (4), α are the Lagrange multipliers, N is the number of training patterns, x and d denote the input and target patterns, and Kðxi ; xj Þ

are the inner-product kernels: m

Kðxi ; xj Þ ¼ ∑ ϕz ðxi Þϕz ðxj Þ

ð6Þ

z¼1

where m is the dimension of the feature space. Only points that lie close to the separating hyperplane have αi 4 0, and are called the support vectors. All the other points have αi ¼ 0. The decision function can be written as follows: ! f ðxÞ ¼ sign

∑ di αni Kðx; xi Þ θ

ð7Þ

i A SV

where αni are the solution of the constrained maximization problem and SV represent the indexes of the support vectors. More detailed descriptions of SVMs can be found in, for instance [28]. In this application, a Gaussian kernel was used, and the implementation described in [30] was employed. In the training procedure, the test and validation sets used with MLPs were employed with SVMs as a single validation set.

3. Training results 3.1. First experiment The first experiment, with MLPs, was conducted by assigning randomly 60% of the data to the training set, 20% to the test set, and 20% to the validation set. It was only ensured that a similar percentage of positive cases were assigned to each data set. The training set consisted of 1744 examples, with 307 positive cases; the test set had 582 examples, with 99 positive cases; the validation set consisted of 577 events, with 96 positive cases. In this first experiment, 20 different topologies of MLPs were tried, each one with 20 different parameter initializations. Moreover, the use (or not) of the low-pass filter described before was tested, resulting in 800 different classifiers. The results are presented in terms of the Sensitivity, or Recall (R) criterion, defined as follows: R¼

TP TP þ FN

ð8Þ

and in terms of the Specificity (S) criterion, defined as follows: S¼

TN FPþ TN

ð9Þ

where TP, TN, FP and FN denote the number of true positives, true negatives, false positives and false negatives, respectively. The aim is to design classifiers which achieve, simultaneously, the best possible R and S values. The design is therefore cast as a multi-objective problem. In this case, we do not have a single optimum; instead a set of non-dominated (ND) solutions is obtained, where the elements have the property that no one is better (larger in this case) in all objectives than the other solutions. Moreover, as the division of data between training, test and validation sets is done randomly, we are looking for a classifier that has a good performance over the three different sets considered separately, and over the whole data. Tables 1–3 show the


A.E. Ruano et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Training

Table 1 Training set. Topology

5

1

F

R

S

R (all)

S (all)

* * * *

91.86 96.74 96.09 93.49 94.79 94.46 92.18 95.44 96.42 92.51

99.23 96.66 97.56 98.96 98.12 98.75 99.10 98.05 97.49 99.03

92.43 96.81 94.82 93.43 95.62 94.62 93.23 95.82 96.61 93.23

99.25 97.17 97.88 98.96 98.25 98.88 99.13 98.50 98.00 99.04

0.95

0.9

Sensitivity

[72] [57] [416] [511] [72] [65] [63] [57] [72] [415]

0.85

0.8

0.75

Table 2 Test set. Topology

0.7

F

R

S

R (all)

solutions ND Sol (Training) ND Sol (All) 0.93

S (all)

0.94

0.95

0.96

0.97

0.98

0.99

1

0.995

1

Specificity

[72] [58] [419] [62]

96.97 98.99 95.96 94.95

99.38 98.96 99.79 100.00

94.22 94.82 93.82 94.82

98.71 97.33 98.04 98.79

Fig. 3. ROC for the training set.

Test

1 0.95

Table 3 Validation set.

[62] [72] [63] [72]

F

R

S

R (all)

S (all)

* *

97.92 98.96 95.83 90.63

99.17 98.13 99.38 99.79

94.82 95.62 94.82 90.44

98.79 98.25 98.96 98.67

0.85 Sensitivity

Topology

0.9

0.8 0.75 0.7 0.65

ND solutions found, considering R and S as criteria, computed for each individual data set. The first column in the above tables denotes the number of neurons in the first and second hidden layer of the networks (please notice that the number of inputs corresponds to the number of features, 30, and there is a single output). As 20 different trainings were conducted for the same topology, with different initial parameters, equal entries in the topology column do not necessarily correspond to the same classifier (typically different initial weights will result, after training, in networks with different final parameters). Only two classifiers are common in two tables. The lines in italic indicate that the same classifier is present in the test and in the validation tables, while the underlined lines indicate an equal classifier in the training and in the validation tables. Please also note that a mark in the column labelled as F indicates if filtering of the input data has been applied. The columns labelled as R (all) and S (all) show the recall and the specificity values computed for the whole data (the union of the training, test and validation data sets). If we perform the same analysis for the three data sets together (i.e., considering as criteria the sensitivity and the specificity for the training, the test and the validation sets, and subsequently determining the ND solutions when these six criteria are considered), we obtain the union of the ND solutions for the three data sets considered separately, plus a significant number of additional Pareto solutions. In total, 51 ND solutions were obtained. If, among these 51 solutions, we select the classifier by the total number of misclassifications (both positive and negative) in the whole data, 3 models achieve the smallest number, 51, in the full 2903 examples. One of the three solutions is shown in the

0.6 0.965

solutions ND Sol (Training) ND Sol (All) 0.97

0.975

0.98

0.985

0.99

Specificity Fig. 4. ROC for the test set.

3rd line of Table 3, and the other two belong to additional ND solutions. The results can also be presented as a Receiver Operating Characteristic (ROC) curve [31]. Figs. 3–5 present these results, where, in every case, the ND solutions obtained considering the corresponding data set are shown as a red circle, and the ND solutions, considering the 6 criteria, are shown as blue diamonds. We were therefore able to obtain classifiers with recall and specificity values above 95% (compared with the 50% values obtained by the existing detection system), and with a total number of misclassifications in the order of 50, compared with 1450, achieved by the existing system. These results are also able to highlight that the use or not of the low-pass filter did not produce any significant difference. Filtered data will be used from now on. 3.2. Support vector machines The design of SVMs does not need a test set (which was employed in the case of MLPs to perform early-stopping). For this reason, for SVMs 60% of the data was used for training, and 40% of the data as a validation set. Additionally, an approximate convex



6

Validation

1

600

0.95

500

0.9

Number of events

Sensitivity

0.85 0.8 0.75

400

300

200

0.7 0.65

100

solutions ND Sol (Training) ND Sol (All)

0.6

0.955

0.96

0.965

0 0.97

0.975

0.98

0.985

0.99

0.995

0

1

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Magnitude

Specificity

Fig. 6. Histogram of the seismic magnitudes for PVAQ.

Fig. 5. ROC for the validation set.

3 Table 4 SVM performance. R

S

R (all)

S (all)

583

100.00

100.00

99.62

99.35

2.5

+ explosions

. earthquakes

log(Distance)

SVs

Table 5 SVM performance with active learning. SVs

R

S

R (all)

S (all)

609 626 640

100.00 100.00 100.00

100.00 100.00 100.00

99.72 99.76 100.00

99.66 99.72 99.93

hull of the input data has been obtained, and the examples that lie in the hull were integrated in the training set. In order to maintain an approximate distribution of 60% and 40% of the data to the training and validation sets, examples of the original training set were moved to the other set. With SVMs, using a spread value of 0.237, the results presented in Table 4 were obtained. In the Table, SVs denote the number of support vectors, R and S the Recall and Specificity values for the training set, while R (all) and S (all) are the same criteria, applied to the whole data. Subsequently, a form of active learning [32] was applied. The examples badly classified in the validation set were incorporated in the training set, and randomly removed the same number of examples of the validation set, provided they were not in the approximate convex hull previously determined. This procedure was repeated three times. The results are presented in Table 5. This represents an almost perfect performance (only 2 misclassifications in the whole data). If results are compared with the off-line approaches presented in Section 1, it is clear that the proposed approach achieves the best performance, for a significant large set of data.

4. Continuous operation The SVM classifier, due to its superior performance, was chosen as the seismic detector. Subsequently, it was applied in continuous operation, for the seismic record corresponding to the year of

2

1.5 0.5

1

1.5

2

2.5

3

3.5

4

Magnitude Fig. 7. Magnitude vs. decimal logarithm of the distance, for events non-detected in PVAQ.

2008. The 120 s window slides within the whole seismic record, in intervals of 50 samples (0.5 s). Each window of 120 s is applied to the classifier, and its output compared with the seismic catalogue, resulting in a label of TP, FP, TN or FP for that segment.

4.1. Vaqueiros station The SVM classifier was applied to the first 256 days of 2008. As the station was not working correctly in the Julian days 3, 200 and 201, these days were not considered. During the period of 253 days, 1545 seismic events were found in the Portuguese Catalogue, corresponding to this seismic station. From these, 964 were considered as local, and 581 as regional events. The number of tectonic events was 638 and the remaining 907 were classified as explosions. Fig. 6 illustrates the histogram of magnitudes of the seismic events, within the period considered. The minimum magnitude is 0.1, and the maximum is 4.7. Out of the 1545 events, the classifier did not detect 11 events. Fig. 7 shows, for those events, the magnitude vs. the decimal logarithm of the distance (in km) to the hypocenter. The symbol ‘þ’, in red, denotes explosions, while the black dot denotes earthquakes. As it can be seen, events with large magnitude that



1

7

1 0.995

0.98

0.99

0.96 0.985 Specificity

Recall

0.94

0.92

0.98 0.975 0.97

0.9 0.965

0.88

0.96

0.86 0

50

100

150

200

250

300

0.955 0

50

Julian days

100

150

200

250

300

Julian days

Fig. 8. Values of (a) recall and (b) specificity for PVAQ.

180 160

Number of events

140 120 100 80 60 40 20 0

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Magnitude Fig. 9. Histogram of the seismic magnitudes for PESTR.

were not detected have hypocenters with a large distance from the station. Fig. 8 shows the recall (R) and specificity (S) computed for each day, within the 253 days considered. In the Julian day 125 no seismic events did occur and no false negatives were obtained. For this reason, this day is not considered. Computing the R and S values for the whole period considered, values of 97.7% and 98.7% were, respectively, obtained. If these values are compared with the performance of the approaches applied to continuous operation described in Section 1, it is clear that the proposed approach achieves the best results, applied to a much larger duration period and magnitude range of seismic events.

4.2. Estremoz station The SVM classifier, trained with the 2007 year data from Vaqueiros station, was applied in continuous operation, for the first 185 days of 2008, of Estremoz station. Fig. 9 shows the histogram of the magnitudes of the seismic events for the period considered. The minimum magnitude was 0.5, and the maximum 4.7.

During the period considered, there were 535 events, 272 being considered local events, and 263 regional events. From these, 226 were earthquakes, and 309 were explosions. The SVM classifier was not able to detect 71, out of the 535 events. Fig. 10 shows the recall and specificity values, computed for each day. For Julian days 3, 7, 14, 17, 20, 26, 43, 47, 48, 76, 96, 118, 122, 125, 132, 181, there were no seismic events recorded, nor classifier false negatives. For Julian days 50, 89, 155 and 159 there were no seismic events recorded. For this reason, these days are not represented in the graph. Figs. 11 and 12 illustrate the characteristics of the events not detected by the classifier. The same conventions of Fig. 7 were used. The R and S values, for the whole period considered, are 88.4% and 99.4%. Considering the S values (nearly perfect) for the two stations, it can be concluded that the SVM classifier learnt how to separate seismic events form background noise. In terms of the R values, it is no surprise a worse value for PESTR, as the classifier was trained with positive examples for PVAQ. In terms of the geophysical properties of the soil of the two stations, they should be very different as PVAQ is near the seaside and PESTR in the interior. The distance between the two stations is nearly 200 km.

5. Detection time The classifier was trained with 120 s segments, where, for the positive cases, the P phase was in the beginning of the segment. In continuous operation, we should expect the classifier to have the same behaviour, i.e., a pick-up time of 120 s should be expected. The next figure shows the histogram of the time (in s) taken from the P-phase onset, marked in the Portuguese catalogue, to the instant when the seismic is detected by the SVM classifier, for the whole year of 2008, for PVAQ station, and for the first 185 days of 2008, for PESTR. Please notice that the detection time is computed only as the difference between the last temporal value of the sliding window where the seism is detected, and the P-phase onset marked in the catalogue, i.e., it does not include pre-processing (i.e., feature computation) and SVM computation. These computation times are, however, negligible, as their sum is in the order of milliseconds. The average detection times for PVAQ and PESTR were 88 s and 110 s, respectively. These times are too long to be employed in a EWS. In order to decrease them, less information about the actual



1

1.005

0.9

1

0.8

0.995

0.7

0.99

Specificity

Recall

8

0.6

0.985

0.5

0.98

0.4

0.975

0.97 0

20

40

60

80

100

120

140

160

180

0

20

40

Julian days

60

80

100

120

140

160

180

200

Julian days

Fig. 10. (a) Recall and (b) specificity values for PESTR.

3

2.8

log(Distance)

2.6

2.4

+ explosions

2.2

. earthquakes 2

1.8

1.6 0.5

1

1.5

2

2.5

3

3.5

4

Magnitude Fig. 11. Magnitude vs. decimal logarithm of the distance, for events non-detected in PESTR.

features, but still very good, demonstrating that the information contained in the new set of features is sufficient to design a reliable detector. Afterwards, the active learning procedure was applied, obtaining a perfect classifier for the complete data set. Subsequently, the detector was applied in continuous operation to the whole 2008 year of Vaqueiros station. The R and S values obtained for the period considered were 95.3% and 98.4%, respectively. These should be compared with the 97.7% and 98.7% figures, obtained using the first detector, using the original set of features. The Recall value is slightly worse, while the Specificity value is essentially the same. The main difference, however, lies in the detection time, as it can be seen in the next figure. The average detection time has been reduced from 88 s to just 1.3 s. An equivalent reduction is found when applying this SVM classifier to the Estremoz 2008 data, reducing the average detection time from 110 to 1.8 s. The R and S values obtained for PESTR station were 89.7% and 91.2%, respectively. Using the first set of features, the corresponding values were 88.4% and 99.4%. This means that the rate of false alarms has increased, while an equivalent number of correct detections were obtained. 5.2. Results with PESTR detector

seismic event should be used, and more information between the transition between background noise and the event should be employed. The feature extraction method was changed to the concept of overlapping windows, of small and medium duration. Fig. 13 illustrates the implemented scheme. In the same way as explained in Section 2.1, five windows are used, and in each window the PSD is computed, smoothed, and 6 power values for the same frequencies are computed. The only difference lies in the windows considered. The smallest window has a 3 s duration, encompassing the P-phase onset. The other windows have 5, 10, 20, 40 and 100 s duration, all of them terminating at the instant where the smallest window ends. 5.1. Results with PVAQ detector As the input features changed, there was the need for training the classifier. After repeating the training process described in Section 3.2, using the same training and validation data, the results corresponding to R (all) and S (all) in Table 4 were 97.8% and 98.6%, respectively. These values are a little bit worse than using the original set of

In the previous sections, it was shown that, using the detectors trained with PVAQ data, the off-line and on-line application to PESTR data produced excellent results, but worse than the ones obtained with PVAQ data. This occurred using both set of features. We speculated that equivalent results to those obtained for PVAQ could be obtained, if a detector would be trained for PESTR with Estremoz data. In order to validate this hypothesis, we collected 3606 seismic segments from 2009 PESTR data, 1012 representing seismic events and 2594 segments with background noise. Among the latter, around half correspond to FP and the others to TN. In terms of the former, the whole set of the station detections (TP and FN) was considered. Using this data, we trained a classifier using the scheme presented in Section 3.2 using the EW input features. The results corresponding to R (all) and S (all) in the first line of Table 4 were 94.4% and 99.4%, respectively. In the end of the third iteration of the active learning procedure a perfect classifier for the complete data set was obtained. Subsequently, the detector was applied in continuous operation to the first two months of 2008 of Estremoz station. The R and



9

350

900 800

300

700

Number of events

Number of events

250 600 500 400

200

150

300 100 200 50

100 0

0

50

100

150

200

0

250

0

50

100

150

200

250

Time (in secs)

Time (in secs)

Fig. 12. Detection time for (a) PVAQ and (b) PESTR.

Fig. 13. Windows scheme for EW feature extraction.

250

900 800

200

600

Number of events

Number of events

700

500 400

150

100

300 200

50

100 0

0

1

2

3

4

5

6

7

8

9

10

0

0

1

2

3

4

5

6

7

8

9

10

Time (in secs)

Time (in secs) Fig. 14. Detection time for (a) PVAQ and (b) PESTR – EW approach.

S values obtained for the period considered were 91.7% and 97.1%, respectively. The specificity is similar to the one obtained for PVAQ, and the recall value is slightly worse. Among the 210 seismic events occurred during this period, the SVM classifier did not detect only 3. These are events with small magnitude (1.3, 1.4 and 2.0) and/or large hypocentral distance (242, 164 and 297 km, respectively). Comparing Figs. 14b and 15, it is clear that a reduction in detection time was obtained. The average detection time is 1.3 s, the same value obtained for PVAQ data with the PVAQ detector.

6. Conclusions We have shown that the use of a SVM detector, trained with data where the convex hull samples are ensured to lie in the training set, and an active learning scheme, achieves a perfect offline performance. The application of this detector, for continuous operation, achieves the best performance among the different approached found in the literature. Using a different window scheme, the SVM detector maintains its excellent performance, whilst simultaneously significantly reducing



10

50 45 40

Number of events

35 30 25 20 15 10 5 0

0

0.5

1

1.5

2

2.5

3

3.5

4

Time (in secs) Fig. 15. Detection time for PESTR – EW approach (PESTR detector).

the time taken for the seismic detection. Please note that the actual average detection time will be less than 1.3 s (the value obtained for PVAQ and PESTR), as, to save computational time, the whole seismic record is slided in intervals of 50 samples (0.5 s). We expect that, if the SVM detector would be applied at every sample of the seismic record, the average detection time will be around 1 s, a value sufficiently small to be used in a early-warning system. We are currently training SVM detectors to the rest of the seismic stations belonging to the Southern Seismic Network of Portugal. Similarly to the automatic detection system in place at IM, an event will only be declared when a certain number of consistent detections are achieved. We expect that the superior performance of each station detector will also be translated into an improved performance at the NDC level.

[14] C. Riggelsen, M. Ohrnberger, A machine learning approach for improving the detection capabilities at 3C seismic stations, Pure Appl. Geophys. (2012) 1–17. [15] C. Riggelsen, M. Ohrnberger, F. Scherbaum, Dynamic Bayesian networks for real-time classification of seismic signals, in: J. Kok, J. Koronacki, R. Lopez de Mantaras, S. Matwin, D. Mladenič, A. Skowron (Eds.), Knowledge Discovery in Databases: PKDD 2007, Springer, Berlin/Heidelberg, 2007, pp. 565–572. [16] T. Tiira, Detecting teleseismic events using artificial neural networks, Comput. Geosci. 25 (1999) 929–938. [17] F. Botella, J. Rosa-Herranz, J.J. Giner, S. Molina, J.J. Galiana-Merino, A real-time earthquake detector with prefiltering by wavelets, Comput. Geosci. 29 (2003) 911–919. [18] D.M. Tottingham, W.H.K. Lee, XRTP: Seismic Data Acquisition, Processing and Analysis, Tottco Consulting Group, Palo Alto, CA, 1994. [19] D.M. Tottingham, W.H.K. Lee, XDetect. Toolbox for Seismic Data Acquisition, Processing and Analysis, ISAPEI, 1989. [20] M. Beyreuther, J. Wassermann, Continuous earthquake detection and classification using discrete Hidden Markov Models, Geophys. J. Int. 175 (2008) 1055–1066. [21] C. Hammer, M. Beyreuther, M. Ohrnberger, A seismic event spotting system for volcano fast response systems, Bull. Seismol. Soc. Am. 102 (2012) 948–960. [22] C. Hammer, M. Ohrnberger, D. Fäh, Classifying seismic waveforms from scratch: a case study in the alpine environment, Geophys. J. Int. 192 (2013) 425–439. [23] C. Satriano, Y.-M. Wu, A. Zollo, H. Kanamori, Earthquake early warning: concepts, methods and physical grounds, Soil Dyn. Earthquake Eng. 31 (2011) 106–118. [24] A. Lomax, C. Satriano, M. Vassallo, Automatic picker developments and optimization: FilterPicker – a robust, broadband picker for real-time seismic monitoring and earthquake early-warning, Seismol. Res. Lett. 83 (2012) 531–540. [25] P.D. Welch, Use of fast fourier transform for estimation of power spectra: a method based on time averaging over short modified periodograms, IEEE Trans. Audio Electroacoust. AU15 (1967) 70. [26] A.E. Ruano, P.M. Ferreira, C.M. Fonseca, An overview of nonlinear identification and control with neural networks, in: A.E. Ruano (Ed.), Intelligent Control Systems using Computational Intelligence Techniques, Institution of Electrical Engineers, 2005, pp. 37–87. [27] A.E.B. Ruano, D.I. Jones, P.J. Fleming, A new formulation of the learning problem for a neural network controller, in: Thirtieth IEEE Conference on Decision and Control, Brighton, UK, 1991, pp. 865–866. [28] S. Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, 1999. [29] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297. [30] T. Frieß, N. Cristianini, C. Campbel, The Kernel Adatron algorithm: a fast and simple learning procedure for support vector machines, in: Fifteenth International Conference on Machine Learning, Morgan Kaufmann Publishers, 1998. [31] J. Swets, Measuring the accuracy of diagnostic systems, Science 240 (1988) 1285–1293. [32] D. Cohn, L. Atlas, Improving generalization with active learning, Mach. Learn. 15 (1994) 201–221.

References [1] H. Adeli, A. Panakkat, A probabilistic neural network for earthquake magnitude prediction, Neural Networks 22 (2009) 1018–1024. [2] A.A. Suratgar, F. Setoudeh, A.H. Salemi, A. Negarestani, Magnitude of earthquake prediction using neural network, in: M.Z. Guo, L. Zhao, L.P. Wang (Eds.), ICNC 2008: Fourth International Conference on Natural Computation, vol. 2, IEEE Computer Society, Los Alamitos, 2008, pp. 448–452. [3] I. Andreadis, I. Tsiftzis, A. Elenas, Intelligent seismic acceleration signal processing for damage classification in buildings, IEEE Trans. Instrum. Meas. 56 (2007) 1555–1564. [4] H. Furuta, D.M. Frangopol, K. Nakatsu, Life-cycle cost of civil infrastructure with emphasis on balancing structural performance and seismic risk of road network, Struct. Infrastruct. Eng. 7 (2010) 65–74. [5] N. Orlic, S. Loncaric, Earthquake-explosion discrimination using genetic algorithm-based boosting approach, Comput. Geosci. 36 (2010) 179–185. [6] S. Scarpetta, F. Giudicepietro, E.C. Ezin, S. Petrosino, E. Del Pezzo, M. Martini, A. Marinaro, Automatic classification of seismic signals at Mt. Vesuvius volcano, Italy, using neural networks, Bull. Seismol. Soc. Am. 95 (2005) 185–196. [7] S. Gentili, A. Michelini, Automatic picking of P and S phases using a neural tree, J. Seismol. 10 (2006) 39–63. [8] M. Lancieri, A. Zollo, A Bayesian approach to the real-time estimation of magnitude from the early P and S wave displacement peaks, J. Geophys. Res. Solid Earth 113 (2008) 17. [9] I. Tasic, F. Runovc, Automatic S-phase arrival identification for local earthquakes, Acta Geotech. Slov. 6 (2009) 46–55. [10] L. Valet, G. Mauris, P. Bolon, N. Keskes, A fuzzy linguistic-based software tool for seismic image interpretation, IEEE Trans. Instrum. Meas. 52 (2003) 675–680. [11] B.K. Sharma, A. Kumar, V.M. Murthy, Evaluation of seismic events detection algorithms, J. Geol. Soc. India 75 (2010) 533–538. [12] H.C. Dai, C. MacBeth, The application of back-propagation neural network to automatic picking seismic arrivals from single-component recordings, J. Geophys. Res. Solid Earth 102 (1997) 15105–15113. [13] S. Gentili, P. Bragato, A neural-tree-based system for automatic location of earthquakes in Northeastern Italy, J. Seismol. 10 (2006) 73–89.

A.E. Ruano was born in 1959 in Espinho, Portugal. He received the first degree in electronic and telecommunications engineering from the University of Aveiro, Portugal, in 1982, the M.Sc. in electrothecnic engineering from the University of Coimbra, Portugal, in 1989, the Ph.D. degree in electronic engineering from the University of Wales, UK, in 1992 and the aggregation degree in electronic engineering and computing from the University of Algarve, Portugal, in 2004. In 1992 he joined the Department of Electronic Engineering and Computing of the Faculty of Sciences and Technology of the University of Algarve, where at present he is an associate professor with Aggregation of Automatic Control. His research interests lie in computational intelligence, intelligent control, intelligent signal processing and wireless sensor networks. He has over 220 research publications; he is an associate editor/reviewer for several journals, and he is the Chair of the IFAC Technical Committee of Computational Intelligence in Control Systems.

G. Madureira is a senior technician of the Portuguese Institute of Ocean and Atmosphere. He is at the moment working in the field of artificial neural networks applied to seismology, namely, in automatic detection of seismic events and phase identification using artificial neural networks. His research interests lie in digital signal processing, electronics, telecommunications and neural networks.


A.E. Ruano et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ O. Barros is a M.Sc. student in electronic and telecommunications engineering at the University of Algarve. He currently works under the guidance of Prof. A.E Ruano and Eng. Guilherme Madureira, continuing the development of the approach for the seismic events detection based on computational intelligence. His research interests include, machine learning, neural networks, computational intelligence, and decisionsupport systems.

H.R. Khosravani is a Ph.D. student of artificial intelligence at University of Algarve. He is at the moment working on ‘Artificial Neural Network Models: Data Selection and Model Adaptation’. He received his B.Sc in software engineering from the Islamic Azad University, Shiraz, Iran, in 2002 and the M.Sc in Software engineering from the Science and Research Branch, Islamic Azad University, Tehran, Iran, in 2010. His research interests lie in data selection, model adaptation, model predictive control, neural networks and data quality mining.

11

P.M. Ferreira received a five-year engineering degree in systems engineering and computing (1996) and a Ph.D. degree in electronics engineering and computing (2008) from the University of Algarve, Faro, Portugal. From 1997 to 08/2012 he has been student assistant, Teaching Assistant, Assistant Professor And Research Fellow In Different Schools And Research Units of the University of Algarve. From 04/2007 to 10/2008 he was a Marie Curie Research Fellow at Unilever Research and Development Port Sunlight, United Kingdom. Currently he is an assistant professor at the Department of Informatics of the Faculty of Sciences of the University of Lisbon where he integrates the LASIGE research centre in the research line on Timeliness and Adaptation in Dependable Systems. His main research interests are computational/artificial intelligence methods, artificial neural networks, process modelling, complex event processing, control systems, and prediction/forecasting methods. He is a member of the International Federation for Automatic Control (IFAC) Technical Committee for Computational Intelligence in control.

M.G. Ruano received her Ph.D. in electronic engineering from the University of Wales, United Kingdom in 1992. She started her professional activity in 1982 at the University of Aveiro, joining the Electronics Engineering and Informatics Department of the University of Algarve, Portugal in 1992, currently holding the position of associate professor with aggregation. Her research interests are in modelling and processing nonlinear systems, particularly bio-signals and systems, with emphasis on ultrasound diagnostic and therapeutic applications. She is reviewer of several scientific journals.


Seismic detection using support vector machines - Semantic Scholar

Seismic detection using support vector machines - Semantic Scholar

Suggest Documents

Robust Anomaly Detection Using Support Vector Machines

Multiclass Support Vector Machines Using

Transductive Support Vector Machines for ... - Semantic Scholar

Parallel Support Vector Machines - Semantic Scholar

Recurrent Support Vector Machines - Semantic Scholar

Recurrent Support Vector Machines - Semantic Scholar

Support Vector Regression Machines - Semantic Scholar

Linear programming support vector machines - Semantic Scholar

Support Vector Regression Machines - Semantic Scholar

Semi-Supervised Support Vector Machines - Semantic Scholar

Multi-class Support Vector Machines - Semantic Scholar

Training Invariant Support Vector Machines using ... - Semantic Scholar

Using Support Vector Machines for Terrorism ... - Semantic Scholar

Using Support Vector Machines for Terrorism ... - Semantic Scholar

Fast Rates for Support Vector Machines using ... - Semantic Scholar

Machine Learning Using Support Vector Machines ... - Semantic Scholar

Density Estimation using Support Vector Machines - Semantic Scholar

Using support vector machines for long-term ... - Semantic Scholar

Using Support Vector Machines and Bayesian ... - Semantic Scholar

hot method prediction using support vector machines - Semantic Scholar

Speech Event Detection Using Support Vector Machines - CiteSeerX

Frontal face detection using support vector machines and back

Detection of Periods of Food Intake Using Support Vector Machines

smile detection using local binary patterns and support vector machines