CUED PhD and MPhil Thesis Classes

3 downloads 0 Views 11MB Size Report
Chenyi Chen, Yin Wang, Li Li, Jianming Hu, and Zuo Zhang. ..... Spyros Makridakis, Chris Chatfield, Michele Hibon, Michael Lawrence, Terence Mills,. Keith Ord ...
Dipl.-Ing. Ahmad Haj Mosa

Cellular Neural Networks (CNN) Based Robust Classification Under Difficult Conditions With Selected Applications In Intelligent Transportation Systems

Doctoral Dissertation Technischen Wissenschaften Alpen-Adria-Universit¨ at Klagenfurt Fakult¨ at f¨ ur Technische Wissenschaften

Mentor & 1st Evaluator Univ.-Prof. Dr.-Ing. Kyandoghere Kyamakya Alpen-Adria-Universit¨at Klagenfurt (Klagenfurt, Austria) Fakult¨at f¨ ur Technische Wissenschaften

2st Evaluator Apl. Prof. Dr. habil. Zhong Li FernUniversit¨at in Hagen (Hagen, Germany) Fakult¨at f¨ ur Mathematik und Informatik

Affidavit( I"hereby"declare"in"lieu"of"an"oath"that"

- the" submitted" academic" paper" is" entirely" my" own" work" and" that" no" auxiliary" materials" have" been" used" other"than"those"indicated,"

- I" have" fully" disclosed" all" assistance" received" from" third" parties" during" the" process" of" writing" the" paper," including"any"significant"advice"from"supervisors,"

- any" contents" taken" from" the" works" of" third" parties" or" my" own" works" that" have" been" included" either" literally"or"in"spirit"have"been"appropriately"marked"and"the"respective"source"of"the"information"has"been" clearly"identified"with"precise"bibliographical"references"(e.g."in"footnotes)," - to"date,"I"have"not"submitted"this"paper"to"an"examining"authority"either"in"Austria"or"abroad"and"that" - the"digital"version"of"the"paper"submitted"for"the"purpose"of"plagiarism"assessment"is"fully"consistent"with" the"printed"version." I"am"aware"that"a"declaration"contrary"to"the"facts"will"have"legal"consequences."

(Signature)""

©"Alpen(Adria(Universität"Klagenfurt,"Studien("und"Prüfungsabteilung"

(Place,"date)""""""Klagenfurt(27.06.2016"

version"2015(02(20"

This thesis work is dedicated to my parents for their endless love and support. To the soul of my mother, her words of encouragement and love still ring in my ears even she is gone to give me reassurance and peace. To my father, the greatest man I have known whose words of motivation carry me forward all the time to achieve my goals. I always look up to you, you are my inspiration. To my wife who has been so supportive during this journey, her love and patience enabled me to carry out the research and complete the thesis successfully, and to our son who brings joy and love to my life. To my sweet loving sisters who have always been supportive and encouraged me to follow my dreams and goals. To my exceptional and beloved family-in-law.

Acknowledgements

The research conducted for this thesis is the essence of my scientific career. The research marks a major milestone for me as a student and the comprehensive research conducted for the thesis has permitted me to absorb, study and understand the various theories, concepts and techniques which I would not have understood or explored if this study had not taken place. I am grateful to several individuals and people who have supported me throughout the research process and without whom I would not have been able to complete the thesis. The first and most important personality I would like to thank is my supervisor Univ.-Prof. Dr.-Ing. Kyandoghere Kyamakya who proved to be an inspiration and an excellent guide throughout my research and thesis. The motivation, support, and guidance provided by my supervisor during the various stages of research and di↵erent parts of the thesis. I literally could not have completed this thesis without his support. I am grateful to all my research team mates Fadi Al Machot and Mouhannad Ali for their guidance and support throughout the research process. I am also thankful to all my friends and colleagues (Hasan Yacoub, Alireza Fasih, Saeed Yahyanejad, Mouhannad Ali, Patrik Grausberg and Annemarie Korenjak) who have supported and motivated me in all aspects of my life including the research and thesis writing process.

Abstract

The most prominent challenges (abbreviation for Challenge x: Chl.x) for prediction in general and particularly in classification endeavors in the frame of data mining and machine learning are related to the five following contexts, which may be encountered either separately or partially or fully combined: Chl.CI: strong class imbalance; Chl.CO: strong class overlap (resulting in high nonlinearity in prediction); Chl.ON: outliers and noise; Chl.LC: learning complexity; Chl.SP: the presence of some (external/internal) coupling/modulating stochastic process; and Chl.LG: low out-of-sample performance (low generalization). It is well known that all these classification challenges do constitute serious obstacles to apply machine learning to real-world domains. In this doctoral thesis, the classification problem under difficult conditions is addressed in a series of practical situations of real-world data processing in intelligent transportation systems: (A) Case 1: car/truck detection using a single presence sensor - here, we do have a classification endeavor where challenges Chl.CI, Chl.CO, Chl.LC and Chl.LC are faced simultaneously; (B) Case 2: drops detection here, we do have a classification task embedded in a complex low-level image processing context for visual advanced driver assistance systems (ADAS), where challenges Chl.CO, Chl.ON, Chl.LC and Chl.LG are most prominent and (C) Two Cases (3 and 4): where classification is used in the context of a prediction problem (related either to a time-series forecast or to a transfer function-like discrete time prediction problem), namely, for case 3, a traffic flow prediction case at a road traffic junction which is needed in the frame of a local traffic light control, and, case 4, a business time-series prediction case. In cases 3 and 4, challenges Chl.CO, Chl.SP and Chl.LG are faced. For all considered complex and extremely challenging practical situations, a novel Cellular Neural Network (CNN) based comprehensive classification and/or prediction concept has been developed and validated. Hereby, the appropriate features selection has been efficiently conceptually integrated. The obtained performance results are excellent regarding both classification performance and the practical computational requirements. The superiority of the CNN based classifier/predictor concept is strongly demonstrated as it clearly outperforms all known competing related approaches (e.g.: RB-SVM, Decision Trees, Artificial NN, Naive Bayes, some kernel approaches, some hybrid concepts, etc.) from literature. The results obtained during the research of this doctoral thesis have been already partially published or related submissions are under review in following publication organs: 1 already granted EU patent (2014); 1 submitted German patent (under review since 2015); 1 journal paper submission (under review) in the high ranked (Class A+) journal Transportation Research Part C; 1 published journal paper in the Real-time Image Processing Journal; 1 journal paper submission (under review) in the high ranked (Class A+) journal IEEE Transactions on Neural Networks and Learning Systems and 1 book chapter (published).

Contents Contents

iv

List of Figures

vii

List of Tables

x

1 Introduction 1.1 General thesis objectives and research questions . . . . . . . . . . . . . . 1.2 Overall research methodology and brief presentation of the novel neurocomputing (CNN) based concepts . . . . . . . . . . . . . . . . . . . . . . 1.3 Short presentation of the main case studies . . . . . . . . . . . . . . . . . 1.4 A Comprehensive Summary of the quintessence of the results obtained in this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Quality and significance of the results, dissemination status . . . . . . . . 1.6 Thesis outlook short descriptions of the remaining thesis chapters . . . . 1.6.1 Consider Chapter 2 (State-of-the-art review) . . . . . . . . . . . . 1.6.2 Consider Chapter 3 (Case Study 1: Truck detection involving a single presence sensor). . . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Consider Chapter 4 (Case Study 2: Raindrop detection for visual sensors in ADAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.4 Consider Chapter 5 (Case Study 3 and 4 Time series Forecast) . . 1.6.5 Consider Chapter 6 (Conclusion and Outlook) . . . . . . . . . . .

.

1 2

. .

3 4

. . . .

4 5 7 8

.

8

. . .

8 9 9

2 Some fundamentals and a review of the state-of-the-art 2.1 Classification under difficult conditions: state-of-the-art review . . . . . . 2.2 Critical summary of the state-of-the-art represented by the above provided representative sample of the related literature . . . . . . . . . . . . . . . 2.3 Cellular neural networks basics . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 CNN based classification in the current literature: principles, pros and cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 CNN use for a black-box systems science . . . . . . . . . . . . . . 2.4 Echo state network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

10 . 10 . 13 . 13 . 15 . 15 . 16

CONTENTS

2.5

Support vector machine with radial basis function . . . . . . . . . . . . . . 18

3 Case Study 1: Truck detection 3.1 Background and motivation . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Presence detectors . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Truck detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Classification in presence of imbalanced and overlapping classes . 3.2 Related works for truck detection . . . . . . . . . . . . . . . . . . . . . . 3.3 The proposed truck detection process . . . . . . . . . . . . . . . . . . . . 3.4 Features selection and extraction . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Vehicle occupancy time occtar . . . . . . . . . . . . . . . . . . . . 3.4.2 First derivatives f d and second derivatives sd . . . . . . . . . . . 3.4.3 Linear divergence ld . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 The novel cellular neural networks based classification concept . . . . . . 3.5.1 SRB-CNN templates design . . . . . . . . . . . . . . . . . . . . . 3.5.2 Our novel soft radial basis cellular neural network based concept . 3.6 Other classification methodology and competing concepts . . . . . . . . . 3.6.1 Artificial neural network . . . . . . . . . . . . . . . . . . . . . . . 3.6.2 Naive bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.3 Decision tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Data collection and preparation, training and classifiers implementation . 3.7.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Features evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.3 Appropriate training and implementation for SRB-CNN classifier 3.7.4 Appropriate training and implementation for the other classifiers beside SRB-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Comparative performance evaluation of all considered classifiers . . . . . 3.8.1 Evaluation of all considered classifiers . . . . . . . . . . . . . . . . 3.9 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

19 20 21 21 22 22 25 25 26 26 27 28 31 33 35 35 35 35 36 36 37 39

. . . .

40 41 41 42

4 Case Study 2: Drop detection 4.1 The modification of CNN though using RBF-SVM (radial basis support vector machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Use of support vectors as the CNN control templates . . . . . . . . 4.1.2 The use-cases for using support vectors as CNN templates . . . . . 4.1.3 General description of the proposed approach for rain-drops detection 4.1.3.1 Pre-processing phase . . . . . . . . . . . . . . . . . . . . . 4.1.3.2 SVM training in the o↵-line phase . . . . . . . . . . . . . 4.1.3.3 Online rain drops detection . . . . . . . . . . . . . . . . . 4.2 Experimental setup and results obtained . . . . . . . . . . . . . . . . . . .

v

44 45 45 46 46 46 47 48 48

CONTENTS

4.3

4.2.1 Performance and evaluation . . . . . . . . . . . . . . . . . . . . . . 48 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5 Case Studies 3 & 4 : Time-series forecast 5.1 Background and motivation . . . . . . . . . . . . . . . 5.2 Related works . . . . . . . . . . . . . . . . . . . . . . . 5.3 Background knowledge . . . . . . . . . . . . . . . . . . 5.3.1 The two-systems model of cognitive processes . 5.3.2 Model reference neural network adaptive control 5.4 OSA-CNN . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 OSA-CNN data preparation . . . . . . . . . . . 5.4.2 OSA-CNN model . . . . . . . . . . . . . . . . . 5.5 OSA-CNN learning . . . . . . . . . . . . . . . . . . . . 5.5.1 Echo state cellular neural network . . . . . . . . 5.5.2 I-LR learning . . . . . . . . . . . . . . . . . . . 5.5.3 OSA-CNN recursive learning . . . . . . . . . . . 5.6 Evaluation results . . . . . . . . . . . . . . . . . . . . . 5.6.1 Traffic flow time series . . . . . . . . . . . . . . 5.6.1.1 Simulation and training setup . . . . . 5.6.1.2 Performance metrics . . . . . . . . . . 5.6.1.3 Results . . . . . . . . . . . . . . . . . 5.6.2 NN3 time-series . . . . . . . . . . . . . . . . . . 5.6.2.1 Simulation and training setup . . . . . 5.6.2.2 Performance metrics . . . . . . . . . . 5.6.2.3 Results . . . . . . . . . . . . . . . . . 5.7 Chapter summary . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . (MRNNAC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

51 52 52 53 54 54 56 56 58 60 61 61 61 62 63 65 66 67 68 73 76 76 78

6 Conclusions

79

Bibliography

82

vi

List of Figures 2.1 2.2

3.1 3.2 3.3 3.4

3.5

3.6 3.7 3.8 3.9

The cellular neural network architecture as provided in Chua and Yang [1988b]. The state model of a single cell is also given. . . . . . . . . . . . . 14 The echo state network architecture. Inputs, Reservoir and Output layers are presented. The input layer feeds the reservoir which has a random internal connection between its inner cells. The reservoir feeds the output layer which also feeds back the reservoir. . . . . . . . . . . . . . . . . . . . 17 A Scatter Plot of Truck-like vs Passenger Car Samples at Red light. x1, x2 ,x3 ,x4 represent the four proposed features, see Section 3.4 . . . . . . . . . Class Imbalance between car at green (GC), truck at green (GT), truck at red (RT) and car at red (RC) . . . . . . . . . . . . . . . . . . . . . . . . . The three classifiers scheme . . . . . . . . . . . . . . . . . . . . . . . . . . Figures (a) and (b) present the first two classes and show a passenger car and a truck-like-vehicle respectively which is standing still while being in front of a stop sign and over an inductive loop (red traffic light phase). Figures (c) and (d) present the third and fourth classes, which are a passenger car and a truck-like vehicle passing over an inductive loop without any hindrance at their free-flow speed (green traffic light phase) . . . . . . Figure (a) shows the occupancy di↵erence pattern of a car-in-the-middle pattern. Figure (b) shows the occupancy di↵erence pattern of a truck-inthe-middle pattern. a1 and a2 indicate the related foreword and backward slopes which are correlated to the first and second derivatives. . . . . . . . Figure (a) shows the occupancy time orthogonal divergence ld of a car in the middle pattern. Figures (b) shows the ld of a truck in the middle pattern. Linear divergence scenario of a truck-like (bigger ld values) versus passenger car (smaller ld values). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The novel Soft Radial Cellular Neural Network SRB-CNN architecture suggested in this case study . . . . . . . . . . . . . . . . . . . . . . . . . . . . The SRB-CNN template optimization process: general principle. u is the input, ⌫ is the actual output, ⌫ˆ(T ) is the SRB-CNN output at the equilibrium

vii

23 24 25

26

27 28 28 30 33

LIST OF FIGURES

3.10 The SRB-CNN template optimization process (PSO based): related block diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.11 Figure (a) shows the state oscillation of the 25 inner cells and their convergence at time t=5. Figure (b) shows the state output oscillation of the 25 inner cells and their convergence at time t=5. Figures(c) shows the corresponding oscillation of the global output cells (in this example) . . . . 34 3.12 The video-signal synchronization tool . . . . . . . . . . . . . . . . . . . . . 38 4.1

4.2

5.1

5.2

The RBF-SVM-CNN architecture as suggested in this case study. In the image plan, each pixel contains the extracted feature from the corresponding original image pixel. The red pixel is the tested pixel, the green pixels are the 3 ⇥ 3 neighbors. CNN plan contains of the CNN states/pixels. As it shown, the tested pixel in the CNN plan (in red) is connected to all 3 ⇥ 3 neighbors in the image plan. A graphical representation of the RBF-SVMCNN is also presented, in which u1 . . . un represent the pixels on the image features plan and y1 . . . yn represent the corresponding CNN outputs. . . . 47 The raindrops detection results of various experiments, where our SVMCNN method has been using di↵erent sets of features. . . . . . . . . . . . . 49 The OSA-CNN Framework. The Intuitive-System takes the historical values to predict the future value of a time series. The Controller-System reads and compares the historical values with the corresponding prediction values and manipulate the Intuitive-System output to improve the future performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 (a) An example of a time series vk0 (t) with two marked records vk0 (t) and vk0 (t 1) (b) The transformation of vk0 (t) into square pulses (red solid) and smooth reference (brown dotted) series which is constructed by solving ( 5.1) using the Matlab ode113 solver. It also shows the corresponding new position of vk0 (t) and vk0 (t 1) from (a) represented by vrk0 (ti) and vrk0 (ti ⇤) respectively, where ⇤ is the interpolation size or the number of interpolation points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

viii

LIST OF FIGURES

5.3

The OSA-CNN four layers model. Two modes are presented. The online operating (solid line connections) and o✏ine training (doted line) modes. In the operating mode. The first layer is the Controller CNN (C-CNN) connected to the Controller linear layer C-LR. The third layer is the Intuitive CNN (I-CNN) connected to the Intuitive linear layer I-LR. The controller output u(ti) adapts the I-LR biases with respect to the di↵erence between the real historical values vsk1 (ti)..vskh (ti) and the corresponding predicted values vr ˆ k1 (ti)..vr ˆ kh (ti). In addition, four blocks visualizing the models of C-CNN cell state, I-CNN cell state, C-LR and I-LR. In the o✏ine training mode, PSO reads the prediction error and adapts accordingly the the C-LR weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 (a) An example of OSA-CNN in operation mode. Signal in red is the ground truth square pulses series, Signal in blue is the OSA-CNN output which includes multiple prediction values. We read out the last predicted values and consider it as the best one (b) the OSA-CNN corresponding controller output showing the interference of Controller system to improve the prediction performance. . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 The OSA-CNN traffic flow prediction using the detector 1006210 from PEMS data set. It presents OSA-CNN predicted values (in blue) compared with the ground truth traffic flow (in red) of the first two days of November 2009 (1st is Sunday (weekend) and the 2nd is a Monday (working day)) using di↵erent aggregation levels . . . . . . . . . . . . . . . . . . . . 5.6 The performance of OSA-CNN versus I-CNN for di↵erent aggregation levels 5.7 The performance propagation of OSA-CNN for up to 6 future steps . . . . 5.8 The OSA-CNN vs I-CNN performance of the 111 NN3 time series for 18 multi-steps prediction case. (a) the SMAPE with respect to the time-series index. It shows how OSA-CNN improved the results of the I-CNN over all time series (b) the box distribution of SMAPE over the 111 series . . . . . 5.9 The OSA-CNN vs I-CNN performance of the 111 NN3 time series for one step prediction case: (a) the SMAPE with respect to the time series index. It shows how OSA-CNN improved the result of the I-CNN over all time series; (b) the box distribution of SMAPE over the 111 series . . . . . . . 5.10 The comparison between OSA-CNN against the related state of the art methods over the 111 time series of NN3 time series competition. OSACNN proved to be an robust TSF algorithm with 13.18 % comparing to other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

59

64

67 74 74

77

77

77

List of Tables 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9

The di↵erent relative loop positions . . . . . . . The selected data distribution . . . . . . . . . . The features evaluation. The range is between 0 The K-Means configuration parameters . . . . The PSO configuration parameters . . . . . . . The Weka classifiers configuration parameters . Scenario 1 Evaluation . . . . . . . . . . . . . . . Scenario 2 Evaluation . . . . . . . . . . . . . . . Scenario 3 Evaluation . . . . . . . . . . . . . . .

4.1

Performance measures and the average computation times (510⇥585 pixels on GPU) of HSV gradient, HOG, HSV & HOG and edge features. . . . . . 49

5.1

The RPSO configuration parameters. These configurations are used in all of the studied TSF cases . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference Model configurations. Parameters are selected empirically . . The performance of OSA-CNN and I-CNN versus other models, with 3-min aggregation level. Data used in this evaluation are the measured traffic data provided by the PeMS dataset of the detector 1006210. Data have been collected between October 1, 2009 and November 30, 2009. The results of the other method are taken from Chen et al. [2012] . . . . . . . . . . . . The performance of OSA-CNN and I-CNN versus other models, with 5-min aggregation level. Data used in this evaluation are the measured traffic data provided by the PeMS dataset of the detector 1006210. Data have been collected between October 1, 2009 and November 30, 2009. The results of the other methods are taken from Chen et al. [2012] . . . . . . . . . . . . The performance of OSA-CNN and I-CNN versus other models, with 10min aggregation level. Data used in this evaluation are the measured traffic data provided by the PeMS dataset of the detector 1006210. Data have been collected between October 1, 2009 and November 30, 2009. The results of the other methods are taken from Chen et al. [2012] . . . . . .

5.2 5.3

5.4

5.5

x

. . . . . . . . . . . . . . . . . . . . . . . . (bad) and 1 (perfect) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

37 38 39 39 40 40 41 42 42

. 63 . 65

. 69

. 70

. 71

LIST OF TABLES

5.6

5.7

5.8 5.9

The performance of OSA-CNN and I-CNN versus other models, with 15min aggregation level. Data used in this evaluation are the measured traffic data provided by the PeMS dataset of the detector 1006210. Data have been collected between October 1, 2009 and November 30, 2009. The results of the other methods are taken from Chen et al. [2012] . . . . . . The performance of OSA-CNN and I-CNN with 30-min and 60-min aggregation levels. Data used in this evaluation are the measured traffic data provided by the PeMS dataset of the detector 1006210. Data have been collected between October 1, 2009 and November 30, 2009 . . . . . . . . The multi-steps performance of OSA-CNN over the short and long term prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The number of NN3 forecasting competition time-series with respect to the length and seasonality of each time-series . . . . . . . . . . . . . . . . .

xi

. 72

. 73 . 73 . 74

Chapter 1 Introduction The most prominent challenges (abbreviation for Challenge x: Chl.x) for prediction in general and particularly in classification endeavors in the frame of data mining and machine learning are related to the six following contexts which may be encountered either separately or partially or fully combined: Chl.CI: strong class imbalance; Chl.CO: strong class overlap (resulting in high nonlinearity in prediction); Chl.ON: outliers and noise; Chl.LC: learning complexity Chl.SP: the presence of some (external/internal) coupling/modulating stochastic process; and Chl.LG: low out-of-sample performance (low generalization). It is well known that all these classification challenges do constitute serious obstacles for robust machine-learning concepts under real-world classification scenarios and contexts. It is well-known from the relevant literature, that only very few concepts are found, which try to address simultaneously more than one of these named hard challenges under real-world application contexts. In the frame of various sensor data processing and/or classification tasks in the frame of intelligent transportation systems (ITS) one does encounter various classification situations where the above-mentioned challenges are simultaneously faced, what makes them extremely hard to solve, especially by traditional approaches from the relevant state-ofthe-art. Hereby, besides the basic requirements related to a high-qualitative classification despite the harsh conditions, one does also have to cope, additionally, with a series of computational requirements related to resource consumption efficiency, namely low memory usage (on embedded systems) paired with a relatively high computational speed in order to satisfy real-time constraints. In most situations, classification and prediction/forecasting are needed in the frame of either the online operations and optimization in automated vehicles contexts. In most realistic (real-world) situations, these classification and prediction/forecasting tasks are performed under strong stochastic and nonlinear conditions. It has been found out that in such prediction/forecasting cases, classification can be involved despite such harsh prediction contexts, whereby the given prediction and/or forecasting problem is transformed into a classification one, whereby, however, this last one does then simultaneously face

1

1. Research questions

mostly all 6 above mentioned classification challenges.

1.1

General thesis objectives and research questions

The focus of this thesis is to reliably and robustly solve hard classification and prediction problems encountered in the frame of 4 real case studies of relevance for ITS related sensor data processing. Overall, we do consider two major scenarios: (1) classification; and (2) prediction and forecasting. In Scenario (1) we do consider two use cases: Case Study 1: Classification in the context of a car/truck detection task while involving only a single presence-sensor in the frame of advanced traffic management systems (ATMS) in ITS. Case Study 2: Classification in the context of a rain drops detection task in the frame of a low-level image processing for advanced driver assistance systems (ADAS). And in Scenario 2 we do also consider two use cases: Case Study 3: Prediction in the context of a traffic flow related time series forecast. Case Study 4: Prediction in the context of various business time series forecast. Independently of all 4 above cited case studies the following six general research questions are to be solved in the frame of this doctoral thesis: Research Question 1 (RQ1): What are the main limitations of the related stateof-the-art regarding the addressing of the six above-mentioned theoretical challenges for classification/prediction, while considered separately and/or combined? Research Question 2 (RQ2): How far can a neuro-computing based classification concept involving Cellular Neural Networks (CNN) reliably solve the six identified theoretical challenges faced by classification. The general assessment should consider both the classification performance and the computational requirements encountered in most practical real-world case studies. Research Question 3 (RQ3): How far can a neuro-computing based time series forecast concept involving CNN reliably solve the six identified theoretical challenges faced by prediction. The general assessment should consider both the classification performance and the computational requirements encountered in most practical case studies.

2

1. Methodology

Furthermore, and this is specific for each of the 4 fixed case studies, the following additional research questions should be addressed: Research Question 4 (RQ4): Conduct an appropriate comprehensive systems engineering of a CNN based classifier concept for each of the underlying case studies. Thereby, solve also the appropriate features selection issues. Also, identify and address classification challenges encountered in each given case study. Research Question 5 (RQ5): For each case study, define and justify the comprehensive performance metrics imposed by the practical context and conduct both a concept validation and extensive stress-tests through some appropriate benchmarking whenever possible.

1.2

Overall research methodology and brief presentation of the novel neuro-computing (CNN) based concepts

For all considered complex and extremely challenging practical situations, a novel CNN based comprehensive classification/prediction concept has been developed and validated. Hereby, appropriate features selection schemes have been efficiently conceptually integrated. CNN is a very powerful type of neural network. A CNN integrates the architecture of both cellular automata and artificial neural networks and is essentially an array of locally coupled nonlinear cells (realizable in hardware by electrical circuits), which is capable of processing a large amount of information in parallel and in real-time (i.e. with an extremely high processing speed; see Reference by Chua Chua and Yang [1988a] where he calls it supercomputer on chip. Roska, Tamas, and Leon O. Chua. Reprogrammable CNN and supercomputer Roska and Chua [1994] and other references, Roska, Tamas, and Leon O. Chua. The CNN universal machine: an analogic array computer Roska and Chua [1993]. CNN is used as a universal system modeler and an appropriate related architecture for solving classification problems and/or prediction problems is tuned. Hereby, the CNN processor is working as a robust and nonlinear discrete black-box transfer function, whereby the inputs are the features or parts of a time-series and the output is either the predicted class or the predicted part of a timeseries. Overall, an appropriate CNN based classifier/predictor is developed and tuned for each of the considered real-world case studies which are quite di↵erent from each other in a series of significant aspects. In the systems engineering of the CNN classifier we do have 2 major steps: (a) identification and validation of the appropriate features or system inputs and design of an efficient way to extract them; (b) design and fine-tuning of a casestudy related customized CNN architecture, which is optimized for the given case study;

3

1. Case studies

and (c) training concept for the CNN processor: generally the training has been involving in a first generation an adapted and optimized variant of the PSO (particle swarm optimization), and in later generations an reservoir computing inspired training concept and architecture optimization is involved. For all considered case studies, a careful concept validation, an extensive stress-testing, and in most cases some extensive benchmarking were thoroughly conducted.

1.3

Short presentation of the main case studies

In this doctoral thesis, the classification/prediction problem under difficult conditions is addressed in a series of practical real-world situations of real sensor data processing in intelligent transportation systems (IST): Car/truck detection, while using/involving a single presence sensor: here, we do have a classification endeavor where challenges Chl.CI, Chl.CO, Chl.LC and Chl.LG are faced simultaneously; Rain drops detection in an image: here, we do have a classification task embedded in a complex low-level image processing context for visual advanced driver assistance systems (ADAS) where challenges Chl.CO, Chl.ON and Chl.LG are the most prominent; Traffic flow prediction: here, the task is to solve a hard prediction/forecast problem; one does thereby e↵ectively face all the above listed six challenges, but mostly Chl.CO, Chl.SP and Chl.LG. Business time series prediction: An online business time-series consisting of di↵erent real case studies. These dataset have been utilized by over 60 researchers to evaluate their prediction models. This case does e↵ectively face all the above listed six challenges but mostly Chl.CO, Chl.SP and Chl.LG.

1.4

A Comprehensive Summary of the quintessence of the results obtained in this thesis

The quintessence of the results obtained in this thesis can comprehensively be summarized as follows (details are provided and appropriately explained in the remaining chapters of this work): • The state-of-the-art does not o↵er reliable comprehensive concepts, which do satisfactorily address the challenges at stake, especially when these do appear in com-

4

1. Quality and significance of the results

bination (i.e. of more than 2 challenges) and also while considering hard real-life application contexts and constraints. • It has been solidy demonstrated that a conceptual platform based on cellular neural networks (CNN) does o↵er the potential and the proven capability of addressing all faced challenges while also fulfilling the constraints and requirements of real-life application contexts. Chapters 3, 4 and 5 of this thesis do provide details related to specific real-world case studies which were solidly well solved. • A comprehensive systems-engineering oft he CNN based methological platform concept has been developed, validated and illustrated by its involvement and related tuning for at least three significantly di↵erent real-world hard case studies. All issues of relevance, amongst others those regarding the global systems modeling, features- identification, -selection and extraction, architecture design and optimization, and training have been satisfactorily solved and confirmed through extensive solid validation processes. • The results obtained are very significant both theoretically and practically, as realworld scenarios have been considered, and do confirm the superiority of the neurocomputing based conceptual platform to solve the issues at stake while compared to counterparts from the latest relevant state-of-the-art (literature). • In its best configuration, the CNN processor system framework used for classification and/or prediction, as developed and validated in this thesis work, does integrate both reservoir-computing and echo-state neuro-computing inspirations and is therefore fastly trainable, even online in real-time whenever needed, through an additional functional block of a CNN based on-the-fly ultrafast matrix inversion concept. Hereby, all requirements related to an implementability on embedded platforms are fully satisfied: memory efficiency, fastest processing speed, extreme flexible implementability of various target embedded platforms (e.g.: multi-core ARM processors, FPGA, DSP, FPAA, GPU, etc.).

1.5

Quality and significance of the results, dissemination status

The results obtained during the research of this doctoral thesis have been already either partially published or related submissions are under review in following publication organs (i.e. international journals, conferences, book chapter, patents): • An already granted EU patent (2014 already published), by the company Swarco Traffic System GmbH (Germany). Under the title Quality determination in data

5

1. Quality and significance of the results

acquisition , Patent No. EP2790165, https://data.epo.org/publication-server/ rest/v1.0/publication-dates/20141015/patents/EP2790165NWA1/document.pdf • A submitted EU/German patent (submited in 2015), by the company SWARCO TRAFFIC SYSTEMS GmbH (Germany). Under the title Steuervorrichtung und Steuerverfahren f¨ ur eine Verkehrssteuer-Anlage, und Steuersystem. • A journal paper submission (currently under review since May 2015) in the high ranked (Class A+) journal Transportation Research Part C: Manuscript ID: TRC-D-15-00320; Title: Soft Radial Basis Cellular Neural Network (SRB-CNN) based Robust Low-Cost Truck Detection using a Single Presence Detection Sensor. • A published journal paper (published 2016) in the journal Real-time Image Processing Journal: Al Machot, Fadi, Mouhannad Ali, Ahmad Haj Mosa, Christopher Schwarzlmller, Markus Gutmann, and Kyandoghere Kyamakya. ”Real-time raindrop detection based on cellular neural networks for ADAS.” Journal of Real-Time Image Processing (2016): 1-13. • A journal paper submission (currently under review since May 10th, 2016) in the high ranked (Class A+) journal IEEE transactions on neural networks and learning systems: Manuscript ID: TNNLS-2016-P-6504, Title: Online Self-Adaptive Cellular Neural Network Architecture for Robust Time-Series Forecast. In addition, the following conference papers and/or a book chapter of direct relevance to the core issues of this thesis have already been published: • Ahmad Haj Mosa, Kyandoghere Kyamakya, Mouhannad Ali, Fadi Al Machot, Jean Chamberlain Chedjou, ”Neurocomputing-based Matrix Inversion Involving Cellular Neural Networks: Black-box Training Concept”, in Proceedings of the 8th GI Workshop Autonomous Systems 2015, Vol 842, 119 - 132, VDI Verlag GmbH, October 2015. • Ahmad Haj Mosa, Jean Chamberlain Chedjou, Mouhannad Ali, Kyandoghere Kyamakya, ”Input Variant Particle Swarm Optimization for Solving Ordinary and Partial Di↵erential Equations with Constraints” Proceedings of the International Symposium on Theoretical Electrical Engineering (ISTET 2013), 2-2, June 2013. • Mouhannad Ali, Fadi Al Machot, Ahmad Haj Mosa, Patrik Grausberg, Nkiediel Alain Akwir, Baraka Olivier Mushage, Kyandoghere Kyamakya, ”A Review of Object Classification for Video Surveillance Systems”, in Proceedings of the 7th GI Workshop Autonomous Systems 2014, Vol 835, 197 - 208, VDI Verlag GmbH, 2014.

6

1. Quality and significance of the results

• Mouhannad Ali, Ahmad Haj Mosa, Fadi Al Machot, Kyandoghere Kyamakya, A Novel Real-Time EEG-Based Emotion Recognition System for Advanced Driver Assistance Systems (ADAS), in Proceedings of the INDS’15 , 9 - 13, Shaker Verlag GmbH, Juli 2015 • Mouhannad Ali, Fadi Al Machot, Ahmad Haj Mosa, Kyandoghere Kyamakya, ”A novel EEG-based emotion recognition approach for e-healthcare applications”, in Proceedings of the 31st Annual ACM Symposium on Applied Computing, 162-164, January 2016. Further conference papers and/or a book chapter of low or only indirect relevance to the core issues of this thesis have already been published: • Al Machot, Fadi, Ahmad Haj Mosa, Alireza Fasih, Christopher Schwarzlmller, Mouhanndad Ali, and Kyandoghere Kyamakya. ”A Novel Real-Time Emotion Detection System for Advanced Driver Assistance Systems”. In Autonomous Systems: Developments and Trends, 267-276. Springer Berlin Heidelberg, 2012. • Ahmad Haj Mosa, Mouhannad Ali, and Kyandoghere Kyamakya. ”A Computerized Method to Diagnose Strabismus Based on a Novel Method for Pupil Segmentation.” In Proceedings of the International Symposium on Theoretical Electrical Engineering (ISTET 2013), University of West Bohemia in Pilsen, Faculty of Electrical Engineering, Juni 2013. • Ahmad Haj Mosa, Mouhannad Ali, Fadi Al Machot, Kyandoghere Kyamakya. ”A Computerized Method to Diagnose Strabismus”, In Proceedings of the 7th GI Workshop Autonomous Systems 2014, Vol 835, 209 - 220, VDI Verlag GmbH, 2014. • Ahmad Haj Mosa, Mouhannad Ali, Jean Chamberlain Chedjou, Kyandoghere Kyamakya, ”OD Matrix Estimation of a Complex Junction based on Particle Swarm Optimization”, Proceedings of the 13th International Conference on Innovative Internet Community Systems and the International Workshop on Autonomeaous Systems, Vol 826, 282 - 288, VDI Verlag GmbH, Juni 2013.

1.6

Thesis outlook short descriptions of the remaining thesis chapters

The remaining part of this thesis does consist of the following 5 chapters. Each of them is briefly summarized below.

7

1. Quality and significance of the results

1.6.1

Consider Chapter 2 (State-of-the-art review)

In this chapter, a brief survey the relevant state-of-the-art is discussed, whereby we do thereby show possible limitations of those concepts from literature in view of the current task under investigation in this thesis. In addition, for each of the considered case studies (in the respective chapters 3, 4 and 5) we present some details (but not all, because details are in the related chapters 3, 4 and 5) of the relevant related works.

1.6.2

Consider Chapter 3 (Case Study 1: Truck detection involving a single presence sensor).

This chapter handles the first case study, which considers the real-world context of a truck detection while involving a single presence traffic sensor. The hard related classification problem is described and a comprehensive CNN based solution concept is engineered and validated trough extensive validation tests involving real field-data.

1.6.3

Consider Chapter 4 (Case Study 2: Raindrop detection for visual sensors in ADAS)

This chapter handles the second case study, which considers the real-world context of visual sensors in ADAS. An essential goal of advanced driver assistance systems (ADAS) is to provide the driver with important information about the state of the vehicle. The visual perception of a driver might be a↵ected, when severe weather conditions such as rain might block regions of the windshield. Therefore, the automated detection of raindrops has a significant importance for video-based ADAS. The related video preprocessing steps make the detection of raindrops a time-critical process, in which, the improvment of the image quality and accurate decision is requiered in real-time. This chapter presents an approach for real-time raindrops detection which uses cellular neural networks (CNN) as a core processor integrated with support vector machines (SVM). The major idea is to prove the possibility of using the support vectors as CNN control templates. The advantage of CNN is its ultra-fast processing capability on any platform (on either CPU or embedded platforms). The proposed approach is capable of detecting raindrops that might negatively a↵ect the vision of the driver. Di↵erent visual features were extracted to evaluate and compare the performance between the proposed approach and other approaches.

8

1. Quality and significance of the results

1.6.4

Consider Chapter 5 (Case Study 3 and 4 Time series Forecast)

This chapter does handle two case studies: the first one is related to the forecasting of traffic flow data in ITS (case study 3), and the second one is related to the forecasting of business data (case study 4). For both case studies, the hard related classification/prediction problem is described and a comprehensive CNN based solution concept is engineered and validated trough extensive validation tests involving real field-data. Hereby, the prediction problem is transformed, in the conceptual process, and transformed into an appropriate classification problem.

1.6.5

Consider Chapter 6 (Conclusion and Outlook)

In this chapter, the quintessence of this work’s core results is presented. Besides a review of the main challenges addressed and the related research questions, a comprehensive summary of the quintessence of the solutions found is presented. The significance of the results obtained is also briefly underlined. Finally, a short outlook is briefly discussed, where some potential further research avenues based on the results of this thesis are indicated.

9

Chapter 2 Some fundamentals and a review of the state-of-the-art A review of the relevant state-of-the-art reveals that the six classification challenges described in Chapter 1( Chl.CI; Chl.CO; Chl.ON; Chl.LC; Chl.SP; and Chl.LG) are not sufficiently and robustly solved, especially while combined, and also while additionally considering practical computational requirements. A short selection sample amongst the most recent works does confirm how far the issues at stake are not sufficiently and convincingly solved especially amongst others while considering the application to realworld cases. In this chapter, we briefly survey the relevant state-of-the-art, whereby we do thereby show possible limitations of those concepts from literature in view of the current task under investigation in this thesis. In addition, for each of the considered case studies we present in details the relevant related works.

2.1

Classification under difficult conditions: state-ofthe-art review

The six challenges addressed in this thesis are the most crucial issues in the field of pattern recognition and many studies have been conducted on this problem. In the regard of solving these challenges, researchers investigated three main strategies. The first strategy manipulates the feature space by introducing di↵erent mapping techniques. The most promising contributions are summarized in the following: • Stacked Sparse Auto Encoders (SSAN). This is an unsupervised method that has been extensively used in the field of visual and acoustic pattern recognition. It proposes a deep neural network architecture with a layer-wise (layer by layer) learning technique. The aim is to build high level features that deliver the best mapping of the raw features. It proved to be promising according to many contributions Hinton

10

2. Classification under difficult conditions: state-of-the-art review

and Salakhutdinov [2006]; Lee et al. [2006]; Olshausen et al. [1996] and most recently in Le [2013], which shows the importance of SSAN. They did evaluate their method on ImageNet visual object dataset. This method improves the detection rate by at least 5% comparing to other related works. However it also shows the learning complexity (see Chl.SP) of this method (they needed 16,000 processing cores). • Features de-correlation and whitening methods such as Principle Component Analysis (PCA) Le [2013], Independent Component Analysis (ICA) Bartlett et al. [2002] and most recently Reconstruction Component Analysis (RICA) Le et al. [2011]. Features whitening is considered to be a very important preprocessing step for all classification problems namely for those related to Chl.ON and Chl.LC. • Hybrid methods consider separation of feature space into: non-overlapped, purely overlapped and uncertainty region Vorraboot et al. [2015]. These methods attempt to solve the class imbalance and overlap problems (Chl.CI and Chl.CO). • Manipulating the class distribution to overcome the imbalance problem (see Chl.CI). Such a method is useful whenever the imbalance is caused by insufficient measurement samples. However, in many cases the imbalance does rather reflect the real distribution of the studied data, thus, changing the balance may cause a low generalization property Drummond et al. [2003]; Laurikkala [2001]. The second strategy focuses on the decision function. The main contributions can be summarized in the following: • Probabilistic decision functions. They fit to the classification problems in the presence of stochasticity (see Chl.SP). Most common related methods are: Naive Bayes classier Rish [2001], Bayesian Quadratic Discriminant Analysis Srivastava et al. [2007], and Probabilistic Neural Networks Lin et al. [1997]. Such statistical methods require a huge amount of data (see Chl.LC) and fail in the presence of high nonlinearity (see Chl.CO) in the feature space. • Margin classifiers are methods, which impose a margin between data samples and the decision function. A common example of such classifiers is the Support Vector Machines classifier; which has been successfully utilized in many classification problems Kecman [2001]; R¨atsch et al. [2007]. • Ensemble classification algorithms do combine a set of weak classifiers into a strong classifier. In other words, it fuses the output of di↵erent classifiers to improve the performance. Bagging and boosting are the two common techniques of fusing. In Bagging the original data set is uniformly divided into random overlapped subsets. Then weak classifiers are trained for each subset. The final decision can then be

11

2. Classification under difficult conditions: state-of-the-art review

made by either averaging or voting the weak classifiers output Breiman [1996]; Sahu et al. [2011]. Random forest is a promising example of bagging, which proved to overcome two of the challenges Chl.CO and Chl.LG) Ham et al. [2005]; Pal [2005]. On the other hand, boosting also builds on weak classifiers, however on the original data set rather than on random subsets. Boosting is an iterative process. At the initial iteration all samples get a uniform weights, then at each iteration a weak classier (i.e. linear) is learned. Accordingly, the samples that are correctly classified get a relatively low weight compared to the misclassified samples. In this way, the next weak classifiers focus more on the samples that previously are misclassified. There are many techniques that use boosting such as AdaBoost, LPBoost, TotalBoost and LogitBoot. Similarly as bagging, boosting also delivers a relatively good performance when facing two of the challenges Chl.CO and Chl.LG Zhou [2012]. Despite the encouraging performance of ensemble classifiers, they do experience complexity in learning (Chl.LC). • Discrete or continuous recurrent neural network (RNN) is the best candidate when the studied data samples are temporally correlated (e.g.visual tracking). RNN provides a practical memory that supports the decision making of the classier/predictor. Long-short term RNN He et al. [2015], Echo state network Skowronski and Harris [2007] and Cellular neural network Gacsadi and Szolgay [2005] are praised models in this regard. The last and third approach targets the learning algorithm. The recent contributions are summarized in the following: • Matias Di et al(2013) propose a novel approach Martino et al. [2013]. Their novelty is translated as coping with the imbalance problem without changing the distribution of the original class. They meet that goal by introducing a training algorithm that considers the maximization of the F-measure. However, the reported F-measure, in case of a highly imbalanced data-set, did not exceed 50%. • The class imbalance and overlap problems are usually a result of weak features besides a poor mapping of the data. Both problems can be improved by using high nonlinear models (i.e., multilayer neural networks), which may increase the performance but at the same time increase the learning complexity. The extreme learning machine (ELM) has been proposed to solve this problem Huang et al. [2006]. ELM is used to train a neural network by first generating random hidden layer weights (the hidden layer must have an enormous number of neurons) and then the output layer is trained using a simple least square. This technique is extremely fast and performs well. The good performance is delivered by a large number of neurons which improves the nonlinearity capability. The RNN version of ELM

12

2. Critical summary of the state-of-the-art

is mainly realized in Echo state network which shows a remarkable performance in dealing with prediction and classification problems Jaeger [2001]. ELM is also utilized successfully to learn stacked auto encoders (deep neural networks) Tang et al. [2016].

2.2

Critical summary of the state-of-the-art represented by the above provided representative sample of the related literature

In the following we summarize the critical drawbacks of the related stare-of-the-art: • Very rarely, if at all, more than two (amongst the six hard) classification challenges are systematically addressed simultaneously. • The better performing approaches do appear to combine neural networks, support vector machines and kernel methods. • The universality issue is still a dilemma especially in time-varying systems in which an adaptive decision function is required. • Recurrent neural network has been successfully involved for prediction and classification problems. • The Extreme Learning machine is a very promising learning technique in terms of learning complexity and universality. • CNN has been involved for classification in some punctual cases, it did perform very well, but the general potential to address simultaneously all identified challenges has not been addressed yet. .

2.3

Cellular neural networks basics

CNN is a universal system modeler expressed by a system of di↵erential equations that demonstrates the relationship between close/neighboring nonlinear units. CNN has been suggested by Chua and Yang in the year 1988 Chua and Yang [1988a]. It incorporates good features of both ANN and cellular automata (CA); this makes it more promising compared to both earlier approaches. However, despite the ability to integrate the qualities of both CA and ANN it does di↵er from CA and ANN by its nonlinear dynamical

13

2. Cellular neural networks basics

Control Bias Template Inputs

+

( )



Cell Output

-1

Feedback from other cells

Feedback Template

Figure 2.1: The cellular neural network architecture as provided in Chua and Yang [1988b]. The state model of a single cell is also given. representation of the coupling amongst cells associated with its property of local connectivity. Due to its parallelism capability, CNN has a significant processing capacity and does provide easy and flexible implementability in either software or hardware or even in an analog circuit or in hybrid constellations of all the lastly named target platforms. More importantly, CNN can be embraced as a part of digital platforms (see CNN chips Kinget and Steyaert [1995]; Roska and Chua [1993]) or emulated on top of digital platforms like FPGA and GPU Ho et al. [2008]; Nagy and Szolgay [2003]. Recently, reconfigurable analog platforms like FPAA (field programmable analog arrayDudek and Hicks [2000]) have o↵ered a further realization alternative. The generally proposed state equation of a CNN cell is given in (2.1) Chua and Yang [1988b]. dxi (t) = dt

xi (t) +

n X

ak,i yk (t) +

k=1

m X

bk,i uk (t) + I

(2.1)

k=1

where xi (t) is the cell state and 1 yi (t) = f (xi (t)) = (|xi (t) + 1| 2

14

|xi (t)

1|)

(2.2)

2. Cellular neural networks basics

do represent respectively the current system state and the output of the cell (i); uk (t) is the current kth input value, A = a1,1 · · · an,n is the feedback template, B = b1,1 · · · bm,n is the control template, I is the cell bias; n and m are the number of cells and inputs respectively. In Fig. 2.1, the CNN architecture illustrated and a graphical representation of (2.1) is given. Generally, CNN is utilized as a continuous-time oscillator that is driven/excited by an external input. But discrete-time CNN model variants (DT-CNN) are also available Destri [1996]; Harrer and Nossek [1992]. For the continuous time CNN, the di↵erential equation (2.1) needs to be solved in order to get the value of each cell state. In the DT-CNN case, a corresponding di↵erence equation is considered instead.

2.3.1

CNN based classification in the current literature: principles, pros and cons

The exploitation of CNN as a classifier is still limited but does possess a huge potential. Most related research utilizes CNN in the frame of texture and visual object recognition Milanova and B¨ uker [2000]; Shou and Lin [2004]; Szir´anyi and Csapodi [1998], in which CNN shows a remarkable performance. However, the proposed architecture in those last-named works can only be employed in the scope of visual patterns recognition. Another example of a CNN classifier is suggested by Kim et al. [2003]. They present a method that takes advantage of combining CNN and dynamic programming. However, the complexity of the proposed classification method is proportional to the size of the data-set .

2.3.2

CNN use for a black-box systems science

When the problem under consideration is sophisticated enough or one possesses insufficient information about the problem, it is hard to model it using physical laws following the white-box approach. The system should then be modeled as a black-box in which the system parameters are estimated using solely the measurements and/or the observations of given/selected inputs and their respective outputs van den Bosch and van der Klauw [1994]. CNN can be used to realize a robust black-box modeling system that reliably learns even strong nonlinear behavior of a complex system from only its measured input/output data pairs. One example of using CNN as a black-box modeling instrument is introduced by Sahin et al. (2011); thereby CNN is proposed as a robust forecast modeling instrument for analysis and prediction of missing data purposes lk Alver ahin et al. [2011]. Object detection/classification is one field where black-box modeling is prevalent because only knowledge of objects attributes and related labels is available in most cases. Consequently, most related techniques like support vector machines (SVM), Artificial Neural Networks (ANN), and Naive Bayes (NB) should be considered black-box models Pappa

15

2. Echo state network

and Freitas [2009]. CNN black-box object detection/classification does show a remarkable competitive performance according to many published related works Milanova and B¨ uker [2000]; Perfetti et al. [2007]; Tang [2009]. However, there are two properties of traditional CNN architectures that partially make them incompatible with most pattern detection cases (also, in general, most black-box modeling cases like clustering, time series analysis). These two drawbacks are however successfully addressed and solved by our novel CNN architecture concept presented in this thesis. Those drawbacks are the following: • The input and output spaces do in traditional CNN architectures generally have identical dimensions Chua and Roska [2002]. This property is compatible with visual pattern recognition, in which both feature space(s) and class space(s) are images of the same size. In contrast, this property is inappropriate for most detection/classification problems where input and output spaces do generally have di↵erent dimensions. Furthermore, when either the input or the output dimension is relatively small (like in the cases studied in this thesis), then the related model can not sufficiently benefit from CNN’s valuable characteristic of multiple interconnected cells. • In the traditional CNN model, the single nonlinear part is the output part. Thus, it is thereby hard to model imbalanced and overlapped input data spaces as observed in many detection problems.

2.4

Echo state network

ESN is a novel type of RNN that has produced excellent performance in the forecasting of either nonlinear or non-Gaussian dynamical sequences. Primarily, ESN was designed by Jaeger back in the year 2001 with the objective of using an RNN with a large number of neurons to provide a desirable sparsity from which valuable information coming from the inputs can be better extracted Jaeger [2001]. Compared to other neural networks concepts, ESN does have the clear merits of enclosing finer details regarding the contribution of the past in a form that reflects the recent past at best. This feature becomes possible due to its large number of neurons along with the recurrent connections, which give it a shortterm memory. Therefore, the challenge of finding the best time-delay of inputs (i.e. how much history to be considered in the input vector) is no more an issue, unlike previous neural network concepts. Another worth mentioning advantage of ESN is that it is simple to train; thus it is easy to reach optimally the larger population so rarely confronted by the challenges common to other neural networks. ESN has the following configuration which are also visualized in Fig.2.2: • Its first layer called the reservoir consists of nonlinear neurons. Virtually, in this layer, all neurons are connected with each other and with themselves (self-feedback).

16

2. Echo state network

Feedback weights

Input weights

Output weights Internal weights

Input Layer

Output Layer Reservoir Layer

Figure 2.2: The echo state network architecture. Inputs, Reservoir and Output layers are presented. The input layer feeds the reservoir which has a random internal connection between its inner cells. The reservoir feeds the output layer which also feeds back the reservoir. • The network inputs are respectively connected to each neuron of the first layer. Also, each output of the input layer neurons is connected to all hidden reservoir layer neurons. • The output layer consists of a linear regressors which is connected to the outputs of all hidden-layer reservoir neurons. • The network is trained in a first stage by a random initialization of both the reservoir inner/hidden-layer weights and also input-layer weights. Secondly, the reservoiroutput weights are trained by using a regularized simple least square method. Despite many desirable features, traditional ESN concepts do experience two main limitations. These limitations include the issue of ill-conditioned solutions, which is usually related to linear regression or recursive least squares, which may lead to poor outcomes due to the large output weights that mostly a↵ect the universality and the stability of the network. Furthermore, there is the problem of input uncertainty. Generally, uncertainty from the inputs is a very common case in industrial time-series prediction, where noise is a serious issue to be addressed Lukoˇseviˇcius [2012].

17

2. Support Vector Machine with Radial Basis Function

2.5

Support vector machine with radial basis function

The SVM is a classifier that separates a set of objects into classes so that the distance between the class borders is as large as possible. The idea of SVM is to separate both classes with a hyperplane so that the minimal distance between elements of both classes and the hyperplane is maximal. In general, training data are not always linearly separable due to noise or the class distribution. Therefore, it is not possible to separate both classes with a single linear hyperplane. Hence, a kernel function is generally needed because of the nonlinearity of the training data that transforms the data into a higher dimension. To reduce the computational cost, positive definite kernel functions are used, e.g. polynomial kernels and the Radial Basis Function (RBF) Vapnik et al. [1997].

18

Chapter 3 Case Study 1: Truck detection This chapter presents a general concept for a robust truck detection involving one single presence sensor (e.g. an inductive loop, but also any other presence sensor) at a signalized traffic junction. This case study face namely challenges Chl.CI, Chl.CO and Chl.LG. To overcome these challenges, a novel CNN based frame work is developed. The proposed framework combines the powerful features of CNN (see Section 2.3) with high nonlinearity of Radial Basis Functions (RBF). The oscillation behavior of CNN makes it more robust for uncertainty (mainly because of class overlap Chl.CO) classification problems. Thereby the decision of CNN is made in some propagations/iterations which mimic the human mind way of decision making (think then decide). This feature is superior in comparison to the classical classification method, in which the decision is made at one iteration. The second feature of CNN is its memory thanks to its feedback connections. a practical memory is important when dealing with temporal correlated inputs such this case, in which the truck/car patterns change with respect to the traffic situation (congested/non-congested traffic). Because the provided memory helps CNN implicitly to recognize the traffic situation. Thus, the memory feature is important for Chl.CO, Chl.LG and Chl.CI. In addition, adding the high nonlinearity of RBFs, our proposed model becomes a promising model for truck detection. In the regard of truck detection, two operations modes are distinguished: (a) during green traffic lights phases, and (b) a much challenging case, during red traffic lights phases. First, it is shown how challenging the underlying classification task is, this mainly due to strongly overlapped classes, which cannot be easily divided by simple hyper-planes. Then, the proposed framework is validated and extensively benchmarked with a selection of the best representatives of the current related state-of-the-art classification methods. To benchmark the concept, the same features are used for all selected classifiers. The superiority of the novel CNN based classifier is thereby underscored, as it outperforms the other ones. This novel SRBCNN based concept does satisfactorily fulfill the hard industrial requirements regarding robustness, low-cost, high processing speed, low memory consumption, and the capability to be deployed in low cost embedded systems. The work presented in this chapter has

19

3. Background and Motivation

been submitted for publication (status: under review) to the journal Transportation Research Part C: Emerging Technologies.

3.1

Background and motivation

In modern adaptive and optimized road traffic control concepts, highly reliable traffic sensing at junctions is a very critical brick of the sensors system. In view of the highly congested urban streets, the pressure for optimizing the traffic control in order to ensure a high throughput of the traffic systems is higher than ever de Laski and Parsonson [1985]. Practitioners from the operations of advanced traffic management systems currently call for an accuracy of maximum one second regarding the dynamic sizing of time-splits amongst the di↵erent phase groups at a junction Wilshire [1985]. In the sake for optimality, one should also be able to di↵erentiate between normal cars and trucks or truck-similar big vehicles like buses. Trucks and big vehicles, this is well known and considered in related microscopic traffic models Sun [2000], do behave di↵erently (see related kinematics and dynamics) in and at a junction due to their relatively much bigger mass and length if compared to a normal car. Thus, a reliable and accurate truck detection does consequently enable an optimized traffic control at a junction. While considering practical constraints regarding costs when it comes to practical system implementations, any system o↵ering high performance at a relatively low-cost is much welcome. Therefore, the following criteria will apply for fairly judging a good truck detection sensing concept: (a) highest possible performance, (b) high robustness and reliability in both green and red traffic light phases; and (c) low-cost due amongst other things to involving only one single sensor and low resource consumption on embedded platforms. This work mainly focuses on presence sensors which are the most pervasive ones in current traffic management systems. A presence sensor does generate a rectangular pulse when it has been crossed by a vehicle, the pulse width is called occupancy time and does somehow correlate with the vehicle speed. The most widely used sensors of this type are the inductive loop detectors. Further most recent presence sensors may include earths magnetic field sensors, radars, and even camera/video based sensors de Laski and Parsonson [1985]. Although the concept developed in this case study is applicable to all presence sensors, this case study does use field data collected from inductive loops in a real city traffic management context in Austria. Recently, advanced traffic management systems (ATMS) have occupied an important spot within the field of intelligent transportation systems. ATMS are viewed as a low cost way to improve the safety and increase the traffic flow throughput and also to reduce the fuel consumption and environmental costs, when comparing to the alternative

20

3. Presence Detectors

of the extreme costly construction of new and/or broader city roads or highways. Truck detection is a highly relevant and essential sensing task within ATMS, though it is not very challenging if one has simultaneously access to a variety of di↵erent and meaningful (but generally relatively costly) sensor data such as data from video surveillance, a pair of longitudinally consecutive inductive loop detectors, laser scanning or data about the weight of vehicles. If one is confronted with the task of classifying vehicles by using solely a single inductive loop detector (for economic reasons), what generally implies that the only data available are the occupancy times of the vehicles and their relative/successive headways, the task becomes a non-trivial one requiring elaborated and well adapted approaches due to the lack of versatile sensor data. There are already rather trivial but more costly truck-detection concepts using two consecutive loops separated by a fix/known distance Kim and Coifman [2013]; Minge [2012]; Wu and Coifman [2014]. This paper does however address the rather scientifically more challenging scenario where only one single presence sensor (e.g. an inductive loop) is involved in the truck detection endeavor.

3.1.1

Presence detectors

In the microscopic traffic measurement, many di↵erent types of sensors can be used to detect the presence of vehicles: cameras, radars (incl. lidar and ultrasound), and analyzing analog signals of inductive loop detectors. Common approaches to detect vehicles in a dynamic traffic environment can be classified as being either intrusive or non-intrusive. In principle, intrusive ones are potentially more robust and reliable in presence of any weather condition. According to a non-standard definition, the term intrusive refers to equipment that is installed either beneath or on the surface of a roadway such as pneumatic road tubes, piezoelectric sensors or magnetic- or inductive loops Kell et al. [2006]. Non-intrusive technologies commonly refer to equipment which is used to observe the traffic from a remote location in a passive manner, such as the manual counting of vehicles, passive and active infrared detection, passive magnetic detection, microwaveradar sensors, ultrasonic and passive acoustic sensors as well as video image detection.

3.1.2

Truck detection

Vehicle classification in the context of traffic sensing and analysis is primarily concerned with the di↵erentiation of the two major vehicle types which comprise the bulk of nowadays traffic mass (although trucks themselves may be sub-classified in more than a dozen sub-classes). These major types are normal passenger cars and truck-like vehicles such as trucks and buses. These types of vehicles have their own significant properties such as vehicle dimension (primarily significant are vehicle length and vehicle height), vehicle mass, maximum vehicle speed and maximum vehicle acceleration/deceleration profiles, what makes it possible to distinguish them from normal cars quite accurately. Neverthe-

21

3. Classification in presence of imbalanced and overlapping Classes

less, sensors that can measure all these attributes together (length, height, mass, etc.) are very expensive. At the same time, low cost sensors like a single inductive loop do just sense trivial/simple vehicle-dynamics related attributes (i.e. occupancy time and headway) of the passing vehicles. We should underscore here a first challenge of this case study, namely that of using the last-named simple attributes to dare to tackle a challenging car classification endeavor. As will be seen in the following sections and sub-sections (see e.g. Fig. 3.1), the classification task under these last mentioned conditions does face strong overlaps and imbalance amongst the classes. It is well known from the classification related state-of-the-art Sun et al. [2006] that a reliable, robust, low-cost and ultrafast classification under such hard conditions, if successful, should/must be assessed/considered to be close to the cutting-edge. Therefore, an extensive benchmarking with a selection of the best representatives of the related state-of-the-art is undertaken in this case study to demonstrate the superiority of the novel concept.

3.1.3

Classification in presence of imbalanced and overlapping classes

In the ATMS context, class imbalance and overlapping is a very common problem due amongst other things to the time-varying quality of the produced sensor data, which are a↵ected by the time-varying surrounding physical conditions (e.g. weather, traffic, and lighting conditions) and to the significantly higher number of majority objects (i.e. normal passenger cars) against a smaller number of a minority of others objects (i.e. trucklike vehicles). In Fig. 3.1 the strong class overlapping amongst the classes is illustrated; hereby di↵erent feature pairs are considered for for illustrative purposes (more details on the features used are provided in Section 3.4 below). And in Fig. 3.2 the strong imbalance between the four classes car at green (GC), truck at green (GT)), truck at red (RT) and car at red (RC) is well illustrated.

3.2

Related works for truck detection

In this section, we briefly survey the relevant state-of-the-art, whereby we do thereby show possible limitations of those concepts from literature in view of the current task under investigation in this chapter. In the last couple of decades, several sensor types have been developed for vehicle detection. In conjunction with their development, relevant techniques for vehicle classification were proposed. One research team investigated the use of acoustic signals for vehicle classification Nooralahiyan et al. [1997a,b]. Other researchers proposed a vision-based vehicle classification technique Nooralahiyan et al. [1997b]; Zangenehpour et al. [2015]. However, these techniques (acoustic and vision) rely on signals whose accuracy is highly sensitive to environmental conditions and the traffic

22

3. Related works for truck detection

state. Range sensors can be used for vehicle classification, as presented in Harlow and Peng [2001]. However, the utilization of such sensors is still rare, especially in urban junctions. 1

1 Car at red light Truck at red light 0.8

0.6

0.6

0.4

0.4

0.2

0.2

x3

x2

Car at red light Truck at red light 0.8

0

0

−0.2

−0.2

−0.4

−0.4

−0.6

−0.6

−0.8

−0.8

−1 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

−1 −1

1

−0.8

−0.6

−0.4

−0.2

x1

0

0.2

0.4

0.6

(a) 1 Car at red light Truck at red light

Car at red light Truck at red light

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

x3

x4

1

(b)

1

0

0

−0.2

−0.2

−0.4

−0.4

−0.6

−0.6

−0.8

−0.8

−1 −1

0.8

x1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x1

−1 −1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x2

(c)

(d)

Figure 3.1: A Scatter Plot of Truck-like vs Passenger Car Samples at Red light. x1, x2 ,x3 ,x4 represent the four proposed features, see Section 3.4 The vehicle speed is an efficient feature that has been widely employed in vehicle classification. In the case of a dual loop, the vehicle speed can be measured easily, as proposed by Wu and Coifman [2014]. In this method, they estimate both vehicle length and speed by using two loop detectors separated by a small distance. However, the utilization of dual loops is limited by the scope of the vehicle detector, especially in urban areas. Single loop detectors are more used for the purpose of traffic detection. Nevertheless, the accurate estimation of the vehicle speed is difficult. A group of researchers have used the mean or the median vehicle speed instead of the direct speed measurement Coifman and Ergueta [2003]; Wang and L.NANCY [2004]. A more recent work has proposed to estimate the individual loop length by using the actuation data. Coifman and Kim [2009]. The drawback of this method, as Coifman et al. addressed, is its limitation in the case of congestion. In 2013, Jean et al. used the features of the Haar wavelet combined with

23

3. Related works for truck detection

GC

GT

RC

RT

700

63.000% 525

%

350

175

19.000% 14.000% 4.000%

0

Figure 3.2: Class Imbalance between car at green (GC), truck at green (GT), truck at red (RT) and car at red (RC) a KNN clustering method while using a single inductive loop detector. Despite the high accuracy (94 %) reached by this method, the concept requires a high amount of data, which results in a complex data acquisition and related complex data processing Jeng et al. [2013]. An artificial neural network (ANN) based concept using analog loop detector signals has also been presented as a robust vehicle classifier Ki and Baik [2006]. In 2010, Meta et al. proposed an improved ANN classifier, with a pre-filtering of analog signals and features enhancement based on the principle component analysis (PCA) Meta and Cinsdikici [2010]. The use of these pre-cited methods is, however, explicitly limited to inductive loop detectors (ILDs) placed on highways. However, one should hereby notice that the traffic dynamics on a highway is di↵erent from that at a city road junction; there is generally no red light phase on the highway. In addition, the pre-cited concepts generally do not consider the correlation between the target presence signal with both the respective successor and predecessor signals. In 2014, Liu and Sun proposed a vehicle classification method by considering both the signal occupation time and the time-gap between consecutive vehicles as efficient features Liu and Sun [2014]. However, such features alone do and did not sufficiently allow an efficient separation of vehicle classes, an issue that is mainly addressed in this thesis.

24

3. The proposed truck detection process

3.3

The proposed truck detection process

In this section, we present our novel concept for a robust truck detection at a city road junction with traffic signals. Four distinct cases will be identified with respect to the vehicle type (car or truck-like) and the vehicle’s dynamic state (as either in green traffic signal phase, i.e. the vehicle is passing over the detector, or in red traffic signal phase, i.e. the vehicle has stopped over the detector).

CLASSIFIER 1 (distinguishes between a STOPindication and the absence of any hindrance)

CLASSIFIER 2 (distinguishes between passenger cars and truck-like vehicles in the STOP-situation)

Class 1: PASSENGER CAR in the STOP-situation

CLASSIFIER 3 (distinguishes between passenger cars and truck-like vehicles in the non-STOP-situation)

Class 2: TRUCK-LIKE VEHICLE in the STOP-situation

Class 3: PASSENGER CAR in the non-STOP-situation

Class 2: TRUCK-LIKE VEHICLE in the non-STOP-situation

Figure 3.3: The three classifiers scheme These four classes are depicted in Fig. 3.4. We propose three classifiers, the first of which checks whether a vehicle is either in the red or in the green traffic signal phase. Then, two successive classifiers will check whether the vehicle is either a passenger car or a truck-like vehicle. The various classification situations are shown in Fig. 3.3 which shows how the three above classifiers do determine the appropriate classification situation.

3.4

Features selection and extraction

Identifying appropriate features that can be used within a given classification process is always a challenging task. Inappropriate features may lead to overlapped and lowseparable classes. Since this case study deals with event-based square signals, selecting features is a difficult task Pohjalainen et al. [2015]. Because the only available data are: the occupancy duration of a vehicle passing over an inductive loop, and the corresponding time-stamps. Therefore, high-e↵ective features with a high separability are difficult to be selected. However, by considering three consecutive vehicle signals, a total of four features are determined to realize the classification task.

25

3. First derivatives f d and second derivatives sd

traffic light indicating „Stop“

traffic light indicating „Stop“

truck-like vehicle

passenger car driving direction

driving direction

single inductive loop

single inductive loop

(a)

(b)

traffic light indicating „Pass“

traffic light indicating „Pass“

truck-like vehicle

passenger car driving direction

driving direction

single inductive loop

single inductive loop

(c)

(d)

Figure 3.4: Figures (a) and (b) present the first two classes and show a passenger car and a truck-like-vehicle respectively which is standing still while being in front of a stop sign and over an inductive loop (red traffic light phase). Figures (c) and (d) present the third and fourth classes, which are a passenger car and a truck-like vehicle passing over an inductive loop without any hindrance at their free-flow speed (green traffic light phase)

3.4.1

Vehicle occupancy time occtar

The occupancy reflects the duration it takes a vehicle to pass through the inductive loop area. The occupancy duration is proportional to the length of the passing vehicle, which results in long occupation durations for trucks, buses or caravans and short occupation duration for passenger cars. Long occupancy durations also results from a vehicle which stands still directly within the inductive loop area such as a vehicle waiting in front of a red light traffic signal.

3.4.2

First derivatives f d and second derivatives sd

These two features are coupled in one sub-section because they are correlated. The idea behinds these two features is to measure the change in occupancy durations between three consecutive vehicles passing through the same detector; see Fig. 3.5. In order to do that, the first and second derivatives of occupancy time at the target signal are calculated. The formulas given in (3.1) and (3.2) determine the first derivative and the second derivative

26

8 7 6 5 4 3 2 1

Occupancy Duration

Occupancy Duration

3. Linear divergence ld

α2

α1

0

1

2

3

4

8 7 6 5 4 3 2 1 0

5

α2

α1

1

2

3

4

5

Vehicle Order of Occurrence

Vehicle Order of Occurrence

driving direction

driving direction

single inductive loop

single inductive loop

(a)

(b)

Figure 3.5: Figure (a) shows the occupancy di↵erence pattern of a car-in-the-middle pattern. Figure (b) shows the occupancy di↵erence pattern of a truck-in-the-middle pattern. a1 and a2 indicate the related foreword and backward slopes which are correlated to the first and second derivatives. using the finite di↵erence approximation Thomas [1995]. fd =

sd =

occtar tstar

occpred tspred

occpred 2occtar + occsucc (tstar tspred ) ⇤ (tssucc tstar )

(3.1)

(3.2)

where occtar , occpred and occsucc indicate the occupancy time of the target, predecessor and successor vehicles respectively, tstar , tspred and tssucc indicate the time-stamp of target, predecessor and successor vehicles respectively.

3.4.3

Linear divergence ld

This feature measures the occupancy time orthogonal divergence of the target vehicle from the interpolation line between the successor and predecessor occupancy; see Fig. 3.6 and Fig. 3.7. The formula given in (3.3) determines the value of this feature: ld = occtar where occ [ tar =

occsucc

occ [ tar occpred

(3.3)

(3.4) 2 is the predicted occupancy time, lays in the middle of the connected line between the successor and predecessor occupancy.

27

8 7 6 5 4 3 2 1

Occupancy Duration

Occupancy Duration

3. The novel cellular neural networks based classification concept

0

1

2

3

4

5

8 7 6 5 4 3 2 1 0

1

2

Vehicle Order of Occurrence

3

4

5

Vehicle Order of Occurrence

driving direction

driving direction

single inductive loop

single inductive loop

(a)

(b)

Occupancy Duration

Figure 3.6: Figure (a) shows the occupancy time orthogonal divergence ld of a car in the middle pattern. Figures (b) shows the ld of a truck in the middle pattern.

13 12 11 10 9 8 7 6 5 4 3 2 1

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

Vehicle Order of Occurrence

Figure 3.7: Linear divergence scenario of a truck-like (bigger ld values) versus passenger car (smaller ld values).

3.5

The novel cellular neural networks based classification concept

In Section 2.3.2 we addressed two main drawbacks of classical CNN in case of classification and prediction. In the following we address the proposed adaption of the CNN model to overcome these drawbacks: • The input and output spaces do in traditional CNN architectures generally have identical dimensions Chua and Roska [2002]. To overcome this first drawback, we propose a novel CNN architecture in which input and output spaces do possibly have non-equal dimensions and are located in various layers. Further an inner/hidden n⇥n CNN layer whose dimension is fully independent of both input and output layers is introduced. Through this inner layer, we do ensure to always profit from the

28

3. The novel cellular neural networks based classification concept

multiple-interconnected-cells property. At the same time, the inner/hidden CNN layer is flexible, allows and can work with di↵erent sizes of input and output dimensions. Further, the new architecture includes, in addition to the ordinary feedback found in traditional CNN architectures, a global feed-back from the output layer to all CNN cells in the inner layer. This new feedback link maintains and ensures the oscillatory property of CNN. The new architecture is illustrated in Fig. 3.8.

• In the traditional CNN model, the single nonlinear part is only the output part, we overcome that by introducing nonlinear state-input links (see Fig. 3.8). The nonlinearity is realized by using multiple Gaussian Radial Basis Functions (GRBF) Yee and Haykin [1995]. GRBF is a nonparametric, universal function approximator Park and Sandberg [1991] that can be used in high-dimensional and arbitrarily distributed data. Since GRBS is a distance-based model, the dependency between adjacent points is greater than the dependency between far-away points, a property that is very useful in the case of overlapped data Wendland [2006]. Consequently, our new CNN architecture has the following properties: • It consists of an input layer, an output layer and one or more hidden layers. • The input layer size depends on the current problem related number of inputs (in the case studied in this case study we have four inputs from four attributes). • The output layer size depends on the current problem number of output (in the case studied in this case study we have two outputs for two classes). We call each cell in the output layer by a global output to di↵erentiate it form the hidden cell output. • The hidden layer is a locally connected cellular neural network layer. Like regular multi-layer perceptron neural network, there is no defined rule that specifies the required number of hidden cells and the complexity depends on the studied problem. Two key parameters do describe this layer: a) the size in terms of number of cells ro ⇥ co; b) and the neighborhood size (or region of interest) around any given cell (see green colored cells in Fig. 3.8). • Each input is connected to all hidden cells through both linear and nonlinear links. • Each global output cell has a feed-forward connection from all hidden cells. • And there is a feedback link from each global output to all hidden cells. • The inputs are time invariant. • The global outputs and the hidden cells outputs oscillate from an initial condition to target fixed points.

29

3. The novel cellular neural networks based classification concept

Input layer

CNN Layer

∑ Tansig



-1

1 0

-2

Tansig 0



2

+



1 0

-2



0

2



GRB

Output layer

Figure 3.8: The novel Soft Radial Cellular Neural Network SRB-CNN architecture suggested in this case study According to the properties addressed above, the hidden cells state equation (of 3 ⇥ 3 CNN) is given by dxi,j (t) = dt

xi,j (t) +

1 1 P P

ai,j,k,l yi+k,j+l (t)+

k= 1 l= 1

+

g P

n X

k= 1

di,j,k vk (t) +

bi,j,k uk +

n X s X

ci,j,k,l rk,l (3.5)

k=1 l=1

i,j

k=1

where n and g are respectively the number of inputs and the number of global outputs; Ai,j = (ai,j, 1, 1 . . . ai,j,1,1 ) is the feedback template; Bi,j = (bi,j . . . bi,j,n ) is the control template; Ci,j = (ci,j,k,l . . . ci,j,n,s ) is the nonlinear control template; Di,j = (di,j,k . . . di,j,g ) is the global feedback template, i,j is the (i, j)th cell bias; xi,j (t) is the actual state of the (i, j)th cell; 2 yi,j (t) = tansig(xi,j ) = 1 (3.6) 1 + e 2xi,j (t) is a hyperbolic tangent sigmoid transfer function Vogl et al. [1988] and uj is the jth external input. The term rk,l is the GRBF given as rk,l = e

30

kuk

uˆl k 2 2 l

(3.7)

3. SRB-CNN templates design

where uˆl is the lth GRBF center, l defines the lth GRBF width and s is the number of basis functions Yee and Haykin [1995]. And last but not least ⌫k (t) = log

sigmoid( = 1+e

ro P

ro P co P

!i,j,k yi,j (t) + "k )

(3.8)

i=1 j=1 1 co P

i=1 j=1

(!i,j,k yi,j (t))+"k

is the global output function where ro ⇥ co is the size of the inner CNN layer and thus, it represents the total number of CNN cells, !i,j,k is the feed-forward templates that connect the (i, j)cell with the kth output and "k is an output bias.

3.5.1

SRB-CNN templates design

Recently, various methods have been proposed for CNN template design. They are divided in two classes. The first one is the gradient-based class of methods. In this category, Mirzai et al. (1998) proposed a template design technique based on back-propagation Mirzai et al. [1998]. The second group is based on evolutionary computation. In this last group, genetic algorithm is one of the most commonly used optimization methods to design CNN templates Fasih et al. [2008]; SanthoshKumar et al. [2007]. Another more recent technique is Particle Swarm Optimization (PSO) Su et al. [2011]; Wei and Billings [2008]. So far, all proposed training concepts for CNN template design are supervised methods. In SRB-CNN, the existence of the Radial Basis Functions split the training process into two phases. The first one is the unsupervised phase in which the RBF centers and RBF widths are estimated. The second phase is then a regular supervised training phase in which the CNN weights or templates are trained. In the first phase, we use the KMeans clustering method Lloyd [2006] to estimate the GRBF centers. Then, we estimate the related GRBF width by using the heuristic process proposed by Saha and Keeler [1990]. Thereby, the width of the lth basis function is estimated with respect to the Euclidean distance between its center uˆl and the nearest neighbor center uˆj weighted by a recommended overlap constant r = 1 such as l

= r.kuˆl

uˆj k

(3.9)

In the second phase, we train the SRB-CNN using PSO. The basic idea behind PSO is to make the optimizer rely on the collaboration between many particles, and thereby creating the so-called swarm. These particles search through di↵erent directions around the searching space in order to find the optimum solution. PSO is composed of a number of iterations. At each iteration it, the particle p has a position vector Pp and a

31

3. SRB-CNN templates design

velocity vector Vp . The velocity vector updates the particle position to a new direction. This direction is influenced by the best position the related particle has reached and the best position among all particles Kennedy and Eberhart [1995]. This update process is performed using the following equations Vp,it = $Vp,it

1

+

1 rd1 (Lp,it 1

Pp,it 1 ) +

Pp,it = Pp,it

1

2 rd2 (Lglobal,it 1

+ Vp,it

Pp,it 1 )

(3.10)

(3.11)

where rd1 , rd2 are uniform random numbers in the range [0,1], 1 and 2 are the acceleration factors Shi and Eberhart [1998] , $ is the inertia weight, Lp,it 1 is the best position of the particle p, Lglobal,it 1 is the best position among all particles. In SRB-CNN, the position of particle p at iteration it is given by Pp,it = [(A1,1 . . . Aro,co )p,it . . . (B1,1 . . . Bro,co )p,it (C1,1 . . . Cro,co )p,it . . . (D1,1 . . . Dro,co )p,it (

1,1,1

...

ro,co,g )p,it

(3.12)

. . . ("1 . . . "g )p,it ]

Various published works have proposed many modifications of the original PSO in order to enhance its performance. In this thesis, we use both the time-varying inertia weight Shi and Eberhart [1999], see (3.13), and time- varying acceleration factors PSO versions Ratnaweera et al. [2004] see (3.14)(3.15). $2 ) ⇥

$ = ($1

1

=(

1,f

1,i )

M AXIT ER it + $2 M AXIT ER



it + M AXIT ER

1,i

(3.13)

(3.14)

it + 2,i (3.15) M AXIT ER where M AXIT ER is the maximum allowed number of iterations. The pairs [$1 , $2 ], [ 1,f , 1,i ] and [ 2,f , 2,i ] represent the range of inertia weights, first acceleration factor and second acceleration factor respectively. Then, the used SRB-CNN optimization process is inspired by Fornarelli and Giaquinto [2009]; Su et al. [2011] and is illustrated in Fig. 3.10 and the related block diagram in Fig. 3.9 Finally, the corresponding fitness function is given by g ns X X G(Pp,it ) = (⌫i,j ⌫ˆi,j (T ))2 (3.16) 2

=(

2,f

2,i )



i=1 j=1

32

3. Our novel Soft Radial Basis Cellular Neural Network

+

SBR-CNN

-

+

PSO Figure 3.9: The SRB-CNN template optimization process: general principle. u is the input, ⌫ is the actual output, ⌫ˆ(T ) is the SRB-CNN output at the equilibrium where g is the number of outputs, ns is the number of training samples; ⌫i,j is the desired ith output of the jth sample; ⌫ˆi,j is the SRB-CNN ith output of the sample jth and T is the specified time for SRB-CNN to converge at an equilibrium.

3.5.2

Our novel soft radial basis cellular neural network based concept

In this case of truck detection, we use SRB-CNN with the following properties: • Four input cells (as we do use four features). • Two outputs (as we do have two output classes for each classifier). • SRB-CNN outputs have a zero initial value. The desired target must oscillate from 0 and converge to 1 while the others must converge to 0. Fig. 3.11 shows an example in which one of the two classes is selected (Class 1). • The inner CNN layer has the size 5 ⇥ 5 • The interconnected neighborhood size 3 ⇥ 3 • Five radial basis centers per input

33

3. Our novel Soft Radial Basis Cellular Neural Network

START

Random Initialization of the population position and velocity

Evaluate the fitness function in (3-16)

Oscillate SRB-CNN fora duration T

Update the position and velocity using Eqs (3.10-3.11)

Determine the best local and global positions

Criteria is met?

Select the best global position(templates)

END

4

2

0

-2

-4

-6

-8 0

0.5

1

1.5

2

2.5

t

3

3.5

4

4.5

5

1

1

0.8

0.9

Global Output Oscillation

6

y1"""5;1"""5 Inner Cells Output Oscillation

x1"""5;1"""5 Inner Cells State Oscillation

Figure 3.10: The SRB-CNN template optimization process (PSO based): related block diagram

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

81 (Class1) 82 (Class2)

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0

0.5

1

1.5

2

t

(a)

(b)

2.5

3

3.5

4

4.5

5

t

(c)

Figure 3.11: Figure (a) shows the state oscillation of the 25 inner cells and their convergence at time t=5. Figure (b) shows the state output oscillation of the 25 inner cells and their convergence at time t=5. Figures(c) shows the corresponding oscillation of the global output cells (in this example)

34

3. Other classification methodology and competing concepts

3.6

Other classification methodology and competing concepts

In this section, we do describe a selection of some of the best-known classification concepts from literature, which will be benchmarked with our own novel CNN based classifier for the truck detection classification task using a single presence detector. We have selected the following classifiers: (a) support vector machine with radial basis function Vapnik et al. [1997], (b) artificial neural network based classifier Yegnanarayana [2009], (c) naive Bayes classifier Zhang [2004] and (d) decision tree classifier Safavian and Landgrebe [1991]. Each of these classifiers (Except SVM which is explained in 2) is briefly described in the following sub-sections:

3.6.1

Artificial neural network

Artificial Neural Networks are a computational tool that uses the idea of natural neurons that receive signals. The neuron is activated if the received signal is strong enough (i.e. it surpasses a particular threshold). A neural network consists of a combination of neurons where every neuron is considered as a system with many inputs and one output. The neuron has two phases: the training phase and the using phase. In the training phase, the neuron can be trained to be activated (or not) for particular input patterns. In the using mode, when a taught input pattern is identified at the input, its related output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the activation rule is used to determine whether to fire or not Yegnanarayana [2009] .

3.6.2

Naive bayes

Naive Bayes is based on Bayes’ Theorem, a theorem of probability theory originally stated by the Reverend Thomas Bayes. It can be seen as a way of understanding how the probability that a hypothesis is true is a↵ected by a new piece of evidence. Naive Bayes is used as a classifier for pattern recognition that assumes that the value of a particular feature is independent of the value of any other feature. An advantage of Naive Bayes is that, in contrast to ANN, Naive Bayes does not need an enormous amount of data to perform well as a classifier Zhang [2004].

3.6.3

Decision tree

A decision tree classifier is one of the early possible approaches that did convert look-up table rules into optimal decision trees. The major idea is to divide the complex decision into a combination of several simple decisions, assuming that the final solution obtained

35

3. Data collection

this way would be the optimal solution. There are di↵erent types of decision trees (ID3, C4.5, CART, SLIQ, SPRINT, BFTree) Safavian and Landgrebe [1991].

3.7

Data collection and preparation, training and classifiers implementation

In this Section we briefly discuss all issues related to the appropriate data preparation, cleaning and further important pre-processing for all benchmarked classifiers. Further, related training are shortly described before implementation related information is provided for all concerned classifiers.

3.7.1

Data collection

This section explains how the data were collected, cleaned and provided to all classifiers. In total six inductive loops from an Austrian city have been considered for collecting traffic data (real field data). All loops have the same geometric coordinates 3 ⇥ 4 meters but di↵erent relative spatial positions on the road as shown in Table 3.1. The speed of a vehicle passing over each loop is varying between 0 kph (vehicle stopped over the loop) and 70 kph. The traffic data of each loop has been collected over one hour. In order to verify the vehicles types (passenger, truck-like) video reference data were captured in parallel for each loop while collecting the inductive loop traffic data. The data verification has been done using the video reference data. After the data has been collected from each loop we have used the video-signal synchronization tool (see Fig. 3.12) in order to check the type of vehicles passing over the loops. Table 3.2 presents the total number of vehicles, number of cars and number of trucks at each loop. The number of cars and number of trucks are categorized based on the red and green traffic phases. After the raw data has been collected, for each observed sample, the four proposed features (occtar , f d, sd and ld) are extracted and labeled with their corresponding class (based on the video observations). Then we apply the following preprocessing steps on the resulting data Baron [2013]: 1) Outliers removal based on interquartile method 2) Data normalization to the range [0,1] We used the Weka data mining software Hall et al. [2009] to perform the preprocessing steps. The last step hereby is to split the data into three sets as follow: 50 % for training, 10 % for validation and 40 % for testing.

36

3. Data collection

Table 3.1: The di↵erent relative loop positions Loop

Sign

Description

1

Straight light

2

Right with traffic light

3

Left with traffic light

4

Straight and right with traffic light

5

Straight and left with traffic light

6

Straight

7

Right

8

Left

9

Straight and right

10

Straight and left

3.7.2

with

traffic

Features evaluation

Features evaluation is important to see how e↵ective the used features are for the intended classification task. The aim of this section is to prove that although the features quality may in some cases (see Classifier 3) be weak, our novel SRB-CNN still provides a very good performance. This extensive evaluation is done based on the involvement of the following three measures Kumar and Sree [2014]: • Information Gain (IG) Attribute Evaluation Novakovic [2009] • Gain Ration (GR) Attribute Evaluation Hall and Smith [1998] • Symmetrical Uncertainty (SU) Attribute Evaluation Gayathri [2014] All above named feature quality evaluation measures have been applied to the three used classifiers (see Fig. 3.3) and the results obtained have been illustrated in Table 3.3.

37

3. Data collection

Figure 3.12: The video-signal synchronization tool

Loop Number

Table 3.2: The selected data distribution Number of Number of Number of Number of Cars at Truck-line Truck-line Cars at Green at Green at Red Red Phase Phase Phase Phase

Total Number of Vehicles

1

424

228

160

40

852

2

190

110

13

7

320

3

80

120

7

4

211

4

356

186

104

66

712

5

310

193

22

23

548

6

681

0

197

0

878

7

105

0

9

0

114

8

50

0

4

0

54

9

311

0

98

0

409

10

286

0

15

0

301

Total

2793

837

629

140

4399

38

3. Appropriate training

Table 3.3: The features evaluation. The range is between 0 (bad) and 1 (perfect) Classifier1

Classifier Measure/

occtar f d

sd

Classifier2 ld

occtar f d

Classifier3

sd

ld

occtar f d

sd

ld

IG

0.79 0.70 0.65 0.54 0.84 0.76 0.65 0.54 0.14 0.06 0.04 0.08

GR

0.29 0.26 0.21 0.16 0.27 0.26 0.22 0.17 0.13 0.20 0.11 0.05

SU

0.42 0.38 0.31 0.28 0.41 0.38 0.31 0.28 0.15 0.12 0.05 0.11 Table 3.4: The K-Means configuration parameters

3.7.3

Parameter

Value

Number of Centers

5

Initialization Algorithm

K-Means++ Arthur and Vassilvitskii [2007]

Max Iteration

100

Distance Measure

Squared Euclidean distance

Appropriate training and implementation for SRB-CNN classifier

Our proposed SRB-CNN classifier is trained in two steps: • An unsupervised training phase does estimate the GRBF function parameters. Hereby, we do first use the K-Means clustering using Matlab 2013b MATLAB [2013] with the configuration parameters shown in Table 3.4. The clustering is done over all the four inputs data to select the corresponding radial basis center uˆj . Then we estimate the GRBF width l using(3.9). • The second SRB-CNN training stage involves a PSO based supervised learning. the SRB-CNN is trained using the algorithm schematically described in Fig. 3.9 and Fig. 3.10. The implementation of the training algorithms has been done using Matlab MATLAB [2013]. The di↵erential equation system describing the SRB-CNN is solved using the Adams-Bashforth-Moulton PECE Shampine and Gordon [1975] (ode113t in Matlab) solver with fixed step size (0.1) and the integration range from 0 to 5. The PSO configuration parameters used in the CNN training process are presented in Table 3.5.

39

3. Appropriate training

Table 3.5: The PSO configuration parameters Parameter

Value

Swarm Size

24

M AXIT ER

500

$1

0.9

$2

0.4

1,f

2.5

1,i

0.5

2,f

2.5

2,i

0.5

SRB-CNN oscillation range

[0,5]

SRB-CNN oscillation step size

0.1

Stopping Criteria

GT rainingSet (Pp,it )  1e

10 or GV alidationSet (Pp,it )  1e

Table 3.6: The Weka classifiers configuration parameters Classifier

Type

Parameters

RBSVM

C-SVC

KernelType= radial basis function, eps= 0.001, gamma= 0.0001

Nbayes

NaiveBayes -k

Using Kernel Estimator

Decision Tree

RandomForest

default

Multilayer Perceptron

hidden-neurons=20, hidden-layers=1, learningrate=0.1, momentum=0.2

ANN

3.7.4

Appropriate training and implementation for the other classifiers beside SRB-CNN

For all other classifiers that explained in Sections 2.5, 3.6.1, 3.6.2, 3.6.3. Weka data mining software is used Hall et al. [2009] to train and test them. In Weka, the dataset must be filled in a special Weka file format called ”ar↵”. In Table 3.6 , we present the type and configuration parameters for all the used classifiers in this case study.

40

4

3. Evaluation

SRB-CNN RBSVM NBayes Decision Tree ANN

3.8

Accuracy % 98.6 90.7 88.4

Table 3.7: Scenario 1 Evaluation Sensitivity Specificity Precision % % % 99.7 95.3 98.6 93.9 80.4 93.9 92.4 75.6 92.4

Recall % 99.7 93.9 92.4

F-measure % 99.1 93.9 92.4

97.3

98.7

92.7

97.8

98.7

98.2

97.0

98.7

91.7

97.5

98.7

98.1

Comparative performance evaluation of all considered classifiers

This section presents a summary of all extensive experiments. Basically we do use both ROC Curves Bradley [1997] and a list of metrics presented in Powers [2011] to show all facets of the respective performance of all considered classifiers for the task at table. Before discussing the various results obtained we should recall that there are three fully di↵erent basic scenarios for the detection task: Scenario 1: The related classifier needs only to decide whether the input signal is in green or red traffic signal phase. This classification task is relatively easy. Scenario 2: The traffic light is in red mode and all vehicles are stopping over the detector. This is the most challenging case that results in both highest class overlap and highest imbalance. Scenario 3: The traffic light is in green mode and all vehicles are moving. In this mode, the classification task is a bit easier, because the level of class overlap is smaller than in Scenario 2.

3.8.1

Evaluation of all considered classifiers

We use the following evaluation metrics: Receiver Operator Characteristic Curves (ROC) Bradley [1997], Accuracy, Sensitivity, Specificity, Recall and F-measure Powers [2011]. The evaluation is done for all the three classifiers using the testing dataset(out of sample) (The validation and training sets are used only at the considered classifiers learning process). As it is illustrated in Tables 3.7, 3.8 and 3.9, the SRB-CNN classifiers does significantly outperform the other ones. Furthermore, in the most challenging case of Red Traffic Light phase the classifier SRB-CNN shows a remarkable lead comparing to the other classifiers, which do poorly perform for this case just as random classifiers.

41

3. Evaluation

SRB-CNN RBSVM NBayes Decision Tree ANN

SRB-CNN RBSVM NBayes Decision Tree ANN

3.9

Accuracy % 98.1 94.5 92.5

Table 3.8: Scenario 2 Evaluation Sensitivity Specificity Precision % % % 98.5 91.8 99.4 96.1 73.5 97.9 94.6 67.9 97.2

Recall % 98.5 96.1 94.6

F-measure % 98.9 97.0 95.9

95.5

97.7

69.8

97.4

97.7

97.6

95.4

97.4

71.7

97.6

97.4

97.5

Recall % 98.0 73.5 36.7

F-measure % 97.0 73.5 49.3

Accuracy % 95.7 62.9 47.1

Table 3.9: Scenario 3 Evaluation Sensitivity Specificity Precision % % % 98.0 90.5 96.0 73.5 38.1 73.5 36.7 71.4 75.0

74.3

85.7

47.6

79.2

85.7

82.4

81.4

100.0

38.1

79.0

100.0

88.3

Chapter summary

Truck detection is a very important sensing task in truck management especially in the context of fine-optimizations of future high-precise adaptive traffic control concepts at urban road junctions. Thus, a reliable and low-cost concept for truck detection involving a single presence detector (presence detectors like inductive loops are the most pervasive ones in traffic management) is particularly interesting in view of real cost constraints faced by traffic systems operators. This chapter has presented a comprehensive concept of a robust and reliable truck detection by using solely one single presence sensor (e.g. an inductive loop, but also any other presence sensor) at a signalized traffic junction. Hereby, two operations modes are distinguished: (a) during green traffic lights phases, and (b) a much challenging case, during red traffic lights phases. First, it is shown how difficult the underlying classification task is due to strongly overlapped classes, which cannot be easily separated by simple hyper-planes. Then, a novel cellular neural/nonlinear networks (CNN) based concept is developed, validated and extensively benchmarked with a selection of the best representatives of the current related state-of-the-art classification concepts (namely support vector machines with radial basis function, artificial neural network, naive bayes, and decision trees). All competing classifiers use the same features and the superiority of the novel CNN based concept is thereby underscored, as it strongly outperforms the other ones.

42

3. Evaluation

This novel CNN based concept does satisfactorily fulfill the hard industrial requirements regarding robustness, low-cost, high processing speed, low memory consumption and the capability to be deployed in low cost embedded systems. It is should be noticed that the novel concept of this chapter is applicable to any type of presence detector independently of the involved technology, however so long the pulse occupancy time is proportional to vehicle length. This chapter does also demonstrate the strong capability of the CNN based classifier to cope with challenging classification tasks facing strong class overlap as well as class imbalance. In a subsequent general and theoretical work the SRB-CNN classifier presented here will be extensively evaluated using publicly available reference data bases for benchmarking regarding class overlap and/or class imbalance.

43

Chapter 4 Case Study 2: Drop detection In the previous chapter, the classical CNN has been adapted and extended in order to to better fit and solve classification problems. Challenges Chl.CO, Chl.LG and Chl.CI are solved by combing the good features of CNN and RBFs. The combination is realized by adding more nonlinear terms to the CNN state equation, which models nonlinearly the relation between states and inputs. The nonlinearly modeling is done using multiple Gaussian Radial Basis Functions (GRBF). In this chapter, the nonlinear coupling between the states and inputs of a CNN processor system is introduced in a di↵erent way. Multiple GRBF functions are still used, however the localization of the corresponding centers is done by involving the radial basis support vector machine (RBFSVM) method. In other words, the GRBF centers represent the support vectors, which maximize the margin between the studied classes. In addition to the three solved challenges by SRB-CNN, by integration SVM, the learning complexity is also solved (Chl.LC) since SVM is usually trained using Quadratic Programming Cottle et al. [2009], a very efficient method to overcome local-global minimum optimization problem. This new CNN system is evaluated on a special case-study related to the detection of rain-drops from images of a car front window (context of ADAS driver assistance systems). It is a classification problem under difficult conditions. The results of this case-study have been already published in the Journal of Real-Time Image Processing Al Machot et al. [2016]. In the following sections, we present a brief description of the concept along with the quintessence of the obtained results. Further details can be found in the published paper; see the following web link: ( http://link.springer.com/article/10.1007/s11554-016-0569-z). Essentially, this chapter just presents a comprehensive summary of this paper.

44

4. Use of support vectors as the CNN control templates

4.1

The modification of CNN though using RBFSVM (radial basis support vector machines

This section focuses on the use of support vectors as CNN templates. It is especially explained, why it is correct to use the support vectors as template and the right time to use it is further investigated. Furthermore, the conditions that are pre-requisite to this use are also enumerated. These are important features needed in advanced driver assistance systems (ADAS). ADAS systems play the role amongst others of providing the driver with important information concerning the environmental situation around the vehicle.

4.1.1

Use of support vectors as the CNN control templates

In traditional approaches of CNN-based image processing, two techniques are used to design the CNN templates. The first technique follows optimization-based supervised learning using both input and related output samples Chandler et al. [1999]. In the second technique, the CNN control templates represent a predetermined image-processing filtering template (e.g., Averaging smoothing template or Sobel template), while the associated feedback template has got zeros but except for the center element which represents the nonlinear self-feedback. In this approach, the intention is not to define new imageprocessing filters; but it is rather to accelerate the image-processing task (by using its predefined kernel in the form of a CNN control template) via a CNN processor. In this case, the acceleration is done due to the inherently parallel paradigm of CNN, which can therefore operate much faster Chua and Roska [2002]. Furthermore, in the field of signal/image processing, the term Kernel is a function that estimates the correlation or the similarity between two variables. Following the same strategy, the kernel used in the SVM-based visual object detection Chua and Roska [2002] also looks for the similarity between the inputs and the estimated support vectors using the convolution methods as well. This fact was the inspiration for the use of support vectors as centers for the radial basis functions in the CNN model. This last described method is diferent from the one used in SRB-CNN (soft radial basis CNN), which uses a K-Means clustering method to determine the centers. In SVM, the support vectors are located in a way that maximize the margin between the classes in precense. According to many studies Vapnik et al. [1997] it improves the universality of the corresponding classifier. The SVM decision function is expressed as follows: ! m X f (u) = sgn(fsvm (u)) = sgn ↵i Li K(ui , u) + b (4.1) i=1

Where ui is the ith support vector, m is the number of support vectors, x is the feature

45

4. The use-cases for using support vectors as CNN templates

vector, ↵i is the lagrange multiplier coefficient, K(.) is the kernel function (in this chapter, we use the Radial Basis kernel), Li is the class label (raindrops or non-raindrops) and b is a bias. After (4.1) is trained and the support vectors are located. Then the related 3 ⇥ 3 neighborhood SVM-CNN state equation is given by dxi,j (t) = dt

xi,j (t) +

1 1 P P

ai,j,k,l yi+k,j+l (t) + fsvm (ui,j (t)) +

i,j

(4.2)

k= 1 l= 1

Where Ai,j = (ai,j, 1, 1 . . . ai,j,1,1 ) is the feedback template; fsvm (u) is the SVM decision function defined in Eq.(4.1), i,j is the (i, j)th cell bias; xi,j (t) is the actual state of the (i, j)th cell.

4.1.2

The use-cases for using support vectors as CNN templates

In the field of image processing, the pixels of an image can usually be categorized into several distinct classes. Consequently, if we have a set of images in which the pixels have been assigned to specific class labels, we can extract corresponding image features that help to distinguish between pixels of di↵erent classes. Therefore, in any visual object detection scenario, where possible features are known(e.g.: edge strength, pixel intensity, or color-based features), our proposed approach can be used e↵ectively for such classification cases.

4.1.3

General description of the proposed approach for raindrops detection

There are two phases, namely the following: • The o↵-line phase, where the support vectors are located during the construction phase of models to classify the raindrops. • The online phase that involves the detection in real-time of the raindrops. In both above listed phases, a pre-processing step consisting of image filtering and image enhancement is done. In the o↵-line phase, the CNN templates are produced from the support vectors. And in the online phase the real-time raindrop classification is performed by using a CNN processor system. 4.1.3.1

Pre-processing phase

A pre-processing step is essential for ensuring a robust image analysis (in this case drop detection) in the computer vision system. In all two phases of the raindrop detection process image filtering and image enhancement are used for contrast improvement. The role of image filtering is the removal of any weather related/induced noise, whereas image

46

4. General description of the proposed approach

Image Plan

CNN

-1 Saturion



+



1 0

-2

0

2

∑ RBF-SVM

Figure 4.1: The RBF-SVM-CNN architecture as suggested in this case study. In the image plan, each pixel contains the extracted feature from the corresponding original image pixel. The red pixel is the tested pixel, the green pixels are the 3 ⇥ 3 neighbors. CNN plan contains of the CNN states/pixels. As it shown, the tested pixel in the CNN plan (in red) is connected to all 3 ⇥ 3 neighbors in the image plan. A graphical representation of the RBF-SVM-CNN is also presented, in which u1 . . . un represent the pixels on the image features plan and y1 . . . yn represent the corresponding CNN outputs. enhancement does improve the contract and again an additional reduction of noise. These used filtering are as follow: • A CNN based contrast enhancement and noise removal technique Gacs´adi et al. [2005] • A CNN based median filter to remove the rain dots(dots are relatively small comparing to drops) from the image Karacs et al. [2010]. 4.1.3.2

SVM training in the o↵-line phase

The first stage consists of capturing/collecting images in rainy weather situations. An emphasis should be made on the high image quality. Its resolution needs to be of 510 ⇥ 585 pixels. Then follows the extraction of many single raindrops of di↵ent sizes. A complete bounding box surrounding a raindrop is extracted, which is eventually stored into a separate file. The non-raindrop images are required to be formed for the establishment

47

4. Experimental setup and results obtained

of sets of negative classes to be used in the training of the SVM classification. Various features are picked from the raindrops, and those features are characterized by di↵erences in the neutrality of the highway/road background. The features extracted are then used in the training of the support vector machine. These features are: • The edge features, which represents the distances between the center and the contours of the raindrops Kotoulas and Andreadis [2005]. • The color features represented by the V component of Hue, Saturation and Value color space (HSV) Acharya and Ray [2005]. • The histograms of oriented gradient (HOG) features Dalal and Triggs [2005]. • Wavelets features Tong et al. [2004]. 4.1.3.3

Online rain drops detection

Here, Eq. (4.2) is used. Figure 4.1 presents the related architecture. In which, the image plan contains the image pixels. Each pixel has the value of its corresponding feature. The pixel in red is the tested pixel. In this figure a 3 ⇥ 3 CNN is used. All the green neighbors are considered in the decision of the red pixel. A graphical representation of Eq (4.2) is also presented. The inputs u1 . . . un contains the features value of the 3 ⇥ 3 neighboring pixels. The outputs y1 . . . yn contains the decision values of the 3 ⇥ 3 neighboring pixels. Binary images that cause a segmentation of the raindrops as foregrounds result of this phase. Furthermore, the pixels are discarded as backgrounds. The indexes of the raindrops could be used in the reconstruction of the original color image.

4.2

Experimental setup and results obtained

In this section, a summary of the experimental results obtained is presented. The image acquisition is obtained through a digital camera located below the rearview mirror inside a real car. The number of captured images is 315 (510 ⇥ 585 pixels for each) containing (7054 raindrops) were acquired during di↵erent rainy weather traffic conditions. A total of 189 images has been taken into account for training and 126 for testing. All raindrops have been annotated by using GIMP 2.6.8 R .

4.2.1

Performance and evaluation

As this case-study is a classification problem, the following evaluation metrics are used: sensitivity, specificity and accuracy ( Hand [2009]). The evaluation is done based on di↵erent combinations of the proposed features. Table 4.1 shows the related classification

48

4. Chapter summary

Figure 4.2: The raindrops detection results of various experiments, where our SVM-CNN method has been using di↵erent sets of features. performances beside the computation time averages of processing each image (on a GPU platform). Table 4.1: Performance measures and the average computation times (510 ⇥ 585 pixels on GPU) of HSV gradient, HOG, HSV & HOG and edge features. Feature

Specificity Sensitivity Accuracy Process time in s

HOG

98.68%

94.32%

98.25%

0.565

HSV gradient

98.79%

67.60%

95.66%

0.488

HSV & HOG

98.75%

65.44%

96.1%

0.684

Wavelet

97.90%

40.45

93.23%

0.676

Edge features

96.83%

56.43%

93.54%

0.563

A visual presentation of the classification performance is shown in Fig. 4.2, in which, the tested image is found in the first column. The other columns present the detected drops using di↵erent features combination. A conclusion made after analyzing both Table 4.1 and Fig. 4.2 is that the use of HOG features gives the best classification performance with an accuracy value of 98.25%.

4.3

Chapter summary

In this chapter, we have introduced a novel SVM-CNN model, which combines the good universality feature of RBFSVM with the high processing speed of CNN in one platform.

49

4. Chapter summary

This novel proposed model has been evaluated on a raindrops detection case, which is an important application case for ADAS. Related works, which do generally not involve CNN, have been found either less-performing or do not sufficiently address the real-time constraints. In addition, most of the existing systems are not easy to implement on hardware. The proposed SVM-CNN concept shows very promising results obtained by processing a real field images. The details of the comprehensive work summarized in this chapter have been already published in the Journal of Real-Time Image processing; the paper can be found under the following web link: http://link.springer.com/article/ 10.1007/s11554-016-0569-z).

50

Chapter 5 Case Studies 3 & 4 : Time-series forecast This chapter presents a further extended and adapted version of the CNN model in order to fit and efficiently solve time-series forecasting (TSF) problems. TSF usually faces all the addressed six challenges. The contribution in this chapter is to first reduce the learning complexity, increase the nonlinearity modeling, cope with stochastic process and increase the universality of the model. To face that, a novel CNN-ESN system is developed. In this case, the high nonlinearity is achieved through the possible huge number of cells the ESN allows rather than using RBFs. Another aspect to be considered is the requested adaptive model to cope with time-varying systems which brings us to our proposed framework. This proposed framework does mimic the human mind’s biological two-thinking model. Our mind makes decisions/calculations using a two-connected system. A first system, System 1, the so-called the intuitive system, makes decisions based on our experience. A second system, System 2, the controller system, does control the decisions of System 1 by either modifying or trusting them. Similarly to the human mind’s two-systems model, this chapter proposes an artificial framework consisting of two cellular neural network (CNN) systems. The first CNN processor does represent the intuitive system and we call it Intuitive-CNN. The Second CNN processor does represent the controller system, which is called Controller-CNN. Both are connected within a general framework that we name OSA-CNN. The proposed framework is extensively tested, validated and benchmarked with the best state-of-the-art related methods, while involving real field time-series data. Multiple scenarios are considered: traffic flow data extracted from the PeMs traffic database Caltrans and the 111 time-series collected from the so-called NN3 competitions NN3. The novel OSA-CNN concept does remarkably highly outperform the state-of-the-art competing methods regarding both performance and universality. The work presented in this chapter has been submitted for publication (status: under review) to the journal IEEE Transactions on Neural Networks and Learning Systems.

51

5. Background and motivation

5.1

Background and motivation

The importance of time-series in science and technology is extremely high nowadays. It is a fact that TSF is widely utilized in several areas such as engineering, science, and finances Crone and Gra↵eille [2004]; Zhang and Kline [2007]; Zhang et al. [1998]. The high interest for time-series analysis and forecasting is motivated by the need for understanding and predicting the future of various technical, social, and natural systems. In this perspective several forecasting methods have been developed. Some of which rely on either linear or nonlinear models. One of these promising forecasting concepts do involve artificial neural networks (ANN). Over the past decades, time-series forecasting has been predominately implemented using statistical analysis based methods. One prominent example is the autoregressive integrated moving averages (ARIMA) Ahmed and Cook [1979]; Mills [1990] which has been used by scientists for several years. However, regarding both the emerging technologies and the high complexity of much recent data, artificial neural networks have been preferred and have become a serious processing alternative in time-series forecasting. The fact that artificial neural networks are characterized by a clearly superior performance in classification and regression problems in various practical domains has made them to be a preferred method for time-series forecasting Lv et al. [2015]. Forecasting is without any doubt a major challenge that characterizes most global data analysis. That is, an accurate and timely flow of information is needed for individuals or experts to make the relevant decisions regarding a certain phenomenon under observation. It can be easier to make a decision based data or knowledge on a future evolution of the system. However, bid data can be very challenging due to the uncertainty of the related variables Pijanowski et al. [2002]. Thus, time-series forecasting will be helpful for making a calculated decision and predicting the future trend of the variables. The phenomena that are used to produce time-series can sometimes be unknown, and the information available for forecasting is by large limited to the past values of the time-series Pereira Salazar et al. [2014]. Therefore, it is essential and vital to utilize the most relevant number of previous values, termed lag, when undertaking a time-series forecasting. This is the gap that ANN is mostly developed to close.

5.2

Related works

Compared to earlier works especially to some of the existing statistical forecasting methods, neural network approaches have proven several outstanding characteristics that include the following ones: (a) artificial neural networks can use both linear and nonlinear data; (b) artificial neural networks are non-parametric, that is, ANN does not need an explicit underlying model if compared to other forecasting methods; and (c) ANN is very

52

5. Background knowledge

flexible and universal and hence can be used for more complicated and complex data structures/models. The above listed characteristics make ANN to be regarded as the best current methods/concepts for time-series forecasting. Several empirical studies have shown a significantly superior performance of ANN over other statistical forecasting tools. The superiority measure/benchmarking has been mainly based on a single or a small set of time-series. A special and very prominent feature of ANN is its ability to combine both long and short-term forecasting approaches to provide a better system-model with highest performance potential Li and Yeh [2002]. A special type of ANN which is the recurrent neural network (RNN) has been a breakthrough in the scientific field related to time-series forecasting. Through recurrent neural networks, data analysis and forecasting has taken tremendous steps ahead, and hence enhanced scientific decision making has been enabled. However, RNN does also face several limitations, which have been revealed by several studies. For instance, according to Yan [2012], in some cases, ANN have depicted some inferiority regarding performance when compared to traditional statistical methods. The inconsistency of the performance of artificial neural networks, as revealed by several studies, is attributed to the inherent requirement of a sufficient number of training samples needed to insure that the ANN can be adequately trained. Also, several real-world applications occur during a short time period; there it becomes inadequate for artificial neural networks to reveal the underlying pattern and structure. This leads to poor prediction performance for RNNs. In this regards and in order to increase the universality performance of ANN Rahman (2016) Rahman et al. [2016] and Weizhong (2012) Yan [2012] suggested an ensemble architecture of multi layers neural networks MLP and generalized regression neural network (GRNN) respectively. Both presented a promising universality performance by fusing multi ANN predictors. Another trending type of ANN is the Echo state network (ESN) Ilies et al. [2007]; Jaeger [2001]; Lukoˇseviˇcius [2012] which benefts from its big size hidden layer to well model any complex nonlinear time series. It also very fast to be trained thanks to the random generation of hidden layer weights followed by a simple least square for the output layer.

5.3

Background knowledge

In this section, we briefly provide background information related to the four basic techniques/knowledge that are involved as building bricks in the design of the OSA-CNN (online self-adaptive cellular neural network) architecture concept which is later described in this chapter.

53

5. MRNNAC

5.3.1

The two-systems model of cognitive processes

The di↵erence between conscious and unconscious processes (of the human mind) in decision-making is one of the key questions for which social scientists have conducted many studies to find appropriate answers. In the 2011 best-selling book Thinking, Fast and Slow, the Nobel Prize winner Daniel Kahneman answered that question by concluding that there are two di↵erent systems the brain uses to form thoughts Kahneman [2011]. These two systems are the following: • System 1: is a fast, intuitive, unconscious and emotion-based decision-making system. • System 2: is a slow, conscious and calculation-based decision-making system. In other words, System 1 does involve both our intuition and past experiences. System 1 may give a correct decision if the related case is neither critical nor new to the system. If the case is critical, new or unexperienced by the brain, System 1 will have a high likelihood of arriving at a wrong decision. In this case, System 2 will interfere with and control System 1 in order to ensure a more accurate outcome. This model of thinking inspired us to build an artificial framework for time-series prediction that mimics the two-systems human psychological system. Our novel framework consists of two systems as well. Similarly to the human mind, System 1 of OSA-CNN predicts the future value of the time-series. Also, System 2 of OSA-CNN controls System 1 and drives it to deliver a better performance. More details about the OSA-CNN artificial framework will be addressed later in this chapter.

5.3.2

Model reference neural network adaptive control (MRNNAC)

Neural networks have been widely used in process control applications. One of the wellperforming architectures was proposed by Kasparian and Batur [1998]. They introduced what is called the Model Reference Neural Network Adaptive Control (MRNNAC). Their proposed architecture consists of two connected networks. The first network (a Process Model Network) does model the dynamical process of the plant, while the second network (a Controller Network) controls the first one and, thus, drives it to a target point. Both networks are generally trained using one of the popular dynamic neural network training techniques Kasparian et al. [1994] Davidon [1976]. In the following, we summarize/explain the configuration and later the training of both networks. • Process Model Network: – It is used to predict the next process output.

54

5. MRNNAC

Real Historical Values

-

Recent Prediction Error

Controller System

Control Action

Intuitive System

Predicted Values

Time Delay

Figure 5.1: The OSA-CNN Framework. The Intuitive-System takes the historical values to predict the future value of a time series. The Controller-System reads and compares the historical values with the corresponding prediction values and manipulate the IntuitiveSystem output to improve the future performance – The current and delayed plant input, along with the delayed output, are the inputs to this network. – A training set containing plant input-output measurements is used to train this network. • Controller Network: – It is used to drive the process plant to a target set-point following a reference model, which models a particular trajectory that the plant output should follow to converge to the targeted set-point. – The target set-points input (the reference value that the plant output should follow), along with the delayed output and inputs of the plant, are considered to be inputs to this network. – Once the process model network is trained, the controller network is trained by connecting it with the process model network. Then, a time-series of a variant set-point value is valued with its corresponding reference model to train the controller. In this training phase, only the controller network weights need to be learned while the process network weights are fixed from the first training phase (training of the proceed network).

55

5. OSA-CNN data preparation

5.4

OSA-CNN

OSA-CNN is built by integrating: a) a continuous-time nonlinear CNN oscillator system; b) the echo-state property and the easy training approach of ESN; and c) the theory of the two-systems’ model of thinking together into one framework. This combination of advantages makes OSA-CNN to be a precious architecture. As it is presented in Fig. 5.1, OSA-CNN has two connected systems, mimicing the Two-Systems model of human cognitive processes. System 1 is the controller model (Controller-System). System 2 is the process model (Intuitive-System). Both systems are integrated similarly to MRNNAC. The role of the Intuitive-System is to predict the future value of a time-series given its historical records. The role of the Controller-System is then to assess the recent prediction performance of the Intuitive-System by comparing it with the corresponding real values. Accordingly it controls the Intuitive-System to improve the future performance. In other words, the Controller-System drives the Intuitive-System in such a way that the recent prediction error is minimized. In the following, we address the data preparation needed for OSA-CNN along with a review of modeling details.

5.4.1

OSA-CNN data preparation

Given a time series v(t), the forecasting goal is to predict v(t) while considering the h historical values v(t k1 )...v(t kh ) as shown in Fig 5.2(a), where ki is a delay or lag. For modeling purposes, we treat the original and lagged series as separate quantities denoted by vk0 (t) and vk1 (t) to vkh (t) respectively. For modeling purposes, the original and lagged time-series are transformed into a series of quadratic pulses (i.e. step-functions) wich are delayed by one time-lag of the time series and which look as following (see Fig 5.2(b)): • We take each two samples of the original series, say vk0 (1) and vk0 (2), then we interpolate them with a square pulse that has the length ⇤ (interpolation size or the time-distance between two samples of the original time-series) which starts at vk0 (1) and ends at vk0 (2). • This square pulse has the constant value of vk0 (2) over all the interval of size ⇤ between lag-time (1) and lag-time (2). The aim of this interpolation process is to make the system (driven by the controller part) doing several prediction trials of the same target and eventually converges at the best possible prediction value. The resulted signals (i.e square pulses) are denoted by vsk0 (ti) and vsk1 (ti) to vskh (ti) respectively. Note that each of this signals has the length of ⇤. In the next step, as in MRNNAC, we do need to obtain the reference series of the pulses series. A reference series is the same as the pulses series; however the di↵erence is that the reference series consist of smooth steps instead of sharp square steps. The resulted

56

5. OSA-CNN data preparation

(a)

(b)

Figure 5.2: (a) An example of a time series vk0 (t) with two marked records vk0 (t) and vk0 (t 1) (b) The transformation of vk0 (t) into square pulses (red solid) and smooth reference (brown dotted) series which is constructed by solving ( 5.1) using the Matlab ode113 solver. It also shows the corresponding new position of vk0 (t) and vk0 (t 1) from (a) represented by vrk0 (ti) and vrk0 (ti ⇤) respectively, where ⇤ is the interpolation size or the number of interpolation points smooth path (see vrj (ti) in Equation ( 5.1)) mimics the desired path that the plant should follow. The transformation from squares into smooth steps is generally done by using the following reference model, which is very known in the related literature Kasparian et al. [1994]; Kingravi et al. [2012]; Rossomando et al. [2011]; Shuzhi et al. [1998]; Wei [2007]: d2 vrj (ti) = dti2

↵vrj (ti) +



vrj (ti) + vsj (ti) dti

(5.1)

where j = k0 to kh , representing the original and lagged square steps series; ↵, ⇢ and are the feedback, damping and control coefficients that characterize the shape of the desired smooth path.These coefficients are selected empirically. The selected values are presented in the evaluation Section 5.6. Equation ( 5.1) is a second order di↵erential equation that needs to be solved. Given a sequence/series of square pulses vsj (ti) and zero initial condition (at the very beginning oft he sequence) we solve ( 5.1) using the Matlab MATLAB [2013] Adams-Moulton solver Hairer et al. [2000] (ode113) solver with a variable step size between 0.05 and 0.1. Accordingly, the solution gives the reference smooth step series vrj (ti). Eventually, we get a set of two time-series which is used in the training and testing processes of OSA-CNN. These time series are: • The OSA-CNN input (square pulse) signals vsk1 (ti), . . . vskh (ti): the step series of the historical values that are used as inputs for the prediction model. • The OSA-CNN output (smooth pulse) signals vrk0 (ti): the reference series of the future values that need to be predicted. The output of OSA-CNN is denoted by vr ˆ k0 (ti). The task of OSA-CNN is to minimize the di↵erence between the input signals vsk1 (ti). . . . vskh (ti) and the history of the recent prediction values vr ˆ k1 (ti), . . . vr ˆ kh (ti).

57

5. OSA-CNN model

5.4.2

OSA-CNN model

As it is mentioned above, OSA-CNN consists of two connected systems the ControllerSystem and the Intuitive-System. Each one of this two systems consists of a hidden and output layer. The hidden layer is an ordinary cellular neural network as presented in Sub-section 2.3. The output layer is nothing but a linear layer. Accordingly, OSA-CNN consists of four connected layers: Controller CNN (C-CNN), Controller Linear Regression (C-LR), Intuitive CNN (I-CNN) and Intuitive Linear Regression (I-LR) models. Figure 5.3 presents these four layers. As it is shown, the C-CNN is fed by the h lag reference square-pulses vsk1 (ti)..vskh (ti) representing the recent real h values. It is also fed by the h lag predicted values vr ˆ k1 (ti)..vr ˆ kh (ti). The last input for C-CNN is the Controller-System output feedback u(ti). C-CNN feeds forward the linear layer C-LR which does a linear mapping (weighted sum) to the Controller-System output. The I-CNN has two inputs, first one is again the h lag reference square pulse vsk1 (ti)..vskh (ti) and the IntuitiveSystem output feedback vr ˆ k0 (ti). The I-CNN feeds forward the linear layer I-LR which does a weighted sum of the I-CNN outputs. In addition, the control action for C-LR to I-LR is realized as a nonlinear adaptive bias added to weighted sum in I-LR. In this way, the Controller-System adapts the Intuitive-System output by regulating the biases of the I-LR layer. In the following, we do address the modeling equation of both the first Controller-System and the Intuitive-CNN. Equations ( 5.2) and ( 5.3) ( 5.4) give the state, output models of C-CNN and C-LR respectively: dxc-cnn (ti) i = dti h X

xc-cnn (ti) i

+

n X

c-cnn ! c-cnn (ti) + 1i,j yj

(5.2)

j=1

! c-cnn 2i,j vskj (ti)

+

j=1

h X

! c-cnn ˆ kj (ti) + 3i,j vr

j=1

n X

c-cnn ! c-cnn 4i,j uj (ti) + "i

j=1

1 yic-cnn (ti) = (|xc-cnn + 1| 2 i u1 (ti) = u2 (ti) = : un (ti) =

n P

i=1 n P i=1 n P

i=1

|xc-cnn i

c-cnn ! c-lr (ti) + "c-lr 1,i yi 1 c-cnn ! c-lr (ti) + "c-lr 2,i yi 2

c-cnn ! c-lr (ti) + "c-lr n,i yi n

58

1|)

(5.3)

(5.4)

5. OSA-CNN model

Online Operating Mode Offline Training Mode

C-LR model

I-LR model

+ +

Linear

+ Linear

+

Linear

+ PSO

C-CNN

C-LR

I-CNN

-

I-LR

C-CNN cell state model

+

+

+

I-CNN cell state model



Satlins

+

-1



Satlins

-1

+

+

Figure 5.3: The OSA-CNN four layers model. Two modes are presented. The online operating (solid line connections) and o✏ine training (doted line) modes. In the operating mode. The first layer is the Controller CNN (C-CNN) connected to the Controller linear layer C-LR. The third layer is the Intuitive CNN (I-CNN) connected to the Intuitive linear layer I-LR. The controller output u(ti) adapts the I-LR biases with respect to the di↵erence between the real historical values vsk1 (ti)..vskh (ti) and the corresponding predicted values vr ˆ k1 (ti)..vr ˆ kh (ti). In addition, four blocks visualizing the models of CCNN cell state, I-CNN cell state, C-LR and I-LR. In the o✏ine training mode, PSO reads the prediction error and adapts accordingly the the C-LR weights

59

5. OSA-CNN learning

where xc-cnn (ti) and yic-cnn (ti) do represent respectively the system state and the output i of the C-CNN ith cell; ui (ti) is the ith C-LR output; n and h are respectively the number c-cnn of C-CNN cells and the number of input lags; W c-cnn = (! c-cnn 1 11,1 . . . ! 1n,n ) is the Cc-cnn c-cnn c-cnn CNN feedback template; W c-cnn = (! c-cnn = (! c-cnn 2 21,1 . . . ! 2n,h ), W 3 31,1 . . . ! 3n,h ) and c-lr c-cnn c-lr W c-cnn = (! c-cnn = (! c-lr 4 41,1 . . . ! 4n,n ) are the C-CNN control templates; W i i,1 . . . ! n,i ) is the ith C-LR template; "c-cnn = ("c-cnn . . . "c-cnn ) and "c-lr = ("c-lr . . . "c-lr 1 n 1 n ) are biases. While, Equations ( 5.5), ( 5.6) and ( 5.7) give the state, output models of the I-CNN and I-LR respectively: dxi-cnn (ti) i = dti h X

xi-cnn (ti) + i

n X

i-cnn ! i-cnn (ti) + 1i,j yj

(5.5)

j=1

i-cnn ! i-cnn vr ˆ k0 (ti) + "i-cnn 2i,j vskj (ti) + ! 3i i

j=1

1 yii-cnn (ti) = (|xi-cnn + 1| 2 i vr ˆ k0 (ti) =

n P

i=1

|xi-cnn i

1|)

i-cnn (! i-lr (ti) + ui (ti)) i yi

(5.6)

(5.7)

where xi-cnn (ti) and yii-cnn (ti) do represent respectively the system state, the output of i i-cnn the I-CNN ith cell; n is the number of I-CNN cells; W i-cnn = (! i-cnn 1 11,1 . . . ! 1n,n ) is the i-cnn i-cnn I-CNN feedback template; W i-cnn = (! i-cnn = (! i-cnn . . . ! i-cnn ) 2 21,1 . . . ! 2n,h ) and W 3 31 3n i-lr i-lr i-lr i-cnn are the I-CNN control templates; W = (! 1 . . . ! n ) is the I-LR template; " = ("i-cnn . . . "i-cnn ) is a bias. To solve the OSA-CNN states equations we also use the Mat1 n lab MATLAB [2013] Adams-Moulton solver Hairer et al. [2000] (ode113) solver with a variable step size between 0.05 and 0.1. In the next section we will provide more details about the OSA-CNN configuration and training.

5.5

OSA-CNN learning

The training of OSA-CNN is subjected to three main phases. First, the C-CNN and ICNN feedback and control templates and the biases are generated randomly. The random template generators must ensure the echo state and CNN stability properties. In the second step, the I-LR is trained using a regularized least square method. In the last phase, C-LR templates are trained using recursive particle swarm optimization (RPSO). In the following, we explain the three training phases.

60

5. Echo state cellular neural network

5.5.1

Echo state cellular neural network

Training a CNN means the determination through an appropriate optimization process of the feedback templates, the control templates, and the biases. So doing one does identify the CNN state equation. Classical groups of learning techniques (see Section 3.5.1) are time-consuming and may explode when dealing with the highly dimensional CNN (i.e. when CNN has a large number of cells). In ESN, high dimensionality is not an issue anymore due to the random generation of the state feedback templates. In OSA-CNN, we do use the same method as in ESN. Using the methods suggested by Lukoˇseviˇcius [2012], the feedback templates, control templates and biases of both C-CNN and I-CNN are generated randomly as described in the following: • W c-cnn (n ⇥ n matrix) and W i-cnn (n ⇥ n) are generated as normally distributed 1 1 sparse symmetric matrices with N(0, 1) and sparseness measure divided its own largest absolute eigenvalue. In this way, both conditions (sparsity and spectral radius