Speech Prediction Using Higher Order Neural Networks - IEEE Xplore

Speech Prediction Using Higher Order Neural Networks Abir Jaafar Husssain, Ahmed J. Jameel, Dhiya Al-Jumeily and Rozaida Ghazali Ahlia University, Gosi Complex, First floor, PO. Box 10878, Kingdom of Bahrain Liverpool John Moores University, School of Computing and Mathematical Sciences, Byrom Street, Liverpool, L3 3AF, UK Faculty of Information Technology and Multimedia, Universiti Tun Hussein Onn Malaysia [email protected]

Abstract In this paper, we present the use of higher order neural networks for the prediction of speech signal. Various neural network structure have been used for our experiments these include the functional link neural network and pi-sigma neural network. Extensive experimentation is carried out to evaluate the performance of the higher order networks on the speech prediction platform. Simulation results indicated that the functional link network showed higher signal to noise ratio in comparison to the pisigma neural networks. Index Terms!Speech time series, Functional Link Network, High Order Neural Networks, and Pi sigma Network.

1. INTRODUCTION This paper investigates the application of Higher Order Neural Networks (HONNs) as forecasting tools to predict the nonlinear and nonstationary speech signals. We explore the prediction capability of two types of HONNs; Functional Link Network [1] and PiSigma Network [2]. The aim of this paper is to investigate the use of higher order neural network as speech time series predictor with a parsimony structure that can maintain a good generalization capability. Many successful researches have been carried out in the area of neural networks, unfortunately not all of these researches can be used in real time applications due to the size of neural network which can be so large that prevents the problem solution from being used in real world [6]. This is because an unsuitable network topology may increase the training time, and it is likely to decrease the generalization capability of the network itself [6], [7]. Selecting the optimum network structure is significantly important since undersized network will usually fail to approximate the underlying function,

978-1-4244-5700-7/10/$26.00 ©2009 IEEE

294

whereas the oversized networks will have a large number of weights and can lead to over-fitting, which resulted in unpromising generalization [8]. HONNs which have only one layer of trainable weights can help to speed up the training process. These types of neural networks have some advantages over Multi Layer Perceptron (MLP). They are simple in the architecture and require fewer numbers of weights to learn the underlying equation [1], [6], [9]. This will potentially reduce the number of required training parameters. As a result, they can learn faster since each iteration of the training procedure takes less time [10]. This makes them suitable for complex problem solving where the ability to retrain or adapt to new data in real time is critical [10]-[12]. HONNs are also needed because ordinary feedforward network cannot elude the problem of slow learning, especially when involving highly complex nonlinear problems [13]. High order terms or product units in HONNs can increase the information capacity of neural network in comparison to neural networks that utilise summation units only. The representational power of high order terms can help solving complex problems with construction of significantly smaller network whilst maintaining the fast learning [6].

2. The Networks A. Functional Link Network (FLN) FLN was first introduced by Giles and Maxwell [1]. It naturally extends the family of theoretical feedforward network structure by introducing nonlinearities in inputs patterns enhancements [14]. These enhancement nodes act as supplementary inputs to the network. FLN calculates the product of the network inputs at the input layer, while at the output layer the summation of the weighted inputs are calculated. FLN can use higher order correlations of the input components to perform nonlinear mappings using only a single layer of units. Since the architecture is simpler,

they suppose to reduce computational cost in the training stage, whilst maintaining good performance of approximation [15]. A single node in FLN model could receive information from more than one node by one weighted link. The higher order weights, which connect the high order terms of input products to the upper nodes have simulated the interaction among several weighted links. For that reason, FLN could greatly enhance the information capacity and more complicated data could be learned [10], [11], [15]-[17]. FLN can often learn quicker than conventional approach, although this is highly dependent on the specific problem and implementation design. Fei and Yu [18] have shown that FLN has a powerful approximation capability than that of conventional Backpropagation network, and it is a good model for system identification [11], [15], [16]. Cass and Radl [10] investigated FLN on process optimization. They found that FLN can train much faster than MLP without scarifying computational capability, which makes them more suitable in process modeling applications, where the ability to retrain or adapt to new data in real time is critical. Another research was constructed using FLN that responds invariantly under geometric transformations on the input space [17]. The model has the advantage of inherent invariance, and only learned the desired signal. Fig. 1 shows an example of third order FLN with three external (raw) inputs x1, x2, and x3 and four high order inputs which act as supplementary inputs to the network.

where ! is a nonlinear transfer function, and wo is the adjusted threshold. Unfortunately, FLN suffers from the explosion of weights which increase exponentially with the number of inputs. Therefore, normally up to second or third order networks are considered in practice [7], [19]. B. Pi-Sigma Network (PSN) PSN was introduced by Shin and Ghosh [2] to overcome the problem of weight explosion in FLN. It is a feedforward network with a single ‘hidden’ layer and product units in the output layer [20], [21]. PSN uses product of sum of the input components instead of the sum of products as in FLN.

Transfer function

F(y)

yi Output layer of product unit

#

Fixed weights h2

h1 "

hj "

…… …… ……

hK "

…… …… ……

Wkj i

X1

Xk XN ……… ……

"

Hidden layer of linear summing unit Adjustable weights Input layer

InFig. contrast 2. PSNtoofFLN, K-ththe number of free parameters in PSNorder increases linearly to the order of the network. For that reason, PSN can overcome the problem of weights explosion that occurs in FLN whilst still maintains the fast learning and powerful mapping capability of FLN. PSN is able to learn in a stable manner even with fairly large learning rates [20]. The use of linear summing units makes the convergence analysis of the learning rules for the PSN more accurate and tractable. Fig.1. FLN of order 3 Previous research found that it is a good model for various applications. Shin and Ghosh [22] investigated For a third order FLN, the output is determined as the applicability of PSN for shift, scale and rotation follows invariant pattern recognition. Results for both function approximation and classification were extremely (1) encouraging when compared to backpropagation for achieving similar quality of solution. Ghosh and Shin

295

[2] argued that PSN requires less memory (weights and nodes), and at least two orders of magnitude less number of computations when compared to MLP for similar performance level, and over a broad class of problems. Fig. 2 shows a Pi-Sigma Network with a single output. The output of the Pi-sigma Network is computed as follows:

learning rate was selected between 0.05 and 0.5 and the momentum term was experimentally selected between 0.4 and 0.9. Two sets of random weights initialization are employed which are in the range of [-0.5, 0.5] and [-1, 1]. The Functional Link Network and Pi-Sigma Network models are trained with order or degree from to five The prediction performance of our networks is evaluated using two statistical metrics as shown in TABLE II. TABLE II PERFORMANCE METRICS AN THEIR CALCULATIONS

where

are adjustable weights,

is the input

vector, K is the number of summing unit, N is number of input nodes, and is a nonlinear transfer function Both Functional Link Network and Pi-Sigma Network demonstrated competent ability to solve many scientific and engineering problems [10]-[12], [15], [17], however the networks have not been widely and extensively applied in financial time series forecasting.

Metrics

Calculations

NMSE

SNR

3. SPEECH TIME SERIES PREDICTION Time series forecasting takes an existing series of data and forecasts the next incoming values of the time series . Time series forecasting is the process of predicting future values using current values. The prediction of a time series is synonymous with modeling of the underlying physical mechanism responsible for its generation. Most of the physical signals encountered in practice have two distinct properties: nonlinearity, and nonstationary. The speech signal is an example of a physical time series that exhibits those two properties. It should be pointed out that the use of prediction plays a key role in the modeling and coding of speech signals [23]. The production of a speech signal is known to be the result of a dynamic process that is both nonlinear and nonstationary. In these experiments, a male voice counting from one to ten using Arabic language is used . The speech signal is divided into three data sets which are the training, the validation and the test data sets. Each network is trained by incremental backpropagation learning algorithm. Early stopping with maximum number of 3000 epochs was utilized. An average performance of 5 trials is used. The

296

n is the total number of data patterns and represent the actual and predicted output value

4. RESULTS In this section, the simulation results for the prediction of the speech signals will be presented using higher order neural networks. Table III shows the simulation results for the various neural network architectures, while Figure 3 demonstrates part of the predicted and the original speech signal.

Table III. SIMULATION RESULTS FOR THE PREDICTION OF THE SPEECH SIGNAL.

Network SNR(dB) NMSE

MLP 20.00 0.799

FLN 27 0.416

Although PSN is not a universal approximator, it can still learn the underlying problems relatively well even with smaller size of network when compared to MLP. The presence of only a single layer of adaptive weights in HONNs should allow fast and rapid training. However, results gained on the average epoch used do not clearly prove an evident on that. This is probably due to the characteristic of speech signal which exhibit dynamic behaviour over time and uneasy to learn. The results also showed that in most cases HONNs have a smaller network topology with a higher signal to noise ratio in comparison to MLP.

PSN 26.52 0.466

(a)

5. CONCLUSION In this paper the simulation results for the prediction of speech signal time series using two HONNs models (Functional Link Network and Pi-sigma network) are presented. Results obtained from the experiments showed that HONNs outperformed Multilayer Perceptron (MLP). The enhanced performance in the prediction of speech signal time series using HONNs is due to the network’s robustness caused by the reduced number of free weights compared with MLP. The parsimonious representation of high order terms in HONNs results in the ability to accurately model higher-order interactions for long-term forecasting. We notice that Ridge Polynomial Networks (RPNs) [9] are formed in a well regulated structure by adding different degrees of Pi-Sigma Networks. RPNs preserve all the advantages of Pi-Sigma Networks and they have the same approximation capability as that of ordinary multivariate polynomials [9]. Further work will investigate the use of Ridge Polynomial Network.

(b)

(c)

6. REFERENCES

Fig. 3. Part of the predicted and the original signal using (a) MLP (b) FLN and (c) PSN neural networks.

The simulation results demonstrated that the results of all models are comparable in terms of the SNR.

297

[1] C. L. Giles and T. Maxwell, “Learning, invariance and generalization in high-order neural networks,” in Applied Optics, vol. 26, no. 23, Optical Society of America, Washington D. C., pp. 4972-4978, 1987. [2] Y. Shin and J. Ghosh, “ The Pi-Sigma Networks : An Efficient Higher-order Neural Network for Pattern Classification and Function Approximation,” Proceedings of International Joint Conference on Neural Networks, Vol.1, pp.13-18, Seattle, Washington, July 1991. [3] C. L. Dunis & M. Wiliams, “Modeling and trading the UER/USD exchange rate : Do Neural Network models perform better?,” in Derivatives Use,

Trading and Regulation, Vol. No. 8, 3, pages 211239, 2002. [4] R. Schwaerzel, “Improving the prediction accuracy of time series by using multi-neural network systems and enhanced data preprocessing,” Thesis, Master of Science, The University of Texas at San Antonio, 1996. [5] E. A. Plummer, “Time series forecasting with feed-forward neural networks: Guidelines and limitations,” Master of Science in Computer Science, University of Wyoming, 2000. Downloaded from: http://www.karlbranting.net/papers/plummer/Paper _7_12_00.htm [6] L. R. Leerink, C. L. Giles, B. G. Horne, and M. A. Jabri, “Learning with product units,” in G. Tesaro, D. Touretzky, and T. Leen, editors, Advances in Neural Information Processing Systems 7, pp. 537-544, MIT Press, Cambridge, MA, 1995. [7] G. Thimm, “Optimization of High Order perceptron,” Swiss federal Institute of Technology (EPFL), 1995. [8] S. Lawrence and C. L. Giles, “Overfitting and Neural Networks : Conjugate Gradient and Backpropagation,” International Joint Conference on Neural Network, Italy, IEEE Computer Society, CA, pp 114-119, 2000. [9] Y. Shin and J. Ghosh , “Ridge Polynomial Networks,” IEEE Transactions on Neural Networks, Vol.6, No.3, pp.610-622, 1995. [10] R. Cass and B. Radl, “Adaptive Process Optimization Using Functional-Link Networks and Evolutionary Algorithm,” Control Eng. Practice, Vol. 4, No. 11, pp. 1579-1584, 1996. [11] Y. H. Pau and S. M. Phillips, “The Functional Link Net and Learning Optimal Control,” Neurocomputing 9, pp. 149-164, 1995. [12] E. Artyomov and O. Y. Pecht, “Modified Highorder neural Network for Pattern Recognition, “ Pattern Recognition Letters, 2004. [13] A. S. Chen and M. T. Leung, “Regression Neural Network for error correction in foreign exchange forecasting and trading,” Computers & Operations Research 31, pp. 1049-1068, 2004. [14] R. Durbin and D. E. Rumelhart, “Product Units: A Computationally Powerful and Biologically Plausible Extension to Back-propagation Networks,” Neural Computation, vol. 1, pp. 133142, 1989. [15] L. Mirea and T. Marcu, “System identification using Functional-Link Neural Networks with dynamic structure,” 15th Triennial World Congress, Barcelona, Spain, 2002. [16] M. Y. Chow and J. Teeter, “Reduced-Order Functional Link Neural Network for HVAC

298

Thermal System Identification and Modeling,” Department of Electrical and Computer Engineering, North Carolina State University, 1997. [17] C. L. Giles, R. D. Griffin, and T. Maxwell., “Encoding Geometric Invariances in HONN,” American Institute of Physics, pp. 310-309, 1998. [18] G. Fei and Y. L. Yu, “A modified Sigma-Pi BP Network with Self-feedback and its Application in Time Series Analysis,” Proceedings of the 5th International Conference, vol. 2243-508F, pp. 508-515, 1994. [19] T. Kaita, S. Tomita and J. Yamanaka, “On a Higher-order Neural Network for Distortion Invariant Pattern Recognition,” Pattern Recognition Letters 23, pp 977-984, 2002. [20] J. Ghosh and Y. Shin, “Efficient Higher-order Neural Networks for Function Approximation and Classification” Int. J. Neural Systems, vol. 3, no. 4, pp. 323-350, 1992 [21] Y. Shin and J. Ghosh, “Approximation of Multivariate Functions Using Ridge Polynomial Networks,” Proceedings of International Joint Conference on Neural Networks, vol. 2, pp. 380385, 1992. [22] Y. Shin and J. Ghosh, “Computationally efficient invariant pattern recognition with higher order PiSigma Networks,” The University of Texas at Austin, 1992. [23] S. Haykin and L. Lee, “Nonlinear Adaptive Prediction of Nonstationary Signals” IEEE Transactions on Signal Processing, 43 (2), 1995.