Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 262791, 7 pages http://dx.doi.org/10.1155/2014/262791
Research Article Implementation of a Tour Guide Robot System Using RFID Technology and Viterbi Algorithm-Based HMM for Speech Recognition Neng-Sheng Pai, Hua-Jui Kuang, Ting-Yuan Chang, Ying-Che Kuo, and Chun-Yuan Lai Department of Electrical Engineering, National Chin-Yi University of Technology, No. 57, Section 2, Zhongshan Road, Taiping District, Taichung 41170, Taiwan Correspondence should be addressed to Neng-Sheng Pai;
[email protected] Received 3 January 2014; Accepted 1 March 2014; Published 24 March 2014 Academic Editor: Her-Terng Yau Copyright © 2014 Neng-Sheng Pai et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. This paper applied speech recognition and RFID technologies to develop an omni-directional mobile robot into a robot with voice control and guide introduction functions. For speech recognition, the speech signals were captured by short-time processing. The speaker first recorded the isolated words for the robot to create speech database of specific speakers. After the speech pre-processing of this speech database, the feature parameters of cepstrum and delta-cepstrum were obtained using linear predictive coefficient (LPC). Then, the Hidden Markov Model (HMM) was used for model training of the speech database, and the Viterbi algorithm was used to find an optimal state sequence as the reference sample for speech recognition. The trained reference model was put into the industrial computer on the robot platform, and the user entered the isolated words to be tested. After processing by the same reference model and comparing with previous reference model, the path of the maximum total probability in various models found using the Viterbi algorithm in the recognition was the recognition result. Finally, the speech recognition and RFID systems were achieved in an actual environment to prove its feasibility and stability, and implemented into the omni-directional mobile robot.
1. Introduction For speech recognition, the dissimilarity between the signal characteristic values and the characteristic values in the database was calculated in early stages to identify the minimum difference as the recognition result. However, this method has a problem of poor recognition effect due to different talking speeds. Afterwards, some scholars proposed the dynamic time warping (DTW) to improve the recognition effect [1, 2]. This method assumes two speech signal segments to be compared, and the short-time feature parameters of the two segments of speech are extracted, namely, separated into a string of frames to determine a group of parameters from each frame. The comparison between two segments of speech is indeed the comparison between two sequence feature parameters. The DTW can adjust the speech length to reduce the errors in the speech time span. In the recognition
system following DTW, the Artificial Neural Network (ANN) and HMM algorithms were proposed. The ANN is a method often used in the artificial intelligence domain [3, 4]. The ANN does not need to know the mathematical model of the system in experimental data modeling and highly complex recognition of images, letters, or sound, so that it can replace system models. Knowing the type of outputs generated from the type of input could achieve good recognition effect after learning and repeated training. However, the ANN updates weights and bias iteratively, and the amount of calculation is large, so it consumes a lot of computer resources. The HMM uses probability model to describe the pronunciation in statistics [5–7]. A continuous state transition in Markov model can be regarded as the phonation of a short speech segment, namely, a string of connected HMM, which is the representative for segment of speech. The HMM is a method mostly used in speech
2
Mathematical Problems in Engineering
RFID tag
RFID tag
Wireless handset
RF User side 110 V
9V
RF
RFID reader
Bluetooth Industrial computer
RS232
Computer side
RS232
5V 24 V lead acid battery
Bluetooth dongle
USB 5V
Bluetooth serial adapter
Motor encoder
Transform to 5 V. 9 V. 24 V
5V
I/O
Power circuit
9V
Encoder board
PIC microcontroller
I/O 9V
5V
12 V lead acid battery
Ultrasonic seneor
12 C PIC microcontroller
I/O
24 V 500 W power inverter
RS232
PWM Motor driver
DC motor
Omnidirectional mobile robot side
Figure 1: Robot system hardware link.
recognition in recent years. This paper uses the HMM as the speech recognition core.
2. Hardware Design The direction of the voice controlled guide type omnidirectional mobile robot is controlled by voice, and the robot has the RFID guide system and infrared image tracking and ultrasonic obstacle avoidance functions [8]. All of the proposed robot systems are configured with three subsystems; they are omnidirectional mobile robot side, computer side, and user side and shown in Figure 1. The Peripheral Interface Controller (PIC) microcontroller is the core on the omnidirectional mobile robot side; its main function includes signal process of peripheral devices and motor drive control for three wheels. The computer side uses an industrial computer processing speech recognition calculation, RFID guide system, and infrared image tracking. The user side uses a wireless headset and the RFID active tag as voice control equipments.
3. HMM-Based Speech Recognition System 3.1. Preprocessing and Feature Parameter Extraction. The speech signals are preprocessed before speech recognition. The speech preprocessing contains sampling, frame, endpoint detection, preemphasis, and windowing. After the speech signal preprocessing, the characteristic feature parameters
are identified for subsequent recognition calculation. In this paper, the Linear Predictive Coefficient (LPC) is used to deduce the cepstrum and delta-cepstrum as the most important feature parameters. The concept of linear prediction originates from that the amplitude of a sampling point is related to the amplitude of an adjacent sampling point during pronunciation. If the postsampling sequence of speech signals is 𝑆(𝑛), the present sample of speech signal, that is, 𝑆(𝑛), is the sample values of ̂ is the predicted value of 𝑆(𝑛), since there must time 𝑛. If 𝑆(𝑛) be an error between the predicted value and the actual value, the predicted error can be expressed as 𝑒(𝑛), as follows: 𝑙
𝑒 (𝑛) = 𝑆 (𝑛) − 𝑆̂ (𝑛) = 𝑆 (𝑛) − ∑ 𝑎𝑗 𝑆 (𝑛 − 𝑗) ,
(1)
𝑗=1
where 𝑎𝑗 is the linear predictive coding and 𝑙 is the order number of linear prediction. The coefficient 𝑎𝑗 is adjusted; as long as the squared error value of (1) is minimized, an optimal linear predictive coefficient 𝑎𝑗 can be obtained. The autocorrelation is determined before solving the linear predictive coefficient, and then the wanted linear predictive coefficient is obtained from the obtained autocorrelation using the Durbin algorithm. After determining the LPC, the cepstrum coefficient is deduced from the LPC [9]. The cepstrum coefficient separates the vocal tract model from excitation signal, and it can calculate the vocal tract parameters more precisely, so as to control
Mathematical Problems in Engineering
3 Start RFID actives tag transmits signal
Whether RFID reader receives tag signal or not
No
Yes RFID reader continues detection
No
Tag ID recognition
Fixed tag is read
Introduce or not (speech recognition system)
Tag in user's hand is read
Enter verbal command on direction Speech recognition system
Yes Introduce the site
Robot movement control
End
Figure 2: System operation flow of voice controlled guide type omnidirectional mobile robot.
𝑗−1
𝑘 𝑐̂𝑗 = 𝑎𝑗 + ∑ (1 − ) ⋅ 𝑎𝑘 ⋅ 𝑐𝑗−𝑘 , 𝑗 𝑘=1
(1 < 𝑗 ≤ 𝑙) ,
𝑝
𝑘 𝑐̂𝑗 = ∑ (1 − ) ⋅ 𝑎𝑘 ⋅ 𝑐𝑗−𝑘 , 𝑗 𝑘=1
(𝑗 > 𝑙) . (2)
In a practical environment, the external noise influences the speech receiving, so that the tone in the spectrum is disturbed and distorted. The delta-cepstrum can reduce this noise effect. The delta-cepstrum parameter Δ̂𝑐𝑗 (𝑡) is shown in (3), where 𝜏 is the number of related former (−𝐾) or latter frames (𝐾). The cepstrum and delta-cepstrum parameters are to be used as feature parameters for recognition: Δ̂𝑐𝑗 (𝑡) = Figure 3: Omnidirectional mobile robot.
the speech spectrum characteristics. The cepstrum coefficient 𝑐̂𝑗 is determined from the linear predictive coefficient 𝑎𝑗 , where 𝑙 is the order number of linear prediction, shown as follows: 𝑐̂1 = 𝑎1 ,
𝑑̂𝑐𝑗 (𝑡) 𝑑𝑡
=
̂𝑗 (𝑡 + 𝜏) ∑𝐾 𝜏=−𝐾 𝜏 ⋅ 𝑐 2 ∑𝐾 𝜏=−𝐾 𝜏
.
(3)
3.2. HMM and Training Reference Model 3.2.1. Build the Initial Model. The states and frames are separated averagely from the audio part of a segment of speech according to the preset HMM state number, and the feature vectors in the frames are used to calculate the mean value 𝑀𝑗 and variance Var𝑗 , as shown in (4), where 𝐴 is a state of HMM, 𝑖 is the frame, 𝑗 is the feature parameter, 𝑇 is the number of frames in a state, and 𝑞 is the number of feature vectors of cepstrum and delta-cepstrum. This paper uses 15
4
Mathematical Problems in Engineering
(a)
(b)
(c)
(d)
Figure 4: User using speech to control robot to move forward and turn left. (a) User commands robot to move forward. (b) User commands robot to stop. (c) User commands robot to turn left. (d) Robot turns left and moves on.
cepstrum and 15 delta-cepstrum as characteristic values, and 𝑞 is 30:
𝑀𝑗 =
∑𝑇𝑖=1 𝐴 (𝑖, 𝑗) , 𝑇
1 ≤ 𝑗 ≤ 𝑞, (4)
2
∑𝑇 [𝐴 (𝑖, 𝑗)] 2 − (𝑀𝑗 ) , Var𝑗 = 𝑖=1 𝑇
1 ≤ 𝑗 ≤ 𝑞.
3.2.2. Viterbi Algorithm. In order to obtain the correct relationship between frame and HMM state more accurately, this paper uses a Gaussian probability function [10] to determine the similarity probability value of state and frame. A higher probability value indicates a higher similarity between the corresponding frame and the state, as shown in (5), where 𝐺𝑖 (𝑥𝑇 ) is the probability value of each state corresponding to its frame, 𝑑 is the feature vector dimension, 𝑥𝑇 is the feature vector, 𝜏𝑅𝑖 is the mean value of states, 𝑅𝑖 is the covariance
Mathematical Problems in Engineering
5
(a)
(b)
(c)
(d)
Figure 5: Robot guide experiment. (a) User commands robot to move forward. (b) Robot detects tag and asks user whether he needs any introduction to the place or not. (c) User says YES. (d) Robot plays video.
matrix of the density function, and 𝐺𝑖 (𝑥𝑇 ) is the probability value of similarity between the feature vector 𝑥𝑇 and state 𝑖: 𝐺𝑖 (𝑥𝑇 ) =
1 √(2𝜋) 𝑅𝑖 𝑑
1 𝑇 exp {− (𝑥𝑇 − 𝜏𝑅𝑖 ) 𝑅𝑖 −1 (𝑥𝑇 −𝜏𝑅𝑖 )} . 2 (5)
The HMM can be represented by 𝜆 = {𝜋, 𝐴, 𝐵, 𝑆, 𝑉} ,
(6)
where 𝑆 = {𝑠1 , 𝑠2 , . . . , 𝑠𝑁} is the state sequence, 𝑁 is the state number, 𝑉 is the observed results, 𝜋 = {𝜋𝑖 } is the initial state probability, 𝐴 = {𝑎𝑖𝑗 } is the state transition probability, 𝐵 = {𝑏𝑗 (𝑂𝑡 )} is the state observation probability, 𝑂𝑡 = {𝑂1 , 𝑂2 , . . . , 𝑂𝑇 } is the observation sequence, and 𝑇 is the sequence length. The Gaussian probability density function determines the probability value between frame and state. The HMM has many optional paths for state transition, and the path with the maximum total probability value among all possible paths is
6
Mathematical Problems in Engineering
required to be found. This paper uses the Viterbi algorithm [11, 12], as shown in (7)–(10), where 𝛿𝑖 (𝑖) is the probability of staying in state 𝑖 at time 𝑡. 𝜓𝑡 (𝑖) is the probability of reaching state 𝑖 at time 𝑡, 𝑝 is the final probability value of the Viterbi algorithm, and 𝑆𝑇 is the optimal state sequence. Step 1. Initializing 𝛿𝑡 (𝑖) = 𝜋𝑖 𝑏𝑖 (𝑜1 ) ,
1 ≤ 𝑖 ≤ 𝑁,
𝜓𝑡 (𝑖) = 0.
(7)
Step 2. Recursing 𝛿𝑡+1 (𝑗) = max [𝛿𝑡 (𝑖) ⋅ 𝑎𝑖𝑗 ] ⋅ 𝑏𝑗 (𝑜𝑡+1 ) 1≤𝑖≤𝑁
𝜓𝑡+1 (𝑗) = arg max [𝛿𝑡 (𝑖) ⋅ 𝑎𝑖𝑗 ] , 1≤𝑖≤𝑁
(8)
Table 1: Recognition rates for the speaker dependent and speaker independent. Speaker dependent Chun-Yuan Jian-Min Yi-Chung Wei Hung-Hui Average recognition rates Speaker independent Jason Ian Andy Momo Apple Average recognition rates
Recognition rates 96.7% 93.3% 90% 96.7% 93.3% 94% Recognition rates 66.7% 73.3% 90% 83.3% 50% 74.7%
1 ≤ 𝑡 ≤ 𝑇 − 1; 1 ≤ 𝑗 ≤ 𝑁.
4. Experiment Results
Step 3. Terminating 𝑝 = max [𝛿𝑡 (𝑖)] , 1≤𝑖≤𝑁
𝑆𝑇 = arg max [𝛿𝑡 (𝑖)] .
(9)
1≤𝑖≤𝑁
Step 4. Path backtracking 𝑆𝑡 = 𝜓𝑡+1 (𝑆𝑡+! ) ,
𝑡 = 𝑇 − 1, 𝑇 − 2, 𝑇 − 3, . . . 1.
(10)
3.2.3. Reevaluation. After the new relationship between state and frame is obtained using the Viterbi algorithm, the mean value and variance in old state are updated, and the Gaussian density function is used to determine the updated probability between state and frame again. The new total probability value is obtained using the Viterbi algorithm. The update continues until the maximum total probability value is converged, and this is the reference model after training. 3.3. Speech Recognition. The needed commands are trained into models, which serve as reference database of speech recognition. The feature parameters are determined according to previous procedure during recognition. The reference models of database are compared using the Viterbi algorithm to determine the probability value of each model and find the optimal state sequence. The time warping of speech signals is solved automatically when corresponding to a sequence of frames to the state sequence. The key point in the speech training procedure is to identify the correlation between frame and state. The relationship between frame and state should be updated by continuous path backtracking of Viterbi, until the path with the maximum total probability is determined. The most important step in the recognition procedure is to compare the reference models of training and obtain the maximum total probability value in reference models.
Figure 2 shows the system operation flow of the voice controlled guide type omnidirectional mobile robot. In the RFID guide system, the Reader captures Tag data and then attaches environmental information to the Tags of different ID codes or starts up the speech function. Figure 3 shows the picture of the proposed omnidirectional mobile robot. We place the robot in the actual environment and test various moving actions (forward, backward, turn left, turn right, stop, and turn back). The voice control of speaker dependent and speaker independent are tested by five users, respectively, and the experimental results of speech recognition rates are shown in Table 1. Figure 4 shows the experiment of the user using speech to control the robot to move forward and turn left. Figure 5 shows the user using speech to control the robot to move forward, receiving the Tag of the classroom when passing by the classroom, the user can use Yes or No to choose whether accessing detailed information on the site. The site is introduced in the video format, so that the user can get acquainted with the environment quickly.
5. Conclusions This paper used the HMM-based speech recognition method to complete a voice controlled guide type omnidirectional mobile robot. The first convenience of voice control is that the operation does not require manual operation, which makes the robot more user-friendly. The guide system based on RFID technology enables the users to know the information of an unfamiliar environment quickly. Finally, the robot movement experiment and the robot guide system experiment proved the feasibility and stability of this voice controlled guide type omnidirectional mobile robot.
Conflict of Interests The authors declare no conflict of interests.
Mathematical Problems in Engineering
Acknowledgment The financial support of this research by the National Science Council of Taiwan, under Grant no. NSC-100-2221-E-167-004 is greatly appreciated.
References [1] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43– 49, 1978. [2] C. Kim and K.-D. Seo, “Robust DTW-based recognition algorithm for hand-held consumer devices,” IEEE Transactions on Consumer Electronics, vol. 51, no. 2, pp. 699–709, 2005. [3] D. P. Morgan and C. L. Scofield, Eds., Neural Networks and Speech Processing, Kluwer Academic Publishers, 1991. [4] C.-F. Juang, C.-T. Chiou, and C.-L. Lai, “Hierarchical singletontype recurrent neural fuzzy networks for noisy speech recognition,” IEEE Transactions on Neural Networks, vol. 18, no. 3, pp. 833–843, 2007. [5] L. R. Rabiner, “A tutorial on hidden Markov Models and selected applications in speech recognition,” IEEE T Acoust Speech, vol. 77, pp. 257–286, 1978. [6] S. Yoshizawa, N. Wada, N. Hayasaka, and Y. Miyanaga, “Scalable architecture for word HMM-based speech recognition and VLSI implementation in complete system,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 53, no. 1, pp. 70– 77, 2006. [7] J.-H. Im and S.-Y. Lee, “Unified training of feature extractor and HMM classifier for speech recognition,” IEEE Signal Processing Letters, vol. 19, no. 2, pp. 111–114, 2012. [8] S. F. Huang, Design and Implementation of an Autonomous Following Omni-Directional Mobile Robot, National Digital Library of Theses and Dissertations, Taipei, Taiwan, 2008. [9] Y. Yuan, P. Zhao, and Q. Zhou, “Research of speaker recognition based on combination of LPCC and MFCC,” in Proceedings of the IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS ’10), pp. 765–767, Xiamen, China, October 2010. [10] L. Liu and J. He, “On the use of orthogonal GMM in speaker recognition,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’99), pp. 845–848, Phoenix, Ariz, USA, March 1999. [11] C. C. Wen, Ed., Multimedia Applications for Speech Recognition System, National Digital Library of Theses and Dissertations, Taipei, Taiwan, 2008. [12] D. F. Tseng, “Robust decoding for convolutionally coded systems impaired by memoryless impulsive noise,” IEEE Transactions on Communications, vol. 61, pp. 4640–4652, 2013.
7
Advances in
Operations Research Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Advances in
Decision Sciences Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Journal of
Applied Mathematics
Algebra
Hindawi Publishing Corporation http://www.hindawi.com
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Journal of
Probability and Statistics Volume 2014
The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
International Journal of
Differential Equations Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Volume 2014
Submit your manuscripts at http://www.hindawi.com International Journal of
Advances in
Combinatorics Hindawi Publishing Corporation http://www.hindawi.com
Mathematical Physics Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Journal of
Complex Analysis Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
International Journal of Mathematics and Mathematical Sciences
Mathematical Problems in Engineering
Journal of
Mathematics Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Discrete Mathematics
Journal of
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Discrete Dynamics in Nature and Society
Journal of
Function Spaces Hindawi Publishing Corporation http://www.hindawi.com
Abstract and Applied Analysis
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
International Journal of
Journal of
Stochastic Analysis
Optimization
Hindawi Publishing Corporation http://www.hindawi.com
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Volume 2014