Hindawi Publishing Corporation Discrete Dynamics in Nature and Society Volume 2015, Article ID 826812, 8 pages http://dx.doi.org/10.1155/2015/826812
Research Article Learning Bounds of ERM Principle for Sequences of Time-Dependent Samples Mingchen Yao, Chao Zhang, and Wei Wu School of Mathematical Sciences, Dalian University of Technology, Dalian, Liaoning 116024, China Correspondence should be addressed to Chao Zhang;
[email protected] Received 17 July 2015; Revised 26 October 2015; Accepted 1 November 2015 Academic Editor: Juan R. Torregrosa Copyright Β© 2015 Mingchen Yao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Many generalization results in learning theory are established under the assumption that samples are independent and identically distributed (i.i.d.). However, numerous learning tasks in practical applications involve the time-dependent data. In this paper, we propose a theoretical framework to analyze the generalization performance of the empirical risk minimization (ERM) principle for sequences of time-dependent samples (TDS). In particular, we first present the generalization bound of ERM principle for TDS. By introducing some auxiliary quantities, we also give a further analysis of the generalization properties and the asymptotical behaviors of ERM principle for TDS.
1. Introduction Let X β RπΌ and Y β Rπ½ be an input space and the corresponding output space, respectively. Define ZflX Γ Y β Rπ with π = πΌ + π½. Many classical results of statistical learning theory are built under the assumption that samples {zπ }fl{(xπ , yπ )} are independently drawn from an identical distribution on Z, that is, the so-called i.i.d.-sample assumption, for example, [1β5]. However, the theoretical results sharing the i.i.d.-sample assumption may not be valid for (or cannot be directly used in) the non-i.i.d. scenario. There have been many research interests lying in the theoretical analysis of the learning processes for time-dependent samples. Zhang and Tao [6, 7] study the generalization properties of ERM-based learning processes for time-dependent samples drawn from LΒ΄evy processes and continues-time Markov chains, respectively. Zou et al. [8] establish exponential bound on the rate of relative uniform convergence for the ERM algorithm with dependent observations and derived the related generalization bounds with πΌ-mixing sequence. Jiang [9] introduces a triplex inequality based on Hoeffdingβs inequality (cf. [10]) and then obtains probability bounds for uniform deviations in a very general framework, which is suitable for the cases with unbounded loss functions and dependent samples. Zou et al. [11] present a novel Markov
sampling algorithm to generate uniformly ergodic Markov chain samples from a given dataset. Xu et al. [12] adopt the similar techniques of [11] to develop the error bound for an online SVM classification algorithm, which provides a higher learning performance than that of classical random sampling methods. In this paper, we are mainly concerned with the generalization performance of ERM principle for the sequences of time-dependent samples (TDS), which are independently and repeatedly observed from an undetermined stochastic process at the fixed time points. We can see that the generalization performance of such learning process is affected by the following factors: the number of independent observation sequences, time-dependence among samples in a sequence, the sample number in a sequence, and the observing time points. By introducing some auxiliary quantities, we propose a new framework to analyze the generalization properties of the learning process. In particular, we first show that the generalization bound of this learning process can be decomposed into four parts: Ξ¦1 , Ξ¦2 , Ξ¦3 , and Ξ¦4 , which are related to the aforementioned factors. We then analyze the properties of these quantities. By imposing some mild conditions, we further obtain the upper bounds of the first three types of quantities Ξ¦1 , Ξ¦2 , and Ξ¦3 , respectively. Finally,
2
Discrete Dynamics in Nature and Society
some techniques of statistical learning theory are applied to bound the quantity Ξ¦4 , for example, the uniform entry number, the relevant deviation, and the symmetrization inequalities for independent sequences of TDS. Different from previous works [6, 7], there is no specific assumption on the distribution of the stochastic process sampled from. In contrast, the previous works require that samples should be observed from LΒ΄evy processes and continuous-time Markov chains, respectively. Instead of the only Z-valued function class appearing in the previous works, we impose a function class consisting of functions evaluated on Z and the time interval [π1 , π2 ]. Moreover, the samples considered in previous works [6β8] are a series of data observed from one certain stochastic process, while this paper discusses the learning process based on quantities of independent sequences of TDS observed from a stochastic process at the fixed time points. Moreover, the works [9, 11, 12] only consider the case of one single sequence of TDS, while this paper studies the learning process for multiple sequences of TDS, where one sequence corresponds to one trajectory (sample path) of a stochastic process. Therefore, our results are more general than previous ones. The rest of paper is organized as follows. In Section 2, we first introduce some notions and notations used in the paper and then exhibit the decomposition of the generalization bounds of ERM principle for TDS. In Section 3, we bound the four auxiliary quantities Ξ¦1 , Ξ¦2 , Ξ¦3 , and Ξ¦4 and present the main results of the paper. The last section concludes the paper.
2. Problem Setup In this section, we formalize the main research issue of this paper and then show the decomposition of the generalization error of ERM principle for TDS. 2.1. ERM Principle and Generalization Bounds. For any π‘ β₯ 0, let xπ‘ β X and yπ‘ β Y be the π‘-time inputs and the corresponding outputs, respectively. Denote zπ‘ fl(xπ‘ , yπ‘ ) β Rπ (π‘ β₯ 0) and assume that {zπ‘ }π‘β₯0 is an undetermined stochastic process with a countable space. Consider a function class G β YXΓT consisting of functions evaluated at the input space X and the time interval T = [π1 , π2 ]. We would like to find a function πβ β G such that, for any input xπ‘ β X (π‘ β T), the corresponding output yπ‘ can be predicted as accurately as possible. A natural criterion to choose the function πβ is the lowest expected risk caused by some function in G: πβ = arg min πΈ (β β π) πβG
1 π2 β« β« β (π (xπ‘ , π‘) , yπ‘ ) πππ‘ ππ‘, πβG π π1
(1)
= arg min
where π(xπ‘ , π‘) is a function with respect to the π‘-time input xπ‘ and the time point π‘ (Here, xπ‘ in the function π(xπ‘ , π‘) is just an input value of the functional π(β
, π‘)), β β π is the composite
function β after π, and ππ‘ stands for the π‘-time distribution of the stochastic process {zπ‘ }π‘β₯0 on Z. However, the distribution of {zπ‘ }π‘β₯0 is unknown, and it is difficult to directly obtain the target function πβ by minimizing the expected risk πΈ(β β π). Instead, the empirical risk minimization (ERM) principle provides a solution scheme to this issue. For 1 β€ π β€ π, let [π] π S[π] π fl{zπ‘π }π=1 be the πth sequence of samples observed from a certain stochastic process {zπ‘ }π‘β₯0 at the fixed time points π1 β€ π‘1 < π‘2 < β
β
β
< π‘π β€ π2 and let the sequences S[π] π (1 β€ π β€ π) be independent of each other. The ERM principle aims to minimize the empirical risk over G: πΜ = arg min πΈΜ (β β π) πβG
1 π π , π‘π ) , yπ‘[π] ), β β β (π (xπ‘[π] π π πβG ππ π=1 π=1
(2)
= arg min
and the solution πΜ is regarded as an estimate to the expected π solution πβ with respect to the sample sequences {S[π] π }π=1 . For convenience, we further define the loss function class Ffl {(zπ‘ , π‘) σ³¨σ³¨β β (π (xπ‘ , π‘) , yπ‘ ) : π β G}
(3)
and call F the function class in the rest of this paper. Given π π sample sequences {S[π] π }π=1 , taking π = β β π provides the following brief notations of expected risk (1) and empirical risk (2): πΈπ = πΈ (β β π) =
1 π2 β« β« π (zπ‘ , π‘) πππ‘ ππ‘, π π1
(4)
Μ = πΈΜ (β β π) = πΈπ
1 π π β β π (z[π] π‘π , π‘π ) . ππ π=1 π=1
(5)
Similar to the classical statistical learning theory [13], the main issue of this paper is to discuss whether the empirical solution πΜ provided by ERM principle will perform as well as the expected solution πβ . Then, the supremum σ΅¨ Μ σ΅¨σ΅¨σ΅¨σ΅¨ , sup σ΅¨σ΅¨σ΅¨σ΅¨πΈπ β πΈπ (6) σ΅¨ πβF called the generalization bound of ERM principle for TDS π {S[π] π }π=1 , will play an important role in analyzing the generalization performance of the above learning process as well. 2.2. Relationship with Some Time-Dependent Problem. At the end of this section, we will show that many time-dependent problems can come down to the aforementioned learning process based on ERM principle, for example, the estimation of information channel and functional linear models. 2.2.1. Estimation of Information Channel. Since the functions in F are evaluated at both the real input space X and the time interval T, the framework proposed in the paper can describe the inherent characteristics of the time-dependent problems more precisely than the learning framework given in the previous works [6, 7], where the function class is only evaluated
Discrete Dynamics in Nature and Society
3
at X. Certainly, the time-dependent problem mentioned in [14], for example, the estimation of information channel, can also be included in this learning setting. Different from the previous one, the setting considered in this paper is also suitable for analyzing the performance of estimating dynamic information channel. The estimation of dynamic information channel is of the following model: yπ‘ = H(π‘)xπ‘ + n(π‘), where H(π‘) and n(π‘), changing status with time varying, are the channel matrix and the noise vector, respectively. The corresponding function class H is formalized as Hfl {(x, π‘) σ³¨σ³¨β H (π‘) x + n (π‘) : H (π‘) β Rπ½ Γ RπΌ , n (π‘) (7)
β Rπ½ } .
2.2.2. Functional Linear Models. Moreover, the functional data classification or regression with functional linear models is also in accordance with the learning setting mentioned in this paper. By denoting B (π‘)fl (π½ππ (π‘))π½ΓπΌ ,
(b) the other is y(π‘) = π(π‘) + β«Ξ B(π , π‘)x(π‘)ππ + π(π‘), π‘ which corresponds to the function class for any π‘ β T: G (β
, π‘) = {x (π‘) σ³¨σ³¨β π (π‘) + β« B (π , π‘) x (π‘) ππ Ξπ‘
+ π (π‘) : π½ππ (π , π‘) β R
TΓT
(12) T
; ππ (π‘) , ππ (π‘) β R } .
In the above models, the loss function β is usually selected as the mean square error function and then some functional data algorithms are used to find the function that minimizes empirical risk (2) over the function class F. We refer to [16] for more details on functional linear models. 2.3. Analysis of Generalization Performance. Different from the i.i.d. learning setting, regardless of the distribution characteristics of {zπ‘ }π‘β₯0 , the generalization performance of the aforementioned learning process is also affected by the following factors: (i) Time-dependence among samples in one sequence.
β€
π (π‘)fl (π1 (π‘) , . . . , ππ½ (π‘)) ,
(8)
β€
(ii) The number π of independent sequences of samples. (iii) The number π of samples in one sequence. (iv) The fixed time points π‘1 , . . . , π‘π, at which the samples zπ‘1 , . . . , zπ‘π are observed.
π (π‘)fl (π1 (π‘) , . . . , ππ½ (π‘)) , the following are the most frequently used functional linear models mentioned by [15]: (i) The model with a scalar input and a functional output: y(π‘) = B(π‘)x+π(π‘), which corresponds to the function class for any π‘ β T: G (β
, π‘) = {x σ³¨σ³¨β B (π‘) x + π (π‘) : π½ππ (π‘) , ππ (π‘) β RT } .
(9)
The main concern of this paper is to quantitatively analyze the influence to the generalization performance caused by these factors and then achieve the generalization bound of ERM principle for TDS. β } β [π1 , π2 ] as an Proposition 1. Denote Ξβ fl{π0β , π1β , . . . , ππ β β β ordered set with π1 = π0 < π1 < β
β
β
< ππ < π2 and Ξππβ flππβ β β (1 β€ π β€ π). If the set Ξβ achieves the supremum ππβ1
sup
(ii) The model with a functional input and a scalar output: y = π + β«Ξ B(π‘)x(π‘)ππ‘ + π, which corresponds to the π‘ function class for any π‘ β T:
πβF Ξβ[π1 ,π2 ]
σ΅¨σ΅¨ σ΅¨σ΅¨ 1 [π] β β β π (zπ‘π , π‘π )σ΅¨σ΅¨σ΅¨ , σ΅¨σ΅¨ ππ π=1 π=1 σ΅¨ π
G (β
, π‘) = {x (π‘) σ³¨σ³¨β π + β« B (π‘) x (π‘) ππ‘ + π : π½ππ (π‘) Ξπ‘
(10)
β RT ; π, π β Rπ½ } .
(iii) The two models with a functional input and a functional output: (a) one model is y(π‘) = π(π‘) + B(π‘)x(π‘) + π(π‘), which corresponds to the function class for any π‘ β T:
+ π (π‘) : π½ππ (π‘) , ππ (π‘) , ππ (π‘) β R } ;
(13)
π
then there holds that σ΅¨ Μ σ΅¨σ΅¨σ΅¨σ΅¨ β€ Ξ¦1 + Ξ¦2 + Ξ¦3 + Ξ¦4 , sup σ΅¨σ΅¨σ΅¨σ΅¨πΈπ β πΈπ σ΅¨ πβF
(14)
where the quantities Ξ¦1 , Ξ¦2 , Ξ¦3 , and Ξ¦4 are, respectively, defined by σ΅¨σ΅¨ Ξπβ σ΅¨ π σ΅¨σ΅¨ σ΅¨σ΅¨ 1 σ΅¨σ΅¨ σ΅¨ Ξ¦1 fl β (σ΅¨σ΅¨σ΅¨σ΅¨ π β σ΅¨σ΅¨σ΅¨σ΅¨ sup σ΅¨σ΅¨σ΅¨σ΅¨β« π (zππβ , ππβ ) ππππβ σ΅¨σ΅¨σ΅¨σ΅¨) ; π σ΅¨σ΅¨ πβF σ΅¨ σ΅¨ π=1 σ΅¨σ΅¨ π
(15)
Ξ¦2
G (β
, π‘) = {x (π‘) σ³¨σ³¨β π (π‘) + B (π‘) x (π‘) T
σ΅¨σ΅¨ π σ΅¨σ΅¨ Ξππ σ΅¨σ΅¨ β β« π (zππ , ππ ) ππππ σ΅¨σ΅¨ σ΅¨σ΅¨π=1 π
(11)
σ΅¨σ΅¨ 1 π σ΅¨σ΅¨σ΅¨ β σ΅¨σ΅¨σ΅¨β« π (zππβ , ππβ ) ππππβ β β« π (zπ‘π , ππβ ) πππ‘π σ΅¨σ΅¨σ΅¨σ΅¨ ; σ΅¨ πβF π π=1 σ΅¨
flsup
(16)
4
Discrete Dynamics in Nature and Society Ξ¦3 σ΅¨σ΅¨ 1 π σ΅¨σ΅¨σ΅¨ β σ΅¨σ΅¨σ΅¨β« π (zπ‘π , ππβ ) πππ‘π β β« π (zπ‘π , π‘π ) πππ‘π σ΅¨σ΅¨σ΅¨σ΅¨ ; σ΅¨ πβF π π=1 σ΅¨
flsup
(17)
Ξ¦4 σ΅¨σ΅¨ π σ΅¨σ΅¨ σ΅¨σ΅¨ 1 σ΅¨σ΅¨ 1 π π σ΅¨σ΅¨ . flsup σ΅¨σ΅¨σ΅¨ β β« ππ‘π (z) πππ‘π β ) β β ππ‘π (z[π] π‘ σ΅¨σ΅¨ π ππ π=1 π=1 σ΅¨σ΅¨ πβF σ΅¨σ΅¨σ΅¨ π π=1
(18)
This result implies that the behavior of the generalization bound can be described by using the summation of the quantities Ξ¦1 , Ξ¦2 , Ξ¦3 , and Ξ¦4 . In the next section, we will discuss the properties of the quantities Ξ¦1 , Ξ¦2 , Ξ¦3 , and Ξ¦4 , respectively.
Ξβ is not the unique one which can achieve supremum (13); then we define π σ΅¨σ΅¨ 1 σ΅¨σ΅¨σ΅¨ σ΅¨ Ξπ β fl arg min β σ΅¨σ΅¨σ΅¨ π β σ΅¨σ΅¨σ΅¨ . Ξπ σ΅¨ Ξ π σ΅¨σ΅¨ π=1 σ΅¨ π
(22)
β It can be observed that if the time points π1β , . . . , ππ β satisfy that the length of each Ξππ (1 β€ π β€ π) is equivalent, the sampling-time error Ξ¦1 is equal to zero. From β is the one the probabilistic perspective, it means that Ξπ that is closest to uniform distribution among the candidate sequences. In other words, more uniformly the time points β are distributed and the quantity Ξ¦1 is closer to zero. in Ξπ On the other hand, the uniform distribution of time points β implies that, for the stochastic process {zπ‘ }π‘β₯0 , π0β , π1β , . . . , ππ the distributions of zπ‘ are identical for all π‘ β€ 0, which is actually a probability distribution.
3. Analysis of Relevant Quantities As mentioned above, regardless of the distribution characteristics of {zπ‘ }π‘β₯0 , there are four factors affecting the generalization performance of the ERM learning process for TDS: time-dependence, the number of TDS sequences, the sample number, and the observing time points, which are actually related to the quantities Ξ¦1 , Ξ¦2 , Ξ¦3 , and Ξ¦4 , respectively. Moreover, two mild conditions will be imposed for the following discussion: (C1) Assume that there exists a constant πΆ1 such that, for all π‘ β T and π β F, there holds that β« π (zπ‘ , π‘) πππ‘ < πΆ1 .
(19)
(C2) Assume that each π β F is differentiable with respect to the time π‘ and there exists a constant πΆ2 such that, for all π‘ β₯ 0 and π β F, there holds that, for any z β Z and π‘ β T, ππ (z, π‘) β€ πΆ2 . ππ‘
(20)
The former requires that any function π β F should have the bounded expectation with respect to the distribution of {zπ‘ }π‘β₯0 at any time π‘ β T, and the latter implies that all functions in F should have bounded first-order partial derivative with respect to π‘. 3.1. Upper Bound of Quantity Ξ¦1 . Under Condition (C1) and recalling (15), we can bound the quantity Ξ¦1 as follows: σ΅¨σ΅¨ Ξπβ σ΅¨ 1 σ΅¨σ΅¨ σ΅¨ Ξ¦1 β€ πΆ1 β σ΅¨σ΅¨σ΅¨σ΅¨ π β σ΅¨σ΅¨σ΅¨σ΅¨ , π σ΅¨σ΅¨ π=1 σ΅¨σ΅¨ π π
(21)
which implies that Ξ¦1 is affected by the choice of the β } which achieves specific time sequence Ξβ = {π0β , π1β , . . . , ππ supremum (13). Note that it is possible that the time sequence
3.2. Upper Bound of Quantity Ξ¦2 . Denote Fπ‘ fl {π (β
, π‘) : π β F} , π·Fπ‘ (ππ‘ , ππ ) σ΅¨σ΅¨ σ΅¨σ΅¨ = sup σ΅¨σ΅¨σ΅¨σ΅¨β« π (zπ‘ , π‘) πππ‘ β β« π (zπ , π‘) πππ σ΅¨σ΅¨σ΅¨σ΅¨ , σ΅¨ π(β
,π‘)βF σ΅¨
(23)
π‘
which is the so-called integral probability metric (IPM) between the two distributions ππ‘ and ππ with respect to the function class Fπ‘ . IPM plays an important role in measuring the discrepancy between two distributions in probability theory, and we refer to [17, 18] for more details on IPM. According to (16), we then obtain that Ξ¦2 β€
1 π (π , π β ) . βπ· π π=1 Fππβ π‘π ππ
(24)
Particularly, according to (23), if the stochastic process {z}π‘β₯0 has an identical distribution at any time, the quantity Ξ¦2 is equal to zero. 3.3. Upper Bound of Quantity Ξ¦3 . According to (17), it follows from the condition of (C2) that Ξ¦3 σ΅¨σ΅¨ 1 π σ΅¨σ΅¨σ΅¨ β σ΅¨σ΅¨σ΅¨β« π (zπ‘π , ππβ ) πππ‘π β β« π (zπ‘π , π‘π ) πππ‘π σ΅¨σ΅¨σ΅¨σ΅¨ σ΅¨ πβF π π=1 σ΅¨
= sup
σ΅¨σ΅¨ 1 π σ΅¨σ΅¨σ΅¨σ΅¨ ππ σ΅¨ (zπ‘π , ππ ) (ππβ β π‘π ) ππ‘π σ΅¨σ΅¨σ΅¨ β σ΅¨σ΅¨β« σ΅¨σ΅¨ πβF π π=1 σ΅¨σ΅¨ ππ‘
= sup β€
(25)
πΆ2 π σ΅¨σ΅¨ β σ΅¨ β σ΅¨π β π‘π σ΅¨σ΅¨σ΅¨ , π π=1 σ΅¨ π
where ππ lies between ππβ and π‘π . According to (25), quantity Ξ¦3 can be bounded by the summation of the discrepancies between the supremum time points ππβ and the fixed observation time point π‘π . Recalling (13), the time points {ππβ }π π=1
Discrete Dynamics in Nature and Society
5
are determined by the selection of {π‘π }π π=1 . It is noteworthy that the time points π‘1 , . . . , π‘π usually cannot coincide with β . π1β , . . . , ππ 3.4. Upper Bound of Quantity Ξ¦4 . Similar to the classical statistical learning theory, the task to bound Ξ¦4 can be divided into three steps: complexity measure of function classes and the deviation and the symmetrization inequalities. Based on Azumaβs inequality, we will obtain a suitable deviation inequality for TDS and then the relevant symmetrization inequality. By using the uniform entropy number, we finally bound the quantity Ξ¦4 in the sense of probability. 3.4.1. Deviation Inequality. Let Sπ = {zπ‘π }π π=1 be the observation sequence of the stochastic process {zπ‘ }π‘β₯0 with a countable state space Z. By the way presented in [19], a filtration associated with Sπ can be built as follows: (i) Let Ξ©π be all subset of Zπ. (ii) Set Ξ©0 = {0, Zπ}. (iii) For any 1 β€ π β€ π β 1, let Ξ©π be the π-algebra generated by {zπ‘1 , . . . , zπ‘πβ1 }. Then, there naturally holds that Ξ©0 β Ξ© 1 β β
β
β
β Ξ© π .
(26)
According to Azumaβs inequality [19, 20], since the observa[π] tion sequences S[1] π , . . . , Sπ are independent of each other, we obtain the following result. Lemma 2. Given a function πΉ : Zπ β R, set πΉfl βπ π=1 π(zπ‘π , π‘π ) and for any 1 β€ π β€ π, define the associated martingale differences as follows: ππ (πΉ) = πΈ {πΉ (Sπ) | Ξ©π } β πΈ {πΉ (Sπ) | Ξ©πβ1 } .
(27)
Then, there holds that, for any π > 0, σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π σ΅¨σ΅¨ σ΅¨ Pr {σ΅¨σ΅¨σ΅¨ β πΉ (S[π] ) β πΈ {πΉ (S )} π σ΅¨σ΅¨ > π} π σ΅¨σ΅¨ π π=1 σ΅¨σ΅¨ σ΅¨ σ΅¨ ππ2 β€ 2 exp (β σ΅¨2 ) . π σ΅¨σ΅¨ 2 βπ=1 σ΅¨σ΅¨ππ (πΉ)σ΅¨σ΅¨σ΅¨
(28)
3.4.2. Symmetrization Inequality. Symmetrization inequalities are mainly used to replace the expected risk by an empirical risk computed on another sample set that is independent of the given sample set but has the identical distribution. In this manner, some kinds of complexity measures can be incorporated in the resulting bounds. For clarity of presentation, we give a useful notation for the following discussion. Given a sample sequence Sπ = σΈ σΈ π {zπ‘π }π π=1 , consider another sample sequence Sπ = {zπ‘π }π=1 σΈ independent of Sπ. If zπ‘π has the same distribution as zσΈ π‘π for any 1 β€ π β€ π, the sequence SσΈ π is called the ghost sample sequence of Sπ. The following result is a direct extension of the classical symmetrization results of Lemma 2 in [21].
Theorem 3. Assume that F is a function class consisting of the functions with the range [π, π] and {zπ‘ }π‘β₯0 is a stochastic π process with a countable state space Ξ© β RπΎ . Let {S[π] π }π=1 and σΈ [π] π {Sπ }π=1 be drawn from {zπ‘ }π‘β₯0 in the time interval [π1 , π2 ]. Then given any π > 0, one has, for any π β₯ 8(π β π)2 /(π)2 , σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π σ΅¨σ΅¨ σ΅¨ Pr {sup σ΅¨σ΅¨σ΅¨ β πΉ (S[π] ) β πΈ {πΉ (S )} π σ΅¨σ΅¨ > π} π σ΅¨σ΅¨ πβF σ΅¨σ΅¨σ΅¨ π π=1 σ΅¨ (29) σ΅¨σ΅¨ σ΅¨ π σ΅¨ π 1 σ΅¨σ΅¨σ΅¨σ΅¨ π σΈ [π] σ΅¨σ΅¨ β€ 2 Pr {sup σ΅¨σ΅¨ β πΉ (S[π] π ) β β πΉ (Sπ )σ΅¨σ΅¨σ΅¨ > } , σ΅¨σ΅¨ 2 πβF π σ΅¨σ΅¨σ΅¨π=1 π=1 π [π] where πΉ(S[π] π )fl βπ=1 π(zπ‘π , π‘π ).
3.4.3. Approximate Error Ξ¦4 . The following is the definition of the covering number and we refer to [22] for details. Definition 4. Let F be a function class and let π be a metric on F. For any π > 0, the covering number of F at radius π with respect to the metric π, denoted by N(F, π, π), is the minimum size of a cover of radius π. The uniform entropy number (UEN) is a variant of the covering number and we refer to [22] for details as well. By setting the metric βπ (Sπ) (π > 0), the UEN is defined as follows: ln Nπ (F, π, π) flsup ln N (F, π, βπ (Sπ)) . Z2π 1
(30)
Based on the uniform entropy number, the upper bound of Ξ¦4 can be obtained as follows. Theorem 5. Assume that F β YXΓT with T = [π1 , π2 ] is a function class such that, for any π β F and any π‘ β T, the functional π(β
; π‘) is bounded with the range [π, π]. Let {zπ‘ }π‘β₯0 be a stochastic process with a countable state space. Let π {S[π] π }π=1 be π independent observation sequences of {zπ‘ }π‘β₯0 at π the time points π‘1 , π‘2 , . . . , π‘π. Let {SσΈ [π] π }π=1 be the ghost samples [π] [π] σΈ [π] π of {S[π] π }π=1 and denote S2π fl{Sπ , Sπ } for any 1 β€ π β€ π. Then, given any π > 0, one has, for any π β₯ 8(π β π)2 /(π)2 , σ΅¨σ΅¨ π σ΅¨σ΅¨ 1 Pr {sup σ΅¨σ΅¨σ΅¨ β β« π (zπ‘π , π‘π ) πππ‘π πβF σ΅¨σ΅¨σ΅¨ π π=1 σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π π π σ΅¨ β , π‘ ) β β π (z[π] π σ΅¨σ΅¨ > π} β€ 8N1 (F, , π‘π σ΅¨σ΅¨ ππ π=1 π=1 8 σ΅¨ 2ππ) exp (β
(31)
ππ2 π2 σ΅¨σ΅¨2 ) . σ΅¨σ΅¨ 128 βπ π=1 σ΅¨σ΅¨ππ (πΉ)σ΅¨σ΅¨
Compared to the classical generalization results for i.i.d. samples (cf. Theorem 2.3 of [22]), time-dependent data are involved in the analysis of upper bound (31). This result is also different from the generalization bound of [7] because more than one sequence of time-dependent samples are
6
Discrete Dynamics in Nature and Society
investigated. Denote the right-hand side of (31) by π, and under notations of Theorem 5 we immediately obtain that, with probability of at least 1 β π, σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π sup σ΅¨σ΅¨σ΅¨πΈ {πΉ (Sπ)} β )σ΅¨σ΅¨σ΅¨ β πΉ (S[π] π σ΅¨ σ΅¨σ΅¨ π πβF σ΅¨σ΅¨ π=1 σ΅¨ β€β
σ΅¨σ΅¨2 σ΅¨σ΅¨ 128 (ln 8N1 (F, π/8, 2ππ) β ln π) βπ π=1 σ΅¨σ΅¨ππ (πΉ)σ΅¨σ΅¨ , ππ2
(32)
σ΅¨σ΅¨ π π σ΅¨σ΅¨ Ξπβ σ΅¨ Μ σ΅¨σ΅¨σ΅¨σ΅¨ β€ πΆ1 β σ΅¨σ΅¨σ΅¨σ΅¨ π β 1 σ΅¨σ΅¨σ΅¨σ΅¨ + 1 β π·F β (ππ‘ , ππβ ) sup σ΅¨σ΅¨σ΅¨σ΅¨πΈπ β πΈπ π π σ΅¨ σ΅¨ σ΅¨ π σ΅¨σ΅¨ π π=1 ππ πβF π=1 σ΅¨σ΅¨ π πΆ2 π σ΅¨σ΅¨ β σ΅¨ β σ΅¨π β π‘π σ΅¨σ΅¨σ΅¨ π π=1 σ΅¨ π
A. Proof of Proposition 1 According to (3), (4), and (5), we have
which explicitly shows the upper bound of Ξ¦4 . Finally, according to the upper bounds of Ξ¦π (π = 1, 2, 3, 4) Μ given in (6) and Proposition 1, we can bound supπβF |πΈπβ πΈπ| as follows: with probability of at least 1 β π,
+
Appendices
σ΅¨σ΅¨ π2 σ΅¨ σ΅¨ Μ σ΅¨σ΅¨σ΅¨σ΅¨ = sup σ΅¨σ΅¨σ΅¨ 1 β« β« π (zπ‘ , π‘) πππ‘ ππ‘ sup σ΅¨σ΅¨σ΅¨σ΅¨πΈπ β πΈπ σ΅¨ πβF σ΅¨σ΅¨σ΅¨ π π πβF 1 σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π π σ΅¨ , π‘ ) β β β π (z[π] π σ΅¨σ΅¨ π‘π σ΅¨σ΅¨ ππ π=1 π=1 σ΅¨ σ΅¨σ΅¨ π σ΅¨σ΅¨ Ξπ β€ sup σ΅¨σ΅¨σ΅¨ β π β« π (zππ , ππ ) ππππ πβF σ΅¨σ΅¨σ΅¨π=1 π Ξβ[π1 ,π2 ]
β (33)
σ΅¨σ΅¨2 σ΅¨σ΅¨ 128 (ln 8N1 (F, π/8, 2ππ) β ln π) βπ π=1 σ΅¨σ΅¨ππ (πΉ)σ΅¨σ΅¨ +β . ππ2
From result (33), we can find that the generalization performance of ERM principle for TDS is affected by the following factors: (i) The choice of the observation time points π‘1 , π‘2 , . . . , π‘π. (ii) The number π of TDS sequences. (iii) The length π of sequences. (iv) The properties of the martingale differences {ππ (πΉ)}. (v) The complexity of function classes.
σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π π σ΅¨ , π‘ ) β β π (z[π] π σ΅¨σ΅¨ π‘π σ΅¨σ΅¨ ππ π=1 π=1 σ΅¨
σ΅¨σ΅¨ π β σ΅¨σ΅¨ Ξπ β€ sup σ΅¨σ΅¨σ΅¨ β π β« π (zππβ , ππβ ) ππππβ πβF σ΅¨σ΅¨σ΅¨π=1 π π σ΅¨σ΅¨σ΅¨ 1 σ΅¨ β β β« π (zππβ , ππβ ) ππππβ σ΅¨σ΅¨σ΅¨ σ΅¨σ΅¨ π π=1 σ΅¨ σ΅¨σ΅¨ π σ΅¨σ΅¨ 1 + sup σ΅¨σ΅¨σ΅¨ β β« π (zππβ , ππβ ) ππππ πβF σ΅¨σ΅¨σ΅¨π=1 π σ΅¨σ΅¨ π σ΅¨σ΅¨ 1 β β β« π (zπ‘π , π‘π ) πππ‘π σ΅¨σ΅¨σ΅¨ σ΅¨σ΅¨ π=1 π σ΅¨ σ΅¨σ΅¨ π σ΅¨σ΅¨ 1 + sup σ΅¨σ΅¨σ΅¨ β β« π (zπ‘π , π‘π ) πππ‘π πβF σ΅¨σ΅¨σ΅¨π=1 π σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π π [π] β β β π (zπ‘π , π‘π )σ΅¨σ΅¨σ΅¨ , σ΅¨σ΅¨ ππ π=1 π=1 σ΅¨
(A.1)
And meanwhile, the quantitative relationship among these factors is explicitly given in the above result as well.
which can lead to result (14). This completes the proof.
4. Conclusion
B. Proof of Theorem 5
In this paper, we propose a new framework to analyze the generalization properties of ERM principle for the sequences of TDS. By introducing four auxiliary quantities π1 , π2 , π3 , and π4 , we give a detailed insight to the interactions among the complexity of function classes, the size of samples, and the dependence among samples. To achieve the upper bound of π4 , we develop the relevant deviation inequality and symmetrization inequality for the sequences of TDS. This work is an extension of the classical techniques of statistical learning theory. In our future works, we will consider to relax conditions (C1) and (C2) and further investigate the properties of ERM-based learning processes for the TDS sequences, which are not observed at the fixed observation time points.
Given a time sequence π‘1 , . . . , π‘π, let Sπ = {zπ‘1 , . . . , zπ‘π }. For any π β F, define a function πΉ : Zπ β R: π
πΉ (Sπ) = β π (zπ‘π , π‘π ) .
(B.1)
π=1
Then, consider {ππ }π π=1 as independent Rademacher random variables, that is, independent {β1, 1}-valued random variables with equal probability of taking either value. Given {ππ }π π=1 and S2π , we denote β€
β (π1 , . . . , ππ, βπ1 , . . . , βππ) , πfl
(B.2)
Discrete Dynamics in Nature and Society
7
and for any π β F, [π] σΈ [1] σΈ [π] [1] πΉβ (S[1] 2π , . . . , S2π ) fl (πΉ (Sπ ) , . . . , πΉ (Sπ ) , πΉ (Sπ ) , β€
. . . , πΉ (S[π] π )) .
On the other hand, given any π > 0, we also have, for any π β₯ 8(π β π)2 /π2 , (B.3)
According to (5) and Theorem 3, given any π > 0, we have, for any π β₯ 8(π β π)2 /π2 , σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π σ΅¨σ΅¨ [π] σ΅¨ Pr {sup σ΅¨σ΅¨ β πΉ (Sπ ) β πΈ {πΉ (Sπ)}σ΅¨σ΅¨σ΅¨ > π} β€ 2 σ΅¨ σ΅¨σ΅¨ π πβF σ΅¨σ΅¨ π=1 σ΅¨ σ΅¨σ΅¨ σ΅¨π π σ΅¨ π 1 σ΅¨σ΅¨ σΈ [π] σ΅¨σ΅¨σ΅¨ β
Pr {sup σ΅¨σ΅¨σ΅¨ β πΉ (S[π] ) β πΉ (S ) β σ΅¨σ΅¨ > } π π σ΅¨σ΅¨ 2 πβF π σ΅¨σ΅¨σ΅¨π=1 π=1 σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ π 1 σ΅¨σ΅¨σ΅¨ [π] σΈ [π] σ΅¨σ΅¨σ΅¨ = 2 Pr {sup σ΅¨σ΅¨ β ππ (πΉ (Sπ ) β πΉ (Sπ ))σ΅¨σ΅¨ σ΅¨σ΅¨ πβF π σ΅¨σ΅¨σ΅¨π=1 σ΅¨
σ΅¨σ΅¨ σ΅¨ π 1 σ΅¨σ΅¨σ΅¨σ΅¨ π σΈ [π] [π] σ΅¨σ΅¨ σ΅¨σ΅¨ β π» (Sπ ) β π» (Sπ )σ΅¨σ΅¨σ΅¨ > } σ΅¨σ΅¨ 4 ββΞ π σ΅¨σ΅¨σ΅¨π=1 σ΅¨
σ΅¨σ΅¨ π σ΅¨σ΅¨ π 1 σ΅¨σ΅¨σ΅¨ σΈ [π] [π] σ΅¨σ΅¨σ΅¨ β€ Pr { β σ΅¨σ΅¨ β (π» (Sπ ) β π» (Sπ ))σ΅¨σ΅¨ > } σ΅¨σ΅¨ 4 π ββΞ σ΅¨σ΅¨σ΅¨π=1 σ΅¨ (B.4)
π > }. 4 [π] For any given S[1] 2π , . . . , S2π , set Ξ to be a π/8-radius cover [π] of F with respect to the β1 (S[1] 2π , . . . , S2π ) norm. Since F is composed of bounded functions with range [π, π], we assume that the same holds for any β β Ξ. According to the triangle inequality, if π0 is the function that achieves
π 1 σ΅¨σ΅¨ [π] σ΅¨σ΅¨ σ΅¨σ΅¨β¨π,β πΉβ (S[1] σ΅¨ 2π , . . . , S2π )β©σ΅¨σ΅¨ > , σ΅¨ 2π 4 πβF
(B.5)
there must be β0 β Ξ that satisfies 1 π σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨ β (σ΅¨σ΅¨πΉ (SσΈ [π] ) β π»0 (SσΈ [π] π )σ΅¨σ΅¨ 2π π=1 σ΅¨ 0 π
(B.6)
π σ΅¨ [π] σ΅¨σ΅¨ σ΅¨ + σ΅¨σ΅¨σ΅¨σ΅¨πΉ0 (S[π] π ) β π»0 (Sπ )σ΅¨σ΅¨) < , 8 and meanwhile π 1 σ΅¨σ΅¨ [π] σ΅¨σ΅¨ σ΅¨σ΅¨β¨π,β π»β 0 (S[1] σ΅¨ 2π , . . . , S2π )β©σ΅¨σ΅¨ > . 2π σ΅¨ 8
σ΅¨ π σ΅¨σ΅¨ 1 [π] σ΅¨σ΅¨ σ΅¨ β€ Pr {sup σ΅¨σ΅¨σ΅¨σ΅¨ β¨π,β π»β (S[1] 2π , . . . , S2π )β©σ΅¨σ΅¨ > } . σ΅¨ 8 ββΞ σ΅¨ 2π
σ΅¨σ΅¨ σ΅¨σ΅¨σ΅¨ σ΅¨σ΅¨ 1 π σ΅¨σ΅¨ ) β€ Pr { β σ΅¨σ΅¨σ΅¨πΈ {π» (Sπ)} β β π» (S[π] π σ΅¨σ΅¨σ΅¨ σ΅¨ π σ΅¨ σ΅¨σ΅¨ π=1 ββΞ σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ π 1 π σΈ [π] σ΅¨σ΅¨σ΅¨ σ΅¨ + σ΅¨σ΅¨πΈ {π» (Sπ)} β β π» (Sπ )σ΅¨σ΅¨ > } β€ 2 σ΅¨σ΅¨ σ΅¨ π 4 σ΅¨σ΅¨ π=1 σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ σ΅¨σ΅¨ π 1 π σ΅¨σ΅¨ > } ) β
Pr { β σ΅¨σ΅¨σ΅¨πΈ {π» (Sπ)} β β π» (S[π] π σ΅¨σ΅¨ 8 σ΅¨ π π=1 σ΅¨σ΅¨ ββΞ σ΅¨σ΅¨
(B.9)
π β€ 4N1 (F, , 2ππ) 8 β
exp (β
ππ2 σ΅¨σ΅¨2 ) . σ΅¨σ΅¨ 128 βπ π=1 σ΅¨σ΅¨ππ (πΉ)σ΅¨σ΅¨
The last inequality of (B.9) is derived from Definition (30) and Lemma 2. The combination of (B.4), (B.8), and (B.9) leads to the result: given any π > 0, there holds that, for any π β₯ 8(π β π)2 /π2 , σ΅¨σ΅¨ π σ΅¨σ΅¨ σ΅¨σ΅¨ 1 σ΅¨σ΅¨ 1 π π σ΅¨σ΅¨ Pr {sup σ΅¨σ΅¨σ΅¨ β β« π (zπ‘π , π‘π ) πππ‘π β , π‘ ) β β π (z[π] π π‘ σ΅¨σ΅¨ π σ΅¨ π ππ σ΅¨σ΅¨ πβF σ΅¨σ΅¨ π=1 π=1 π=1 π ππ2 π2 > π} β€ 8N1 (F, , 2ππ) exp (β σ΅¨σ΅¨2 ) . σ΅¨σ΅¨ 8 128 βπ π=1 σ΅¨σ΅¨ππ (πΉ)σ΅¨σ΅¨
(B.10)
This completes the proof. (B.7)
Conflict of Interests The authors declare that there is no conflict of interests regarding the publication of this paper.
Therefore, we arrive at σ΅¨ π σ΅¨σ΅¨ 1 [π] σ΅¨σ΅¨ σ΅¨ Pr {sup σ΅¨σ΅¨σ΅¨σ΅¨ β¨π,β πΉβ (S[1] 2π , . . . , S2π )β©σ΅¨σ΅¨ > } σ΅¨ 4 πβF σ΅¨ 2π
σ΅¨ π σ΅¨σ΅¨ 1 [π] σ΅¨σ΅¨ σ΅¨ = Pr {sup σ΅¨σ΅¨σ΅¨σ΅¨ β¨π,β π»β (S[1] 2π , . . . , S2π )β©σ΅¨σ΅¨ > } σ΅¨ 4 ββΞ σ΅¨ π = Pr {sup
σ΅¨ σ΅¨σ΅¨ 1 π [π] σ΅¨σ΅¨ σ΅¨ > } = 2 Pr {sup σ΅¨σ΅¨σ΅¨σ΅¨ β¨π,β πΉβ (S[1] 2π , . . . , S2π )β©σ΅¨σ΅¨ 2 σ΅¨ πβF σ΅¨ 2π
sup
σ΅¨ π σ΅¨σ΅¨ 1 [π] σ΅¨σ΅¨ σ΅¨ Pr {sup σ΅¨σ΅¨σ΅¨σ΅¨ β¨π,β π»β (S[1] 2π , . . . , S2π )β©σ΅¨σ΅¨ > } σ΅¨ 8 ββΞ σ΅¨ 2π
Acknowledgment (B.8)
This work is supported by the National Natural Science Foundation of China under Project 11401076, Project 61473328, Project 11171367, and Project 61473059.
8
References [1] V. N. Vapnik, βAn overview of statistical learning theory,β IEEE Transactions on Neural Networks, vol. 10, no. 5, pp. 988β999, 1999. [2] E. Parrado-HernΒ΄andez, A. Ambroladze, J. Shawe-Taylor, and S. Sun, βPAC-Bayes bounds with data dependent priors,β Journal of Machine Learning Research, vol. 13, no. 1, pp. 3507β3531, 2012. [3] L. Oneto, A. Ghio, S. Ridella, and D. Anguita, βLocal rademacher complexity: sharper risk bounds with and without unlabeled samples,β Neural Networks, vol. 65, pp. 115β125, 2015. [4] X. Liu, S. Lin, J. Fang, and Z. Xu, βIs extreme learning machine feasible? A theoretical assessment (Part I),β IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 1, pp. 7β 20, 2015. [5] S. Lin, X. Liu, J. Fang, and Z. Xu, βIs extreme learning machine feasible? A theoretical assessment (Part II),β IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 1, pp. 21β 34, 2015. [6] C. Zhang and D. Tao, βRisk bounds of learning processes for lΒ΄evy processes,β Journal of Machine Learning Research, vol. 14, no. 1, pp. 351β376, 2013. [7] C. Zhang and D. Tao, βGeneralization bounds of ERM-based learning processes for continuous-time markov chains,β IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 12, pp. 1872β1883, 2012. [8] B. Zou, L. Li, and Z. Xu, βThe generalization performance of ERM algorithm with strongly mixing observations,β Machine Learning, vol. 75, no. 3, pp. 275β295, 2009. [9] W. Jiang, βOn uniform deviations of general empirical risks with unboundedness, dependence, and high dimensionality,β Journal of Machine Learning Research, vol. 10, pp. 977β996, 2009. [10] W. Hoeffding, βProbability inequalities for sums of bounded random variables,β Journal of the American Statistical Association, vol. 58, no. 301, pp. 13β30, 1963. [11] B. Zou, Y. Y. Tang, Z. Xu, L. Li, J. Xu, and Y. Lu, βThe generalization performance of regularized regression algorithms based on markov sampling,β IEEE Transactions on Cybernetics, vol. 44, no. 9, pp. 1497β1507, 2014. [12] J. Xu, Y. Y. Tang, B. Zou, Z. Xu, L. Li, and Y. Lu, βThe generalization ability of online svm classification based on markov sampling,β IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no. 3, pp. 628β639, 2015. [13] P. L. Bartlett, S. Mendelson, and P. Philips, βLocal complexities for empirical risk minimization,β in Learning Theory, pp. 270β 284, Springer, 2004. [14] A. Sutivong, M. Chiang, T. M. Cover, and Y.-H. Kim, βChannel capacity and state estimation for state-dependent Gaussian channels,β IEEE Transactions on Information Theory, vol. 51, no. 4, pp. 1486β1495, 2005. [15] J. Ramsay, Functional Data Analysis, Wiley, 2005. [16] G. M. James, J. Wang, and J. Zhu, βFunctional linear regression thatβs interpretable,β The Annals of Statistics, vol. 37, no. 5A, pp. 2083β2108, 2009. [17] B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. SchΒ¨olkopf, and G. R. G. Lanckriet, βOn the empirical estimation of integral probability metrics,β Electronic Journal of Statistics, vol. 6, pp. 1550β1599, 2012. [18] A. MΒ¨uller, βIntegral probability metrics and their generating classes of functions,β Advances in Applied Probability, vol. 29, no. 2, pp. 429β443, 1997.
Discrete Dynamics in Nature and Society [19] L. A. Kontorovich and K. Ramanan, βConcentration inequalities for dependent random variables via the martingale method,β The Annals of Probability, vol. 36, no. 6, pp. 2126β2158, 2008. [20] K. Azuma, βWeighted sums of certain dependent random variables,β Tohoku Mathematical Journal, Second Series, vol. 19, no. 3, pp. 357β367, 1967. [21] O. Bousquet, S. Boucheron, and G. Lugosi, βIntroduction to statistical learning theory,β in Advanced Lectures on Machine Learning, pp. 169β207, Springer, 2004. [22] S. Mendelson, βA few notes on statistical learning theory,β in Advanced Lectures on Machine Learning, vol. 2600 of Lecture Notes in Computer Science, pp. 1β40, Springer, Berlin, Germany, 2003.
Advances in
Operations Research Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Advances in
Decision Sciences Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Journal of
Applied Mathematics
Algebra
Hindawi Publishing Corporation http://www.hindawi.com
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Journal of
Probability and Statistics Volume 2014
The Scientific World Journal Hindawi Publishing Corporation http://www.hindawi.com
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
International Journal of
Differential Equations Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Volume 2014
Submit your manuscripts at http://www.hindawi.com International Journal of
Advances in
Combinatorics Hindawi Publishing Corporation http://www.hindawi.com
Mathematical Physics Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Journal of
Complex Analysis Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
International Journal of Mathematics and Mathematical Sciences
Mathematical Problems in Engineering
Journal of
Mathematics Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Discrete Mathematics
Journal of
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Discrete Dynamics in Nature and Society
Journal of
Function Spaces Hindawi Publishing Corporation http://www.hindawi.com
Abstract and Applied Analysis
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
International Journal of
Journal of
Stochastic Analysis
Optimization
Hindawi Publishing Corporation http://www.hindawi.com
Hindawi Publishing Corporation http://www.hindawi.com
Volume 2014
Volume 2014