Supporting Text S2 for Reinforcement Learning on Slow ... - PLOS

Recommend Documents

Santa Barbara. N e. 6.5. 51.3. 10.6. 32.2 ..... Mentzer WC, Jr., Warner R, Addiego J, Smith B, Walter T (1980) G6PD San. Francisco: a new variant ... Prchal J, Moreno H, Conrad M, Vitek A (1979) G-6-PD Dothan: A new variant associated with ...

SUPPORTING INFORMATION: Text S2 - PLOS

2. 1. Re-encoded sequences. Cassette located in the nsP1 Region. Length (nt): 1302. Position (nt) in complete genome: 242-1543. Number of mutations: 264.

Supporting Information Text S2

Influences of Excluded Volume of Molecules on Signaling Processes on the ... was larger than that of each molecule, but had to be small enough for the distance ...

SUPPORTING INFORMATION S2 - PLOS

0.957000. 0.019700 mxim. 0.963700. 0.060700 lltc. 0.936800. 0.106900 rost. 0.851700. 0.142600 cmcsa. 0.913600. 0.175900 vrsn. 0.781100. 0.176800 infy.

Supporting information S2 Text for Modeling genetic ... - PLOS

May 1, 2017 - a standard likelihood ratio test with Normal dispersion distribution between the full model that ... ing strategies affect the family-wise error rate.

Supplementary Text S2 - PLOS

Sebastes atrovirens. Sebastes auriculatus. Sebastes auriculatus. Sebastes auriculatus. Sebastes auriculatus. Sebastes auriculatus. Sebastes auriculatus.

Chiew - S2 Text - PLOS

Inter-rater reliability was good (72.8% overlap; Cohen's kappa = .55). (Fleiss, 1981) across 9 items (the 8 art pieces, plus exhibit statement). Fewer than 15.

S2 Text - PLOS

approved ICES Project PIA, PIA Amendments, and DCP are saved on the. T Drive, ensuring PIA Amendments are submitted as required, ensuring.

Text S2. - Plos

1. 1. 4/5. Identify the artide as a study of diagnostic accuracy (recommend MeSH heading. 'sensitivity and specificity). State the research questions or study aims, ...

S2 Text. - PLOS

May 26, 2016 - ...... cary. .

Text S2 - PLOS

Text S2: Details for Figure 3. Supplementary Information for. Detailed simulations of cell biology with Smoldyn 2.1. Andrews, Addy, Brent, and Arkin. 2.1 Figure 3 ...

Supplementary Text S2 - PLOS

3, the state vector X(t) is a non-negative N-dimensional jump stochastic process but ... of one reaction, the third term also considers all the possible system states.

Text S2 - PLOS

0 = y + g(x),. (41) where f : R+22 â R15, g : R+15 â Râ7, x â R+15 and y â R+7 are the vectors consisting of the concentrations on the left hand side of Eqns.

Text S2 - PLOS

signal), which contained 608 SNPs. ... following quality control (QC) criteria: 1) missing rate per SNP â¤ 0.15; 2) missing rate ... methods for selecting tag SNPs.

Text S2 - PLOS

ocMOF of -th patient, for . Step 2. We fitted the following linear regression model on ... Next, we subtracted the variation fitted to the batch and sex variables from.

Text S2 - PLOS

supporting text S2 always refer to the figures within text S2.) The Pearson coefficient R turned out to be sufficient as ... is a good approximation to f(t). This fact is ...

S2 Text. - PLOS

1 Department of Infectious Disease Epidemiology, Imperial College London, United Kingdom. ... 4 Department of Zoology, University of Oxford, United Kingdom.

Text S2 - PLOS

The apparent pKa is then defined as the inflection point of the S vs. pH curve and can be calculated as the pH for which the second derivative of S equals zero.

S2 Text - PLOS

Information sources 7 Describe all information sources (e.g., databases with dates of ... S2 Table includes the all results of the ... The items are all covered.

Text S2. - PLOS

Conclusions: ii. The code has no GREEN or YELLOW. iii. Therefore, the code has both RED and BLUE, and either PINK or ORANGE (but not both). iv. Based on ...

Text S2 - PLOS

response to Notch and Bmp4. Scl+/-, Gata2+/- and Fli1+/- heterozygotes all show bistable dose response to Notch and Bmp4 (Figure S4 A-C). Note that the ...

Text S2 - PLOS

In this study, there were clearly dominant trapping methods used for different groups of birds. Ducks: The majority of duck samples were taken from hunted birds.

S2 Text - PLOS

isocitrate dehydrogenase IDH2, which is responsible for reductive carboxylation (46). ... dehydrogenase flux predicted by FBA (Fig 3A) might be ascribed to the ...

Supplementary Text S2 - PLOS

SEVAJOL. MARION. Biochimie 2006. SUZANON. DIDIER. Biochimie 2006. SZYTTENHOLM. ALEXANDRA .... ISABELLE. ESIL 2006. DUMOULIN. CHLOÃ.

Supporting Text S2 for Reinforcement Learning on Slow ... - PLOS

Download PDF

4 downloads 0 Views 23KB Size Report

Comment

update rules (9) and (10) given in the main text perform gradient ascent on the reward signal ... of the reward signal, which is given by the chain rule as. âR(t).

Supporting Text S2 for Reinforcement Learning on Slow Features of High-Dimensional Input Streams Robert Legenstein1 , Niko Wilbert2 , and Laurenz Wiskott2,3 1 Institute for Theoretical Computer Science, Graz University of Technology, Austria 2 Institute for Theoretical Biology, Humboldt-Universit¨at zu Berlin, Germany 3 Institut f¨ur Neuroinformatik, Ruhr-Universit a¨ t Bochum, Germany

S2: Derivation of the Policy-Gradient Update Rule In the following we provide a derivation which shows that the policy-gradient update rules (9) and (10) given in the main text perform gradient ascent on the reward signal R(t). Consider several neurons that influence the reward signal by their output. In fact, in our setup the reward R(t) directly depends on the output of the neurons at time t. The weights should change in the direction of the gradient of the reward signal, which is given by the chain rule as ∂R(t) ∂ai (t) ∂R(t) ∂R(t) = = bik (t), ∂uik ∂ai (t) ∂uik ∂ai (t)

(1)

where ai (t) is the total somatic input to neuron i at time t, see equation (6) in the main text. We assume that the noise ξ is independently drawn at each time and for every neuron with zero mean and variance µ2 , hence we have hξi (t)i = 0, and hξi(t)ξj (t′ )i = µ2 δij δ(t − t′ ) where δij denotes the Kronecker Delta, δ(t − t′ ) denotes the Dirac delta, and h·i denotes the average over the random variable ξ, i.e., an average over trials with the same input but different noise. We assume that the noise depends deterministically on the activation of a set of neurons. The deviation of the reward R(t) from the reward R0 (t) without the noise term can be approximated to be linear in the noise for small noise X ∂R(t) R(t) − R0 (t) ≈ ξk (t). (2) ∂ak (t) k 1

Multiplying this equation with ξi (t) and averaging over different realizations of the noise, we obtain the correlation between the reward at time t and the noise signal at neuron i h(R(t) − R0 (t))ξi (t)i ≈

X ∂R(t) ∂R(t) hξk (t)ξi(t)i = µ2 . ∂ak (t) ∂ai (t) k

(3)

The last equality follows from the assumption that the noise signal is temporally and spatially uncorrelated. Hence, the derivative of the reward signal with respect to the activation of neuron i is 1 ∂R(t) ≈ 2 h(R(t) − R0 (t))ξi(t)i. ∂ai (t) µ

(4)

Note also that hR0 (t)ξi (t)i = 0, thus the actual choice of the baseline R0 is not important in the mean. However, the baseline can have an effect on the variance of the estimate. The choice in the learning rule estimates the average reward for zero noise since inputs are temporally correlated whereas the noise is not. Using this result in equation (1), we obtain 1 ∂R(t) ≈ 2 h(R(t) − R0 (t))ξi (t)ibik (t). ∂uik µ

(5)

We further note that for a small learning rate or if the input changes slowly compared to the noise signal, the weight vector is self-averaging and we can neglect the average in equation (5). By symmetry, we find that ∂R(t) 1 ≈ 2 h(R(t) − R0 (t))ξi (t)iuik . ∂bik µ We thus obtain ∂R ∂bik ∂bik 1 ∂R = ≈ h(R(t) − R0 (t))ξi (t)iuik ∂wijk ∂bik ∂wijk ∂wijk µ2 1 = uik h(R(t) − R0 (t))ξi (t)if˙ik (t)xj (t). µ2

2

(6)