Conditioning in Reinforcement Learning Setup

kapilagrawal · April 27, 2018, 7:17am

Hi
I am working on a reinforcement learning problem wherein the model works in a loop until time is not over.
Something like this:

with pm.Model():
      timeLeft = 7(say)
      while(timeLeft>0):
            //sample from a distribution and compute an outcome based on defined action
            //store the outcome in an array
            timeLeft = timeLeft-1

after I come out of the model
I get an array of outcome of length 7.
I am interested in conditioning this array of outcome to the actual array and see the posterior distribution of latent variables.

How do I write the code for the conditioning part and the inference part?

If there are any existing issues that solve my problems please point to them.

Thanks

junpenglao · April 27, 2018, 8:13am

The conditioning works usually as an input to a logp function. For example, if the observed is from a Gaussian distribution, you can input the array of outcome as \mu and input the actual data as observed.

If I understand you correctly, in the while loop you stimulate a data array and you would like to match with the actual observed? It sounds to me you don’t have a likelihood function, which the inference can be done using approximate bayesian computation (not available yet, we are implementing it as one of the GSOC project this summer). A work around for now is assign a Gaussian likelihood with a small sigma.

kapilagrawal · April 27, 2018, 11:07am

Thanks for your help.

Yes you understood my problem completely.

I gave a shot with Gaussian likelihood but it is throwing some error. I used this specific line:

y = pm.Normal('y', mu=simulated_results , sigma=0.2, observed=obs)

Here simulated_results is an array of length 7 and obs too is of length 7.

Code:

with pm.Model():
    timeLeft = 7
    ac = 0
    calc_reach = []
    maxi = 3400
    while(timeLeft>0):
        theta1 = pm.Beta('theta1', alpha=2, beta=3)
        theta2 = pm.Beta('theta2', alpha=4, beta=20)
        old = theta1 * ac
        new = theta2 * maxi
        res = old + new
        ac = new
        timeLeft = timeLeft-1
        calc_reach.append(reach)

    y = pm.Normal('y', mu=calc_reach , sigma = 0.2, observed=obs)

Error: Variable name theta1_logodds__ already exists.

I am not sure what it means. Further if I sample theta1 and 2 above the while loop. Like this:

with pm.Model():
    timeLeft = 7
    ac = 0
    calc_reach = []
    maxi = 3400
    theta1 = pm.Beta('theta1', alpha=2, beta=3)
    theta2 = pm.Beta('theta2', alpha=4, beta=20)
    while(timeLeft>0):
        
        old = theta1 * ac
        new = theta2 * maxi
        res = old + new
        ac = new
        timeLeft = timeLeft-1
        calc_reach.append(reach)

    y = pm.Normal('y', mu=calc_reach , tau = 0.2, observed=obs)
    
    start = pm.find_MAP() # Find good starting point
    step = pm.Slice() # Instantiate MCMC sampling algorithm
    trace = pm.sample(10000, step, start=start, progressbar=False)

Error: Input 0 of the graph (indices start from 0), used to compute Elemwise{neg,no_inplace}(winning_logodds__), was not provided and not given a value. Use the Theano flag exception_verbosity='high', for more information on this error.

I think the core problem lies in the likelihood statement. Not sure though.

junpenglao · April 27, 2018, 4:39pm

You should rewrite the while loop either into a matrix operation or as a theano.scan. There are some example of using theano.scan in the doc
http://docs.pymc.io/notebooks/PyMC3_tips_and_heuristic.html
and in the example folder:

github.com

pymc-devs/pymc3/blob/master/pymc3/examples/arma_example.py

import pymc3 as pm
from theano import scan, shared

import numpy as np
"""
ARMA example
It is interesting to note just how much more compact this is than the original STAN example

The original implementation is in the STAN documentation by Gelman et al and is reproduced below


Example from STAN- slightly altered

data {
  int<lower=1> T;
  real y[T];
}
parameters {
    // assume err[0] == 0
}

This file has been truncated. show original

You can also get some inspiration from the timeseries distribution implementation as theano.scan is used as well:

github.com

pymc-devs/pymc3/blob/master/pymc3/distributions/timeseries.py

import theano.tensor as tt
from theano import scan

from pymc3.util import get_variable_name
from .continuous import get_tau_sd, Normal, Flat
from .dist_math import Cholesky
from . import multivariate
from . import distribution


__all__ = [
    'AR1',
    'AR',
    'GaussianRandomWalk',
    'GARCH11',
    'EulerMaruyama',
    'MvGaussianRandomWalk',
    'MvStudentTRandomWalk'
]

This file has been truncated. show original

Topic		Replies	Views
How to handle loops better? v5 modeling	1	928	August 13, 2023
Fitting Reinforcement Learning model modeling	0	338	November 4, 2023
Reinforcement learning - help building a model Questions scan_ops , large_model	16	1339	June 3, 2022
Sequence of Observed in a loop, how the log-likelihood are estimated? add them all? Questions	9	1749	November 16, 2018
Some questions about PyMC Models in online settings v5	3	437	August 29, 2023

Conditioning in Reinforcement Learning Setup

Related topics