Looping in pymc3/theano

Hi All,

I am trying to implement a model that involves looping but am having a hard time implementing this in pymc3.

The basic (unrolled loop) model would look something like this:

with pm.Model() as model:
    p0 = pm.Beta("p0",1,1)
    p1 = pm.Binomial("n1",popsize,p0)/popsize
    p2 = pm.Binomial("n2",popsize,p1)/popsize
    r0 = pm.Binomial("r0",sample_size,p0,observed=[10,14,12])
    rT = pm.Binomial("rT",sample_size,p2,observed=[7,16,13])

Now, the question is how to loop the p1 , p2,… part so that I don’t have to explicitly write down all the binomial sampling steps (I obviously would like to have an arbitrary number of these)?

Thanks in advance for any help!


Doing something like this:

with pm.Model() as model:
    p0 = pm.Beta("p0",1,1,shape=1)
    def evolve(prev_p):
        return pm.Binomial.dist(popsize,prev_p).random()/popsize
    result,updates = theano.scan(evolve,
    p2 = result[-1]
    r0 = pm.Binomial("r0",sample_size,p0,observed=[10,14,12])
    rT = pm.Binomial("rT",sample_size,p2,observed=[7,16,13])

results in ValueError: Cannot resolve inputs for ['p0[t]'].
Variations of this also result in errors…

Hi Tiago,
Could you add some more context? I might be able to help if I had a tangible use case in mind.

Thanks @jbuddy_13.

The idea is to simulate a finite population evolving under genetic drift for several generations. You have a population of N individuals, of which initially n_0 have a certain gene (so, with frequency p_0 = n_0/N). Every new generation, a binomial sample of the previous generation is drawn, resulting in a new gene frequency p_{t} = n_{t}/N, where n_t \sim Bin(N,p_{t-1}).

the model that I presented in the original post is a simplification to illustrate my difficulty. The actual idea is to use information from sub-samples of the initial and final population to estimate parameters of a slightly more complicated model that I have not included here (namely, selection, which would change the way we calculate p_{t+1}.

Does this help?

I figured out that the problem is this line:

def evolve(prev_p):
   return pm.Binomial.dist(popsize,prev_p).random()/popsize

In particular, theano does not like the use of pm.Binomial.dist()… A deterministic function works fine.

So the question is, How do i get random numbers inside a function that will be used by theano.scan() ?

I wouldn’t use scan for this, and generating random numbers is also not the right way.

I didn’t look too closely into what you want but I think this idea of generating a Gaussian random-walk might help:

pm.Normal('x', 0, sigma, shape=100).cumsum()

So you can just generate a vector that would be each initialization, and then e.g. sum the up to get random walk.

thanks @twiecki. Although I am not sure this would solve it: What I would like is a series of binomial samples for which the p parameter of each draw depends on the previous draw. Because of this, I thought scan() would be right way of going about it.

How would you go about implementing this?

This is something that is not well supported in PyMC just yet. In the next major release it will be straightforward to use Scans to specify such kind of models.

Thanks @ricardoV94 . Looign forward to the new version. In any case, I ended up doing it without scan(), just creating a list of variables in a for loop. It’s probably much slower than the scan solution, as it ends up creating many variables that will be estimated, but it works.

I am leaving it here or reference, as it may help others:

ps =[ pm.Binomial("n0",popsize,p0_,shape=data.shape[0])/popsize ]

for n in range(1,24):
  nextp = ps[n-1]
  ps.append(pm.Binomial("n"+str(n),popsize,nextp,shape=data.shape[0])/popsize )

Importantly, you need to specify the shape, when you use Binomials like this…

The issue with that approach is that you can only do a very limited number of steps due to theano recursion limit. You can try to do that loop for 500 steps and you’ll see it (I think the limit is way less than 500).

But if that’s enough for you that’s great!

for now it is enough, but I would certainly like to have an arbitrary number of steps.

Can you think of any other way with the current version of pymc3? Maybe using random streams?