Using pytensor random number generator with scan for Simulator class

Hello,

[Note: see below for updates, some of the problems here may be resolved]

I want to use the Simulator to setup a Bayesian ABC for a large Markov Chain Model. By large lets say it is a state space of couple thousand raised to power of 10 (and not sparsely connected). So the DiscreteMarkovChain model is a no go. I don’t even know if this ABC approach is statistically feasible but I wanted to try it on some simulated data. But I ran into an issue, possibly due to my misunderstanding of how pytensor rng should be used. The full code is below. When I run that code I got an “setting an array element with a sequence.” error with a not so very helpful error stack. So I added a line where the output of the simulator class is evalled (see commented out part) and that still led to the same error but the error stack now showing that the problem occurs at

next_state = rng.multinomial(N, p)

After playing around with it more and trying different things I think I was also able to get to the bottom of where this is actually happening:


File ~/miniconda3/envs/ACORG/lib/python3.10/site-packages/pytensor/tensor/random/op.py:401, in RandomVariable.perform(self, node, inputs, outputs)
    398     rng = copy(rng)
    400 outputs[0][0] = rng
--> 401 outputs[1][0] = np.asarray(
    402     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    403     dtype=self.dtype,
    404 )

Update: After digging into this a little bit more, I feel like I may not be using scan correctly in conjunction with Simulator somehow. In the first step self.rng_fn has as args = [[0], [1]] but then at the point this produces an error, args is

[array([ 2.25634373,  1.18186176, 88.82739814,  2.10452112,  1.81074076]), array([0.02283933]), array([    0,   100, 10000,   100,    10]), array([11], dtype=int8)]

So this is starting to feel more like not using scan correctly in the context of simulate. Note also that _WFM_sim works fine on its own, it returns a ncycles x ndim array when called and evalled on its own.

Full code (the code does not seem to have any syntax errors in it and runs fine but I don’t know why I could not get it highlighted properly):


import pymc as pm
import numpy as np
import pytensor as pt


def _calculate_P_pyt(state, f, m, nvariants):
  '''
  calculate the transition from state to the next state
  '''

  denom = state*f

  M = pt.tensor.stack([m for _ in range(nvariants)])
  M = pt.tensor.tile(M, (1,nvariants))/(nvariants-1)
  mask = np.eye(nvariants, dtype=bool)
  M = M[mask].set(1-m)

  return pt.tensor.sum(M*denom, axis=1)/pt.tensor.sum(denom)


def _WFM_sim(rng, f, m, initial_state, nsims, size=None):
  '''
  given an initial state, generate a markov chain of length nsims
  '''

  nvariants = initial_state.size
  N = initial_state.sum()
  #rng = pt.shared(rng)

  def transition(*args):

    state,  f, m = args

    p = _calculate_P_pyt(initial_state, f, m, nvariants)

    next_state = rng.multinomial(N, p)

    # my other attempt, in this case uncomment pt.shared(rng) above
    #next_state = pm.Multinomial.dist(N, p, rng=rng)

    return next_state

  result, updates = pt.scan(transition,
                            outputs_info=initial_state,
                            non_sequences=[f, m],
                            n_steps=nsims)

  return pt.tensor.flatten(result)


def WFM_ABC(initial_state, data):

  nsims = data.shape[0]

  with pm.Model():
    log10_m = -1-pm.HalfNormal("log10_m", 3)
    log_f = pm.LogNormal("log_f", 0, 1, size=initial_state.size)

    f = pm.math.exp(log_f)
    m = 10**log10_m


    s = pm.Simulator("s", _WFM_sim, params=(f, m, initial_state, nsims),
                     epsilon=1, observed=data.flatten())

    # commenting this out gives a more informative error
    # s.eval()

    idata = pm.sample_smc()
    idata.extend(pm.sample_posterior_predictive(idata))


# rows are the observed states in a Markov chain starting with i=1
obs_data = np.array([[    0,   123,  9990,    85,    12],
                     [    0,    99, 10012,    93,     6],
                     [    1,   127,  9958,   117,     7],
                     [    0,   109,  9969,   120,    12],
                     [    0,   125,  9973,    99,    13],
                     [    1,   116,  9982,   103,     8],
                     [    0,    88, 10022,    89,    11],
                     [    1,   107,  9995,   101,     6],
                     [    0,   123,  9982,    99,     6],
                     [    0,    99, 10006,    92,    13]])

initial_state = np.array([    0,   100, 10000,   100,    10])

WFM_ABC(initial_state, obs_data)

So there is a major enough update that I decided to add this as a new message. About the error that occurs at

File ~/miniconda3/envs/ACORG/lib/python3.10/site-packages/pytensor/tensor/random/op.py:401, in RandomVariable.perform(self, node, inputs, outputs)
    398     rng = copy(rng)
    400 outputs[0][0] = rng
--> 401 outputs[1][0] = np.asarray(
    402     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    403     dtype=self.dtype,
    404 )

I have identified that this first of all occurs for the simulator RV. There are actually two problems that occur here. One is shape related so won’t dwell on that (also not cause of the error I mentioned above). The other one is I assumed, like in normal sampling, that everything was done tensorially and the returned result should be tensors. But seems like it is not the case for the simulator class. In fact the dtype in the code above is float where as self.rng returns tensor hence the “setting an array element with a sequence” error. I fixed the shape and dtype issue by changing_WF_sim as follows (basically adding an eval in the end although it seems I could have done everything in numpy to):

def _WFM_sim(rng, f, m, initial_state, nsims, size=None):

  nvariants = initial_state.size
  N = initial_state.sum()
  rng = pt.shared(rng)
  initial_state = initial_state[0]


  def transition(*args):

    state,  f, m = args

    f = f.flatten()
    m = m.flatten()[0]

    p = _calculate_P_pyt(initial_state, f, m, nvariants)
    next_state = pm.Multinomial.dist(N, p, rng=rng)

    return next_state

  result, updates = pt.scan(transition,
                            outputs_info=initial_state,
                            non_sequences=[f, m],
                            n_steps=nsims)

  return result.eval()

Now the code does not give any errors WHEN evaluating the simulator rv though it suspiciously returns an array with identical states so will need to think on that a bit):

array([[  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.],
       [  11.,   39., 9860.,  278.,   22.]])

However the simulator still does not run to completion it also gives the following error (I can give the full stack if that would be helpful, it happens inside run_chains):

ValueError: Input dimension mismatch: (input[%i].shape[%i] = %lld, input[%i].shape[%i] = %lld)
Apply node that caused the error: Composite{sqr((i0 - i1))}(s{[[0.0000e+ ... 4000e+01]]}, Simulator_s_rv{"(),(),(),()->()"}.out)
Toposort index: 8
Inputs types: [TensorType(float64, shape=(10, 5)), TensorType(float64, shape=(10, 5))]
Inputs shapes: [(10, 5), (9, 5)]
Inputs strides: [(40, 8), (40, 8)]
Inputs values: ['not shown', 'not shown']
Inputs type_num: [12, 12]
Outputs clients: [[Sum{axes=None}(Composite{sqr((i0 - i1))}.0)]]

This one is a bit more harder to diagnose because the error stack just seems to happen inside run_chains functions. But the error suggests that it happens when computing the error between data and observed since Composite{sqr((i0 - i1))}. I will see if I can fix it. Changing sum_stat to “mean” seems to make the chains run forever without any progress, I suppose that is because of the slow eval. I will rewrite in numpy or compile it tomorrow to see if it improves anything.

Makes me wonder if I should just drop the simulator and write everything with potentials as in

Ok so numpyfing the relevant parts of the code I was able to get the code to run, more or less (sorry for using the forums as a journal, I am just updating so that people don’t spend time on something that may already be solved!). The estimates are not great so I am trying to figure out why that is (any insight into this is welcome, perhaps what I am doing is not very feasible?) but in case anyone is interested the full code is below.

One question I have is:

In this case my observables are multinomial counts with roughly totalling to 10000. If I use the counts in simulator for distance comparison, the sampler gets stuck (posteriors are fixed numbers). So I set the epsilon to sum of initial_states which makes it a proportion comparison. However it finishes quite quickly. The estimates aren’t great which I wonder is something that happens with sample_smc? I also tried epsilon=1000 instead of sum of initial_state, or playing with thresholds, sometimes it takes more time but does not improve the estimates much. Should I instead try something with potentials?

As an example, with parameters:

f =  np.array([0.67786154, 1.89172801, 0.84634297, 0.63536372, 1.19762663])
m = 1e-4

I get (using epsilon=1000 and default for anything else, also don’t mind the log10_m since I actually use -2 - log10_m inside the model, could have named that better!)

                     mean     sd      hdi_3%  hdi_97%  ... 
f[0]               0.970  0.087   0.804    1.127 
f[1]               1.546  0.094   1.376    1.727 
f[2]               0.724  0.046   0.638    0.808 
f[3]               0.968  0.085   0.817    1.137
f[4]               0.966  0.082   0.817    1.123 
log10_m[0]  0.936  0.592   0.017    1.976  

It seems to get it roughly right but not quite. Note that if you check the model, there is an invariance with respect to f, that is f and af for some a>1 would give the same results (because of the division inside _calculate_P. I however don’t think it is the issue here because sd are low and for instance fitted f[0] is larger than real value but f[1] is lower, f[2] is lower, f[3] is larger, f[4] is lower etc. Fixing value of m, or using more observables does not change the result.

import pymc as pm
import numpy as np
import pytensor as pt
import arviz as az

def _WFM_sim(rng, f, m, initial_state, nsims, size=None):

  nvariants = initial_state.size
  N = initial_state.sum()

  # for some reason simultor class expands dimensions and turns constants
  # into arrays so need to deal with that
  initial_state = initial_state[0]
  nsims = nsims.flatten()[0]
  state_history = [initial_state]
  f = f.flatten()

  for i in range(nsims):
    P = _calculate_P(state_history[-1], f, m, nvariants)
    state_history.append(rng.multinomial(N, P))

  out = np.array(state_history)[1:,:]

  return out


def _calculate_P(state, f, m, nvariants):
  '''
  m is assumed to be of size 1 x nvariants
  f and state of size nvariants
  '''

  denom = state*f

  M = m.copy()
  M = np.tile(M, (nvariants,1))/(nvariants-1)
  mask = np.eye(nvariants, dtype=bool)
  M[mask] = 1 - m[0,:]

  return np.sum(M*denom,axis=1)/np.sum(denom)

def WFM_ABC(data):

  initial_state = data[0,:]
  obs = data[1:,:]
  nsims = obs.shape[0]

  with pm.Model():
    log10_m = -2-pm.HalfNormal("log10_m", 1, size=1)
    f = pm.LogNormal("f", 0, 0.1, size=initial_state.size)

    m = pt.tensor.tile(10**log10_m[:,None], (1,initial_state.size))

    pm.Simulator("obs", _WFM_sim, params=(f, m, initial_state, nsims),
                 epsilon=1000, observed=obs)

    idata = pm.sample_smc()

  return idata



seed = 0
f =  np.array([0.67786154, 1.89172801, 0.84634297, 0.63536372, 1.19762663])
m = 1e-4
rng = np.random.default_rng(0)
initial_state = np.array([[100, 100, 10000, 100, 100]])
nobs = 20

obs_data=\
_WFM_sim(rng, f, np.array([[m for _ in range(f.size)]]),
         initial_state, np.array([nobs]))

idata = WFM_ABC(obs_data)

print(az.summary(idata))

IIRC Simulator is not supposed to have pytensor code inside, but numpy? I could be misremembering

Thanks for the reply. Yea so in the third message I sent above, I did realize that and converted it to numpy in which now the sampler runs but estimates are not great. In summary what it does differently from the DiscreteMarkovChain is:

Because the state space is very large, using transfer matrices and pushing initial distribution of states is not feasible. So I generate a single trajectory from a single starting state (my initial starting state is fixed i.e initial distribution of states is a dirac ) and use that trajectory in the context of Simulator with an error.

The estimates I get are not great and I am wondering if that is because I am generating a single trajectory. If I generate multiple trajectories and take the average error over multiple trajectories maybe it will work better. Or maybe because the state space is large, this is not a feasible approach I don’t know. In any case I am trying to understand what is the best way to model markov chains with large state spaces without going to sth like diffusion approximation and solving SDEs instead.

The Simulator approach doesn’t have so many guarantees. Perhaps @aloctavodia has some pointers as to whether it should theoretically do well in your case or poorly like it seems to be doing.

I tried also doing it with Potential and calculated everything in pytensor. It does give roughly the same results (see code below). I suppose it is roughly similar to SMC though, atleast in terms of its cost function. And because of the multinomial.dist involvement, cant use NUTs sampler (it complains about gradient for that operator not being defined).

I think what I am trying to do is unreasonable because I am generating a single trajectory. Where as the transfer matrix approach in DiscreteMarkovChain generates the whole probability distribution at each step, I am just basically generating a single sample from those distributions at each step and comparing it to my observed. Perhaps I should generate at least multiple trajectories but if the state space is large I wonder trying to mimic the effect of transfer matrix with just some trajectories is reasonable.

import pymc as pm
import numpy as np
import pytensor as pt
import arviz as az


def _calculate_P_pyt(state, f, m, nvariants):

  denom = state*f

  M = pt.tensor.stack([m for _ in range(nvariants)])
  M = pt.tensor.tile(M, (nvariants,1))/(nvariants-1)
  mask = np.eye(nvariants, dtype=bool)
  M = M[mask].set(1-m)

  return pt.tensor.sum(M*denom, axis=1)/pt.tensor.sum(denom)

def _WFM_sim_pyt(rng, f, m, initial_state, nsims, size=None):

  nvariants = initial_state.size
  N = initial_state.sum()
  rng = pt.shared(rng)

  def transition(*args):

    state,  f, m = args

    p = _calculate_P_pyt(state, f, m, nvariants)
    next_rng, next_state = pm.Multinomial.dist(N, p, rng=rng).owner.outputs

    return next_state, {rng: next_rng}

  result, updates = pt.scan(transition,
                            outputs_info=initial_state,
                            non_sequences=[f, m],
                            n_steps=nsims)
  return result


def WFM_ABC(initial_state, obs, seed=None):


  obs_props = obs/obs.sum(axis=1)[:,None]

  nsims = obs.shape[0]

  with pm.Model():

    _log10_m = pm.HalfNormal("_log10_m", 2)
    log10_m = -2-_log10_m

    f = pm.LogNormal("f", 0, 1, size=initial_state.size)

    counts = _WFM_sim_pyt(rng, f, 10**log10_m, initial_state, nsims)

    fit_props = counts/counts.sum(axis=1)[:,None]

    pm.Potential("pot", -0.5*pm.math.sum((fit_props-obs_props)**2))

    idata = pm.sample()

  return idata

seed = 0
f =  np.array([0.67786154, 1.89172801, 0.84634297, 0.63536372, 1.19762663])
m = 1e-4
rng = np.random.default_rng(0)
initial_state = np.array([100, 100, 10000, 100, 100])
nobs = 50
rng = np.random.default_rng()

obs_data=\
_WFM_sim_pyt(rng, f, m,
             initial_state, nobs).eval()

idata = WFM_ABC(initial_state, obs_data)

print(az.summary(idata))

Is it unfeasible to marginalize the DiscreteMarkovChain? That’s what we would recommend (you never actually want to sample it). pm.marginalize works with chains with lags=1

Well if my transfer matrix is unfeasibly large then I can’t construct the transfer matrix and so can not construct DiscreteMarkovChain, unless I misunderstood your suggestion. Imagine for instance a case where state space is 10 dimensional positive integer finite lattice up to 10000. State space has 10000*10 elements transfer matrix is to the power 2 of that…

The transfer matrix is used to construct the probability density of states at each step but since I can not do that, the only tool I have is to have a routine that generates individual trajectories from the markov chain and use these trajectories somehow…

I have to check the code you use to simulate draws from that process, but before doing that is your system mostly sparse so that you could model it with a sparse transitions matrix?

No in fact it is the quite opposite of sparse. Jumps from one state to another are modelled with Multinomials so anything can go anywhere. Ofcourse you could try to cut it down beyond a threshold. But with such a large state space, would sparse matrices even help?

It is actually the Wright-Fisher model with fitness and mutations (a very commonly used model but generally with 2 variants). The description of the state space I used in the code is as follows: There are six variants each variant can have numbers up to some fixed N (say on the order of thousands) and the sum of variants is also equal to N. So state space is roughly N^6.

Transition is calculated here:

def transition(*args):

    state,  f, m = args

    p = _calculate_P_pyt(state, f, m, nvariants)
    next_rng, next_state = pm.Multinomial.dist(N, p, rng=rng).owner.outputs

    return next_state, {rng: next_rng}

The _calculate_P_pyt just calculates some probability but transitioning to any other state is a multinomial.

I’m probably be missing something but I would think if you can derive p at each time step you can also marginalize it. Not with pm.marginalize because that assumes a constant transition probability, but the algorithm doesn’t assume that?

1 Like

It is more likely that I am missing something, I am not experienced with marginalizing parameters out of models. Are you suggesting to integrate the variable p out of the model or something else? In any case I will have a look at this page and have a think about it thanks.

You would marginalize state out of the model.

is an exampe with the DiscreteMarkovChain (which you never want to sample, it exists solely so you can marginalize it next): pymc-extras/tests/model/marginal/test_distributions.py at 7d62c53139419df8f9d7ef89b69ec07befbbad4d · pymc-devs/pymc-extras · GitHub

In that case chain is marginalized using the forward algorithm which has complexity P.shape[-1] * steps or something not P.shape[-1] ** steps if you did a naive enumeration.

1 Like

Thanks I need to sit down and think on it for a while, I will come back once I get a better understanding.

Code for the logp is here, it’s highly unreadable I would say but touches on some things you may want to be aware: compute stuff on log-scale, try to use vectorization… pymc-extras/pymc_extras/model/marginal/distributions.py at 7d62c53139419df8f9d7ef89b69ec07befbbad4d · pymc-devs/pymc-extras · GitHub