Reconciliation via Poisson Sampling

Hello, everyone.

I was trying to run some models to perform a kind of reconciliation over multiple levels. The idea is to have a base level (a.k.a, lvl3) which is Poisson distributed with a given mean (lvl3). Later, this will be joined using the S matrix to generate the observed data of the following level. The idea came out of from this paper https://arxiv.org/pdf/2207.09322.pdf. The implementation follows:

lvl3 = np.array([0,1,1,0,1,1,2,1])
lvl2 = np.array([1,1,2,3])
lvl1 = np.array([4,6])
lvl0 = np.array([10])

S_2 = np.array([[1,1,0,0,0,0,0,0],[0,0,1,1,0,0,0,0],[0,0,0,0,1,1,0,0],[0,0,0,0,0,0,1,1]])
S_1 = np.array([[1,1,0,0],[0,0,1,1]])
S_0 = np.array([1,1])

with pm.Model() as model:
    l3 = pm.Poisson('l3', mu=lvl3, shape=lvl3.shape)
    l2 = pm.Poisson('l2', mu=lvl2, shape=lvl2.shape, observed=S_2@l3)
    l1 = pm.Poisson('l1', mu=lvl1, shape=lvl1.shape, observed=S_1@l2)
    Y = pm.Poisson('Y', mu=lvl0[0], observed=S_0@l1)
    trace = pm.sample(draws=5000, tune=3000)
>>> trace.posterior.l3.mean(['chain', 'draw']).values
# array([0.     , 1.0997 , 1.078  , 0.     , 0.9671 , 1.0333 , 2.16785, 1.0635 ])

Even thought I can sample (and the results are fairly reasonable) it takes ages when we scale the base level to one thousand items, for example. So, my first question would be: Is there a way to sample discrete models faster? I noticed that external samplers do not support discrete models.

A second approach that I tried was to use the following model:

lvl3 = np.array([0,1,1,0,1,1,2,1])
lvl2 = np.array([1,1,2,3])
lvl1 = np.array([4,6])
lvl0 = np.array([10])

S_2 = np.array([[1,1,0,0,0,0,0,0],[0,0,1,1,0,0,0,0],[0,0,0,0,1,1,0,0],[0,0,0,0,0,0,1,1]])
S_1 = np.array([[1,1,0,0],[0,0,1,1]])
S_0 = np.array([1,1])
                   
with pm.Model() as model:
    l3 = pm.Poisson("l3", mu=lvl3, shape=lvl3.shape)
    l2 = pm.Potential("l2", pm.logp(pm.Poisson.dist(mu=lvl2, shape=lvl2.shape), S_2@l3))
    l1 = pm.Potential("l1", pm.logp(pm.Poisson.dist(mu=lvl1, shape=lvl1.shape), S_1@l2))
    Y = pm.Potential("Y", pm.logp(pm.Poisson.dist(mu=lvl0[0]), S_0@l1))    
    
    posterior = pm.sample(draws=5000, tune=3000)

I tried to use an exponential distribution instead of a Poisson distribution on l3 (mainly to check the speed), it became almost immediate using numpyro parallel, however the results were odd. To check if there was some caveat with this implementation using pm.Potential, I tried running only with Poisson and I was unable to run. The error follows:

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'l3': array([0, 1, 1, 0, 1, 1, 2, 1])}

Logp initial evaluation results:
{'l3': -6.31, 'l2': -4.8, 'l1': -inf, 'Y': -inf}
You can call `model.debug()` for more details.

Then, my second question: Is there an alternative approach that I could use that speed up the sampling time but with some reasonable outcome?

Your first model shouldn’t manage to run because you have RVs in the observations which PyMC doesn’t allow. What version of PyMC are you using?

If it’s a recent one I suspect data @ rv is returning some weird constant numpy array (without ever invoking PyTensor) and the results you got were only accidentally good. You can print the result of that operation to confirm.

Calling pt.matmul(data, rv) should correctly cast data to a tensor and then PyMC will error out when you try to use it in observed

Hello, Ricardo.

>>> pm.__version__
# '5.10.2'

I changed as you proposed but it is still a up and running example.

>>> import pytensor.tensor as pt
>>> pt.matmul(S_2, l3)
# DropDims{axis=1}.0

And for the complete application

lvl3 = np.array([0,1,1,0,1,1,2,1])
lvl2 = np.array([1,1,2,3])
lvl1 = np.array([4,6])
lvl0 = np.array([10])

S_2 = np.array([[1,1,0,0,0,0,0,0],[0,0,1,1,0,0,0,0],[0,0,0,0,1,1,0,0],[0,0,0,0,0,0,1,1]])
S_1 = np.array([[1,1,0,0],[0,0,1,1]])
S_0 = np.array([1,1])

with pm.Model() as model:
    l3 = pm.Poisson('l3', mu=lvl3, shape=lvl3.shape)
    l2 = pm.Poisson('l2', mu=lvl2, shape=lvl2.shape, observed=pt.matmul(S_2, l3))
    l1 = pm.Poisson('l1', mu=lvl1, shape=lvl1.shape, observed=pt.matmul(S_1, l2))
    Y = pm.Poisson('Y', mu=lvl0[0], observed=pt.matmul(S_0, l1))
    trace = pm.sample(draws=5000, tune=3000)

trace.posterior.l3.mean(['chain', 'draw']).values
# array([0.     , 1.06295, 1.10495, 0.     , 1.0249 , 1.0223 , 2.1539 , 1.0237 ])

Seems like our check is dumb and in this case there is a cast from float to integer (since Poisson is discrete, but matmul returns a float), and PyMC thinks that’s fine. If you replace your Poisson by Normal you will see the error raised

In your case the logp seems to come out fine though.

I would still suggest you go with model 2 that uses Potential, since you are directly interfering with the logp component of the model. We may batch the observed in the future.

How should I fix the problem with the second implementation? Going with the second implementation I can’t run… :confused:

Speed-wise nothing obvious comes to mind, but the l3 with mu=0 is not well behaved, since the only value it can generate is 0. That means you can get rid of those?

Your problem with the Potential model is that you are using l2 and l1 coming from the Potential but those are just the logps, not the values like you had before. You want to use l2 = S2 @ l3 in what you called l1 Potential

with pm.Model() as model:
    l3 = pm.Poisson("l3", mu=lvl3, shape=lvl3.shape)
    l2 = pm.Deterministic("l2", S_2 @ l3)
    l1 = pm.Deterministic("l1", S_1 @ l2)
    l0 = pm.Deterministic("l0", S_0 @ l1)
    l2_pot = pm.Potential("l2_pot", pm.logp(pm.Poisson.dist(mu=lvl2, shape=lvl2.shape), l2))
    l1_pot = pm.Potential("l1_pot", pm.logp(pm.Poisson.dist(mu=lvl1, shape=lvl1.shape), l1))
    l0_pot = pm.Potential("l0_pot", pm.logp(pm.Poisson.dist(mu=lvl0[0]), l0))
    
model.point_logps()  # no -inf should show up here
1 Like

Works like a charm!! Thanks a lot Ricardo.

One last question :yum:. How can I speed up this sampling? I know that discrete models cannot be sampled using numpyro for example, but it takes ages with hundreds/thousand of data points. Would I need to change it to some continuous model to benefit from faster sampling methods?

I would suggest getting rid of the zeros. If you have many in your larger dataset that reduces the number of parameters considerably. You know their posterior is exactly zero anyway.

Otherwise, you may need to implement a better sampler for the Poisson variable. The Metropolis sampler doesn’t even know Poisson can’t have negative values so it will be proposing useless values many times. This is however something I would only suggest as a last resource.

Otherwise, discrete variables are just trickier to sample, I’m afraid you just have to throw compute at it. You can try to use the JAX backend for the PyMC sampler in case you have a GPU. That could be faster.

You can do that by wrapping your call to pm.sample like this:

with pytensor.config.change_flags(mode="JAX"):
  pm.sample()

However I am not sure if that will work with multiple chains in parallel, so you may need to revert to a single chain.