I should have linked you this blog post that goes deeper into how this dummy stuff works. I also tried to explain it here, maybe that will help?
The dummy model needs to have all information on how to compute stuff up to the nodes you specify in var_names
. In my example, the var_names
was only beta_dist
, which needs to know beta, which (looking at the DAG) needs to know beta_bar
, sigma_bar
, and z_beta
. If you instead asked for a variable deeper into the dag, like count
, you’d have to specify everything upstream of that, which would include sigma
and mu
.
While the variables are just pytenor
symbols on a computation graph – that is, before you compile and run the model – you don’t have to reason about the batch dimension (chains and draws). You just have to think about the core dimensions. So on the graph, beta
is just a (20,)
array, so beta[:, None]
is a (20, 1)
column vector. Pytensor’s whole raison d’etre is to then handle the vectorization. Just focus on the shapes as you would understand them if you were writing out your model on a piece of paper. Or better yet, use dims
absolutely everywhere, and reason about the named dimensions instead of the shapes.
Actually there’s nothing special about sample_posterior_predictive
. If all you wanted were pairwise differences between the betas, you can just do it with the (4, 1000, 20)
posterior with something like idata.posterior.beta.values[..., None] - idata.posterior.beta.values[None, ...]
. Here’s an example where I first tried to do this, then realized it was dumb and overly complicated and did it elegantly with sample_posterior_predictive
in a dummy model.
Exactly the same way you put it into your model. It should be a 1-to-1 map. Copy+paste everything you need up until you compute beta
in your primary model, then use that beta
to compute some statistic of interest (beta_dist
in my example) , then pm.sample_posterior_predictive(idata, var_names=[statistic_of_interest])
. If you get stuck I’m happy to look at specific code.
As for size, you’re just doing pairwise subtraction so it should scale pretty well. If it gets really bad you could try to think about how to compute only the unique pairs with some clever indexing, since the distance matrix is symmetrical.