2N+2 parameter inference: sampling all parameters as one?

Hello, I am using PyMC v5.11.0. I am interested in performing an inference on the variables a, b, and an ensemble of {c} and {d} variables. That is, 2N + 2 variables in total. I have a uniform prior on all variables, except the {c} ones.

dimension_names = ['Dim ’ + str(i) for i in range(N)]

with pm.Model(coords={“vec1”: dimension_names,
“vec2”: dimension_names}) as model:

a = pm.Uniform('a', lower=0.1, upper=1)
b = pm.Uniform('b', lower=-2, upper=1)

c = pm.LogNormal('c', mu=-3.7, sigma=0.6, dims="vec1")
d = pm.Uniform('d', lower=-0.005, upper=0.005, dims="vec2")

mean1 = f(a, b, c, d)
mean2 = g(c, d)
mean3 = h(a, b, d)

obs1 = pm.MvNormal('obs1', mu=mean1, cov=cov1, observed=d1) 
obs2 = pm.MvNormal('obs2', mu=mean2, cov=cov2, observed=d2) 
obs3 = pm.MvNormal('obs3', mu=mean3, cov=cov3, observed=d3) 

trace = pm.fit(300)
trace_samples = trace.sample(3000)  

f, g, h represent some functions.

However, the {c} and {d} posteriors I recover from trace_samples appear to be different samples from the same distribution for all N variables. My expectation is that each of {c}, {d} is centered at different values. At the same time, the posteriors on a, b, are Gaussians centered on the average of the uniform prior interval. I attach a screenshot of the results. I am using fit instead of sample to speed up the inference / fit according to this.

I think that the former problem points to an implementation issue, so I was wondering what I may be doing wrong. I implemented the above according to this.


Can you provide the functions f,g,h or even better a full sample code with simulated data

Not really, because this part of a larger pipeline that I don’t have the right to share. I’ve reduced it to the bare minimum. I presume you suspect it’s not an implementation issue, but rather a modelling one? What is your hypothesis?

Well in order to understand why all the components of c and d seem to same posterior distribution, I would run the model and try to understand what is going on. c and d seems to be initialized correctly as far as I can see. Maybe the functions are symmetric with respect to different components of c which could result in this?

Some things I can suggest:

  • Try to add stuff to these function that creates more asymmetry between components of c and d (like maybe just an expression that depends on c[0] etc) and see if that changes anything.

  • Use az.plot_posterior on maybe a subset of c to see really that their posteriors are identical (you seem to be using some custom plotting function which can often be the source of error). You can do it via

az.plot_posterior(trace.posterior[["c"]].sel({"group": [0,1]}))
  • You can also sample pairwise differences between components of c and d to see if on average they are different or not via stuff like:
pm.Deterministic("difc", c[:,None] - c[:,None].transpose())
  • Obviously also do az.plot_pcc (after pm.sample_posterior_predictive) to see if your model is fitting reasonably well.

  • Can you generate simulated data with c and d having different components? So you can check whether or not you recover the true parameters.

1 Like

Thank you very much for the prompt reply! You were right that the custom plotting function was the problem. Now that I replaced it with plot_posterior, I can see that the posteriors are indeed different.

However, that is only when I use cores=4. When I change to cores=1 with

pm.sample(1, tune=1000, target_accept=0.95, random_seed=42, cores=1)
I get identical-looking posteriors for all a,b,{c} and {d}, and two sampling chains instead of one.

This is by far suboptimal, but it’s the simplest case on which I can run pm.sample in order to then run plot_posterior in a reasonable amount of time for now. To my understanding, that produces 1 sample per chain, 2N+2 chains, 1000 tune samples per chain, and all of those chains run sequentially and not in parallel. At the same time, when I increase n_samples to 500, keeping cores=1, then the posteriors are different again.

So I don’t really see how cores may be affecting the shape of the posterior. Is there something wrong with my understanding of those variables?

If chains is not supplied then it is either 2 or equal to number of cores which ever is higher. When you set cores=1 and chain=None then that is just 2 chains, drawing a single sample for every parameter (after the tune step which discards 1000). I would not expect sensical result from a chain where draws=1. The default is 1000.

https://www.pymc.io/projects/docs/en/v4.4.0/api/generated/pymc.sample.html

If you are having speed problems, that could be due to many things, bad choice of priors, complicated model which is hard to sample, lots of parameters or lots of data points. You can always try to speed it up by trying numpyro

But if it is better to play around with your priors etc first to see if you can speed it up. For instance, I would suggest using chol for MvNormals instead of cov:

https://www.pymc.io/projects/docs/en/stable/api/distributions/generated/pymc.MvNormal.html

You can also try approximation techniques such as ADVI if your data size is the problem:

1 Like