Issues with the shape parameter

Hi,

I am running into some issues when using the shape parameter in a model. The issues might be arising from my misunderstanding of how the parameter can be used. Hopefully someone can clear up some things for me.

I have a number of curves, N, that I would like to fit separately. Instead of generating a for loop to run N separate models, I am using the shape parameter to generate the RVs. The following is an example of what I doing but for a linear fit (for simplicity):

# generate data
N = 10
x = np.arange(0, 20)
y = x*5+2+np.random.rand(20)*5

x = np.tile(x, (N, 1))
y = np.tile(y, (N, 1))

with pm.Model() as model:
    a = pm.Normal('a', mu=0, sigma=1, shape=N)
    b = pm.Normal('b',  mu=0, sigma=1, shape=N)
    like = pm.Normal('like', mu = a*x.T+b, sigma=5, observed=y.T)

My understanding, perhaps incorrectly, is that the N different a's and b's would be independent of each other, and the samples drawn would also be independent for each set of a and b.

However, I am running into the issue that if my N curves contain one set of curve data that is bad, and leads to a badly behaved posterior (e.g. causes divergences), then the all the samples for the other curve seem to be effected (e.g. number of effective samples). At least this is what the arivz summary function seems to be reporting.

I would be grateful if someone could shed some light on this for me.

Youā€™re right that the elements of the a and b random variables in the above model are independent of each other.
Your model code looks fine too.
Is the ess just bad for some of the elements, maybe?

Generally Iā€™d recommend that you think about what exactly ā€œbad curve dataā€ means. Outliers in the observation? Dramatically different slope or intercept? Or a different shape?
More often than not this is something you can encode into the model, for example by choosing a StudentT likelihood instead of a Normal, or choosing prior distributions that allow for such ā€œoutliersā€ as part of the data-generating process.

1 Like

Thanks for your answer.

No, it returns lower numbers of effective samples for all values of a and b, and increases the r hat. When I sample without the ā€˜badā€™ data set, all is well (although in some cases the effective number of samples is greater than the number of samplesā€¦).

I know what the bad data is. I am trying to measure some particular feature that appears in 99% of the data sets, however in the other 1% there is only noise. These noise-only sets leads to divergences and sampling issues.

If the model sampled well for the 99% and only gave issues for the 1%, then I am fine with that and would ignore the poor results at the end, filtering by r-hat or ess.

I did eye-ball the traces and I thought they looked okay. I guess I should calculate the auto-correlation myself to see if there is a difference.

If that fails, I may just turn to using shared variables and sampling each series individually.

I think the problem is that while the models are independent and I think the results of fitting them together or separately should not change, sampling is not independent. When you fit the models separately the sampler works on a 2d space, whereas when you fit them together the sampler works on a space of 2n dimension.

This makes sampling harder and as you can see, divergences and sampling issues are then shared. If one of the models canā€™t be fitted and the trajectory diverges, what is rejected are not the two a,b of that model but all 2n a,b values in that sample. So as I said, I think results should be the same, but for that you need the sampler to converge, which is much harder in the 2nd case because you are converting N fits over small dimension space, only one of which has bad geometry to a single fit on a high dimension space with bad geometry.

1 Like

Thanks for that @OriolAbril. It had crossed my mind that it was due to the sampling being in 2N dimensional space. Thanks for confirming.

I had run some tests, and when it works, the 2N dimensional case gives reasonable results when all the data is ā€˜goodā€™.

I guess my two options are filter the bad data out before fitting, or switch to a shared variable approach.