I have a model where I am estimating 84 parameters.
The problem I am having is that the NUTS sampler seems to converge (I get 0 divergences), but it is not to the correct value.
I first used the initial jitter+adapt_diag but that gave worse results so I tried advi_map which gave better results, but as you will see not the wanted result.
I will show you the diagnostic of the trace for one of the parameters I am estimating:
The true value for this parameter is 163, and as you can see below the posterior distribution for the variable has long tails but the most probable solution is around 63.
The energy plot for all samples for all parameters can be seen below:
The marginal energy and energy transition do look similar, to me. That said the energy transition is higher.
Here you can see the MCMC mean of the logged sample of one of the parameters, with the logged true value in grey (5.095). As you see the NUTS sampler seem to converge, at least by looking at this plot, but not to the correct valueâŚ
Thank you for your inputs!
Can I ask: Why is pm.Uniform discouraged?
The data I am working with are count data from a nuclear experiment. The code I have presented here is a correction code for the detector response (called resmat in the code). The signal hitting the detector gets transformed by the detector response and then the output signal is different than the original signal. So this code is correcting for this. The response resmat is a matrix and the problem is: reco = truth@resmat. The response matrix is not invertible so there is many possible solutions and we are only interested in the positive solutions, since negative counts are âunphysicalâ when looking at count data. That is why I use a uniform distribution, to set a lower bound to zero for all parameters. A normal distribution allows for the negative solutions, and for some of the parameters the negative solutions are the most probable solution.
But I am very open to suggestions if there is any other priors that can be used in this case.
Hi @Hrima89,
To be a bit more specific, Iâd say itâs less PyMCâs implementation of the uniform (i.e pm.Uniform) than the uniform priors in general that are discouraged.
In short, they induce a nasty geometry to sample from, and we actually almost always have some information on our analysis. In addition, a flat prior doesnât mean itâs non-informative.
Thank you for all the suggestions!
I made modifications to my model using a Gamma prior where I expect the counts to be close to or zero, and normal distribution where I expect peaks (many counts). This works much better than my previous model!
You could probably also use the shape argument to make things much easier and faster. Something like this:
mu = np.array([
[10, 20],
[100, 2],
[20, 20],
])
sd = np.array([
[0.1, 0.2],
[20, 0.1],
[10, 10],
])
# We observed vector producs with those vectors
vectors = np.random.randn(2, 10) ** 2
observations = np.random.randint(1, 100, size=(3, 10))
with pm.Model() as model:
matrix = pm.Gamma('matrix', mu=mu, sd=sd, shape=mu.shape)
eta = tt.dot(matrix, vectors)
pm.Poisson('y', mu=eta, observed=observations)
About the uniform: Itâs not that anything is wrong with a uniform prior per se, but only that it is a prior that beginners often use in places where it isnât a good choice. It can be tempting to think something like âI guess it is larger than x and probably smaller than y, and I donât really know whatâs going on, so Iâll just use a uniform(x, y)â. In cases like that itâs pretty much always better to use a distribution where the support is the entire set that it could in theory be, but that has a small prob far away from (x,y).
If the support of the parameter really should be (x, y), and values outside are impossible, and if there really is no reason to think that some values are more plausible than others, a uniform is fine.