NUTS sampler converges to wrong value

Hrima89 · September 3, 2020, 2:46pm

Hi,

I have a model where I am estimating 84 parameters.
The problem I am having is that the NUTS sampler seems to converge (I get 0 divergences), but it is not to the correct value.

Here is the code I use for sampling:

step = mc.NUTS()
trace = mc.sample(2000, step=step, init='advi_map', tune=1000, cores=2, 
                              chains=3, nuts_kwargs=None,
                              discard_tuned_samples=True,
                              progressbar=True)

I first used the initial jitter+adapt_diag but that gave worse results so I tried advi_map which gave better results, but as you will see not the wanted result.

I will show you the diagnostic of the trace for one of the parameters I am estimating:

The true value for this parameter is 163, and as you can see below the posterior distribution for the variable has long tails but the most probable solution is around 63.

The energy plot for all samples for all parameters can be seen below:
The marginal energy and energy transition do look similar, to me. That said the energy transition is higher.

Here you can see the MCMC mean of the logged sample of one of the parameters, with the logged true value in grey (5.095). As you see the NUTS sampler seem to converge, at least by looking at this plot, but not to the correct value…

Lastly I want to show you the summary for some of the parameters that I am trying to estimate:

How do I proceed? I know something is wrong but I am unsure how I can fix this issue…

ahartikainen · September 3, 2020, 2:50pm

Hi, are you sure that your model is well defined (not multimodal) and models what it should?

Can you share your model?

Hrima89 · September 3, 2020, 3:07pm

Good question! I am actually suspicious of it being multimodal…

Here is the model:

raw_py is my data:

prior_l = np.zeros(len(raw_py))
prior_u = raw_py/(eff*pFE)

trace_out = []
resmat = resp
truthdim = len(resmat)
model = mc.Model()    
    
with model:

truthprior = []

priormethod = getattr(mc,'Uniform')

for bin,(l,u) in enumerate(zip(prior_l,prior_u)):
    name = 'truth%d'%bin
    default_args = dict(name=name,lower=l,upper=u)
    args = dict(list(default_args.items()))

    prior = priormethod(**args)

    truthprior.append(prior)

truth = mc.math.stack(truthprior)
            
tresmat = np.array(resmat)
    
reco = theano.dot(truth, tresmat)
out = reco
  


unfolded = mc.Poisson('unfolded', mu=out,
                              observed=np.array(raw_py))  



step = mc.NUTS() 
trace = mc.sample(2000, step=step, init='advi_map', tune=1000, cores=2, 
                          chains=3, nuts_kwargs=None,
                          discard_tuned_samples=True,
                          progressbar=True)

If you need more information just ask!

nkaimcaudle · September 4, 2020, 4:05am

Hi Hrima

Using pm.Uniform is generally discouraged, is it possible to use pm.Normal instead?

Something like this should sample better

prior_l = np.zeros(len(raw_py))
prior_u = raw_py/(eff*pFE)

prior_mean = 0.5 * (prior_l + prior_u)
prior_sigma = (prior_u - prior_l) / 5.

trace_out = []
resmat = resp
truthdim = len(resmat)
model = mc.Model()    
    
with model:
    truth_raw = mc.Normal('truth_raw', 0, 1, shape=len(prior_l))
    truth = mc.Deterministic('truth', prior_mean + truth_raw * prior_sigma)

    tresmat = np.array(resmat)

    reco = theano.dot(truth, tresmat)
    out = reco

    unfolded = mc.Poisson('unfolded', mu=out, observed=np.array(raw_py))  

    trace = mc.sample(draws=2000, tune=1000, cores=2, chains=3 )

The prior of truth will be centered on the mid point of prior_l and prior_u and approx 0.5% will be below prior_l and another 0.5% above prior_u

Hrima89 · September 4, 2020, 11:58am

Hi,

Thank you for your inputs!
Can I ask: Why is pm.Uniform discouraged?

The data I am working with are count data from a nuclear experiment. The code I have presented here is a correction code for the detector response (called resmat in the code). The signal hitting the detector gets transformed by the detector response and then the output signal is different than the original signal. So this code is correcting for this. The response resmat is a matrix and the problem is: reco = truth@resmat. The response matrix is not invertible so there is many possible solutions and we are only interested in the positive solutions, since negative counts are ‘unphysical’ when looking at count data. That is why I use a uniform distribution, to set a lower bound to zero for all parameters. A normal distribution allows for the negative solutions, and for some of the parameters the negative solutions are the most probable solution.
But I am very open to suggestions if there is any other priors that can be used in this case.

AlexAndorra · September 4, 2020, 2:02pm

Hi @Hrima89,
To be a bit more specific, I’d say it’s less PyMC’s implementation of the uniform (i.e pm.Uniform) than the uniform priors in general that are discouraged.
In short, they induce a nasty geometry to sample from, and we actually almost always have some information on our analysis. In addition, a flat prior doesn’t mean it’s non-informative.

In your case, if you’re trying to enforce a positivity constraint, you can look into Gamma distributions, or Exponential and HalfNormal (usually used for scale parameters). Here are recommendations for good prior choices.

Hope this helps

Hrima89 · September 9, 2020, 1:15pm

Thank you for all the suggestions!
I made modifications to my model using a Gamma prior where I expect the counts to be close to or zero, and normal distribution where I expect peaks (many counts). This works much better than my previous model!

aseyboldt · September 9, 2020, 3:10pm

You could probably also use the shape argument to make things much easier and faster. Something like this:

mu = np.array([
    [10, 20],
    [100, 2],
    [20, 20],
])

sd = np.array([
    [0.1, 0.2],
    [20, 0.1],
    [10, 10],
])

# We observed vector producs with those vectors
vectors = np.random.randn(2, 10) ** 2

observations = np.random.randint(1, 100, size=(3, 10))

with pm.Model() as model:
    matrix = pm.Gamma('matrix', mu=mu, sd=sd, shape=mu.shape)
    eta = tt.dot(matrix, vectors)
    pm.Poisson('y', mu=eta, observed=observations)

About the uniform: It’s not that anything is wrong with a uniform prior per se, but only that it is a prior that beginners often use in places where it isn’t a good choice. It can be tempting to think something like “I guess it is larger than x and probably smaller than y, and I don’t really know what’s going on, so I’ll just use a uniform(x, y)”. In cases like that it’s pretty much always better to use a distribution where the support is the entire set that it could in theory be, but that has a small prob far away from (x,y).
If the support of the parameter really should be (x, y), and values outside are impossible, and if there really is no reason to think that some values are more plausible than others, a uniform is fine.

Topic		Replies	Views
Unexpected sampling behaviour Questions	3	556	February 14, 2020
Hierarchical model - NUTS - warnings after sampling - reparametrization? Questions	1	586	July 27, 2017
Trying to sample the Vanderplas model from the docs Questions	4	675	March 23, 2018
Mixture Model Metropolis vs. NUTS Questions	1	605	April 1, 2020
Trouble with Initilization and Posterior Sampling Questions theano	13	2217	June 15, 2018

NUTS sampler converges to wrong value

Related topics