 # Posterior always at lower limit of uniform prior

Hello PyMC3 community,

First, thanks for all of the support. I’ve learned a lot about bayesian modeling from reading through these posts and pymc3 docs and examples.

I’ll try to describe my problem as briefly as possible.

I’m modeling a set of differential equations that time-dependent forcing conditions and several (~5-10) time-invariant model parameters. I want to infer the likelihood of the parameter values by comparing the model solution to real-world data. I assume that the errors between the model solutions and the data are gaussian and independent for each timestep.

The forward model is written in fortran imported as a module with f2py. Therefore I’m using the pm.Potential class (following examples from the “black box” model approach) with a least squares likelihood function.

I have 7 model parameters in addition to the error variance term that I attempt to infer. I’ve experimented using both uninformative, uniform prior distributions and normal distributions that reflect reasonable ranges of the parameter values. I’m using the metropolis sampler with 14 chains.

I’m finding that one of the parameters, “pm” seems to get “stuck” towards the lower range of whatever uniform prior that I assign. If I manually adjust the parameter to a higher value I find that the model will have a better rmse.

Here is the sample traceplot:

And here are two plots to illustrate further.

1. The first shows the model solutions (from the traceplot shown above) for the posterior trace compared with the observed data (red). The “pm” parameter has a prior uniform range between .4 and 1.6. 2. Same as the first, but from another range where the prior on pm is set between .8 and 1.2: You can see that #2 does a much better job matching the data. The fact that the posterior is so close to the lower limit of the prior seems odd to me. It seems like sampler should be able to “find” this region of better fit.

Does anyone have an idea why I might see this behavior? Are there any model adjustments that I would try making? Any pointers of any kind are appreciated!

It’s worth mentioning that the the ‘number of effective’ samples is also low (<10-20%) for some of these experiments.

And this is the code and hyperparameters (in yml format) that I’m using. I can post a reproducible example but you would need to compile a fortran module…

``````def sls_like(func, parameters, data):
# run the forward model with the new parameters
model = func(parameters[:-1])['qchan']
sigma = parameters[-1]
warm_up = 0
model = model[warm_up:]
data = data[warm_up:]
return -np.sum(0.5 * np.log(2 * np.pi * sigma ** 2) + ((data - model) / sigma) ** 2)

class likelihood_evaluate(tt.Op):
itypes = [tt.dvector] # expects a vector of parameter values when called
otypes = [tt.dscalar] # outputs a single scalar value (the log likelihood)

def __init__(self, func, loglike, obs_data):
# add inputs as class attributes
self.func = func
self.likelihood = loglike
self.obs_data = obs_data

def perform(self, node, inputs, outputs):
# the method that is used when calling the Op
parameters, = inputs  # this will contain my variables

# call the log-likelihood function
logl = self.likelihood(self.func, parameters, self.obs_data)
outputs = np.array(logl)

def main(func,
dict_of_parameters,
obs_q,
tracename="trace.pkl",
priorname="prior.pkl"):

# create the logliklihood function
logl = likelihood_evaluate(func, sls_like, obs_q)
#logl = likelihood_evaluate(func, wls_like, obs_q)

with pm.Model() as model:

# This list collects the model paramters that are passed in by "dict of parameters"
# this is just a convenience
list_of_parameters = []
for key in dict_of_parameters.keys():
p0 = dict_of_parameters.get(key)

if p0 == "normal":
p1 = pm.Normal(key, mu=p0, sigma=p0)

if p0 == "uniform":
p1 = pm.Uniform(key, lower=p0, upper=p0)

# if p0 == "HalfNormal":
#     p1 = pm.Uniform(key, lower=p0, upper=p0)

# append it to the list ..
list_of_parameters.append(p1)

# Append the "sigma" parameter to the end of the list. this is the sigma
# in the error model
list_of_parameters.append(pm.Uniform("sigma", lower=.1, upper=1.0))

parameters = tt.as_tensor_variable(list_of_parameters)
pmpot = pm.Potential('loglike', logl(parameters))
prior = pm.sample_prior_predictive(samples=7000)

with model:
step1 = pm.Metropolis(scaling=.5)
trace = pm.sample(step = step1, chains=14, tune=500, discard_tuned_samples=True)
posterior_predictive = pm.sample_posterior_predictive(trace,  var_names=list(dict_of_parameters.keys()))

with open(tracename, 'wb') as buff:
pickle.dump(trace, buff)

with open('posterior_predictive.pkl', 'wb') as buff:
pickle.dump(posterior_predictive, buff)

with open(priorname, 'wb') as buff:
pickle.dump(prior, buff)

pm.traceplot(trace)
plt.show()
print(pm.summary(trace))
``````

c:

• normal
• 1.0
• .1
ks:
• uniform
• 1
• 20.0
ku:
• uniform
• 1.0
• 20.0
lowercasen:
• normal
• 1.0
• .1
sm1max:
• normal
• 700
• 25.0
sm2max:
• normal
• 300
• 25.0
pm:
• uniform
• .4
• 1.6

This is the traceplot for the 2nd plot, for what it’s worth

I’ve answered part of my question. Since I was discarding the tuned_samples, none of the higher “pm” parameter values were showing up in the trace – but I think they were there and just discarded during the tuning period.

But, I still don’t see why the “pm” value is immediately dropping to the floor since this seems to be a less skillfull region (I’m trying to find the right plot to show this). This plot shows the abrupt decrease from the starting value of 1 to the bottom of the lower prior limit for one of the chains (x axis shows the first 100 samples from the chain; y axis is the parameter value for “pm”). I don’t have any guesses as to why that particular parameter is heading toward the left edge of your prior. However, I might suggest widening the prior on that parameter (e.g., impossibly wide to figure out exactly where the sample wants to settle) and then use some of the approaches outlined here to get a better handle on why (that page seems to focus exclusively on divergences, but the techniques are more general than that). I tend to find pair plots and parallel plots are particularly helpful in hunting down weird sampling behavior because they allow you to view the sampling process in multiple dimensions.

2 Likes

@cluhmann Thanks for the reply! Widening the parameter space made the posterior estimate go even lower.

I took a few steps backwards and tried applying the find_MAP() function. It gave me totally wonky results that do not fit the data. At the same time I can apply the Powell/Nelder-mead using scipy and find something reasonable.

So I think there must be something wrong with how I’ve configured the model on the PyMC3 side. I still don’t fully understand the pm.Potential class so maybe it’s something I’ve done there.