# Abnormal posterior using a non-linear external model

Hi everyone, I built a fuel cell model and got some sensitive parameters for that model by calculating sensitivity coefficients. The model can be simplified as Current = f_model(potential).
Now I want to get the posterior PDFs of these sensitive parameters based on a set of experiment data (potential-current).

Firstly I defined a f_model(theta,data) function

def current(theta,data):    ***#theta is sensitive parameters, data is the experiment data of current***
potential = [0.01629,   0.12868,0.2325,  0.34037, 0.45276,  0.57327] #the experiment data of potential
...   ***#based on different potential I got different curr (simulation results)***

sigma = 0.5

logp = -len(data)*np.log(np.sqrt(2.0 * np.pi) * sigma)
logp += -np.sum((data[k]*1000 - curr*0.1)** 2.0) / (2.0 * sigma ** 2.0)
return logp     ***#then I defined a likelihood function. ***


the rest code are:

Feel sorry that it may be hard to rebuild my code because my model depend on many python libraries.
When I set a single pair of potential-current like potential = [0.01629] current data = np.array([0.00882])
result is：

when I set potential = [0.01629, 0.12868,0.2325, 0.34037, 0.45276, 0.57327] current data = np.array([0.00882, 0.09581, 0.19992,0.29675, 0.38142, 0.47337])
result is：

The relationship between potential and current is :

test is experiment data I used in code and simulation is unoptimized model simulation.

1. If I just sample one parameter based on one experiment point like potential = [0.01629] current data = np.array([0.00882]) the posterior PDFs seems good. So I think there maybe no code bug?
2. If I sample four parameters based on one experiment or all experiment points. The results are really abnormal.

Can you provide a MWE that we can run?

Hi, since my custom likelihood function has many other dependencies such as a chemistry python library called CANTERA. So it’s hard to provide a MWE.

Now I have two specific questions:
When I calculate likelihood function:

logp = -len(data)np.log(np.sqrt(2.0 * np.pi) * sigma)
logp += -np.sum((data[k]1000 - curr0.1)
* 2.0) / (2.0 * sigma ** 2.0)

Because of the unit of curr and data[k] , I get different orders of magnitude when I calculate logp.

if the unit of curr and data[k] is A(Ampere). logp is very small, the order of magnitude is 0.01 to 1. The posterior distribution is almost the same as a prior distribution

if the unit of curr and data[k] is mA(1000mA = 1A). logp is very large, the order of magnitude is up to 10e6. The mean of the posterior distribution changes, while the standard deviation remains the same compared to the prior distribution.

1.What order of magnitude for logp is reasonable?
2. Is there any possibility about why the standard deviation of the posterior distribution has not changed?

Shouldn’t logp values should be strictly negative? Is that just a typo?

A logp of -1e7 basically means the value is impossible. Are you adjusting your priors when you adjust the scale of the input data?

Yes, it is negative. It’s a typo.

I adjust the scale of output data (both simulation and experiment data). The input data and priors remain no change.

When I just consider the uncertainty of one parameter about the model, (other parameters are constant). The result seems good. The standard deviation of the posterior distribution is less than 0.1.

But when I consider the uncertainty of four parameters about the model together. The standard deviation of the posterior distribution is 0.49 nearly no change ( prior 0.5).

You need to re-scale the priors if you re-scale the output data. This is almost certainly why you get very low log probabilities. If you set a Normal(0, 1) prior on a parameter that needs to be 10,000, the regularization from the excessively tight prior will wash away the gain in the likelihood term and the model will fail to update.

If you get back your priors as posteriors, it means that the model is not identified. This could be a bug, or it could be something deeper. It’s impossible to say without seeing the model. I’d say fixing your priors (by doing prior predictive simulations) is a good first step.

I do not know much about the practical calculation inside pymc3 after the likelihood function and priors are defined. So I can’t understand why the gain will be wash away or model will not be updated. Could you please provide me with some links to relevant information?

If I define a prior Normal(0, 0.5) for a parameter “Ox”.
The original value of “Ox” is 85600,000. I will re-calculate the true value of “Ox” base on prior Normal(0, 0.5) to ensure the 95% confidence interval is roughly (85600,000/2, 85600,000*2) and then pass it to model, get model output, calculate likelihood function. Am I Wrong?

Thanks for your advice, I will have a look at prior predictive simulations.

I don’t fully understand your example with Ox. Typically I don’t think of parameters as being observed, so it’s odd to me to say that you have a true values for it. Also the 95% confidence interval for a Normal(0, 0.5) is [-0.82, 0.82], so the logp of 85600 under that prior is basically -inf (scipy says -1e10) .

Remember that the posterior probability of a model parameter is a combination of the likelihood of the data given you model, plus the prior likelihood of the parameters under consideration. Let’s say we want to just fit the mean of a normal distribution to data, given a prior for \mu (i’ll fix \sigma = 10 for the whole example):

data = array([1001.61756508,  990.70212629,  987.0993676 , 1008.14709909,
1001.97173387, 1003.18130839,  994.49818489,  989.82015916,
1000.32589588,  998.04014088])


We choose a prior for the mean whose scale is not suitable for for this data, \mu \sim N(0, 1). From Bayes law, we know that:

P(\mu | data) \propto P(data | \mu) P(\mu)

Or, in logs:

\log P(\mu | data) \propto \log P(data | \mu) + \log P(\mu)

So if we choose a “reasonable” \mu given our prior, let’s say \mu = 0, we will have:

\log P(data | \mu = 0) = -49788.61
\log P(\mu = 0) = -0.91
\log P(\mu | data) \propto \log P(data | \mu) + \log P(\mu) = -49788.61 - 0.91 = -49789.52

As you can see, the likelihood term, \log P(data | \mu), is very negative, because it’s basically impossible that a N(0, 10) distribution generated this sample of data. On the other hand, \log P(\mu = 0) is very large, because 0 is right at the mode of the prior distribution.

Now what happens if we choose the correct \mu=1000 for this sample? We end up with:

\log P(data | \mu = 1000) = -34.58
\log P(\mu = 1000) = -500000.91
\log P(\mu | data) \propto \log P(data | \mu) + \log P(\mu) = -34.58 - 500000.91 = -500035.49

The model actually prefers the \mu=0 solution to the correct solution! The likelihood term has improved a lot – the probably of seeing this data given \mu=1000 increased by a factor of 1000, but we judge the probability of seeing \mu = 1000 given our prior belief that \mu \sim N(0, 1) to be wildly impossible. This is the “drowning out” effect that I mentioned in my previous post. If we try to fit this model in PyMC, we will not learn the correct mean:

import pymc as pm
import arviz as az

with pm.Model() as m:
mu = pm.Normal('mu', 0, 1)
y_hat = pm.Normal('y_hat', mu=mu, sigma=10, observed = data)
idata = pm.sample()

az.plot_trace(idata)


You can see that the posterior trace has shifted towards the true value of \mu=1000, but the solution is extremely over regularized, because our prior on \mu was so mis-matched to the scale of the data.

Prior predictive simulations would reveal this problem by showing us that our model never generates data on the scale of 1000. If we knew that this was the scale of the data – say we were studying the weight of elephants or whatever – this would be our signal to revise our priors to include the relevant scientific information.