# Observed data with uncertainty described by an irregular distribution

Stated more simply, maybe all you want to do is define a and b in terms of c,d and your single data points and then simply draw values from a and b as such (note that I changed your function to make the inverses easy to compute)

``````import pymc as pm
import matplotlib.pyplot as plt

def fun1(a, b, c, d):
return 2*a + b + 2*c + d

def fun2(a, b, c, d):
return a + b + 2*c + 2*d

def fun_inv(out1, out2, c, d):

val1 = out1 - 2*c - d # 2*a + b
val2 = out2 - 2*c - 2*d #a + b

a = val1 - val2
b = val2 - a

return a, b

mean_c = 7
mean_d = 4

data = [ 326, 1011]

with pm.Model() as model:
c = pm.Normal("c_sd", mean_c, 10)
d = pm.Normal("d_sd", mean_d, 10)

a, b = fun_inv(data[0], data[1], c, d)

draws = pm.draw([a, b], draws=10000)

fig,ax = plt.subplots(1, 2)
ax[0].hist(draws[0])
ax[1].hist(draws[1])

``````

I constructed this example with weight just to show how inverse uncertainty quantification works. If you want an actual engineering application, I recommend a paper, " Inverse Uncertainty Quantification using the Modular Bayesian Approach based on Gaussian Process, Part 2: Application to TRACE". After you google it, you can find it on Arxiv, or you can directly go to [1801.09261] Inverse Uncertainty Quantification using the Modular Bayesian Approach based on Gaussian Process, Part 2: Application to TRACE and download the pdf. Of course, in real problems, there are multiple uncertain input parameters and multiple measurements

What do you mean by an irregular distribution?

There is one orthodox way to not update variables in a Bayesian setting which is to marginalize them.

An irregular distribution would be something like this:

The distribution is made from data_with_uncertainty_1:

``````# Generate c and d distributions
c = np.random.uniform(5, 9, 1000000)
d = np.random.uniform(2, 6, 1000000)
# Constants a and b
a = 16
b = 14
# Generate data with uncertainty
data_with_uncertainty_1 = 3*a*a + 2*b*b + c + d**2 + np.random.normal(0, 1, 1000000)
``````

âcâ and âdâ are model input parameters that are not designated for Bayesian updating, so their uncertainty would be propagated to âdataâ uncertainty. np.random.normal(0, 1, 1000000) describes uncertainty with which âdataâ was measured.
So, my problem is that I cannot put such a distribution into anything like âY_obs = pm.Normal(âY_obsâ, mu=mu, sigma=1, observed=data)â (in place of âsigmaâ to describe data[0] uncertainty) because there is no way of fitting this distribution or even transforming and fitting into any distribution parametrizable with ÎŒ and Ï.
The problem is even harder when I have three data points with distributions:

``````data_with_uncertainty_1 = 3*a*a + 2*b*b + c + d**2 + np.random.normal(0, 1, 1000000)
data_with_uncertainty_2 = a**(3/2) + b*b + c*2 + d + np.random.normal(0, 1, 1000000)
data_with_uncertainty_3 = a**(5/2) + 4*b**(3/2) + c*2 + 2*d + np.random.normal(0, 1, 1000000)
``````

because these are correlated but not in a way that would fit into MvNormal.

because there is no way of fitting this distribution or even transforming and fitting into any distribution parametrizable with ÎŒ and Ï.

Thatâs a strong claim. Many compound distributions like BetaBinomial also have a latent random component that is marginalized out. Other common distributions can also be conceptualized as implicit marginalization/convolutions. A studentT is a Mixture of Normals with a latent InverseGamma standard deviation: An Animation of the t Distribution as a Mixture of Normals | Rasmus BĂ„Ă„th's blog

In none of these cases is the unobserved random component âsubject to bayesian inferenceâ.

Your distribution doesnât seem so crazy that itâs certain it canât have a closed form solution like the BetaBinomial or StudentT do. PyMC wonât be able to derive it for you automatically or offer a pre-built one, but thatâs another matter.

If a closed form solution eludes you there are other alternatives like Simulator as you found out but also approaches like Artificial Network approximations trained on many synthetic datasets: Likelihood Approximations with Neural Networks in PyMC - PyMC Labs

It may be useful to find a precise mathematical meaning to what you are calling âirregular distributionsâ. You may be mixing the random representation of a model with the probability measure representation. PyMC usually looks like itâs about the former, until you ask it to do posterior inference in which case it needs the second.

3 Likes

In your real problem do you also have at most second powers of c and d? Because that is actually chi2 distribution so you can write the uncertainty 1,2,3 above as mixture of normals and chi2 distributions. I am still not %100 how your model works but in case all you want is to express the formulas above as pdfs that is easy.

1 Like

Sure, I can look closer if the mixture of chi2 and normals would be applicable here, but it was most important for me to establish that if parametrization is not possible, then NUTS and Metropolis and related methods wonât work (unless noise is relatively small)

1 Like