Observed data with uncertainty described by an irregular distribution

Stated more simply, maybe all you want to do is define a and b in terms of c,d and your single data points and then simply draw values from a and b as such (note that I changed your function to make the inverses easy to compute)

import pymc as pm
import matplotlib.pyplot as plt


def fun1(a, b, c, d):
  return 2*a + b + 2*c + d

def fun2(a, b, c, d):
  return a + b + 2*c + 2*d

def fun_inv(out1, out2, c, d):

  val1 = out1 - 2*c - d # 2*a + b
  val2 = out2 - 2*c - 2*d #a + b

  a = val1 - val2
  b = val2 - a

  return a, b


mean_c = 7
mean_d = 4

data = [ 326, 1011]

with pm.Model() as model:
    c = pm.Normal("c_sd", mean_c, 10)
    d = pm.Normal("d_sd", mean_d, 10)


    a, b = fun_inv(data[0], data[1], c, d)

    draws = pm.draw([a, b], draws=10000)

fig,ax = plt.subplots(1, 2)
ax[0].hist(draws[0])
ax[1].hist(draws[1])

I constructed this example with weight just to show how inverse uncertainty quantification works. If you want an actual engineering application, I recommend a paper, " Inverse Uncertainty Quantification using the Modular Bayesian Approach based on Gaussian Process, Part 2: Application to TRACE". After you google it, you can find it on Arxiv, or you can directly go to [1801.09261] Inverse Uncertainty Quantification using the Modular Bayesian Approach based on Gaussian Process, Part 2: Application to TRACE and download the pdf. Of course, in real problems, there are multiple uncertain input parameters and multiple measurements

What do you mean by an irregular distribution?

There is one orthodox way to not update variables in a Bayesian setting which is to marginalize them.

An irregular distribution would be something like this:
image
The distribution is made from data_with_uncertainty_1:

# Generate c and d distributions
c = np.random.uniform(5, 9, 1000000)
d = np.random.uniform(2, 6, 1000000)
# Constants a and b
a = 16
b = 14
# Generate data with uncertainty
data_with_uncertainty_1 = 3*a*a + 2*b*b + c + d**2 + np.random.normal(0, 1, 1000000)

“c” and “d” are model input parameters that are not designated for Bayesian updating, so their uncertainty would be propagated to “data” uncertainty. np.random.normal(0, 1, 1000000) describes uncertainty with which “data” was measured.
So, my problem is that I cannot put such a distribution into anything like “Y_obs = pm.Normal(“Y_obs”, mu=mu, sigma=1, observed=data)” (in place of “sigma” to describe data[0] uncertainty) because there is no way of fitting this distribution or even transforming and fitting into any distribution parametrizable with μ and σ.
The problem is even harder when I have three data points with distributions:

data_with_uncertainty_1 = 3*a*a + 2*b*b + c + d**2 + np.random.normal(0, 1, 1000000)
data_with_uncertainty_2 = a**(3/2) + b*b + c*2 + d + np.random.normal(0, 1, 1000000)
data_with_uncertainty_3 = a**(5/2) + 4*b**(3/2) + c*2 + 2*d + np.random.normal(0, 1, 1000000)

because these are correlated but not in a way that would fit into MvNormal.

because there is no way of fitting this distribution or even transforming and fitting into any distribution parametrizable with μ and σ.

That’s a strong claim. Many compound distributions like BetaBinomial also have a latent random component that is marginalized out. Other common distributions can also be conceptualized as implicit marginalization/convolutions. A studentT is a Mixture of Normals with a latent InverseGamma standard deviation: An Animation of the t Distribution as a Mixture of Normals | Rasmus Bååth's blog

In none of these cases is the unobserved random component “subject to bayesian inference”.

Your distribution doesn’t seem so crazy that it’s certain it can’t have a closed form solution like the BetaBinomial or StudentT do. PyMC won’t be able to derive it for you automatically or offer a pre-built one, but that’s another matter.

If a closed form solution eludes you there are other alternatives like Simulator as you found out but also approaches like Artificial Network approximations trained on many synthetic datasets: Likelihood Approximations with Neural Networks in PyMC - PyMC Labs

It may be useful to find a precise mathematical meaning to what you are calling “irregular distributions”. You may be mixing the random representation of a model with the probability measure representation. PyMC usually looks like it’s about the former, until you ask it to do posterior inference in which case it needs the second.

3 Likes

In your real problem do you also have at most second powers of c and d? Because that is actually chi2 distribution so you can write the uncertainty 1,2,3 above as mixture of normals and chi2 distributions. I am still not %100 how your model works but in case all you want is to express the formulas above as pdfs that is easy.

1 Like

Sure, I can look closer if the mixture of chi2 and normals would be applicable here, but it was most important for me to establish that if parametrization is not possible, then NUTS and Metropolis and related methods won’t work (unless noise is relatively small)

1 Like