Model Calibration - Estimating parameter with underlying distribution

Hello everyone,

relatively new user here checking out the capabilities of PyMC and looking for further advice.
Currently I’m working with a simple model and trying to calibrate the parameters from data. This has worked fine as long as the parameters are deterministic at core (have no aleatoric uncertainty). However, I’m struggling with parameters that have a underlying distribution of interest.

Example - Deflection Model

To make things more clear, we define a basic model that has three inputs and calculates the maximum deflection of a beam with one fixed end and a force acting on the other:
f(\mathbf{x}, \theta)=\frac{x_1x_2^3}{3I\theta}
where x_1 is the force [kN], x_2 the length [m], \theta the Young’s Modulus [GPa] and I the second moment of area which is fixed at I=320mm^4.

To create the scenario, we assume that \mathbf{x} are observable inputs and we only have to infer \theta from data. Further, \theta is expected to have unknown unit-to-unit variation with \theta\sim\mathcal{N}(210, 7^2). This distribution is the quantity of interest and we want to estimate \mu_\theta and \sigma_\theta.

The model f(\mathbf{x}) is both the ground truth model and the computational model, so there is no model error provided that the calibration is conducted properly.

import numpy as np
import arviz as az
import pymc as pm

def f_deflection(force, length, e):
    return (force * (length * 1000) ** 3) / (30 * e * 1000 * 320.)

To calibrate the model we create N samples from the following process:
y_{exp}=f(x)+\epsilon\qquad \epsilon\sim\mathcal{N}(0, 0.1^2)

RNG = np.random.default_rng(42)
bounds = [[1, 1.5], [0.5, 2.0]]  # calibration domain

N= 5000  # number of experiments
exp_noise = 0.1  # std of experiment (known)
theta_gt_mu = 210  # mu of interest (unknown)
theta_gt_std = 7.0  # std of interest (unknown)
exp_theta = RNG.normal(theta_gt_mu , theta_gt_std , N)

x1 = RNG.uniform(1.0, 1.5, N)  # force [kN]
x2 = RNG.uniform(0.5, 2, N)  # length [m]
x_exp = np.stack([x1, x2], axis=1)

y_exp = f_deflection(x_exp[:, 0], x_exp[:, 1], exp_theta)
y_exp = y_exp + RNG.normal(size=N) * exp_noise 
# here it could be possible that y_exp gets negative, however this 
# shouldn't affect our example

The following pycm code is used to calibrate the model:

with pm.Model() as basic_model:

    # Priors for unknown distribution parameters
    mu_theta= pm.Normal("mu", mu=190.0, sigma=30)
    sigma_theta = pm.HalfNormal("sigma", sigma=4)

    theta = pm.Normal("theta", mu=mu_theta, sigma=sigma_theta)

    # Expected value of outcome
    mu_delta = f_deflection(x_exp[:, 0], x_exp[:, 1], theta)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=mu_delta, sigma=exp_noise, observed=y_exp)

with basic_model:
    data = pm.sample(10000)
az.plot_trace(data, combined=True)

I would expect that with larger N the epistemic uncertainty would decrease, so that we would eventually obtain \mathbb{E}[\mu_\theta\vert\mathcal{D}]\approx210 and \mathbb{E}[\sigma_\theta\vert\mathcal{D}]\approx7 with low variance, where \mathcal{D} is the data used for calibration.

Marginal Posteriors

What we can see is that only \mu is quite well estimated. Additionally, the plots indicate that we get a lot of divergences.

So far I tested different prior distributions with no success. I suspect it’s a fundamental problem that I’ve overlooked so far.

Any further advice is very welcome.

Thanks in advance
Sven

Hello Sven :slight_smile:

Always nice to get questions with a nice reproducible example!

I think the problem you are facing comes from the fact that you are trying to estimate mu_theta and sigma_theta, but you only have any information about a single draw from that distribution, so you hardly learn anything about the parameters of that distribution, and end up with a posterior distribution that is very non-Gaussian:

This is mathematically well defined, but it gives the sampler a very hard time, which is why you see the divergences. You could try to work around this by trying to find a parametrization in which the posterior is easier to sample (or increase target_accept to something like 0.99 to get at least something that approximates the true posterior a bit better), but I’m not sure how much you can gain by trying to find the posterior of a distribution for which you only ever observed a single draw? Are you sure that’s want you want to estimate?

I would expect that with larger N the epistemic uncertainty would decrease, so that we would eventually obtain E[μθ|D]≈210 and E[σθ|D]≈7 with low variance, where D is the data used for calibration.

I don’t think that’s what’s going to happen here. If you increase the amount of data, you get a better estimate for a single draw from the theta distribution, but you still only ever work with a single draw of that distribution, and you can never meaningfully learn the parameters of that distribution with that.

3 Likes

Or do you maybe mean something like this, where we use a different draw for theta for each observation?

    theta = pm.Normal("theta", mu=mu_theta, sigma=sigma_theta, shape=len(x_exp))

    # Expected value of outcome
    mu_delta = f_deflection(x_exp[:, 0], x_exp[:, 1], theta)
1 Like

Hello aseyboldt,

thank you for your detailed answer. I missed the shape parameter and this solves my problem!

In my example dataset every observation y_{exp} is created with a different realization from \theta.

The fixed model looks now like this:

with pm.Model() as basic_model:

    # Priors for unknown distribution parameters
    mu_theta= pm.Normal("mu", mu=190.0, sigma=30)
    sigma_theta = pm.HalfNormal("sigma", sigma=4)

    # create a batch with N independent draws
    theta = pm.Normal("theta", mu=mu_theta, sigma=sigma_theta, shape=len(x_exp))

    # Expected value of outcome
    mu_delta = f_deflection(x_exp[:, 0], x_exp[:, 1], theta)

    # Likelihood
    Y_obs = pm.Normal("Y_obs", mu=mu_delta, sigma=exp_noise, observed=y_exp)

Now with the new model, we see that \sigma_\theta is far better estimated than before and we get no more divergences (here I used only N=500 in contrast to N=5000).

Note: I tried to edit my original post to fix some minor errors but this was not possible. The units corresponding to the parameters of first formula should be: force [N], length [mm] and Young’s Modulus [MPa].

Best regards
Sven

2 Likes