Uncertainty Propagation according to GUM

jakevinolo · May 12, 2024, 9:46am

Hello everyone,
I may be asking a question that has already been answered somewhere, but I cannot clearly see in the previous posts, recommended books and videos that my issue is addressed. If that is the case, I would appreciate a pointer to the specific post/resource. But maybe PyMC cannot be used in this way?
Otherwise, here is my problem:

I have a metrological model of evaluation in the following form (it’s not the PyMC-model):
Y = X1 - X2
There are two input quantities, namely X1 and X2, and the primary measurement data on each input quantity is distributed according to a Poisson pmf with means x1_measure/x2_measure.

According to the Guide to the Expression of Uncertainty - Supplement 1, one could propagate the Poisson pmfs through the model of evaluation by drawing random samples from each Poisson pmf and inserting them into the model. This results in a probability density for the output variable values y. (This is, for example, what mcerp does.)

But I want to use the Posterior densities of the values of X1 and X2 in the model of evaluation, assuming a certain type of Prior density for the values of each quantity.
For Example:
X1: Prior = rectangular density, Likelihood = Poisson pmf, mean = x1_measure
X2: Prior = rectangular density, Likelihood = Poisson pmf, mean = x2_measure
→ Applying Bayes Theorem:
X1: Posterior = const * Likelihood * Prior and so forth for X2.

Then I want to sample from the Posterior density of X1 and X2. The sampled values are used in the evaluation model to estimate the posterior density of the values y.

(I had hoped to directly access the posterior of Y from PyMC.)

Afterwards, I need the posterior predictive density of Y, because I want to calculate quantiles from it.

I experimented with different examples from the PyMC website and recommended books, but was unsuccessful. Here is an example that illustrates my approach to this problem; however, I suspect that it is not implemented correctly in PyMC. Can somone help me?
Thanks a lot and best regards.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc as pm

# Initialize random number generator
RANDOM_SEED = 82345
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")

# True parameter values
x1_true = 2
x2_true = 1

# Size of dataset
trials = 1000

# Predictor variable
x1_data = np.random.poisson(x1_true, trials)
x2_data = np.random.poisson(x2_true, trials)

# Simulate output variable
y_data = x1_data-x2_data

# Show correlation plot
fig, axes = plt.subplots(1, 2, sharex=True, figsize=(10, 4))
axes[0].scatter(x1_data, y_data, alpha=0.6)
axes[1].scatter(x2_data, y_data, alpha=0.6)
axes[0].set_ylabel("y")
axes[0].set_xlabel("x1")
axes[1].set_xlabel("x2");

print(f"Running on PyMC v{pm.__version__}")

# Define pymc models
input_quantity1 = pm.Model()
input_quantity2 = pm.Model()
output_quantity = pm.Model()

with input_quantity1:
    # Priors for unknown model parameters    
    x1_prior = pm.Uniform('x1_prior', lower=0, upper=100)
    
    # Expected value of outcome    
    x1 = x1_prior
    
    # Likelihood (sampling distribution) of observations
    x1_obs = pm.Poisson("x1_obs", mu=x1, observed=x1_data)
    
    # Posterior sampling    
    x1_sample = pm.sample(1000, chains=4, cores=4)
    
with input_quantity2:
    # Priors for unknown model parameters    
    x2_prior = pm.Uniform('x2_prior', lower=0, upper=100)
    
    # Expected value of outcome    
    x2 = x2_prior
    
    # Likelihood (sampling distribution) of observations
    x2_obs = pm.Poisson("x2_obs", mu=x2, observed=x2_data)
    
    # Posterior sampling
    x2_sample = pm.sample(1000, chains=4, cores=4)

# Output of statistical data    
az.plot_trace(x1_sample)
print(az.summary(x1_sample))
az.plot_posterior(x1_sample)

az.plot_trace(x2_sample)
print(az.summary(x2_sample))
az.plot_posterior(x2_sample)
 
# Extract posterior density data
x1_posterior_liste = x1_sample.posterior['x1_prior'].values
x2_posterior_liste = x2_sample.posterior['x2_prior'].values

x1_posterior = np.array(x1_posterior_liste[0])  # Switch to numpy array
x2_posterior = np.array(x2_posterior_liste[0])

y_posterior_data = x1_posterior - x2_posterior

# Doing statistics on the y posterior data
mean = np.mean(y_posterior_data )
median = np.median(y_posterior_data )
std_dev = np.std(y_posterior_data )

print(f'Mean: {mean}')
print(f'Median: {median}')
print(f'Standard deviation: {std_dev}')

# Try to get the posterior of y values
with output_quantity:
    # Priors for unknown model parameters
    y_prior = pm.Uniform('y_prior', lower=0, upper=400)
    y_unc_prior = pm.Uniform('y_unc', lower=0, upper=400)
    
    # Expected value of outcome    
    y = y_prior
    
    # Likelihood (sampling distribution) of observations
    y_obs = pm.Normal("y_obs", mu=y, sigma=y_unc_prior, observed=y_posterior_data)
    
    # Posterior sampling   
    y_sample = pm.sample(1000, chains=4, cores=4)
    
# Output of statistical data     
az.plot_trace(y_sample)
print(az.summary(y_sample))
az.plot_posterior(y_sample)

jessegrabowski · May 12, 2024, 12:23pm

The difference of two Poisson RVs is distributed by the Skellam distribution. We have an implementation in pymc-experimental which you could directly use to model your data. Something like:

import pymc as pm
import pymc_experimental as pmx

 with pm.Model() as m:
    mu1 = pm.HalfNormal('mu1', 1)
    mu2 = pm.HalfNormal('mu2', 1)
    y_hat = pmx.distributions.Skellam('y_hat', mu1, mu2, observed=data)

jakevinolo · May 12, 2024, 5:10pm

Thanks for your reply.
The reason why I do not use the Skellam distribution is that my full model of evaluation is much more complicated:
Y = (X1/(X2X3)) * (X4/X5 - X6X7 - X8/X9)

In the full model mentioned above, only X4 and X8 are from Poisson distributions. All other variables are continuous with distributions such as rectangular, triangular, and normal. To simplify things at the beginning, I started with the model Y = X1-X2.

At first, I wanted to investigate whether the posterior probability density function, obtained by updating a sufficiently broad rectangular prior with a Poisson probability mass function as the likelihood, remains a Poisson distribution.

However, as far as I understand, PyMC provides the posterior for the parameter mu of the Poisson pmf in my example

    # Likelihood (sampling distribution) of observations
    x1_obs = pm.Poisson("x1_obs", mu=x1, observed=x1_data)

(See the full example in the end of the post.)

In this scenario, the standard deviation (sd) of the posterior appears to be very small due to the central limit theorem (?).


Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [x1_prior]
Sampling 4 chains, 0 divergences ━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 / 0:00:18
[Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 19 seconds.
           mean     sd  hdi_3%  hdi_97%  ...  mcse_sd  ess_bulk  ess_tail  r_hat
x1_prior  2.086  0.046   1.999     2.17  ...    0.001    1546.0    2893.0    1.0

Consequently, the obtained posterior does not correspond with the assumed prior. Based on the expectation value of the Poisson pmf denoted as N, the posterior should have a standard deviation of sqrt(N), but the PyMC output is much smaller. So, I conclude that my (pseudo) deterministic part of the PyMC model is wrong:

# Deterministic propagation
y = x1

What aspect am I misunderstanding?

Just to clarify my thought process: ideally, I would obtain the desired posterior density directly from the Markov formula. But it seems that the concept behind PyMC’s functioning is different, isn’t it? (I was hoping that I could formulate my model evaluation as a kind of regression task in PyMC.)

In my next step regarding my PyMC tests, I then attempted to implement the model Y = X1-X2 as described in the initial post.

####################################################

import arviz as az
import numpy as np
import pymc as pm

# Initialize random number generator
RANDOM_SEED = 82345
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")

# True parameter values
x1_true = 2

# Size of dataset
trials = 1000

# Predictor variable
x1_data = np.random.poisson(x1_true, trials)

input_quantity1 = pm.Model()

with input_quantity1:
    # Priors for unknown model parameters    
    x1 = pm.Uniform('x1_prior', lower=0, upper=100)
    
    # Deterministic propagation
    y = x1
    
    # Likelihood (sampling distribution) of observations
    x1_obs = pm.Poisson("x1_obs", mu=y, observed=x1_data)
    
    # Posterior sampling    
    x1_sample = pm.sample(1000, chains=4, cores=4)
    
az.plot_trace(x1_sample)
print(az.summary(x1_sample))
az.plot_posterior(x1_sample)

Topic		Replies	Views
Making predictions on model inputs based on observed outputs Questions	3	738	March 18, 2021
Propagation of uncertainty v5 modeling	7	716	February 1, 2023
Poisson likelihood over a Gaussian process Questions	4	1367	November 13, 2019
Odd results in model prediction using pymc.sample_posterior_predictive v5 linear_model , modeling	9	1206	September 24, 2022
Issue getting started with simple example Questions	1	717	November 18, 2018

Uncertainty Propagation according to GUM

Related topics