Hello everyone,
I may be asking a question that has already been answered somewhere, but I cannot clearly see in the previous posts, recommended books and videos that my issue is addressed. If that is the case, I would appreciate a pointer to the specific post/resource. But maybe PyMC cannot be used in this way?
Otherwise, here is my problem:
I have a metrological model of evaluation in the following form (it’s not the PyMC-model):
Y = X1 - X2
There are two input quantities, namely X1 and X2, and the primary measurement data on each input quantity is distributed according to a Poisson pmf with means x1_measure/x2_measure.
According to the Guide to the Expression of Uncertainty - Supplement 1, one could propagate the Poisson pmfs through the model of evaluation by drawing random samples from each Poisson pmf and inserting them into the model. This results in a probability density for the output variable values y. (This is, for example, what mcerp does.)
But I want to use the Posterior densities of the values of X1 and X2 in the model of evaluation, assuming a certain type of Prior density for the values of each quantity.
For Example:
X1: Prior = rectangular density, Likelihood = Poisson pmf, mean = x1_measure
X2: Prior = rectangular density, Likelihood = Poisson pmf, mean = x2_measure
→ Applying Bayes Theorem:
X1: Posterior = const * Likelihood * Prior and so forth for X2.
Then I want to sample from the Posterior density of X1 and X2. The sampled values are used in the evaluation model to estimate the posterior density of the values y.
(I had hoped to directly access the posterior of Y from PyMC.)
Afterwards, I need the posterior predictive density of Y, because I want to calculate quantiles from it.
I experimented with different examples from the PyMC website and recommended books, but was unsuccessful. Here is an example that illustrates my approach to this problem; however, I suspect that it is not implemented correctly in PyMC. Can somone help me?
Thanks a lot and best regards.
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc as pm
# Initialize random number generator
RANDOM_SEED = 82345
rng = np.random.default_rng(RANDOM_SEED)
az.style.use("arviz-darkgrid")
# True parameter values
x1_true = 2
x2_true = 1
# Size of dataset
trials = 1000
# Predictor variable
x1_data = np.random.poisson(x1_true, trials)
x2_data = np.random.poisson(x2_true, trials)
# Simulate output variable
y_data = x1_data-x2_data
# Show correlation plot
fig, axes = plt.subplots(1, 2, sharex=True, figsize=(10, 4))
axes[0].scatter(x1_data, y_data, alpha=0.6)
axes[1].scatter(x2_data, y_data, alpha=0.6)
axes[0].set_ylabel("y")
axes[0].set_xlabel("x1")
axes[1].set_xlabel("x2");
print(f"Running on PyMC v{pm.__version__}")
# Define pymc models
input_quantity1 = pm.Model()
input_quantity2 = pm.Model()
output_quantity = pm.Model()
with input_quantity1:
# Priors for unknown model parameters
x1_prior = pm.Uniform('x1_prior', lower=0, upper=100)
# Expected value of outcome
x1 = x1_prior
# Likelihood (sampling distribution) of observations
x1_obs = pm.Poisson("x1_obs", mu=x1, observed=x1_data)
# Posterior sampling
x1_sample = pm.sample(1000, chains=4, cores=4)
with input_quantity2:
# Priors for unknown model parameters
x2_prior = pm.Uniform('x2_prior', lower=0, upper=100)
# Expected value of outcome
x2 = x2_prior
# Likelihood (sampling distribution) of observations
x2_obs = pm.Poisson("x2_obs", mu=x2, observed=x2_data)
# Posterior sampling
x2_sample = pm.sample(1000, chains=4, cores=4)
# Output of statistical data
az.plot_trace(x1_sample)
print(az.summary(x1_sample))
az.plot_posterior(x1_sample)
az.plot_trace(x2_sample)
print(az.summary(x2_sample))
az.plot_posterior(x2_sample)
# Extract posterior density data
x1_posterior_liste = x1_sample.posterior['x1_prior'].values
x2_posterior_liste = x2_sample.posterior['x2_prior'].values
x1_posterior = np.array(x1_posterior_liste[0]) # Switch to numpy array
x2_posterior = np.array(x2_posterior_liste[0])
y_posterior_data = x1_posterior - x2_posterior
# Doing statistics on the y posterior data
mean = np.mean(y_posterior_data )
median = np.median(y_posterior_data )
std_dev = np.std(y_posterior_data )
print(f'Mean: {mean}')
print(f'Median: {median}')
print(f'Standard deviation: {std_dev}')
# Try to get the posterior of y values
with output_quantity:
# Priors for unknown model parameters
y_prior = pm.Uniform('y_prior', lower=0, upper=400)
y_unc_prior = pm.Uniform('y_unc', lower=0, upper=400)
# Expected value of outcome
y = y_prior
# Likelihood (sampling distribution) of observations
y_obs = pm.Normal("y_obs", mu=y, sigma=y_unc_prior, observed=y_posterior_data)
# Posterior sampling
y_sample = pm.sample(1000, chains=4, cores=4)
# Output of statistical data
az.plot_trace(y_sample)
print(az.summary(y_sample))
az.plot_posterior(y_sample)