Background:
I am working on a 2nd-degree polynomial regression model for, let’s say, predicting rainfall in a city using atmospheric pressure as a predictor. My dataset comes from various stations, some of which employ old barometers/rain gauges with low precision. This results in different standard deviations across my data for both X and Y.
Objective:
- Model uncertainties in both X and Y in a Bayesian framework.
- Accommodate the fact that data might not come from the same generating distribution.
Current Approach:
x1in = np.array([1.0, 2.0, 3.0, 4.0])
x1inerr = np.array([0.1, 0.2, 0.3, 0.1])
x2in2 = x1in**2
yin = np.array([1.1, 3.1, 8.0, 19])
yinerr = np.array([0.1, 0.1, 0.1, 0.1])
with pm.Model() as model:
x1 = pm.MutableData("x1", x1in)
x1err = pm.MutableData("x1err", x1inerr)
x2 = pm.Deterministic("x2", x1**2)
x2err = pm.Deterministic("x2err", np.abs(2 * x1 * x1err))
y = pm.MutableData("y", yin)
yerr = pm.MutableData("yerr", yinerr)
beta0 = pm.Normal("beta0", 0, 1)
beta1 = pm.Normal("beta1", 0, 1)
beta2 = pm.Normal("beta2", 0, 1)
# Latent true values
x1_true = pm.Normal("x1_true", mu=x1, sigma=x1err, shape=x1.size)
x2_true = pm.Normal("x2_true", mu=x2, sigma=x2err, shape=x2.size)
mu_est = pm.Deterministic("mu", beta0 + x1_true * beta1 + x2_true * beta2)
obs = pm.Normal("obs", mu_est, sigma=yerr, observed=y, shape=x1.shape)
idata = pm.sample()
Questions:
- Modeling with Different Standard Deviations:
Given the background information about the varying precision of barometers/rain gauges, how can I best model this situation within the Bayesian framework? - Posterior Predictions:
With my current approach I am facing difficulties in making posterior predictions. My attempt:
xnew = np.arange(0, 6.2, 0.2)
xerrnew = np.ones_like(xnew) * 0.1
with model:
pm.set_data({"x1": xnew, "x1err": xerrnew})
y_test = pm.sample_posterior_predictive(idata)
However, I am receiving an error related to shape misalignment. I noticed similar issues discussed in these threads:
- Odd results in model prediction using PyMC
sample_posterior_predictive
- Shape issues with
sample_posterior_predictive
in PyMC5 - Get out-of-sample posterior predictive for mean
However, I couldn’t find any thread that shows how to ensure the correct dimension/shape when dealing with additional sigma
information for both X and Y.