Dear community,
I have a linear regression problem in hand, where I wanted to consider the error propagation, considering the uncertainties in x variable, and fit a y value where there is a distribution of values with associated probabilities.
Now the predictions that I get from pm.sample_posterior_predictive(trace_final )
does not seem reasonable.
Looking at all the line predictions (essentially the intercepts and slopes), represented as green lines (4000 samples) in the figure below and the median of the MCMC sampled lines (red line), it seems like the model found a strong linear relationship between x and y. I was expecting the median value of the predictions using pm.sample_posterior_predictive(trace_final )
would give me somewhat linear relationship given the x values, however the function works as if the model is constant and only predict a small value.
I am confused whether this behavior is actually correct or I am missing something?
The model is only predicting the same value and the same standard deviation for all the new x values, although the graph below shows a clear linear relationship between x and y and therefore I was expecting to see an increasing mean value and a small standard deviation for the new y values pred
.
Just as a reference the mean predicted y value right now is about 0.4 and the standard deviation is about 3.9.
Any comment or advice is highly appreciated!
Many thanks!
Ali
P.S. This is the code just as a reference
#### modeling part using the MutableData so that I could predict new y values given new x values
x = x_values
y = mag
with pm.Model() as model_final:
x_final = pm.MutableData("x_final", x)
y_mag = pm.MutableData("y_mag", y)
err_odr = pm.HalfNormal('err_odr', 5.)
err_param = pm.HalfNormal('err_param', 1.)
x_lat = pm.Normal('x_mu', 0, 5., shape=x_final.shape[0])
x_obs = pm.Normal('x_obs', mu=x_lat, sigma=err_odr, observed=x_final, shape=x_final.shape[0])
offset = pm.Normal('Intercept', 0, err_param)
slope = pm.Normal('Slope', 0, err_param)
y_pred = pm.Deterministic('y_pred', offset + slope * x_lat)
y_obs = pm.Normal('y', mu=y_pred, sigma=err_odr, observed=y_mag)
trace_final = pm.sample(4000, tune=2500, chains=8, cores=8, init='jitter+adapt_diag', random_seed= 42)
### prediction part
with model_final:
pm.set_data({"x_final" : [-2,-1,1.5,2.5,4,5,7]})
pred = pm.sample_posterior_predictive(trace_final)
### This is how I look at the std and mean values for each x_final value prediction
pred.posterior_predictive['y'].mean(('chain', 'draw'))
pred.posterior_predictive['y'].std(('chain', 'draw'))
Your x_lat
inference will be completely ignored in posterior predictive because it depends on MutableData (this is the case for any variables with mutable dependencies). PyMC assumes this distribution may have changed completely since sampling and just resamples it from the prior.
Thank you for your reply!
From what you said I understand (I’m in machine learner migrating to Pymc), the training “fit” should stay which is analogy to the distribution of the posterior based on the fitting process of Pymc, but in my code for the test data Pymc ignores these learned posterior distributions and goes back to the priors for the prediction. How could I fix this so that it only uses the learnes posterior distribution?
In other words, how would you suggest to modify the code so that I could still consider the uncertainty in x and be able to use the model to predict the y values for new data points?
Many thanks!
Ali
Usually one does not add uncertainty to the predictors (or at least not individual predictor-specific uncertainty). The common GLM model looks like:
with pm.Model() as m:
x = pm.MutableData("x")
intercept = pm.Normal("intercept")
slope = pm.Normal("slope")
error = pm.HalfNormal("error")
mu = incercept + x * slope
pm.Normal("y", mu=mu, sigma=error, observed=y)
After you do inference you can change x
and new y
s will be resampled. Note that no unobserved variables depend on x
so they are not resampled from the prior. In your case, once you change x
, the old variables inferred during sampling are no longer relevant (or potentially no longer relevant).
Thank you for the reply! I think the biggest advantage of Bayesian approach over ML analysis is the very fact that we can add uncertainty to each of the predictors and we can estimate the error propagation through the function we specify. So when I am deploying this model it should allow for a measured x value and its uncertainty.
Currently I am using another python package mcrep
to go around this, but the caveat is that the posterior distribution of the slope and intercept, and x values should be assumed even though these might not be the true distributions in reality.
Do you have any other suggestion to still consider the individual uncertainties of each x values, and keep it MutableData
, so that I could predict the y-values in Pymc itself?
Just make the uncertainty learnable, instead of trying to infer x
(which you knew anyway):
with pm.Model() as m:
x = pm.MutableData("x", [...])
avg_x_noise = pm.HalfNormal("avg_x_noise")
x_noisy = pm.Normal("x_noisy", mu=x, sigma=avg_noise, shape=x.shape)
...
Then inference will learn how much it can trust the input x
, because even though x_noisy
will be resampled in posterior predictive, avg_x_noise
won’t.
2 Likes
Thank you for the reply, I really appreciate it! I used your suggestion and modified my code and it is right now.
The problem is that it does not converge, I have 1911 divergences out of 2000 samples, even given a target_accept = 0.99
. Any though on this issue?
x = x_values
y = mag
with pm.Model() as model_final_mod:
# x_final = pm.MutableData("x_final", x)
x_final = pm.MutableData("x", x)
avg_x_noise = pm.HalfNormal("avg_x_noise")
x_noisy = pm.Normal("x_noisy", mu=x_final, sigma=avg_x_noise, shape=x_final.shape)
# y_mag = pm.MutableData("y_mag", y)
err_odr = pm.HalfNormal('err_odr', 5.)
err_param = pm.HalfNormal('err_param', 1.)
offset = pm.Normal('Intercept', 0, err_param)
slope = pm.Normal('Slope', 0, err_param)
y_pred = pm.Deterministic('y_pred', offset + slope * x_noisy)
y_obs = pm.Normal('y', mu=y_pred, sigma=err_odr, observed=y)
trace_final_mod = pm.sample(2000, tune=2500, chains=8, cores=8, init='jitter+adapt_diag', random_seed= 42, target_accept = 0.99)
Does it get better if you remove err_param
? Otherwise I would imagine your data may not actually follow a linear relationship and your model has then too many variables it can use to explain the lack of fit.
1 Like
Thank you Ricardo! It actually reduced the number of divergences and it works just fine now!
Bests!
Ali