Hi everyone,
I wanted to apply Bayesian linear regression to data where the predictor variable is log scaled. As result I wanted to see the uncertainty in the mean, using also az.plot_hdi().
Using az.plot_hdi() I don’t get results I would have expected.
Here my attempts to understand it:
Linear set-up on example: GLM: Linear regression — PyMC 5.9.1 documentation
I started trying az_plot_hdi() additionally to the used az.plot_lm().
Basically this worked and I got a result I would have expected:
Then I wanted to work with the logarithmic data and changed the example set-up a little bit:
size_ln = 50 # smaller dataset for more uncertainty around mean
true_intercept_ln = 1
true_slope_ln = 2
x_ln = np.random.lognormal(size=size_ln) # instead np.linspace() for a nicer data spacing
# y = a + b*log10(x)
true_regression_line_ln = true_intercept_ln + true_slope_ln * np.log10(x_ln)
# add noise like in example
y_ln = true_regression_line_ln + rng.normal(scale=0.5, size=size)
data_ln = pd.DataFrame(dict(x=x_ln, y=y_ln))
With the following slightly changed model (logarithm in the likelihood):
with Model() as model_ln: # model specifications in PyMC are wrapped in a with-statement
# Define priors
sigma = HalfCauchy("sigma", beta=10)
intercept = Normal("Intercept", 0, sigma=20)
slope = Normal("slope", 0, sigma=20)
# Define likelihood
likelihood = Normal("y", mu=intercept + slope * np.log10(x_ln), sigma=sigma, observed=y_ln)
# Inference!
# draw 3000 posterior samples using NUTS sampling
idata_ln = sample(3000)
For the result, I wanted to use the graphic from above. But for my use cases we usually show the x-axis log-scaled, so I tried this too. This is displayed on the left side. On the right side I left it linear scaled and displayed the x values as log values (reduced Code below).
# log
az.plot_lm(idata=idata_ln, x=x_ln, y="y", num_samples=30, axes=ax[0], y_model="y_model", y_kwargs={"marker":"x", "markersize":2}, y_model_mean_kwargs={"color":"green", "lw":2}, y_model_fill_kwargs={"alpha":0.01})
az.plot_hdi(x_ln, idata_ln.posterior["y_model"], color = "red", fill_kwargs={"label":"95% HDI"}, hdi_prob=0.95, ax = ax[0])
ax[0].set_xscale("log")
#linear
az.plot_lm(idata=idata_ln, x=np.log10(x_ln), y="y", num_samples=30, axes=ax[1], y_model="y_model", y_kwargs={"marker":"x", "markersize":2}, y_model_mean_kwargs={"color":"green", "lw":2}, y_model_fill_kwargs={"alpha":0.01})
az.plot_hdi(np.log10(x_ln), idata_ln.posterior["y_model"], color = "red", fill_kwargs={"label":"95% HDI"}, hdi_prob=0.95, ax = ax[1])
I would have expected a similar result. Therefore I tried to work with the posterior for myself and here it looks similar:
ida = az.extract(idata_ln, group="posterior", combined=True, num_samples=30)
#log
for i in range(0,30):
x = np.array([0.1,10])
ax[0].plot(x, ida["Intercept"][i].values + ida["slope"][i].values*np.log10(x), color="black", lw = 0.5)
mu = idata_ln.posterior["Intercept"].mean().values + idata_ln.posterior["slope"].mean().values*np.log10(x)
ax[0].plot(x, mu, color = "green", lw=2)
ax[0].set_xscale("log")
# linear
for i in range(0,30):
x = np.array([0.1,10])
ax[1].plot(np.log10(x), ida["Intercept"][i].values + ida["slope"][i].values*np.log10(x), color="black", lw = 0.5)
ax[1].plot(np.log10(x), mu, color="green", lw=2)
I worked with:
PyMC v5.6.1
Arviz v0.16.1
Maybe I have a big thinking error regarding this transformation set-up or my use of the methods is not appropriate. I really hope that someone could find my mistake(s).
Thanks in advance!