Arviz plot_hdi for logarithmic predictor variable in log scale

I’m not 100% sure what your intended output is, but maybe knowing a bit about how to do DIY plot_lm and plot_hdi will be helpful?

To make nice plots, it can be helpful to swap out the data you fit the model on with sorted dummy data. An example of this is shown in the data containers how-to.

In your case I made some small changes to your model:

  • I set the x_data to be a pm.MutableData
  • I linked the observed variable’s shape to the shape of the input data.
  • I saved mu as a pm.Deterministic, because I knew I’d want to use it later.

I also re-defined x_ln to be draws from rng.normal instead of draws from rng.lognormal, because otherwise the variable should be called x_exp, and it was driving me nuts to log a variable already called log.

Here’s the model after my changes:

with pm.Model() as model_ln:  # model specifications in PyMC are wrapped in a with-statement
    x_data = pm.MutableData('x', x_ln)
    
    # Define priors
    sigma = pm.HalfCauchy("sigma", beta=10)
    intercept = pm.Normal("Intercept", 0, sigma=20)
    slope = pm.Normal("slope", 0, sigma=20)

    mu = pm.Deterministic('mu', intercept + slope * x_data)
    
    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sigma=sigma, observed=y_ln,
                           shape=x_data.shape[0])

    # Inference!
    # draw 3000 posterior samples using NUTS sampling
    idata_ln = pm.sample(3000)

Now you have the parameter estimates. Next, we want to make an evenly-spaced grid of data and generate predictions on that. Use pm.set_data:

with model_ln:    
    x_grid = np.linspace(x_ln.min(), x_ln.max(), 100)
    pm.set_data({'x':x_grid})
    idata_spaced = pm.sample_posterior_predictive(idata_ln, predictions=True)

Now we’re ready to plot. We need the mean and hdis of idata_spaced. I don’t like to use az.extract, because az.hdi requires the chain and draw dimensions, which extract flattens.

One trick I like with ax.fill_between is to use *hdi.values.T. The dimensions of the idata return by az.hdi is (y_obs_dim_1, hdi_low_high), with shapes (n_obs, 2). So if you transpose and unpack it, you get two arrays: one for the lower bound, and one for the upper bound. This is exactly what ax.fill_between expects. I also use this “trick” to get rng.choice to sample some regression lines for me (choice will sample from the left-most dimension if you give it an array)

mu = idata_spaced.predictions.y.mean(dim=['chain', 'draw'])
hdi = az.hdi(idata_spaced.predictions).y

# I use .stack here, could also use `az.extract` to do the stacking
reg_line_samples = rng.choice(idata_spaced.predictions.stack(sample=['chain', 'draw']).mu.values.T,
                              size=30)

fig, ax = plt.subplots(1, 2, figsize=(14,4))
for axis, scale in zip(fig.axes, ['linear', 'log']):
    axis.plot(x_grid, mu, label='Posterior Mean', zorder=100, lw=2)
    axis.fill_between(x_grid, *hdi.values.T, alpha=0.25, label='HDI 94%')
    axis.scatter(x_ln, y_ln, label='Observed')
    for sample in reg_line_samples:
        line, = axis.plot(x_grid, sample, color='0.5', lw=0.5)

    # Trick to set only 1 entry in the legend (if you set label in the loop you'll get 30)
    line.set_label('Posterior Regression Sample')
    axis.set(xscale=scale, title=scale.title())
    
ax[0].legend()
plt.show()

Output:

1 Like