Model.plot_prior() wants to plot 200 priors for mu with a Normal likelihood?

Dear Bayesians,

having 200 observations in this Bambi model:

model = bmb.Model(
    "order_amount ~ 0 + age_group + numerical_predictor",
    family="gaussian",
    data=df_data
)

when I try

model.plot_priors(var_names=["mu"]);

Bambi plots 40 priors, starting like this:

Why isn’t it just plotting one singe kde for mu? It probably has something to do with the fact that it is modelled as Deterministic?

Best regards
Matthias

There are 200 mus as indicated in the graph. arviz caps it at 40

Could you please explain why there are 200 mus (in the prior) and how they are generated?

I suspect mu is something like intercept + coefficients @ x, and the extra the dimensions come from x

The other priors look like this:

So what you are saying is that because mu is modelled as Deterministic, Bambi repeats 200 times the following mechanism:

  • draw a value from one of the 2 age_group priors
  • draw a value from the numerical_predictor prior
  • add these two values
  • repeat this n times and create a kde from it (how big is n per kde?)

(In my example, there is no intercept, because I switched it off:

order_amount ~ 0 + age_group + numerical_predictor

I tried to reconstruct the calculation of the result[“prior”][“mu”] values. Would you agree that this happens somehow like this?

n_samples = 1000

age_group_prior = results["prior"]["age_group"].values.flatten()
numerical_predictor_prior = results["prior"]["numerical_predictor"].values.flatten()

# sample from the priors
age_group_samples = np.random.choice(age_group_prior, size=n_samples, replace=True)
numerical_predictor_samples = np.random.choice(numerical_predictor_prior, size=n_samples, replace=True)

# sample from observed predictor data
numerical_predictor_data = df_data["numerical_predictor"].sample(n_samples, replace=True).values

# calculate values for mu
res = age_group_samples + numerical_predictor_samples * numerical_predictor_data

pd.DataFrame(res).hist(bins=50)

The histogram looks like one of the 40 histograms plotted by model.plot_prior(), so this should be the way it happens. Not completely sure about it.

Yes but you don’t sample from the df_data["numerical_predictor"] it uses that deterministically. What is happening is something like this:

import numpy as np
import pymc as pm

predictor = np.linspace(-1, 1, 200)
with pm.Model(coords={"obs": range(len(predictor))}) as m:
  coefficient = pm.Normal("coefficient", 0, 1)
  mu = pm.Deterministic("mu", coefficient * predictor, dims="obs")
  idata = pm.sample_prior_predictive(draws=1000)

pm.plots.plot_posterior(idata, group="prior")

With numpy the way you generate a similar thing is you take a 1000 draws of coefficient and then outer multiply with the predictor to get mu draws with shape (1000, 200)

1 Like

Every order_amount has a different mu, because every order_amount has a different value of age_group and numerical_predictor. What you are seeing is the same as if you did e.g. res.predict(X) in statsmodels (or predict mu in STATA)

Given my model:

      Formula: order_amount ~ 0 + age_group + numerical_predictor
        Family: gaussian
          Link: mu = identity
  Observations: 200
        Priors: 
    target = mu
        Common-level effects
            age_group ~ Normal(mu: [0. 0.], sigma: [74.4983 74.4983])
            numerical_predictor ~ Normal(mu: 0.0, sigma: 4.7408)
        
        Auxiliary parameters
            sigma ~ HalfStudentT(nu: 4.0, sigma: 14.8997)

I adjusted your code like this:

with pm.Model(coords={"obs": range(len(df_data))}) as m:
    coefficient = pm.Normal("coefficient", 0, 4.74)
    age_group  = pm.Normal("age_group", 0, 74.4983)
    mu = pm.Deterministic("mu", age_group + coefficient * df_data["numerical_predictor"], dims="obs")
    idata = pm.sample_prior_predictive(draws=1000)

pm.plots.plot_posterior(idata, group="prior", var_names=["mu"]);

and the result pretty much looks like the output of model.plot_priors

Thank you!

Sorry for one last question here, just for making it completely clear to me what happens during the plotting. Would you agree that one of these 40 mu-plots could be created by something like this?

What I want to understand here is that if the plotting itself (in contrast to the generation of the idata[“prior”] group with its mu data variable) has in itself a sampling part, which is sampling from this mu data variable and then estimates this kde which is plotted.

No the plotting is not doing any sampling. Also definitely not raveling mu and mixing different dimensions

Then how are the values for one of these mu kde plots taken from idata[“prior”][“mu”]? Something like this?

idata["prior"]["mu"][:, :, i] # for the i'th mu kde plot?

i.e.

sns.kdeplot(pd.DataFrame(idata["prior"]["mu"][:, :, i].values.T))

given the shape

idata["prior"]["mu"].shape
(1, 500, 200)
1 Like

Yes, although arviz works directly on the inferencedata with the batch dimensions in a vectorized manner.

Under the hood the, 500x samples of each 200x mu are used separately do sketch the 200 kdeplots (of which arviz only bothers to plot the first 40 by default)

2 Likes