How to account for multiple betas in example documentation?

How would I account for for multiple beta parameters in the toy example :Prior and Posterior Predictive Checks — PyMC 4.4.0 documentation

In this situation, where I do not just have b, but b1, b2, … etc.

post = idata.posterior
mu_pp = post["a"] + post["b"] * xr.DataArray(predictor_scaled, dims=["obs_id"]

It depends on whether each coeffcient was a separate parameter or if you have a parameter array (e.g., betas=pm.Normal("betas, dims="obs_id")). If they are separate, you could just do something along these lines:

mu_pp = (
        post["a"] + 
        post["b1"] * xr.DataArray(predictor_1, dims=["obs_id"] +
        post["b2"] * xr.DataArray(predictor_2, dims=["obs_id"] +
        post["b3"] * xr.DataArray(predictor_3, dims=["obs_id"]

Following that example, however, it would be easier to just save the predicted means like this:

with pm.Model() as model_1:
    a = pm.Normal("a", 0.0, 0.5)
    b = pm.Normal("b", 0.0, 1.0)

    mu = pm.Deterministic("mu", a + b * predictor_scaled)
    sigma = pm.Exponential("sigma", 1.0)

    pm.Normal("obs", mu=mu, sigma=sigma, observed=outcome_scaled)
    idata = pm.sample_prior_predictive(samples=50, random_seed=rng)

And then you should be able to do something like this:

plt.scatter(predictor_scaled, post["mu"].mean(("chain", "draw")))

Not 100% on any of that code running as-is, but should get you started. If not, feel free to ask.

Thanks, super helpful! Could you recommend a procedure if the betas were all chained together, say:

with pm.Model() as model_1:
    a = pm.Normal("a", 0.0, 10.0)
    betas = pm.Normal('betas', mu=[40, 25, 55], sigma=5, dims='features')

Or should we stick to making discrete entries for each parameter (to make our lives easier later?)

Also in your example, where does idata come into play; it seems as an unused variable.

idata was in there just because it was leftover from the example notebook. Here is a little example that uses a parameter array of coefficients. It could be simplified further, but maybe it helps.

import pymc as pm
import numpy as np
import xarray as xr
import arviz as az
import matplotlib.pyplot as plt

nObs = 100
nPredictors = 3

xs = np.random.random(size = (nPredictors,nObs))
obs = 1 +, [1,2,3])

with pm.Model() as model:
    a = pm.Normal("a", 0.0, 10.0)
    betas = pm.Normal('betas', mu=0, sigma=5, shape=nPredictors)
    mu = pm.Deterministic("mu", a +, xs))
    pm.Normal("obs", mu=mu, sigma=1, observed=obs)
    idata = pm.sample()

post = idata.posterior
a = az.extract(idata.posterior, var_names="a", combined=True)
b = az.extract(idata.posterior, var_names="betas", combined=True)
mu_pp = np.tile(a,(nObs,1)) +, b)

plt.scatter(obs, np.mean(mu_pp, axis=1))
1 Like

Very cool – excited to play with this!

1 Like

I want to add the xarray version of this too if you don’t mind, which gets the same result as the numpy code above:

Provided you label your dimensions in the model:

with pm.Model() as model:
    x = pm.ConstantData("x", xs, dims=("predictor", "obs_id"))
    a = pm.Normal("a", 0.0, 10.0)
    betas = pm.Normal('betas', mu=0, sigma=5, dims="predictor")
    mu = pm.Deterministic("mu", a +, xs), dims="obs_id")
    pm.Normal("obs", mu=mu, sigma=1, observed=obs, dims="obs_id")
    idata = pm.sample()

You can then do:

a = az.extract(idata, var_names="a")
b = az.extract(idata, var_names="betas")

mu_pp = a +["x"], b, dims="predictor")

plt.scatter(obs, mu_pp.mean("sample"))

Note the definition of a and b would also work if used in the code above, extract defaults to returning combined variables from the posterior (unless a different group is specified).

Later, instead of indicating what dimension to reduce with the tiling, transposing or using integer axis ids, we do so by name. We reduce the predictor dimension in the dot product, and the sample dimension in the mean, so we are only left with the obs_id dim which is the one we want to plot.

1 Like