Custom naming of prefixed output variables

Is there a way of renaming the output variables when there are multiple variables with the same prefix?

For example if I am inferring the parameter mu, with input shape=3, my pm.summary call for example outputs as expected:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
mu_0 -0.19 0.44 0.01 -1.00 0.70 972.71 1.0
mu_1 -0.19 0.40 0.01 -0.93 0.63 1196.89 1.0
mu_2 0.18 0.38 0.01 -0.59 0.90 841.64 1.0

Is there a way of renaming these with custom suffixes from a list of strings e.g.[‘X’,‘Y’,‘Z’] which would output mu_X, mu_Y, mu_Z, as opposed to the default 0,1,2?

Or can you give them new names entirely once the sampling is completed, so when calling pm.summary or pm.traceplot you can map the labels so mu_0 is displayed as X, mu_1 is Y, and mu_2 is Z?

BUMP: Came here wanting to ask the same question. Working in the context of a GLM with a vector of beta coefficients, it would be very useful to be able to rename these variables according to column names of a design matrix.

PyMC3 uses another library, ArviZ, for diagnostics, stats and plots. You get finer control if you use the ArviZ functions directly. In particular you want to set the dims and coords attributes for an InferenceData object, please see here

Plots will display the coord name, showing the coord name within the summary table is work in progress.

3 Likes

Thanks. I think this is zooming in on what I’m after. Although I found the explanation in the docs about dims and coords rather concise and abstract and I’ve not been able to figure out how it works. I’d be grateful if you expand a bit on how to use those arguments.

Minimum working example

import pymc3 as pm
import arviz as az
import numpy as np
import pandas as pd
import patsy

data = pd.DataFrame({"group": ["a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b"],
                     "sex": ["m", "m", "m", "w", "w", "w", "m", "m", "m", "w", "w", "w"],
                     "participant": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
                     "y": [1, 2, 3, 5, 6, 7, 0, 1, 2, 3, 4, 5]})

formula = "y ~ group + sex + C(participant)"
y, X = patsy.dmatrices(formula, data)
labels = X.design_info.column_names
y, X = np.asarray(y), np.asarray(X)
n_rows, n_cols = X.shape

with pm.Model() as model:
    beta = pm.Normal("beta", mu=0, sd=10, shape=(n_cols,1))
    η = pm.math.dot(X,beta)
    sd = pm.HalfNormal("sd", sd=1)
    y_obs = pm.Normal("y_obs", mu=η , sd=sd, observed=y)
    prior = pm.sample_prior_predictive()
    trace = pm.sample()

then

pm_data = az.from_pymc3(trace=trace, prior=prior)
az.plot_forest(pm_data)

which gives

Ideally I’d be able to overwrite those beta values with labels which are

['Intercept',
 'group[T.b]',
 'sex[T.w]',
 'C(participant)[T.2]',
 'C(participant)[T.3]']

Tried a few things, but I just can’t figure out how to use the dims and coords arguments in az.from_pymc3

This would work

dims = {'beta':['label_names']}
coords = {'label_names':labels}
idata = az.from_pymc3(trace=trace, prior=prior, coords=coords, dims=dims)
az.plot_forest(idata);

I tend to see the InferenceData object named idata, so I use that but of course you can use pm_data.
Your beta variable can be one dimensional rather than two. ArviZ will still work here, it will auto assign the names for the second dimension. But usually when I use dims and coords I specify all the dimensions of each variable.
In more complicated models you might end up with two or more variables for each sub-population, say in sports an attack and defence rating. In this case you’d provide the same dimension for each variable and provide the label names for that dimension just once.

dims = {'team_attack':['team_names'], 'team_defence':['team_names']}
coords = {'team_names':team_names}
3 Likes

Thanks, that’s very useful.

And yes, it works with beta = pm.Normal("beta", mu=0, sd=10, shape=n_cols) :+1:

Looking forward to when it works with az.summary(idata) also :slight_smile:

1 Like