Custom naming of prefixed output variables

cwilliam · May 29, 2019, 2:29pm

Is there a way of renaming the output variables when there are multiple variables with the same prefix?

For example if I am inferring the parameter mu, with input shape=3, my pm.summary call for example outputs as expected:
mean sd mc_error hpd_2.5 hpd_97.5 n_eff Rhat
mu_0 -0.19 0.44 0.01 -1.00 0.70 972.71 1.0
mu_1 -0.19 0.40 0.01 -0.93 0.63 1196.89 1.0
mu_2 0.18 0.38 0.01 -0.59 0.90 841.64 1.0

Is there a way of renaming these with custom suffixes from a list of strings e.g.[‘X’,‘Y’,‘Z’] which would output mu_X, mu_Y, mu_Z, as opposed to the default 0,1,2?

Or can you give them new names entirely once the sampling is completed, so when calling pm.summary or pm.traceplot you can map the labels so mu_0 is displayed as X, mu_1 is Y, and mu_2 is Z?

drbenvincent · May 22, 2020, 11:15am

BUMP: Came here wanting to ask the same question. Working in the context of a GLM with a vector of beta coefficients, it would be very useful to be able to rename these variables according to column names of a design matrix.

nkaimcaudle · May 22, 2020, 11:50pm

PyMC3 uses another library, ArviZ, for diagnostics, stats and plots. You get finer control if you use the ArviZ functions directly. In particular you want to set the dims and coords attributes for an InferenceData object, please see here

Plots will display the coord name, showing the coord name within the summary table is work in progress.

drbenvincent · May 26, 2020, 9:51am

Thanks. I think this is zooming in on what I’m after. Although I found the explanation in the docs about dims and coords rather concise and abstract and I’ve not been able to figure out how it works. I’d be grateful if you expand a bit on how to use those arguments.

Minimum working example

import pymc3 as pm
import arviz as az
import numpy as np
import pandas as pd
import patsy

data = pd.DataFrame({"group": ["a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b"],
                     "sex": ["m", "m", "m", "w", "w", "w", "m", "m", "m", "w", "w", "w"],
                     "participant": [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
                     "y": [1, 2, 3, 5, 6, 7, 0, 1, 2, 3, 4, 5]})

formula = "y ~ group + sex + C(participant)"
y, X = patsy.dmatrices(formula, data)
labels = X.design_info.column_names
y, X = np.asarray(y), np.asarray(X)
n_rows, n_cols = X.shape

with pm.Model() as model:
    beta = pm.Normal("beta", mu=0, sd=10, shape=(n_cols,1))
    η = pm.math.dot(X,beta)
    sd = pm.HalfNormal("sd", sd=1)
    y_obs = pm.Normal("y_obs", mu=η , sd=sd, observed=y)
    prior = pm.sample_prior_predictive()
    trace = pm.sample()

then

pm_data = az.from_pymc3(trace=trace, prior=prior)
az.plot_forest(pm_data)

which gives

Ideally I’d be able to overwrite those beta values with labels which are

['Intercept',
 'group[T.b]',
 'sex[T.w]',
 'C(participant)[T.2]',
 'C(participant)[T.3]']

Tried a few things, but I just can’t figure out how to use the dims and coords arguments in az.from_pymc3

nkaimcaudle · May 26, 2020, 10:31am

This would work

dims = {'beta':['label_names']}
coords = {'label_names':labels}
idata = az.from_pymc3(trace=trace, prior=prior, coords=coords, dims=dims)
az.plot_forest(idata);

I tend to see the InferenceData object named idata, so I use that but of course you can use pm_data.
Your beta variable can be one dimensional rather than two. ArviZ will still work here, it will auto assign the names for the second dimension. But usually when I use dims and coords I specify all the dimensions of each variable.
In more complicated models you might end up with two or more variables for each sub-population, say in sports an attack and defence rating. In this case you’d provide the same dimension for each variable and provide the label names for that dimension just once.

dims = {'team_attack':['team_names'], 'team_defence':['team_names']}
coords = {'team_names':team_names}

drbenvincent · May 26, 2020, 10:50am

Thanks, that’s very useful.

And yes, it works with beta = pm.Normal("beta", mu=0, sd=10, shape=n_cols)

Looking forward to when it works with az.summary(idata) also

ceebeelee · July 27, 2023, 12:36am

az.from_pymc3 does not exist anymore. I am trying to solve the same problem (I think) with more recent packages (pymc 5.6?)
I actually have a trace/idata (?) object returned from pm.sample.
I would like to rename beta[0], beta[1], beta[2], etc to give them different names (or replace the integer index with a text suffix).

jessegrabowski · July 27, 2023, 4:12am

This discussion is quite out of date, you can now pass named dimensions to your model via the coords argument of pm.Model. See here for an example with some discussion.

Topic		Replies	Views
Naming matrix columns Questions	2	552	December 22, 2019
How can I name the dimensions of my variables? v5 modeling	2	343	September 29, 2022
Variable name restrictions Questions	0	357	May 1, 2021
How to name categorical variables? Questions	5	2420	July 12, 2018
How to specify coordinate names in summary() Questions	2	567	May 6, 2020

Custom naming of prefixed output variables

Related topics