Is there a way to change the labels of the distribution of coefficients coming out? For example, my sales forecast model is based on a hierarchy of 72 departments within one store. Instead of having ‘intercept_0, intercept_1, etc.’ on the trace plots and trace_to_dataframe output, is there a way to change those to ‘intercept_dept_a, intercept_dept_b, etc’?
This would make communicating actual results easier for business purposes. Thanks.
Apparently there are no built-in arguments that you can give to traceplot
to change the displayed labels from the variable’s name to something else.
Maybe @colcarroll knows a way, possibly using arviz
.
If in the end there are no built-in ways, you can either
- Get a dataframe from the trace, change the indexes names as you like, and then hand it over to
arviz.plot_trace
- Add deterministics to your model with your desired name.
Wow, this is exactly a use case for arviz
. There’s an open PR to make this work automatically for you, but right now, I would first pip install arviz
, and then (I’m guessing based on what you provided):
import arviz as az
import pymc3 as pm
with my_model:
...
trace = pm.sample()
data = az.from_pymc3(trace=trace,
coords={'departments': list_of_department_names},
dims={'intercept': ['departments']})
az.plot_trace(data, var_names='intercept')
1 Like
Thanks.
I ran the code and it’s been churning for about 3 hours now. To give a little more context, here is my model:
with pm.Model() as sales_model:
#define the priors
mu_a = pm.Normal('mu_a', mu = 0, sd = 10)
sigma_a = pm.HalfCauchy('sigma_a', 5)
#mu_d = pm.Normal('mu_d', mu = 0, sd = 10)
#sigma_d = pm.HalfCauchy('sigma_d', 5)
alpha = pm.Normal('intercept', mu=mu_a, sd = sigma_a, shape = n_dept)
beta_2 = pm.Normal('IsHoliday_T', mu = 0, sd = 20)
beta_3 = pm.Normal('Week', mu = 0, sd = 20)
s = pm.HalfCauchy('sd', 5)
#define the likelihood
mu = alpha[dept_idx] + beta_2*train['IsHoliday_True'].values + beta_3*train['Week'].values
y = pm.StudentT('sales', nu = 5, mu = mu, sd = s
, observed = train['Weekly_Sales'], shape = train['Weekly_Sales'].shape)
trace = pm.sample(draws=5000, cores = 1, init = 'advi', progressbar=True)
print(sales_model.check_test_point())
trace_plot = pm.traceplot(trace[1000:], grid=True)
ppc = pm.plot_posterior(trace[1000:])
summary = pm.summary(trace)
posterior = pm.trace_to_dataframe(trace)
Here is the arviz code I ran:
import arviz as az
with sales_model:
data = az.from_pymc3(trace=trace,
coords={'departments': dept_names},
dims={'intercept': ['departments']})
az.plot_trace(data, var_names='intercept')
Is it normal to churn for that long?
Definitely should not churn for that long. Usually it is under 10s, unless I am feeding it funny data.
I should ask, how big is n_dept
? The plot will make one subplot for each value in there, and matplotlib stops when you have more than, say, 100 subplots…
Can you share the output of data.posterior.intercept
?
There are 77 unique departments. I’ll have to rerun the model which takes a few hours because I can’t get multiprocessing to work on windows. Will post when it’s done but can tell you there is intercept_0…intercept_76
Once you rerun the model, you should be able to save it as a netcdf file after the az.from_pymc3
call: data.to_netcdf("my_data.nc")
. This is built on hdf5, and is a nice way of storing pymc3 traces on disk.