Hey to everybody,
I’m about to model the following data (synthetically created) by a simple hierarchical regression model
Where p=0
means that there is no variable p
given for that particular y (as such, p=0
is specified by me and could be any other value). However, the data where p=0
contains information as it causes a significantly lower value in the dependent variable.
The model I came up with so far is as follows:
with pm.Model() as hierarchical_model:
# Priors on parameters
alpha = pm.Normal('alpha', mu=0.1, sd=2,shape=2)
beta = pm.Normal('beta', mu=0, sd=2,shape=2)
# Estimate of y
clr_est = alpha[data_all.group] + beta[data_all.group] * data_all.p
# Model error
eps = pm.HalfCauchy('eps', 5)
# Data likelihood
like = pm.Normal('like', mu=clr_est, sd=eps, observed=data_all.clr)
The inference I get looks quite normal except that the slope parameter beta
is also estimated in the group where p=0
. As a side effect it seems that this would cause the sampler to converge slower.
Problem description
In the group of observations where p=0
, the slope parameter has no useful information and I’d rather omit it from the model. However, I’m stuck at writing down a proper model for that case as the parameter needs to be estimated in the other group.
I would really appreciate any advice!
Best
Thies_DS
Is it possible to get rid of the data with p = 0 before letting pymc fit the data?
Why not just drop the beta term for that particular group?
with pm.Model() as mod:
offset = pm.Normal('offset', 0, 1.)
offset_missing = pm.Normal('offset_mis', 0, 1.)
beta = pm.Normal('beta', 0, 0.5)
err = pm.HalfNormal('error', 0.5)
err_missing = pm.HalfNormal('error_uk', 0.5)
y_obs = pm.Deterministic('y_obs_', offset + beta * data.p.values[np.isfinite(data.p.values)])
y_lik = pm.Normal('y_lik_', y_obs, err, observed=data.y.values[np.isfinite(data.p.values)])
mis_lik = pm.Normal('mis_lik_', offset_missing, err_missing, observed=data.y.values[np.isnan(data.p.values)])
tr = pm.sample(1000, tune=750, chains=6, cores=2)
Hey @chartl,
thanks for the suggestion. Can you elaborate a little bit more on your approach? I cannot reproduce your results with my data.
Sry for the delay, should have added it in my first post:
synthetic_dataset1.csv (24.2 KB)
Switch on group
rather than missing:
with pm.Model() as mod:
o_scale = pm.HalfNormal('offset_scale', 0.1)
offset = pm.Normal('offset', 0, o_scale)
offset_missing = pm.Normal('offset_mis', 0, o_scale)
be_scale = pm.HalfNormal('be_scale', 0.1)
beta = pm.Normal('beta', 0, be_scale)
err = pm.HalfNormal('error', 0.25)
err_missing = pm.HalfNormal('error_uk', 0.25)
y_obs = pm.Deterministic('y_obs_', offset + beta * data.p.values[data.group.values==0])
y_lik = pm.Normal('y_lik_', y_obs, err, observed=data.clr.values[data.group.values==0])
mis_lik = pm.Normal('mis_lik_', offset_missing, err_missing, observed=data.clr.values[data.group.values==1])
tr = pm.sample(500, tune=1000, chains=4, cores=4, init='advi+adapt_diag')
def mytrace(trace):
vn = [x for x in trace.varnames if x[-1] != '_']
pm.traceplot(trace, vn);
mytrace(tr)
1 Like
That basically solves my problem, many thanks!
One last questions:
- Why is
y_obs
modeled as parameter although you wrote it in pm.Deterministic
?
- Is this inference model scalable to large datasets?
Best
- Why is
y_obs
modeled as parameter although you wrote it in pm.Deterministic
?
It probably would be better called y_pred
or mu_y
.
- Is this inference model scalable to large datasets?
Sure, but it would be clunky to implement in this way if you had several informatively-missing variables. For the more general case, I might write a helper function to generate a coefficient vector like so:
def make_coef(dat, missing_idx):
coef = tt.ones_like(dat)
coef = tt.set_subtensor(coef[missing_idx], 0)
beta = pm.Normal('beta', 0., 1.)
return beta*coef
And then stack these.