Hi,
I am trying to solve a silly indexing issue and realize I’m not sure I understand pymc3 syntax. Suppose I want to model a y that is normally distributed, and have predictors x0 and x1. x1 has only sometimes been measured, but I do not wish to impute it. Instead, I want to define two likelihoods for y. The two likelihoods would have different linear models, the less complex one where x1 is missing has, instead of x1 as a predictor, an additional residual uncertainty (sigma). Thus, when complete data is available, I want to model y like so:
and in the case I only know x0:
In stan I can do this like so (with N observations, and x1_measured an indicator vector):
// likelihood, which we only evaluate conditionally
if(run_estimation==1){
for (i in 1:N) {
if (x1_measured[i]==1) y[i] ~ normal(mu[i] , sigma ) ;
if (x1_measured[i]==0) y[i] ~ normal(mu[i] , sigma+sigma_x1 ) ; // more uncertainty when non-measured
}
I tried this approach in pymc3 (after defining priors etc, idx_without_m is an array of indeces):
with pm.Model() as model:
# define priors...(not shown)
mu_with = a + beta_x0*x0 + beta_x1*x1
mu_without = a + beta_x0 * x0
y_likelihood_with = pm.Normal("y_with",mu_with, sigma, observed=np.array(dw)[idx_with_m])
y_likelihood_without = pm.Normal("y_lik_without",mu_without, sigma+sigma_m, observed=np.array(dw)[idx_without_m])
Is this “canonical”? Should I used Theano shared variables etc?