Hierarchical Model Weibull Regression


I am trying to build a hierarchical regression model with target variable distributed as Weibull. Input is some ‘user’ specific feature vector and the target is a cont. variable.

The number of users in the dataset is ~5k for a smaller version of the data.

X_group = shared(train_userid.astype(int))
X_ = shared(training_data_norm)

with pm.Model() as model:

    mu_a = pm.Normal('mu_a', mu=0., sigma=100)
    sigma_a = pm.HalfNormal('sigma_a', 5.)
    mu_b = pm.Normal('mu_b', mu=0., sigma=100)
    sigma_b = pm.HalfNormal('sigma_b', 5.)
    k = pm.HalfNormal('k', 5)
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=n_users + 1)
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, shape=n_users + 1)
    lamda = tt.log(1 + tt.exp(a[train_uid] + b[train_uid] * X_[:, 4]))
    # Likelihood (sampling distribution) of observations
    runningtime_obs = pm.Weibull('runningtime_obs', alpha=k, \
         beta=lamda, observed=y_train)

I am sampling using:

with model:
    start = find_MAP()
    trace = pm.sample(200, tune=200, target_accept=.9, start=start)

The logp and grad goes to inf quickly. The same model without hierarchy works well. Any suggestions/recommendations?


  • Do you get some warning / divergences from NUTS after sampling? This is usually very helpful to understand what’s going wrong.
  • Initializing your sampler at the MAP is pretty much discouraged now.
  • Have you done some prior predictive checks? The prior stds on your hierarchical means, mu_a and mu_b seem super huge. With hierarchical models, implementing regularizing priors is usually very important.

Hope this helps :vulcan_salute:

Thanks Alex for replying.

  • I didn’t get warning related to divergence during sampling.
  • I tried both variants, (without MAP and with MAP), with MAP gave better results.
  • You are right. The prior stds were set to be very high, I reduced them to 1, and got improvements.
  • I also scaled the target variable to [0-1] range as suggested by you in another thread. That also resulted in improvement.
  • I changed the groups in the hierarchy. I am defining the hierarchy by hour of the day in the dataset. I saw a good variation in the target value with respect to hour of the day.

But after convergence, the results from the hierarchical model are inferior as compared to non-hierarchical model. Is this expected?


That’s already a good sign! Necessary (but not sufficient) to get reliable posterior estimates.

I wouldn’t use MAP to initialize, unless you have a strong reason for doing it. Also, remember to update your PyMC3 install to the brand new 3.9.1 if you can. The NUTS initialization was improved again.

It depends on what you’re talking about by “results”. If you’re talking about in-sample posterior predictions, this is expected and actually desirable: the hierarchial model is here to tame overfitting, so it’ll tend to reduce in-sample fit to improve out-of-sample predictions.