Hello everyone!
I have come across some unexpectedly high memory usage whilst running a hierarchical model - so much so my AWS Linux container (12GB memory) is running out of memory.
Set-up:
2.1m observations
Params by tree level
Level 0 - 3 params
Level 1 - 151 params
Level 2 - 576 params
Level 3 - 4587 params
Level 4 - 158199 params
Using float32, not float64
Running on CPU, not GPU
Using Theano
Code snippet:
with pm.Model() as bayesian_model:
b_0 = pm.Bound(pm.Normal, lower=hbr_lb, upper=hbr_ub)('b_0', mu=hbr_b_mu, sd=hbr_b_sig, shape=level_0_count)
b_1 = pm.Bound(pm.Normal, lower=hbr_lb[level_1_link_zero - 1], upper=hbr_ub[level_1_link_zero - 1])('b_1', mu=b_1[level_1_link_prev - 1], sd=hbr_b_sig[level_1_link_zero - 1], shape=level_1_count)
b_2 = pm.Bound(pm.Normal, lower=hbr_lb[level_2_link_zero - 1], upper=hbr_ub[level_2_link_zero - 1])('b_2', mu=b_2[level_2_link_prev - 1], sd=hbr_b_sig[level_2_link_zero - 1], shape=level_2_count)
b_3 = pm.Bound(pm.Normal, lower=hbr_lb[level_3_link_zero - 1], upper=hbr_ub[level_3_link_zero - 1])('b_3', mu=b_3[level_3_link_prev - 1], sd=hbr_b_sig[level_3_link_zero - 1], shape=level_3_count)
b_4 = pm.Bound(pm.Normal, lower=hbr_lb[level_4_link_zero - 1], upper=hbr_ub[level_4_link_zero - 1])('b_4', mu=b_4[level_4_link_prev - 1], sd=hbr_b_sig[level_4_link_zero - 1], shape=level_4_count)
eps = pm.HalfCauchy('eps', beta=1)
Y_est = b_4[level_4 - 1] * X
Y_like = pm.Normal('Y_like', mu=Y_est, sd=eps, observed=Y)
idata = pm.sample(
draws=300,
tune=500,
cores=1,
chains=1,
return_inferencedata=True,
idata_kwargs = {'log_likelihood': True})
As you can see I am only taking 300 draws in total across all chains, and 500 tune steps.
I am also returning the log likelihood since I need that for a later WAIC calculation. My understanding is that the log likelihood size would be 1 * 300 * 2.1m * 4 bytes = 2.5GB, so itâs large but canât explain the container running out of memory (12GB).
Are there other large data structures being stored in memory that could account for the high memory consumption - is it perhaps an issue with the number of parameters Iâm estimating instead (over several levels)?
Many thanks in advance!