I have the following hierarchical model. The ADVI initialization converges at 33% and NUTS runs okay (in some instances I get 1 divergence/‘effective samples is smaller than 200 for some parameters’). The challenge is the runtime for NUTS. I understand this can run slow, but my model appears to be running quite a bit slower than others I’ve seen in examples, etc. It runs at ~1.7s/draw and takes ~4 hours to run on 4 cores of a Google cloud VM running linux.This is my first model in PyMC. Does anyone have any suggestions? The model is of real estate prices, with dwelling attributes, community attributes, and attributes of larger spatial areas. I was running it with 2000 draws but having some memory issues. I compare it against pooled/unpooled models. There are 35,879 observations.
xbarB = df.groupby('clusterID')['branch'].mean().values
xbarE = df.groupby('clusterID')['entropy'].mean().values
xbarA = df.groupby('ADA')['house_area'].mean().values
with pm.Model() as contextual_effect:
# Priors
sigma_aC = pm.HalfNormal('sigma_aC', 16)
sigma_aA = pm.HalfNormal('sigma_aA', 277)
# Cluster branch model for slope
gammaC = pm.Normal('gammaC', mu=0., sd=1, shape=10)
# Branch and entropy model for intercept
mu_aC = pm.Deterministic('mu_aC', gammaC[0] + gammaC[1]*size2 + gammaC[2]*size3 + gammaC[3]*size4 + gammaC[4]*size5 +gammaC[5]*size6
+ gammaC[6]*size7 + gammaC[7]*size8 + gammaC[8]*xbarB[clusterID] + gammaC[9]*xbarE[clusterID])
# Cluster variation not explained by cluster model
eps_aC = pm.Normal('eps_aC', mu=0, sd=sigma_aC, shape=nclusters)
aC = pm.Deterministic('aC', mu_aC + eps_aC[clusterID])
# ADA model for slope
gammaA = pm.Normal('gammaA', mu=0., sd=1, shape=4)
# ADA model for intercept
mu_aA = pm.Deterministic('mu_aA', gammaA[0] + gammaA[1] * branch_measure + gammaA[2] * entropy_measure + gammaA[3]*xbarA[ADA])
# # ADA variation not explained by entropy model
eps_aA = pm.Normal('eps_aA', mu=0, sd=sigma_aA, shape=nADA)
aA = pm.Deterministic('aA', mu_aA + eps_aA[ADA])
# Model error
sigma_y = pm.Uniform('sigma_y', lower=0, upper=100)
# Individual level slopes
b_area = pm.Normal('b_area', mu=0.004, sd=0.000018)
b_year = pm.Normal('b_year', mu=1.651504, sd=0.03)
b_house2 = pm.Normal('b_house2', mu=-2.378035, sd=0.1)
b_house3 = pm.Normal('b_house3', mu=-1.557363, sd=0.1)
b_house4 = pm.Normal('b_house4', mu=-1, sd=0.1)
# Expected value
y_hat = aC + aA + b_area * area \
+ b_year * year + b_house2 * house2 + b_house3 * house3 + b_house4 * house4
# Data likelihood
y_like = pm.Normal('y_like', mu=y_hat, sd=sigma_y, observed=price)
pm.model_to_graphviz(contextual_effect)
with contextual_effect:
contextual_effect_trace = pm.sample(1000, init='advi+adapt_diag')