Hierarchical Model - Slow Sampling

I have the following hierarchical model. The ADVI initialization converges at 33% and NUTS runs okay (in some instances I get 1 divergence/‘effective samples is smaller than 200 for some parameters’). The challenge is the runtime for NUTS. I understand this can run slow, but my model appears to be running quite a bit slower than others I’ve seen in examples, etc. It runs at ~1.7s/draw and takes ~4 hours to run on 4 cores of a Google cloud VM running linux.This is my first model in PyMC. Does anyone have any suggestions? The model is of real estate prices, with dwelling attributes, community attributes, and attributes of larger spatial areas. I was running it with 2000 draws but having some memory issues. I compare it against pooled/unpooled models. There are 35,879 observations.

xbarB = df.groupby('clusterID')['branch'].mean().values
xbarE = df.groupby('clusterID')['entropy'].mean().values
xbarA = df.groupby('ADA')['house_area'].mean().values

with pm.Model() as contextual_effect:

        # Priors
        sigma_aC = pm.HalfNormal('sigma_aC', 16)
        sigma_aA = pm.HalfNormal('sigma_aA', 277)

        # Cluster branch model for slope
        gammaC = pm.Normal('gammaC', mu=0., sd=1, shape=10)

        # Branch and entropy model for intercept
        mu_aC = pm.Deterministic('mu_aC', gammaC[0] + gammaC[1]*size2 + gammaC[2]*size3 + gammaC[3]*size4 + gammaC[4]*size5 +gammaC[5]*size6
                              + gammaC[6]*size7 + gammaC[7]*size8 + gammaC[8]*xbarB[clusterID] + gammaC[9]*xbarE[clusterID])

        # Cluster variation not explained by cluster model
        eps_aC = pm.Normal('eps_aC', mu=0, sd=sigma_aC, shape=nclusters)
        aC = pm.Deterministic('aC', mu_aC + eps_aC[clusterID])

        # ADA model for slope
        gammaA = pm.Normal('gammaA', mu=0., sd=1, shape=4)

        # ADA model for intercept
        mu_aA = pm.Deterministic('mu_aA', gammaA[0] + gammaA[1] * branch_measure + gammaA[2] * entropy_measure + gammaA[3]*xbarA[ADA])

        # # ADA variation not explained by entropy model
        eps_aA = pm.Normal('eps_aA', mu=0, sd=sigma_aA, shape=nADA)
        aA = pm.Deterministic('aA', mu_aA + eps_aA[ADA])

        # Model error
        sigma_y = pm.Uniform('sigma_y', lower=0, upper=100)

        # Individual level slopes
        b_area = pm.Normal('b_area', mu=0.004, sd=0.000018)
        b_year = pm.Normal('b_year', mu=1.651504, sd=0.03)
        b_house2 = pm.Normal('b_house2', mu=-2.378035, sd=0.1)
        b_house3 = pm.Normal('b_house3', mu=-1.557363, sd=0.1)
        b_house4 = pm.Normal('b_house4', mu=-1, sd=0.1)

        # Expected value
        y_hat = aC + aA + b_area * area \
                + b_year * year + b_house2 * house2 + b_house3 * house3 + b_house4 * house4

        # Data likelihood
        y_like = pm.Normal('y_like', mu=y_hat, sd=sigma_y, observed=price)
    
pm.model_to_graphviz(contextual_effect)
with contextual_effect:
    contextual_effect_trace = pm.sample(1000, init='advi+adapt_diag')

A few points, although I’m not sure how much they will help with the sampling speed.

  • gammaC looks like slope parameters of covariates. Can you change mu_aC to be a dot product, or part of it? That would make the code easier to read. Your first column for gammaC[0] could be a column of 1s.

  • sigma_y needs a more regularising prior, either pm.Exponential, pm.Gamma, pm.HalfNormal

  • Can you standardise the covariates? area for instance is much larger than the rest and hence b_area is much smaller. If possible then it’s best to standardise covariates to have zero mean and unit standard deviation before the modelling. An advantage of this approach is then all your slope parameters can have the same or very similar priors pm.Normal(0, 0.5)

In terms of the modelling I would reduce the number of data points in your model, you can use df.sample(5000), then tweak the model with this lower number before running on the entire dataset. You should look at pairplots to see which parameters are highly correlated with one another.

I might have misread the model but it seems like gammaC[0] and gammaA[0] are both for the global average/intercept? If so then they will be perfectly anti correlated and dramatically slow down the model.

1 Like

Thanks for the suggestions.

  • I changed to a dot product
  • I am testing these priors out
  • Most of the covariates were standardized, but I made some modifications here

I reduced the number of data to 10% and ran it again.

You are correct. I did have redundant intercepts! Thanks for the input. It now runs at 2-4 draws/s rather than 1-2 seconds/draw, which is certainly an improvement.

Good to hear, let us know if you have further issues.

On the technical side, theano does not always use the best/optimized BLAS version. But this depends a lot on the system and the installation. For me starting with “THEANO_FLAGS=‘blas.ldflags=-lblas’ python script.py” speeds up processing significantly. You might find some info (and additional tweaks) here: http://deeplearning.net/software/theano/troubleshooting.html

1 Like