Hierarchical Model - Slow Sampling

jfhawkin · March 21, 2020, 9:24pm

I have the following hierarchical model. The ADVI initialization converges at 33% and NUTS runs okay (in some instances I get 1 divergence/‘effective samples is smaller than 200 for some parameters’). The challenge is the runtime for NUTS. I understand this can run slow, but my model appears to be running quite a bit slower than others I’ve seen in examples, etc. It runs at ~1.7s/draw and takes ~4 hours to run on 4 cores of a Google cloud VM running linux.This is my first model in PyMC. Does anyone have any suggestions? The model is of real estate prices, with dwelling attributes, community attributes, and attributes of larger spatial areas. I was running it with 2000 draws but having some memory issues. I compare it against pooled/unpooled models. There are 35,879 observations.

xbarB = df.groupby('clusterID')['branch'].mean().values
xbarE = df.groupby('clusterID')['entropy'].mean().values
xbarA = df.groupby('ADA')['house_area'].mean().values

with pm.Model() as contextual_effect:

        # Priors
        sigma_aC = pm.HalfNormal('sigma_aC', 16)
        sigma_aA = pm.HalfNormal('sigma_aA', 277)

        # Cluster branch model for slope
        gammaC = pm.Normal('gammaC', mu=0., sd=1, shape=10)

        # Branch and entropy model for intercept
        mu_aC = pm.Deterministic('mu_aC', gammaC[0] + gammaC[1]*size2 + gammaC[2]*size3 + gammaC[3]*size4 + gammaC[4]*size5 +gammaC[5]*size6
                              + gammaC[6]*size7 + gammaC[7]*size8 + gammaC[8]*xbarB[clusterID] + gammaC[9]*xbarE[clusterID])

        # Cluster variation not explained by cluster model
        eps_aC = pm.Normal('eps_aC', mu=0, sd=sigma_aC, shape=nclusters)
        aC = pm.Deterministic('aC', mu_aC + eps_aC[clusterID])

        # ADA model for slope
        gammaA = pm.Normal('gammaA', mu=0., sd=1, shape=4)

        # ADA model for intercept
        mu_aA = pm.Deterministic('mu_aA', gammaA[0] + gammaA[1] * branch_measure + gammaA[2] * entropy_measure + gammaA[3]*xbarA[ADA])

        # # ADA variation not explained by entropy model
        eps_aA = pm.Normal('eps_aA', mu=0, sd=sigma_aA, shape=nADA)
        aA = pm.Deterministic('aA', mu_aA + eps_aA[ADA])

        # Model error
        sigma_y = pm.Uniform('sigma_y', lower=0, upper=100)

        # Individual level slopes
        b_area = pm.Normal('b_area', mu=0.004, sd=0.000018)
        b_year = pm.Normal('b_year', mu=1.651504, sd=0.03)
        b_house2 = pm.Normal('b_house2', mu=-2.378035, sd=0.1)
        b_house3 = pm.Normal('b_house3', mu=-1.557363, sd=0.1)
        b_house4 = pm.Normal('b_house4', mu=-1, sd=0.1)

        # Expected value
        y_hat = aC + aA + b_area * area \
                + b_year * year + b_house2 * house2 + b_house3 * house3 + b_house4 * house4

        # Data likelihood
        y_like = pm.Normal('y_like', mu=y_hat, sd=sigma_y, observed=price)
    
pm.model_to_graphviz(contextual_effect)
with contextual_effect:
    contextual_effect_trace = pm.sample(1000, init='advi+adapt_diag')

nkaimcaudle · March 22, 2020, 12:56am

A few points, although I’m not sure how much they will help with the sampling speed.

gammaC looks like slope parameters of covariates. Can you change mu_aC to be a dot product, or part of it? That would make the code easier to read. Your first column for gammaC[0] could be a column of 1s.
sigma_y needs a more regularising prior, either pm.Exponential, pm.Gamma, pm.HalfNormal
Can you standardise the covariates? area for instance is much larger than the rest and hence b_area is much smaller. If possible then it’s best to standardise covariates to have zero mean and unit standard deviation before the modelling. An advantage of this approach is then all your slope parameters can have the same or very similar priors pm.Normal(0, 0.5)

In terms of the modelling I would reduce the number of data points in your model, you can use df.sample(5000), then tweak the model with this lower number before running on the entire dataset. You should look at pairplots to see which parameters are highly correlated with one another.

I might have misread the model but it seems like gammaC[0] and gammaA[0] are both for the global average/intercept? If so then they will be perfectly anti correlated and dramatically slow down the model.

jfhawkin · March 23, 2020, 3:04pm

Thanks for the suggestions.

I changed to a dot product
I am testing these priors out
Most of the covariates were standardized, but I made some modifications here

I reduced the number of data to 10% and ran it again.

You are correct. I did have redundant intercepts! Thanks for the input. It now runs at 2-4 draws/s rather than 1-2 seconds/draw, which is certainly an improvement.

nkaimcaudle · March 24, 2020, 11:55pm

Good to hear, let us know if you have further issues.

jeti · March 26, 2020, 9:31am

On the technical side, theano does not always use the best/optimized BLAS version. But this depends a lot on the system and the installation. For me starting with “THEANO_FLAGS=‘blas.ldflags=-lblas’ python script.py” speeds up processing significantly. You might find some info (and additional tweaks) here: http://deeplearning.net/software/theano/troubleshooting.html

Topic		Replies	Views
Slow speed of NUTS in Hierarchical model Questions	14	2039	October 20, 2020
NUTS sampler slowing down on a hierarchical model Questions	6	913	February 28, 2021
Improving NUTS Runtime Questions	6	1017	June 24, 2020
Hierarchical Model - Slow Sampling ( 1.8 sec per draw ) - Primary suspect is how the multi-level hierarchy is coded Questions	5	835	November 2, 2021
Fast approximation for inference on hierarchical models? modeling	7	501	May 27, 2022

Hierarchical Model - Slow Sampling

Related topics