Hierarchical Model - Slow Sampling ( 1.8 sec per draw ) - Primary suspect is how the multi-level hierarchy is coded

Hello PyMC3 community,

I’m coming to you with another question about hierarchical modeling for regression. My problem is that sampling is taking way too long. When I try to sample 2000 draw on 4 chains with 500 tuning steps, it takes nearly 5 hours to run, which will make my future plan of cross-validation take an unreasonable amount of time.

I believe the reason that my sampling is taking so long is related to the hierarchal structure I placed on my intercept term. My idea was that my intercept term should be informed by county level, state level and country level observations; something along the lines of:

a_county = a_state_mean + county_level_noise
a_state = a_country_mean + state_level_noise
a_country = global_mean + noise

where the idea is that if I get an observation from a new county, then I should at least have the state mean as a hopefully good enough starting point for making a prediction. The code for my model is as follows:

with pm.Model(coords=coords) as work_in_progress:
            
    # Hyper Priors: Country
    sigma_a_country = pm.HalfNormal("sigma_a_country", 3.)
    mu_a_country = pm.Normal("mu_a_country", mu=9., sigma=2.5)

    # Country coefficients
    a_country_offset = pm.Normal("a_country_offset", mu=0., sigma=1., dims= "country")
    a_country = pm.Deterministic("a_country", mu_a_country + sigma_a_country*a_country_offset)
    
    # Hyper Priors: State
    sigma_a_state = pm.HalfNormal("sigma_a_state", 3.)

    # State coefficients
    a_state_offset = pm.Normal("a_state_offset", mu=0., sigma=1., dims= "state")
    a_state = pm.Deterministic("a_state", a_country[country] + sigma_a_state*a_state_offset[state])
    
    # Hyper Priors: County
    sigma_a_county = pm.HalfNormal("sigma_a_county", 3.)

    # County coefficients
    a_county_offset = pm.Normal("a_county_offset", mu=0., sigma=1., dims= "county")
    a_county = pm.Deterministic("a_county", a_state[state] + sigma_a_county*a_county_offset[county])

            
    # Model error
    epsilon = pm.HalfCauchy("eps", 5.0)

    target_estimate = a_county[county]

    # Data likelihood
    target_like = pm.TruncatedNormal("target_likelihood", 
                           mu=yield_estimate, 
                           sigma=epsilon, 
                           observed = target.values, 
                           dims="obs_id", lower = 0.)

When I visualize this model pm.model_to_graphviz(work_in_progress) I think the issue is that both a_state and a_county are at the observation level and I’m not sure how to fix that.

I’m new to PyMC3/Bayesian modeling so maybe my thought process is wrong but in my mind, they should both be like a_country. Meaning they should both be contained in their own box of dimension 35 and 187 respectively and the only thing that should be at the observation level is the likelihood?

Any feedback would be greatly appreciated. The closest similar question I could find was here but I’m not sure how this individual fixed their intercept terms. Any feedback would be greatly appreciated.

1 Like

For some context if I reduce the intercept term to just the country level, then it only takes 34 minutes to run the same sampling set up.

I have a solution that reduced the training time tremendously. I’ll provide the code along with some today data so that it’s easier to follow:

import arviz as az
import numpy as np
import pandas as pd
import pymc3 as pm

toy_data = pd.DataFrame({"region":["West"]*25 + ["East"]*10, "state":["CA"]*15 + ["NV"]*10 + ["NY"]*10,
               "county":["Los Angeles"]*10 + ["San Diego"]*3 + ["Orange"]*2 + ["Clark"]*5 + ["Washoe"]*5 + ["Kings"]*5 + ["Queens"]*5,
               "y": list(2+ 0.1*np.random.randn(10)) + list(2.5 + 0.1*np.random.randn(3)) + list(3. + 0.1*np.random.randn(2)) + list(5. + 0.1*np.random.randn(5)) + list(5.5 + 0.1*np.random.randn(5)) + list(8. + 0.1*np.random.randn(5)) + list(8.5 + 0.1*np.random.randn(5))})

toy_data["state_region"] = toy_data["state"] + "_" + toy_data["region"]
toy_data["state_region"] = toy_data["state_region"].astype("category")
toy_data["county_state_region"] = toy_data["county"] + "_" + toy_data["state"] + "_" + toy_data["region"]
toy_data["county_state_region"] = toy_data["county_state_region"].astype("category")


toy_data["region"] = toy_data["region"].astype("category")
toy_data["state"] = toy_data["state"].astype("category")
toy_data["county"] = toy_data["county"].astype("category")

region_index = (
    toy_data[["region", "state", "state_region"]]
    .drop_duplicates()["region"]
    .cat.codes
)

state_index = (
    toy_data[[ "region", "state", "county", "county_state_region"]]
    .drop_duplicates()["state"]
    .cat.codes
)
county = toy_data["county"].cat.codes

number_of_regions = toy_data["region"].nunique()
number_of_states = toy_data["state"].nunique()
number_of_counties = toy_data["county"].nunique()

with pm.Model() as toy_model:

    # Hyper Priors: Region
    sigma_a_region = pm.HalfNormal("sigma_a_region", 1.)
    mu_a_region = pm.Normal("mu_a_region", mu = 5., sigma = 2.5,)
    a_region_offset = pm.Normal("a_region_offset", 0., 1., shape = number_of_regions)

    # Country coefficients
    a_region = pm.Deterministic("a_region", mu_a_region + sigma_a_region*a_region_offset)

    # Hyper Priors: State
    sigma_a_state = pm.HalfNormal("sigma_a_state", 2.)
    a_state_offset = pm.Normal("a_state_offset", 0., 1., shape = number_of_states)

    # State coefficients
    a_state = pm.Deterministic("a_state", a_region[region_index.values] + sigma_a_state*a_state_offset)    

    # Hyper Priors: County
    sigma_a_county = pm.HalfNormal("sigma_a_county", 2.)
    a_county_offset = pm.Normal("a_county_offset", 0., 1., shape = number_of_counties)

    # County coefficients
    a_county = pm.Deterministic("a_county", a_state[state_index.values] + sigma_a_county*a_county_offset)

    # Model error
    epsilon = pm.HalfNormal("eps", 5.)

    yield_estimate = (a_county[county.values]) # + pm.math.dot(X[modeling_columns].values, beta) )

    # Data likelihood
    yield_like = pm.Normal("yield_likelihood", 
                           mu=yield_estimate, 
                           sigma=epsilon, 
                           observed = toy_data.y,
                           )

With this model structure, I no longer have the a_state and a_county down at the observation level, which seems to tremendously boosted the sampling rate. Also, I changed the eps HalfCauchy to a HalfNormal, which has also sped up the rate of sampling.

You could fit this toy model very easily with Bambi as follows (no data-processing needed)

import arviz as az
import bambi as bmb
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
toy_data = pd.DataFrame(
    {
        "region":["West"] * 25 + ["East"] * 10, 
        "state": ["CA"] * 15 + ["NV"] * 10 + ["NY"] * 10,
        "county":["Los Angeles"] * 10 + ["San Diego"] * 3 + ["Orange"] * 2 + ["Clark"] * 5 + ["Washoe"] * 5 + ["Kings"] * 5 + ["Queens"] * 5,
        "y": (
            list(2 + 0.1 * np.random.randn(10))
            + list(2.5 + 0.1 * np.random.randn(3))
            + list(3 + 0.1 * np.random.randn(2))
            + list(5 + 0.1 * np.random.randn(5))
            + list(5.5 + 0.1 * np.random.randn(5))
            + list(8 + 0.1 * np.random.randn(5))
            + list(8.5 + 0.1 * np.random.randn(5))
        )
    }
)
intercept_prior = bmb.Prior("Normal", mu=0, sigma=1)
group_specific_sd = bmb.Prior("HalfNormal", sigma=1)
group_specific_prior = bmb.Prior("Normal", mu=0, sigma=group_specific_sd)

sigma_prior = bmb.Prior("HalfNormal", sigma=0.5)

priors = {
    "Intercept": intercept_prior, 
    "group_specific": group_specific_prior, 
    "sigma": sigma_prior
}
# (1 | state/county) means you want a group-specific intercept for each sate
# and each county within each state.
model_bmb = bmb.Model("y ~ (1|state/county)", toy_data, priors=priors)
idata = model_bmb.fit(target_accept=0.9)
az.plot_trace(idata, backend_kwargs={"tight_layout":True})
fig = plt.gcf()
fig.set_facecolor("w")
fig.savefig("posterior.png", dpi=300)

Note there are some divergences, but that also happens with the PyMC model. I think that’s because we’re using a hierarchical component for the state when there are only 3 states, and the sample sizes for some counties are very low (2 in one case for example)

EDIT Since county names are unique within states, you can also write "y ~ (1|state) + (1|county)".

Thank you for posting! I’ll try using the Bambi package.

Hi @tcapretto. I’m not sure these models are doing the same thing because the graphviz is showing a very different model structure.

model_bmb.build()
gv = pm.model_to_graphviz(model_bmb.backend.model)
gv.format = 'png'
gv.render(filename='model_graph',)

pm.model_to_graphviz(model_bmb.backend.model)