Sampling a specific country from the hierarchical model example

In the GLM: Hierarchical Linear Regression article a hierarchical regression model is used to predict radon levels in different counties.

By using pm.sample the parameter’s distributions for each country can be fitted. Now that I know the parameters, how can I sample the model for a specific county only?


Full details

Full details. Code from article:

import matplotlib.pyplot as plt
import numpy as np
import pymc3 as pm
import pandas as pd
import theano

data = pd.read_csv(pm.get_data('radon.csv'))
data['log_radon'] = data['log_radon'].astype(theano.config.floatX)
county_names = data.county.unique()
county_idx = data.county_code.values

n_counties = len(data.county.unique())

with pm.Model() as hierarchical_model:
    # Hyperpriors for group nodes
    mu_a = pm.Normal('mu_a', mu=0., sd=100**2)
    sigma_a = pm.HalfCauchy('sigma_a', 5)
    mu_b = pm.Normal('mu_b', mu=0., sd=100**2)
    sigma_b = pm.HalfCauchy('sigma_b', 5)

    # Intercept for each county, distributed around group mean mu_a
    # Above we just set mu and sd to a fixed value while here we
    # plug in a common group distribution for all a and b (which are
    # vectors of length n_counties).
    a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=n_counties)
    # Intercept for each county, distributed around group mean mu_a
    b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=n_counties)

    # Model error
    eps = pm.HalfCauchy('eps', 5)

    radon_est = a[county_idx] + b[county_idx] * data.floor.values

    # Data likelihood
    radon_like = pm.Normal('radon_like', mu=radon_est, sd=eps, observed=data.log_radon)

Then to fit the model and determine parameter distributions:

with hierarchical_model:
    hierarchical_trace = pm.sample(draws=2000, n_init=1000)

How can I now sample the model for a specific county only, for example AITKIN?

I’m assuming you mean ‘get the posterior distribution for a specific county only, after having performed inference using all counties.’ (The alternative would be to perform inference using just the observed data from a specific county, in which case the hierarchical model wouldn’t make sense).

a and b are 1-D arrays n_counties long. So you just want to get the samples for all counties, and choose only those for the county you are interested in:

a_samples = hierarchical_trace.get_values('a', thin=5)
aitkin_idx = 0  # this is a guess; the point is, each county has a unique integer index
aitkin_a_samples = a_samples[:, aitkin_idx]

That would give me estimates for the parameters a and b but not radon_like (log_radon) for a specific county

Getting log radon from a and b is just addition:

a_samples = hierarchical_trace.get_values('a', thin=5)
b_samples = hierarchical_trace.get_values('b', thin=5)
aitkin_idx = 0  # this is a guess; the point is, each county has a unique integer index
aitkin_a_samples = a_samples[:, aitkin_idx]
aitkin_b_samples = b_samples[:, aitkin_idx]

Posterior distribution of mean log radon in Aitkin, for houses w/ and w/o basements:

aitkin_log_radon_w_basement = aitkin_a_samples 
aitkin_log_radon_wo_basement = aitkin_a_samples + aitkin_b_samples
1 Like