Hierarchical model with predictors with uncertainty at a different level than rest of data

I have a multilevel model that predicts within state effects, and a larger overall effect. My data is at the level of the individual in the dataset - each row is a user, a state id, a variable X1, and a boolean with whether they converted or not.

I have another predictor, X2, that has 1) measurement error and 2) its at the state level, not the individual level, so I have only 51 observation estimates (DC included) and their 51 standard errors.

I’m having trouble incorporating it into my model - any ideas? I tried following this tutorial https://github.com/aloctavodia/Statistical-Rethinking-with-Python-and-PyMC3/blob/master/Chp_14.ipynb but I’m not quite sure what I’m missing. Code Below

with pm.Model() as m3:
    ## hyperpriors
    sigma_a = pm.Exponential('sigma_a', 1)
    mu_a = pm.Normal('mu_a', 0, 1.5)

    sigma_X1 = pm.Exponential('sigma_X1', 1)
    mu_X1 = pm.Normal('mu_X1', 0, 1.5)

    # prior
    # noncentering
    a_offset = pm.Normal('a_offset', mu=0, sd=1, shape=len(state_dct))
    a = pm.Deterministic("a", mu_a + a_offset * sigma_a)   

    bX1_offset = pm.Normal('bX1_offset', mu=0, sd=1, shape=len(state_dct))
    bX1 = pm.Deterministic("bX1", mu_X1 + bX1_offset * sigma_X1)

    # Other model
    X2_est = pm.Flat('X2_est', shape=len(data))

    # Missing data
    sigma_X2 = pm.Exponential('sigma_X2', 1)
    mu_X2 = pm.Normal('mu_X2', 0, 1.5)

    bX2_offset = pm.Normal('bX2_offset', mu=0, sd=1, shape=len(state_dct))
    bX2 = pm.Normal('bX2', mu_X2 + bX2_offset * sigma_X2)


    # equations
    p = pm.invlogit(a[state_id] + bX1[state_id]*(data['X1']) + bX2[state_id2]*X2_est)

    conversions = pm.Binomial('conversions', 1, p, observed = data['conversion'])

    obs = pm.Normal('X2_obs', X2_est, X2_SE, 
            observed=X2_obs)

    start = dict(X2_est=X2_obs)

    # sampling
    prior = pm.sample_prior_predictive()
    trace = pm.sample(1000, tune=1000, start=start, njobs=4)
    posterior_predictive = pm.sample_posterior_predictive(trace)

The error I’m getting is the following:

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 0, got 1 with shape (51,).

And shapes of variables I used for clarity:

  • X2_obs: (51,)
  • X2_SE: (51,)
  • state_id: (3000,)
  • state_id2: (51,)
  • len(state_dct): 51
  • len(data): 3000

I did the best I could to get the model’s shapes right, though you should double-check to make sure I haven’t misinterpreted your intentions regarding each variable. I had to replace the Flat distribution because you can’t sample from it and therefore cannot sample from the prior. I also changed some of the indexing to get the shapes right. There appeared to be just a few mistakes in keeping track of the variables with shape (51) and those with shape (3000).

import pymc3 as pm
import numpy as np

X2_obs=np.ones(51)
X2_SE=np.ones(51)
state_id  = np.random.choice(51,size=3000)
state_id2 = np.random.choice(51,size=3000)
state_dct = np.ones(51)
data_X1 = np.ones(3000)
data_conversion = np.ones(3000)

with pm.Model() as m3:
    ## hyperpriors
    sigma_a = pm.Exponential('sigma_a', 1)
    mu_a = pm.Normal('mu_a', 0, 1.5)

    sigma_X1 = pm.Exponential('sigma_X1', 1)
    mu_X1 = pm.Normal('mu_X1', 0, 1.5)

    # prior
    # noncentering
    a_offset = pm.Normal('a_offset', mu=0, sd=1, shape=len(state_dct))
    a = pm.Deterministic("a", mu_a + a_offset * sigma_a)   

    bX1_offset = pm.Normal('bX1_offset', mu=0, sd=1, shape=len(state_dct))
    bX1 = pm.Deterministic("bX1", mu_X1 + bX1_offset * sigma_X1)

    # Other model
    X2_est = pm.Normal('X2_est', shape=51, sd=50)

    # Missing data
    sigma_X2 = pm.Exponential('sigma_X2', 1)
    mu_X2 = pm.Normal('mu_X2', 0, 1.5)

    bX2_offset = pm.Normal('bX2_offset', mu=0, sd=1, shape=len(state_dct))
    bX2 = pm.Normal('bX2', mu_X2 + bX2_offset * sigma_X2, shape=len(state_dct))


    # equations
    p = pm.invlogit(a[state_id] + bX1[state_id]*(data_X1) + bX2[state_id2]*X2_est[state_id2])

    conversions = pm.Binomial('conversions', 1, p, observed=data_conversion)

    obs = pm.Normal('X2_obs', X2_est, X2_SE, 
            observed=X2_obs)

    start = dict(X2_est=X2_obs)

    # sampling
    prior = pm.sample_prior_predictive()
    trace = pm.sample(1000, tune=1000, start=start, chains=4)
    posterior_predictive = pm.sample_posterior_predictive(trace)
2 Likes