Index for hierarchical model

Hi I am having problems with indexing a hierarchical model and have read the examples and searched for previous questions.

Below is working code that reproduces problem. There are 3 expts; each expt contains 1000 data points. How do I index them all? Thanks in advance!

import numpy as np
import pymc3 as pm
import theano.tensor as tt

n_drops = 1000

s1 = np.random.normal(35,7,n_drops)
s2 = np.random.normal(40,6,n_drops)
s3 = np.random.normal(45,4,n_drops)
s = np.stack((s1,s2,s3),axis=0)

mu1 = np.mean(s1)*np.ones(n_drops)
mu2 = np.mean(s2)*np.ones(n_drops)
mu3 = np.mean(s3)*np.ones(n_drops)
mu = np.stack((mu1,mu2,mu3),axis=0)

sd1 = np.std(s1)*np.ones(n_drops)
sd2 = np.std(s2)*np.ones(n_drops)
sd3 = np.std(s3)*np.ones(n_drops)
sd = np.stack((sd1,sd2,sd3),axis=0)

######### Non-hierarchical works ##############################
tau_true = 1.5
active_true = s > mu + tau_true * sd
Y_true = np.multiply(s, active_true)
Y_true = np.sum(Y_true,axis=1)

with pm.Model() as model_c:
ϵ = pm.HalfCauchy(‘ϵ’, 50)
tau_ = pm.Normal('tau_ ', mu = 0, sd = 2)

active = s > mu + tau_ * sd

μ = pm.Deterministic('μ', ((s*active)).sum(axis=1))
Y = pm.Normal('Y', mu=μ, sd=ϵ, observed=Y_true)

trace_c = pm.sample(cores=1)

pm.traceplot(trace_c);
print(pm.summary(trace_c))

######### Hierarchical indexing problem ##############################

The 3 datasets belong to 2 diff categories

type_idx = np.array((0,0,1))

tau_true = np.array([1.5, 1.5, 2.0]).reshape(3,1)
active_true = s > mu + tau_true * sd
Y_true = np.multiply(s, active_true)
Y_true = np.sum(Y_true,axis=1)

with pm.Model() as model_d:
ϵ = pm.HalfCauchy(‘ϵ’, 50)
tau_ = pm.Normal('tau_ ', mu = 0, sd = 2, shape=2)

active = s > mu + tau_[type_idx] * sd
## get error here 
## ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 1000)

μ = pm.Deterministic('μ', ((s*active)).sum(axis=1))
Y = pm.Normal('Y', mu=μ, sd=ϵ, observed=Y_true)

trace_d = pm.sample(cores=1)

pm.traceplot(trace_d);
print(pm.summary(trace_d))

Hi hope someone can chime in here. In the Radon example simply index each number with its category (county) works. However I have an array of 1000 numbers for each experiment. How do I index each of these 1000 numbers for their category?

Thank you in advance!

You just need to reshape your type_idx into shape (3,1) instead of (3,):

type_idx = np.array((0,0,1)).reshape(3,1)

That being said, in both your models you get a ton of divergences, potentially leading to biased estimates (even though it does capture your tau_true well). You might want to look at reformulating your model a bit. This is probably down to the fact that your noise term is infinitesmal, on the order of 10^-11, so your observations aren’t really Normal, though this could be a quirk of your toy data here.

1 Like

Thank you BioGoertz,

It works now! After adding simulated noise to Y_true there are almost no more divergences. Thanks!

1 Like

Great!