Hi I’m trying to uprgrade to pymc5 in order to try out nutpie but am struggling to understand how to convert the model which used theano/aesera shared variables to the new pm.Data container. My model is something like this
age = data_df.age_num.astype(int).values
age_sh = shared(age)
n_age = len(set(age))
region = data_df.region
n_region = len(set(region))
observed_data = data_df.obs
def hierarchical_normal(name, shape, μ=0.0):
Δ = pm.Normal("Δ_{}".format(name), 0.0, 1, shape=(shape, 4)) # 4 is num parties
σ = pm.HalfCauchy("σ_{}".format(name), 1)
return pm.Deterministic(name, μ + Δ * σ)
with pm.Model() as model:
α_region = hierarchical_normal("state", n_region)
β0 = pm.Normal(
"β0",
0.0,
0.5,
shape=(1, 4),
)
η = β0 + α_age[age_sh ]
p = pm.math.softmax(η, axis=1)
obs = pm.Multinomial("O", n_, p, shape=(len(observed_data ), 4), observed=observed_data )
Currently this returns:
TypeError: TensorType does not support iteration. Maybe you are using builtins.sum instead of aesara.tensor.math.sum? (Maybe .max?)
Is there a simple way to make this work in pymc5? If not how should I write this using the pm.Data ?
Thanks
Rhys
Is that example complete? I don’t see where α_age
is defined and α_region
doesn’t seem to be used at all.
Yes sorry I’ve made a mistake when trying to make a minimal example, (I can’t seem to be able to edit the original post) the code should be
age = data_df.age_num.astype(int).values
age_sh = shared(age)
n_age = len(set(age))
region = data_df.region
region_sh = shared(region)
n_region = len(set(region))
observed_data = data_df.obs
def hierarchical_normal(name, shape, μ=0.0):
Δ = pm.Normal("Δ_{}".format(name), 0.0, 1, shape=(shape, 4)) # 4 is num parties
σ = pm.HalfCauchy("σ_{}".format(name), 1)
return pm.Deterministic(name, μ + Δ * σ)
with pm.Model() as model:
α_region = hierarchical_normal("state", n_region)
α_age = hierarchical_normal("age", n_age)
β0 = pm.Normal(
"β0",
0.0,
0.5,
shape=(1, 4),
)
η = β0 + α_age[age_sh ] + α_region[region_sh ]
p = pm.math.softmax(η, axis=1)
obs = pm.Multinomial("O", n_, p, shape=(len(observed_data ), 4), observed=observed_data )
There are going to be several different ways to handle this, but it’s going to impact both your model and your hierarchical_normal()
function. The simplest way would be to just do this kind of thing:
age = data_df.age_num.astype(int).values
age_idx,ages = pd.factorize(data_df["age_num"])
n_age = len(ages)
region_idx,regions = pd.factorize(data_df["region"])
n_region = len(regions)
and then index into your parameter arrays like this:
η = β0 + α_age[age_idx] + α_region[region_idx]
However, you can definitely utilize pm.MutableData
objects if the flexibility would be of use to you. I would also recommend utilizing the dimension/coordinate functionality to have more meaningful index labels. Check here and here.