TypeError in comparison when using pm.Data

brendan · June 13, 2021, 5:28pm

Hi all,

I have been reading through the documentation, trying to figure out how to correctly use pm.Data so I can pass in data for training. I keep having the following issue.

Code:

with pm.Model() as model:
    Site_prior = pm.Dirichlet("Site_prior", a=np.array([1,1,1,1,1]))
    Site_obs = pm.Data("Site_obs", X_train["Site_Code"]) # X_train["Site_Code"] takes on values 0,1,2,3,4
    Site = pm.Categorical("Site", p=Site_prior, observed=Site_obs)
    v = pm.math.switch(pm.math.eq(Site, 0), 0, 10)
    y = pm.Normal("y", mu=v, sigma=1, observed=y_train)
    pm.sample(10)

Error:

The error when converting the test value to that variable type:
TensorType(int32, vector) cannot store a value of dtype int64 without risking loss of precision. If you do not mind this loss, you can: 1) explicitly cast your data to int32, or 2) set "allow_input_downcast=True" when calling "function".

I do not understand what is causing this, because the code works fine if I pass the data to observed directly instead of using pm.Data. I’m sure I’m making some very basic mistake with this, as I am new to pymc3. If anyone could help, I would be very appreciative!

ckrapu · June 17, 2021, 2:42pm

There’s likely a processing step involved with observed that automatically casts the data or manages its type before the type of error you’ve seen can come up. Can you work around this by doing Site_obs = pm.Data("Site_obs", X_train["Site_Code"].astype('int32')?

Topic		Replies	Views
Usage of "pm.Data" leads to: "TypeError: expected type_num 5 (NPY_INT32) got 7" Questions	0	412	March 2, 2021
Integer values with pm.Data() Questions	2	1047	August 25, 2019
Cannot create Dirichlet with "float32" type Questions	4	1208	May 17, 2019
Trying to impute missing categorical data v5	7	556	January 3, 2023
TypeError: Invalid Use of Observed Data Variable v5 modeling	9	657	October 6, 2023

TypeError in comparison when using pm.Data

Related topics