Usage of "pm.Data" leads to: "TypeError: expected type_num 5 (NPY_INT32) got 7"

Hey everyone,

I’m trying to build a model using the pm.Data interface so that I can easily swap out data for out-of-samples predictions.
Additionally, I try to fit a distribution over input data, so in case this is missing I can sample from the fitted posterior during out of sample predictions.
Though I encounter a theano numtype error, when using the pm.Data instead of a plain pandas/numpy vector. Also casting the vector to the demanded np.int32 did not help.

I’m not sure if it is a bug or me being the problem ^^.

Code example:

import pymc3 as pm
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "alpha": [1, 2, 3],
        "beta": [4, 5, 6]
    }
)

with pm.Model() as model:
    alpha_data = pm.Data("alpha_data", df["alpha"].astype(np.int32).values)
    alpha_prior = pm.HalfNormal('alpha_prior', sigma=10)
    alpha = pm.Poisson("alpha", mu=alpha_prior, observed=alpha_data) # TypeError: expected type_num 5 (NPY_INT32) got 7
    # alpha = pm.Poisson("alpha", mu=alpha_prior, observed=df["alpha"].astype(np.int32).values) # WORKS

    # GLM
    intercept = pm.HalfNormal("Intercept", sigma=20)
    x_coeff = pm.HalfNormal("coeffs", sigma=20)
    eq = pm.Deterministic("eq", intercept + x_coeff*alpha)
    beta_data = pm.Data("beta_data", df["beta"].values)

    y = pm.Poisson("beta", mu=eq, observed=beta_data) # WORKS

with model:
    trace = pm.sample(4000, tune=2000, chains=2, return_inferencedata=False)

I’m sorry for the formatting, I struggle with this interface ^^.

Error:

TypeError: expected type_num 5 (NPY_INT32) got 7
Apply node that caused the error: Elemwise{Composite{Switch((EQ(Composite{(i0 + (i1 * i2))}(i0, i1, i2), i3) * EQ(Cast{int64}(i4), i3)), i3, Switch(Cast{int8}((GE(Composite{(i0 + (i1 * i2))}(i0, i1, i2), i3) * GE(Cast{int64}(i4), i3))), ((Switch(EQ(Composite{(i0 + (i1 * i2))}(i0, i1, i2), i3), Switch(EQ(Cast{int64}(i4), i3), i5, i6), (Cast{int64}(i4) * log(Composite{(i0 + (i1 * i2))}(i0, i1, i2)))) - scalar_gammaln((i7 + Cast{int64}(i4)))) - Composite{(i0 + (i1 * i2))}(i0, i1, i2)), i6))}}(Elemwise{exp,no_inplace}.0, Elemwise{exp,no_inplace}.0, alpha, TensorConstant{(1,) of 0}, beta_data, TensorConstant{(1,) of 0.0}, TensorConstant{(1,) of -inf}, TensorConstant{(1,) of 1})
Toposort index: 6
Inputs types: [TensorType(float64, (True,)), TensorType(float64, (True,)), TensorType(int32, vector), TensorType(int8, (True,)), TensorType(int32, vector), TensorType(float32, (True,)), TensorType(float32, (True,)), TensorType(int64, (True,))]
Inputs shapes: [(1,), (1,), (3,), (1,), (3,), (1,), (1,), (1,)]
Inputs strides: [(8,), (8,), (8,), (1,), (4,), (4,), (4,), (8,)]
Inputs values: [array([15.95769122]), array([15.95769122]), array([1, 2, 3]), array([0], dtype=int8), array([4, 5, 6], dtype=int32), array([0.], dtype=float32), array([-inf], dtype=float32), array([1])]
Outputs clients: [[Sum{acc_dtype=float64}(Elemwise{Composite{Switch((EQ(Composite{(i0 + (i1 * i2))}(i0, i1, i2), i3) * EQ(Cast{int64}(i4), i3)), i3, Switch(Cast{int8}((GE(Composite{(i0 + (i1 * i2))}(i0, i1, i2), i3) * GE(Cast{int64}(i4), i3))), ((Switch(EQ(Composite{(i0 + (i1 * i2))}(i0, i1, i2), i3), Switch(EQ(Cast{int64}(i4), i3), i5, i6), (Cast{int64}(i4) * log(Composite{(i0 + (i1 * i2))}(i0, i1, i2)))) - scalar_gammaln((i7 + Cast{int64}(i4)))) - Composite{(i0 + (i1 * i2))}(i0, i1, i2)), i6))}}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag ‘optimizer=fast_compile’. If that does not work, Theano optimizations can be disabled with ‘optimizer=None’.
HINT: Use the Theano flag ‘exception_verbosity=high’ for a debugprint and storage map footprint of this apply node.

Is it possible that pm.Data casts the type automatically to int64?