Confused about building a compound distribution hierarchical Bayes model with continuous and categorical inputs in pymc3


I am trying to build a semi-complicated pymc3 hierarchical Bayesian model using several predictors. I think my issue is due to a lack of understanding on my part as to how to properly build these models up. I am either getting an error of:

TypeError: For compute_test_value, one input test value does not have the requested type. The error when converting the test value to that variable type: Wrong number of dimensions: expected 0, got 1 with shape (53724,).

or I set the shape parameter for N and X to be the length of the data set and I run out of RAM (this seems inappropriate anyway, so I am not surprised).

My original strategy was to build a model that predicts total cost based on predicting claim counts multiplied by average cost per claim, where the claim counts and average counts follow their own respective distributions (Zero-Inflated Negative Binomial and Gamma, respectively). These are each built up using a linear combination of weights based on several predictors from my data set. Am I fundamentally approaching this incorrectly?

with Model() as model:
    # Priors
    psi_N = Uniform('psi_N', lower=0, upper=1)
    alpha_N = HalfCauchy('alpha_N', beta=1)
    sigma_X = HalfCauchy('sigma_X', beta=1)
    mu_N_noise = StudentT('mu_N_noise', nu=3, mu=0, sd=1)
    mu_N_MT = StudentT('mu_N_MT', nu=3, mu=0, sd=1, shape=num_MT)
    mu_N_V = StudentT('mu_N_V', nu=3, mu=0, sd=1, shape=num_V)
    mu_N_G = StudentT('mu_N_G', nu=3, mu=0, sd=1, shape=num_G)
    mu_N_A = StudentT('mu_N_A', nu=3, mu=0, sd=1, shape=num_A)
    mu_N_RA = StudentT('mu_N_RA', nu=3, mu=0, sd=1, shape=num_RA)
    mu_N_TI = StudentT('mu_N_TI', nu=3, mu=0, sd=1, shape=num_TI)
    mu_N_RS = StudentT('mu_N_RS', nu=3, mu=0, sd=1)
    mu_N_MM = StudentT('mu_N_MM', nu=3, mu=0, sd=1)
    mu_N_IP = StudentT('mu_N_IP', nu=3, mu=0, sd=1)
    mu_N_OP = StudentT('mu_N_OP', nu=3, mu=0, sd=1)
    mu_N_P = StudentT('mu_N_P', nu=3, mu=0, sd=1)
    mu_N_O = StudentT('mu_N_O', nu=3, mu=0, sd=1)
    mu_N_Rx = StudentT('mu_N_Rx', nu=3, mu=0, sd=1)
    mu_X_noise = StudentT('mu_X_noise', nu=3, mu=0, sd=1)
    mu_X_MT = StudentT('mu_X_MT', nu=3, mu=0, sd=1, shape=num_MT)
    mu_X_V = StudentT('mu_X_V', nu=3, mu=0, sd=1, shape=num_V)
    mu_X_G = StudentT('mu_X_G', nu=3, mu=0, sd=1, shape=num_G)
    mu_X_A = StudentT('mu_X_A', nu=3, mu=0, sd=1, shape=num_A)
    mu_X_RA = StudentT('mu_X_RA', nu=3, mu=0, sd=1, shape=num_RA)
    mu_X_TI = StudentT('mu_X_TI', nu=3, mu=0, sd=1, shape=num_TI)
    mu_X_RS = StudentT('mu_X_RS', nu=3, mu=0, sd=1)
    mu_X_MM = StudentT('mu_X_MM', nu=3, mu=0, sd=1)
    mu_X_IP = StudentT('mu_X_IP', nu=3, mu=0, sd=1)
    mu_X_OP = StudentT('mu_X_OP', nu=3, mu=0, sd=1)
    mu_X_P = StudentT('mu_X_P', nu=3, mu=0, sd=1)
    mu_X_O = StudentT('mu_X_O', nu=3, mu=0, sd=1)
    mu_X_Rx = StudentT('mu_X_Rx', nu=3, mu=0, sd=1)

    # Deterministic combination of variates using priors to generate the primary variables
    mu_N = Deterministic('mu_N', mu_N_noise + mu_N_MT[MetalTier] + mu_N_V[Variant] + mu_N_G[Gender] + mu_N_A[AgeBand] + mu_N_RA[RatingArea] + mu_N_TI[TobaccoIndicator] +
                         mu_N_RS * RS17 + mu_N_MM * MM + mu_N_IP * IP_Count + mu_N_OP * OP_Count + mu_N_P * Prof_Count + mu_N_O * Other_Count + mu_N_Rx * Rx_Count)
    mu_X = Deterministic('mu_X', mu_X_noise + mu_X_MT[MetalTier] + mu_X_V[Variant] + mu_X_G[Gender] + mu_X_A[AgeBand] + mu_X_RA[RatingArea] + mu_X_TI[TobaccoIndicator] +
                         mu_X_RS * RS17 + mu_X_MM * MM + mu_X_IP * IP_Avg + mu_X_OP * OP_Avg + mu_X_P * Prof_Avg + mu_X_O * Other_Avg + mu_X_Rx * Rx_Avg)

    # Primary variables, Frequency (N) and Severity (X)
    N = ZeroInflatedNegativeBinomial('N', psi=psi_N, mu=mu_N, alpha=alpha_N)#, shape=len(x_train))
    X = Gamma('X', mu=mu_X, sd=sigma_X)#, shape=len(x_train))

    # Model error
    sigma_y = Normal('sigma_y', mu=0, sd=1e3)

    # Expected value
    y_hat = N * X

    # Data likelihood
    y_like = Normal('y_like', mu=y_hat, sd=sigma_y, observed=y_train.values)

I am expecting the model to be able to compile without running into an error, or not running out of RAM (on a 64GB machine). Do I need to split this into two separate models?

In pymc3, distributions cannot infer their shape from their parameters. You need to explicitly provide the shapes of Nand X.