Problems when changing the likelihood slightly

evenmm · August 31, 2023, 10:14am

Hi all. I’ve been successfully running a hierarchical Bayesian model for repeated measurements, then changed the likelihood model and got a long error message I don’t understand.

In my problem, each patient has an unequal number of repeated measurements, the values and times are stored in the 1 dimensional arrays Y_flat_no_nans and t_flat_no_nans. The structure of which measurements belong to which patients is stored in group_id, a list of patient ids.
Additionally, each patient has a parameter set: psi, rho_r, rho_s and pi_r.
To get the mean of the likelihood from the parameters, group_id is used to index into the parameter arrays.

The change that caused the error was to change the likelihood from two populations r and s growing and decreasing independently:
M(t_{ij}; \psi_i, \pi_i^r, \rho_i^r, \rho_i^s) = \psi_i^0 \Big( \pi_i^r \exp\Big(\rho_i^r t_{ij}\Big) + (1-\pi_i^r) \exp \Big(\rho_i^s t_{ij}\Big) \Big), which in vector form is:

mu_Y = psi[group_id] * (pi_r[group_id]*np.exp(rho_r[group_id]*t_flat_no_nans) + (1-pi_r[group_id])*np.exp(rho_s[group_id]*t_flat_no_nans))

to a model where the two populations interact:
M(t_{ij}; \psi_i, \pi_i^r, \rho_i^r, \rho_i^s) = \psi_i^0 \Big( \pi_i^r \exp\Big(\rho_i^r \psi_i^0(1-\pi_i^r) (t_{ij} + \frac{1 - e^{\rho_i^s t_{ij}}}{\rho_i^s}) \Big) + (1-\pi_i^r) \exp \Big(\rho_i^s t_{ij}\Big) \Big), which is:

mu_Y = psi[group_id] * (pi_r[group_id]*np.exp(rho_r[group_id] * psi[group_id] * (1-pi_r[group_id]) * (t_flat_no_nans + (1 - np.exp(rho_s[group_id]*t_flat_no_nans)/(rho_s[group_id])))) + (1-pi_r[group_id])*np.exp(rho_s[group_id] * t_flat_no_nans))

Here’s the code; I’ll add the error message below.

from utilities import *
import arviz as az
import pymc as pm
##############################
# Function argument shapes: 
# X is an (N_patients, P) shaped pandas dataframe
# patient dictionary contains N_patients patients in the same order as X

def linear_model(X, patient_dictionary):
    df = pd.DataFrame(columns=["patient_id", "mprotein_value", "time"])
    for ii in range(len(patient_dictionary)):
        patient = patient_dictionary[ii]
        mprot = patient.Mprotein_values
        times = patient.measurement_times
        for jj in range(len(mprot)):
            single_entry = pd.DataFrame({"patient_id":[ii], "mprotein_value":[mprot[jj]], "time":[times[jj]]})
            df = pd.concat([df, single_entry], ignore_index=True)
    group_id = df["patient_id"].tolist()
    Y_flat_no_nans = np.array(df["mprotein_value"].tolist())
    t_flat_no_nans = np.array(df["time"].tolist())

    N_patients, P = X.shape
    P0 = int(P / 2) # A guess of the true number of nonzero parameters is needed for defining the global shrinkage parameter
    X_not_transformed = X.copy()
    X = X.T
    yi0 = np.zeros(N_patients)
    for ii in range(N_patients):
        yi0[ii] = patient_dictionary[ii].Mprotein_values[0]

    with pm.Model(coords={"predictors": X_not_transformed.columns.values}) as linear_model:
        # Observation noise (std)
        sigma_obs = pm.HalfNormal("sigma_obs", sigma=1)

        # alpha
        alpha = pm.Normal("alpha",  mu=np.array([np.log(0.002), np.log(0.002), np.log(0.5/(1-0.5))]),  sigma=1, shape=3)

        # beta (with horseshoe priors):
        # Global shrinkage prior
        tau_rho_s = pm.HalfStudentT("tau_rho_s", 2, P0 / (P - P0) * sigma_obs / np.sqrt(N_patients))
        tau_rho_r = pm.HalfStudentT("tau_rho_r", 2, P0 / (P - P0) * sigma_obs / np.sqrt(N_patients))
        tau_pi_r = pm.HalfStudentT("tau_pi_r", 2, P0 / (P - P0) * sigma_obs / np.sqrt(N_patients))
        # Local shrinkage prior
        lam_rho_s = pm.HalfStudentT("lam_rho_s", 2, dims="predictors") # shape = (P,) without predictors
        lam_rho_r = pm.HalfStudentT("lam_rho_r", 2, dims="predictors")
        lam_pi_r = pm.HalfStudentT("lam_pi_r", 2, dims="predictors")
        c2_rho_s = pm.InverseGamma("c2_rho_s", 1, 0.1)
        c2_rho_r = pm.InverseGamma("c2_rho_r", 1, 0.1)
        c2_pi_r = pm.InverseGamma("c2_pi_r", 1, 0.1)
        z_rho_s = pm.Normal("z_rho_s", 0.0, 1.0, dims="predictors")
        z_rho_r = pm.Normal("z_rho_r", 0.0, 1.0, dims="predictors")
        z_pi_r = pm.Normal("z_pi_r", 0.0, 1.0, dims="predictors")
        # Shrunken coefficients
        beta_rho_s = pm.Deterministic("beta_rho_s", z_rho_s * tau_rho_s * lam_rho_s * np.sqrt(c2_rho_s / (c2_rho_s + tau_rho_s**2 * lam_rho_s**2)), dims="predictors")
        beta_rho_r = pm.Deterministic("beta_rho_r", z_rho_r * tau_rho_r * lam_rho_r * np.sqrt(c2_rho_r / (c2_rho_r + tau_rho_r**2 * lam_rho_r**2)), dims="predictors")
        beta_pi_r = pm.Deterministic("beta_pi_r", z_pi_r * tau_pi_r * lam_pi_r * np.sqrt(c2_pi_r / (c2_pi_r + tau_pi_r**2 * lam_pi_r**2)), dims="predictors")

        # Latent variables theta
        omega = pm.HalfNormal("omega",  sigma=1, shape=3) # Patient variability in theta (std)
        theta_rho_s = pm.Normal("theta_rho_s", mu= alpha[0] + pm.math.dot(beta_rho_s, X), sigma=omega[0]) # Individual random intercepts in theta to confound effects of X
        theta_rho_r = pm.Normal("theta_rho_r", mu= alpha[1] + pm.math.dot(beta_rho_r, X), sigma=omega[1]) # Individual random intercepts in theta to confound effects of X
        theta_pi_r  = pm.Normal("theta_pi_r",  mu= alpha[2] + pm.math.dot(beta_pi_r, X),  sigma=omega[2]) # Individual random intercepts in theta to confound effects of X

        xi = pm.HalfNormal("xi", sigma=1)
        log_psi = pm.Normal("log_psi", mu=np.log(yi0), sigma=xi, shape=N_patients)
        psi = pm.Deterministic("psi", np.exp(log_psi))

        # Transformed latent variables 
        rho_s = pm.Deterministic("rho_s", -np.exp(theta_rho_s))
        rho_r = pm.Deterministic("rho_r", np.exp(theta_rho_r))
        pi_r  = pm.Deterministic("pi_r", 1/(1+np.exp(-theta_pi_r)))

        # Old, works fine:
        #mu_Y = psi[group_id] * (pi_r[group_id]*np.exp(rho_r[group_id]*t_flat_no_nans) + (1-pi_r[group_id])*np.exp(rho_s[group_id]*t_flat_no_nans))

        # New, raising an error:
        mu_Y = psi[group_id] * (pi_r[group_id]*np.exp(rho_r[group_id] * psi[group_id] * (1-pi_r[group_id]) * t_flat_no_nans + (1 - np.exp(rho_s[group_id]*t_flat_no_nans)/(rho_s[group_id])))) + (1-pi_r[group_id])*np.exp(rho_s[group_id] * t_flat_no_nans))

        # Likelihood (sampling distribution) of observations
        Y_obs = pm.Normal("Y_obs", mu=mu_Y, sigma=sigma_obs, observed=Y_flat_no_nans)
    return linear_model

And this is the error message, in two parts:

ricardoV94 · August 31, 2023, 11:34am

The error message suggests you have either nan sigma or nu in at least one of your HalfStudentT distributions. Can those P or P0 be zero when casted to int?

As a sanity check I would suggest you use keyword arguments nu and sigma when defining the distribution. There was a bug sometime ago when not passing the argument via keyword.

evenmm · August 31, 2023, 1:37pm

Thanks for the reply Ricardo. How could you tell it was in the HalfStudentT?
P or P0 will never be zero, I have made sure of that.
I’ve added keyword arguments to the HalfStudentT and InverseGamma distributions, yet I still get a similar error message.

Best,
Even

Topic		Replies	Views
Bayesian multilevel model indexing problems Questions	1	509	June 4, 2021
Question about multi level structure for likelihood v5 modeling	2	248	June 16, 2023
First attempt at a hierarchical model with custom likelihood Questions	1	697	September 18, 2018
Ideas for reparameterizing models/changing priors to avoid divergences Questions	3	907	December 5, 2019
Error when trying mixture distribution for likelihood version agnostic modeling	1	338	October 20, 2022

Problems when changing the likelihood slightly

Related topics