Sampling options with many observations and a HurdleGamma likelihood

Hi,

I am developing a model with a pm.HurdleGamma likelihood. I have priors on the alpha and beta parameters of the HurdleGamma, but not on the psi parameter. When I sample with a small subset of my data (2500 observations), it samples well and the results look good to me. However, it takes about 2.5min to complete. I have tried using the different samples available (numpyro, blackjax) but 100% of my results are divergences. I suspect this has to do with my likelihood function, which I’m not very amenable to changing at this point because it fits my data quite well.

My issue is that my full dataset has millions of records, and it is not feasible for me to have a sampling process that takes hours and hours to complete. Am I out of options here? I am actually quite happy with the posteriors, and would like to extend the model to the rest of my dataset, but it feels like there’s no way forward for me with this model. I’m relatively new to Bayesian modeling and pymc, so I’m open to any and all suggestions or critiques!

Thanks

@pymc-bot

There are no relevant documents available to address your query about sampling options with many observations and a HurdleGamma likelihood. However, based on general knowledge, when dealing with divergences in sampling, especially with complex likelihoods like HurdleGamma, you might consider the following approaches:

  1. Reparameterization: Sometimes, reparameterizing the model can help reduce divergences. This involves changing the parameterization of your model to make it more stable numerically.

  2. Tuning: Increase the number of tuning steps. This allows the sampler more time to adapt to the posterior distribution.

  3. Step Size Adjustment: Manually adjusting the step size can sometimes help. A smaller step size might reduce divergences at the cost of longer sampling times.

  4. Prior Specification: Ensure that your priors are not too tight or too vague, as this can affect the sampling process. Even though you mentioned not having a prior on the psi parameter, consider if a weakly informative prior might help.

  5. Alternative Samplers: If you haven’t already, try using different samplers or algorithms that might be more robust to the specific challenges of your model.

  6. Model Simplification: Simplifying the model, if possible, can sometimes help identify the source of the problem.

If these suggestions do not resolve the issue, it might be helpful to consult with others who have experience with similar models or to provide more details about the model structure and data.

For more information, visit https://www.pymc.io/

This answer was generated by the PyMC Assistant bot in response to your command. Please let us know if it was helpful!

There are no relevant documents available to address your query about sampling options with many observations and a HurdleGamma likelihood. However, based on general knowledge, when dealing with divergences in sampling, especially with complex likelihoods like HurdleGamma, you might consider the following approaches:

  1. Reparameterization: Sometimes, reparameterizing the model can help reduce divergences. This involves changing the parameterization of your model to make it more stable.

  2. Tuning: Increase the number of tuning steps. This allows the sampler more time to adapt to the posterior distribution.

  3. Step Size Adjustment: Manually adjusting the step size can sometimes help. A smaller step size can reduce divergences but may increase computation time.

  4. Prior Specification: Ensure that the priors are well-specified and informative enough to guide the sampler, especially if you have not set a prior for the psi parameter.

  5. Alternative Samplers: If numpyro and blackjax are not working, consider trying other samplers or adjusting their settings.

  6. Data Subsetting: If possible, continue using a smaller subset of data to diagnose and fix issues before scaling up.

These strategies can help in addressing sampling issues with complex likelihoods.

For more information, visit https://www.pymc.io/

This answer was generated by the PyMC Assistant bot in response to your command. Please let us know if it was helpful!

I’ve tried adjusting some of the arguments for the samplers, but I still get 100% divergences numpyro and blackjax.

  1. Is with Is there something about my model that I should consider reparameterizing that would help specific to getting the the numpyro or blackjax samplers to work? Are there other samplers available that I am not aware of?
  2. When you say model simplification, what methods would you suggest when it comes to simplifying my model, which has a HurdleGamma likelihood?

I use the pm.HurdleGamma (predominantly in BAMBI) often in my work. The native pm.HurdleGamma distribution has some issues . Jax won’t work. Nutpie will work but is frightfully slow because it drops into object mode. For your 2,000 observations, see how the sample with the following argument in pm.sample: compile_kwargs={‘mode’:‘FAST_RUN’}. How does that compare to your 2.5 min run? Millions of records? Even if it ran ok would be a challenge scaling up to your target data. I would recommend ADVI --, especially ADVI mini-batch. Another option would be to fire up PYMC Experimental and use pm.fit(method=‘pathfinder’) instead of pm.sample.

Finally, you could just do a custom HurdleGamma. I have done that and gotten good results. That also then opens up JAX, Blackjax, Nutpie and Bayeux and its samplers to you.

Do you have some example data that I could try a custom model with?

1 Like

Thank you for the informative responsive–you’ve given me a few things to think about. I tried compile_kwargs={‘mode’:‘FAST_RUN’} but didn’t get any improvements. I will have to do some reading up on ADVI and pathfinder. I do have a few questions if you wouldn’t mind:

  1. When you say custom HurdleGamma, do you mean implementing a sort of two-step model that has a Bernoulli or Binomial and then a Gamma? I had thought of this somewhere along the way, but the HurdleGamma seemed like the easier option for implementation (until I hit this wall). Seems to me at this point like this option would be the path of least resistance.
  2. I know very little about Bambi right now, but does it use the same HurdleGamma as pymc? If so, how are you getting past those issues with sampling? Or, is it just that you are using a more manageable amount of data?
  3. Is there a sampler of those 4 you listed which you’ve found to be most suitable for your use cases?

I’ve attached a sample dataset here with the following:
record: unique to every row
group: each group ID has ~30-50 records
observed_value: observed data for each record
I really appreciate your help!

hurdlegamma_sample.csv (35.5 KB)

No problem.

  1. Yes
  2. Yes
  3. Nutpie is fast as hell on CPU but the new NUMBA based pymc sampler is no slouch. If you use GPU then Jax will be amazing with the scale of your data. It looks like MLX is on it’s way for Apple Silicon GPU – Jax Metal has gone nowhere for months.
1 Like

For CPU sampling of huge models that do not involve extremely high precision calculations (thinking mostly about matrix decomposition here), you can set pytensor.config.floatX = 'float32' for some “free” speedups as well (you might have to fuss with setting .astype throughout your model though; I think @ricardoV94 was working on some helpers to make this easier).

Pedantic sidebar that nobody asked for: there’s no PyMC numba sampler. When you set compile_kwargs={'mode':'NUMBA'}, pytensor compiles a numba.njit decorated logp function, which is then called inside the same old PyMC NUTS sampler, which is implemented in pure python. Nutpie in numba mode also calls the same jit compiled logp function, but then uses a NUTS sampler written in rust to do the sampling loop. Nutpie also has a smarter tuning strategy, so you can get away with fewer warmup steps, which also translates into speedup.

Setting aside the actual logp and gradients of your model, NUTS isn’t a actually computationally heavy algorithm. That’s why we can get away with a boring old python implementation.

Regarding MLX, the package is very deep-learning focused and does not have many of the specialized statistics functions we need (like gammaincused by the HurdleGamma !). I’m also very excited by the possibility of an MLX backend, but it’s going to be a heavy lift to get 100% coverage. Help wanted :slight_smile:

2 Likes

I like pedantic sidebars. Very informative.

2 Likes

With the help of ChatGPT, I was able to write the model(s). I have a fixed-effects (observed_value ~ 1 + group) and a hierarchical model (observed_value ~ 1 + (1 | group). The fixed effects model samples in ~11.8 seconds with nutpie, and the hierarchical model samples in ~16.7 seconds with nutpie (for the 2500 rows of data you sent). Numpyro_JAX fails to sample, the the native PYMC sampler is way slower. I got no divergences in either model.

I would share the full plot trace but it takes like 10 min to render. Hopefully, this helps get you where you want to go.

The fixed effects model:

import arviz as az
import pandas as pd
import polars as pl
import pymc as pm
import pytensor.tensor as pt
from scipy.stats import gamma

data_csv = "hurdlegamma_sample.csv"

df = (pl
     .read_csv(data_csv, schema_overrides={'group':pl.String})
     .drop('record')
     .with_columns(group = pl.concat_str(pl.lit('group'), pl.col.group.str.zfill(2), separator="_"))
     .sort(by='group')
     .to_pandas()
     )

# Encode categorical variable as integer index
df["group_code"] = pd.Categorical(df["group"]).codes
group_names = pd.Categorical(df["group"]).categories

# Coordinates for the model
coords = {"group": group_names, "obs_id": df.index}

with pm.Model(coords=coords) as model:
   # Data containers linked to df
   group_idx = pm.Data("group_idx", df["group_code"].values, dims="obs_id")
   y_obs = pm.Data("y_obs", df["observed_value"].values, dims="obs_id")

   # Priors for logit(p) in the hurdle component
   intercept_logit = pm.Normal("intercept_logit", mu=0, sigma=2)
   beta_logit = pm.Normal("beta_logit", mu=0, sigma=2, dims="group")

   # Priors for log(mu) in the gamma component
   intercept_gamma = pm.Normal("intercept_gamma", mu=0, sigma=2)
   beta_gamma = pm.Normal("beta_gamma", mu=0, sigma=2, dims="group")

   # Shared shape parameter for gamma
   alpha = pm.HalfNormal("alpha", sigma=2)

   # Compute probabilities and means
   logit_p = intercept_logit + beta_logit[group_idx]
   p = pm.Deterministic("p", pm.math.sigmoid(logit_p), dims="obs_id")

   log_mu = intercept_gamma + beta_gamma[group_idx]
   mu = pm.Deterministic("mu", pm.math.exp(log_mu), dims="obs_id")

   # Define log-likelihood for hurdle-gamma
   def hurdle_gamma_logp(value, p_, mu_, alpha_):
       is_zero = pt.eq(value, 0)
       beta_ = alpha_ / mu_
       gamma_rv = pm.Gamma.dist(alpha=alpha_, beta=beta_)
       gamma_logp = pm.logp(gamma_rv, value)
       return pt.switch(is_zero, pt.log1p(-p_), pt.log(p_) + gamma_logp)


   # Custom likelihood using positional args (required when only logp is defined)
   HurdleGamma = pm.CustomDist(
       "HurdleGamma",
       p, mu, alpha,
       observed=y_obs,
       logp=hurdle_gamma_logp,
       dims="obs_id"
   )

   # Sample from the posterior
   idata_fixed_effects = pm.sample(nuts_sampler='nutpie')

The hierarchical model:

import arviz as az
import pandas as pd
import polars as pl
import pymc as pm
import pytensor.tensor as pt
from scipy.stats import gamma

data_csv = "hurdlegamma_sample.csv"

df = (pl
      .read_csv(data_csv, schema_overrides={'group':pl.String})
      .drop('record')
      .with_columns(group = pl.concat_str(pl.lit('group'), pl.col.group.str.zfill(2), separator="_"))
      .sort(by='group')
      .to_pandas()
      )

# Encode categorical variable as integer index
df["group_code"] = pd.Categorical(df["group"]).codes
group_names = pd.Categorical(df["group"]).categories

# Coordinates for the model
coords = {"group": group_names, "obs_id": df.index}
coords = {"group": group_names, "obs_id": df.index}

with pm.Model(coords=coords) as model:
    # Data containers
    group_idx = pm.Data("group_idx", df["group_code"].values, dims="obs_id")
    y_obs = pm.Data("y_obs", df["observed_value"].values, dims="obs_id")

    # Global intercepts
    intercept_logit = pm.Normal("intercept_logit", mu=0, sigma=2)
    intercept_gamma = pm.Normal("intercept_gamma", mu=0, sigma=2)

    # Random effects for group (logit link)
    sigma_logit_group = pm.HalfNormal("sigma_logit_group", sigma=1)
    re_logit_group = pm.Normal("re_logit_group", mu=0, sigma=sigma_logit_group, dims="group")

    # Random effects for group (gamma link)
    sigma_gamma_group = pm.HalfNormal("sigma_gamma_group", sigma=1)
    re_gamma_group = pm.Normal("re_gamma_group", mu=0, sigma=sigma_gamma_group, dims="group")

    # Gamma shape (shared)
    alpha = pm.HalfNormal("alpha", sigma=2)

    # Linear predictors with random intercepts
    logit_p = intercept_logit + re_logit_group[group_idx]
    p = pm.Deterministic("p", pm.math.sigmoid(logit_p), dims="obs_id")

    log_mu = intercept_gamma + re_gamma_group[group_idx]
    mu = pm.Deterministic("mu", pm.math.exp(log_mu), dims="obs_id")

    # Custom hurdle-gamma log-likelihood
    def hurdle_gamma_logp(value, p_, mu_, alpha_):
        is_zero = pt.eq(value, 0)
        beta_ = alpha_ / mu_
        gamma_rv = pm.Gamma.dist(alpha=alpha_, beta=beta_)
        gamma_logp = pm.logp(gamma_rv, value)
        return pt.switch(is_zero, pt.log1p(-p_), pt.log(p_) + gamma_logp)

    # Use positional arguments only (required)
    HurdleGamma = pm.CustomDist(
        "HurdleGamma",
        p, mu, alpha,
        observed=y_obs,
        logp=hurdle_gamma_logp,
        dims="obs_id"
    )

    # Sample from the posterior
    idata_hierarchical = pm.sample(nuts_sampler='nutpie')

Interesting–I had just been looking at a model (graphviz below) to get group means for psi, beta, and alpha (again, I’m fairly new at this, so not sure that what I’m doing is exactly correct but it seems to be working upon first glance). My plan was to add the group-level random effects in, just hadn’t gotten there yet. I was able to increase the data while simultaneously decreasing the sampling time, and there were 0 divergences. The only hiccup was that since I completely decoupled the Bernoulli from the Gamma, if a group doesn’t have any “success” events, I don’t get any estimate on the Gamma portion (noted by groups on the left being 287 and the right being 250). Potentially there is a way for me to leverage your CustomDist into what I’ve done–I think with everything I know now I’ve at least got a couple of things to try. Also open to feedback if anything in my graphviz looks whacky!

I resolved my own issue with this one by changing the dims on beta, alpha, and mu from non_zero_groups to groups and now the groups with 0 successes have mu informed by the population’s data, as I intended. Still sampling well with 0 divergences. I believe I have a basis for moving forward (still open to feedback/suggestions with the graphviz) but thank you to everyone who chimed in, especially @zweli !

2 Likes

@eklim0407 you mind sharing the pymc code?

Sure, this should run for you with the previously shared csv. My only issue right now is that each group will have different length observed values for the Bernoulli and the Gamma since the Bernoulli has all records and the Gamma has only the “successful” records. So when I use sample_posterior_predictive I get different length arrays for y_bernoulli and y_gamma and I’m not sure how to handle that to get to an HDI for each group. I think I could just use the values from the posterior trace with numpy or scipy but I’d prefer to use sample_posterior_predictive if possible.

df = pl.read_csv(
    "sample_data.csv",
    schema_overrides={"group": pl.String, "record": pl.String}
)

gamma_obs = df.filter(pl.col("observed_value") > 0).select("observed_value").to_numpy().squeeze()
bernoulli_obs = df.select("observed_value").to_numpy().squeeze()

def factorize(dataset, col_name):
    dataset = dataset.with_columns(pl.col(col_name).cast(pl.Categorical).to_physical().alias("idx"))
    categories = dataset.select([col_name, "idx"]).unique().sort("idx").select(col_name).to_numpy().squeeze()
    idx = dataset.select("idx").to_numpy().squeeze()

    return idx, categories


group_idx, groups = factorize(df, "group")
record_idx, records = factorize(df, "record")
nonzero_group_idx, non_zero_groups = factorize(df.filter(pl.col("observed_value") > 0), "group")
nonzero_record_idx, nonzero_records = factorize(df.filter(pl.col("observed_value") > 0), "record")
coords = {
    "groups": groups,
    "records": records,
    "non_zero_groups": non_zero_groups,
    "non_zero_records": nonzero_records,
}

with pm.Model(coords=coords) as model:

    # HYPERPRIORS ON ALPHA
    a_alpha = pm.Gamma("a_alpha", alpha=26.4, beta=3.25)
    a_beta = pm.Gamma("a_beta", alpha=26.4, beta=3.25)
    # ALPHA
    alpha = pm.Gamma("alpha", alpha=a_alpha, beta=a_beta, dims="groups") 
    
    # HYPERPRIORS ON BETA
    b_alpha = pm.Gamma("b_alpha", alpha=26.4, beta=3.25) 
    b_beta = pm.Gamma("b_beta", alpha=26.4, beta=3.25)  
    # BETA
    beta = pm.Gamma("beta", alpha=b_alpha, beta=b_beta, dims="groups") 
    
    # HYPERPRIORS ON PSI
    p_alpha = pm.Gamma("p_alpha", alpha=2.78, beta=7.37)
    p_beta = pm.Gamma("p_beta", alpha=22.2, beta=24)
    # PSI
    psi = pm.Beta("psi", alpha=p_alpha, beta=p_beta, dims="groups") 
    
    # MU CALC
    mu = pm.Deterministic("mu", alpha / beta, dims="groups")
    
    y_bernoulli = pm.Bernoulli(
        "y_bernoulli",
        p=psi[group_idx],
        observed=bernoulli_obs,
        dims="records"
    )
    
    y_gamma = pm.Gamma(
        "y_gamma",
        alpha=alpha[nonzero_group_idx],
        beta=beta[nonzero_group_idx],
        observed=gamma_obs,
        dims="non_zero_records"
    )

pm.model_to_graphviz(model)
1 Like