PyMC v4 not returning 'observed_data' in InferenceData object

Hello,

I tried implementing PyMC v4.0 with Jax and GPU. Using pm.sampling_jax.sample_numpyro_nuts, it works / samples fine and provides a similar posterior to PyMC3; however, it does not return the ‘observed_data’ object within InferenceData.

I noticed in the release notes that ‘inner workings have not been refactored’ for mixture distributions. In this case, I am using negative binomial and zero inflated negative binomial distributions.

Would the lack of ‘observed_data’ be due to the mixture distribution? If so, any work arounds or timeline on when to expect a fix?

Thanks.

3 Likes

Thanks for reporting, I opened an issue on our repo: observed_data missing from jax samplers InferenceData · Issue #5586 · pymc-devs/pymc · GitHub

1 Like

Hey,

I’m sorry you are running into this problem. Could you share a minimal example with the issue and I’ll investigate it further?

Sorry it took so long to get this to you - it looks like they closed the ticket.

Please see the following simulated data:
process_data.csv (103.8 KB)

# Test treatments
tot_count = df['value'].values

process_idx, process = df['process'].factorize(sort=True)

COORDS = {

    'obs_id': np.arange(len(process_idx)),
    'process': process

}

with pm.Model(coords = COORDS) as example_model:

    ###
    # Test treatments
    # Data
    count = pm.Data('total_count', tot_count, dims='obs_id')

    # hyper Exponential
    mu_lam = pm.Exponential('mu_lam', lam=0.3)

    # hyper Exponential
    a_lam = pm.Exponential('a_lam', lam=2)

    # Prior - mu
    mu = pm.Exponential('mu', lam= mu_lam, dims=('process'))
    
    # Prior alpha
    alpha = pm.Exponential('alpha', lam= a_lam, dims=('process'))


    # Treatment Likelihoods
    obs = pm.NegativeBinomial('obs', mu= mu[process_idx], alpha= alpha[process_idx], observed= count, dims= ('obs_id'))
    
    # TRACE

    idata_trace = pm.sampling_jax.sample_numpyro_nuts(tune=3_000, draws=5_000, target_accept=0.95)

    idata_trace.extend(
        pm.sample_posterior_predictive(idata_trace, \
        var_names=['mu', 'alpha', 'obs']
                      )
    )

    idata_trace.extend(
        pm.sample_prior_predictive(1_000)
    )

The trace returned does not include ‘observed_data’:

image

There is this warning as well:

/home/user/anaconda3/envs/pymc-dev-py39-gpu/lib/python3.9/site-packages/pymc/backends/arviz.py:58: UserWarning: Could not extract data from symbolic observation obs
  warnings.warn(f"Could not extract data from symbolic observation {obs}")

This is not an issue with the sample_numpyro function though. The observed_data group is generated from the model and is always present in the result, if you looked at the results independently, it should be in the idata from the posterior, posterior predictive and prior. It is probably an issue with the negative binomial.

It is also strange that no prior_predictive group is present. Does the negative binomial have a random method? Is obs a variable in the returnes posterior_predictive group?

Yep, I can reproduce this error. The issue has been re-opened so you can follow the progress there.

I think we’ve fixed this with the most recent commit. Do confirm if you can?