pm.DensityDist issue (pymc3 vs pymc5)

Hi,

I am attempting to utilize the continuous zero-inflated exponential likelihood distribution using pm.DensityDist with:

def  logp_zero_inflated_exponential(value, psi, mu):
     return tt.switch(tt.eq(value, 0), tt.log(psi), tt.log1p(-psi) - mu * value)

psi = pm.Beta('psi', 1, 1)
z  = pm.HalfNormal('Z', sigma=500)
a  = pm.HalfNormal('α', sigma=1000) # weak informative prior (we know it should be positive) 
b  = pm.Normal('β', mu=0, sigma=1000)

ET_I_est = f_ET_I(SM1, SM2, a, b, dt)
mu = z*np.abs(dsm) - ET_I_est

Y_obs = pm.DensityDist('Y_obs', logp_zero_inflated_exponential, observed={'value': y_obs, 'psi': psi, 'mu': mu})

This is working fine with the version 3, however, when I use it with
Y_obs = pm.DensityDist('Y_obs', logp_zero_inflated_exponential, observed=np.array([y_obs, psi, mu]))
in the version 5, I got the following error:

    860                 Y_obs = pm.DensityDist('Y_obs', logp_zero_inflated_exponential, 
    861                                        observed={'value': y_obs, 'psi': psi, 'mu': mu})
    862             elif pmv == 5:
    863                 Y_obs = pm.DensityDist('Y_obs', logp_zero_inflated_exponential, 
--> 864                                        observed=np.array([y_obs, psi, mu]))
    865 # sampling or fit
    866 rng = 321

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

I might be missing something with the Version 5 API. If you can provide me with instructions on how to use the Version 5 API correctly, that would be really helpful. If it works, I won’t need to use pymc3. By the way, the conda-forge pymc3 library installed on macOS works just fine with the mkl-service library. I’m not sure why the conda-forge pymc3 library for Linux comes with a numpy version that is not 1.22.

Additional information:

I use different logp_zero_inflated_exponential function for pymc3 (pmv == 3) and pymc5:

if pmv == 3:
  def logp_zero_inflated_exponential(value, psi, mu):
      return tt.switch(tt.eq(value, 0), tt.log(psi), tt.log1p(-psi) - mu * value)

if pmv == 5:
   def logp_zero_inflated_exponential(value, psi, mu):
      if value == 0:
        return pm.math.log(psi)
      else:
        return pm.math.log1p(-psi) - mu * value

Thank you for your help.

The API has changed, you can check the new documentation here: pymc.CustomDist — PyMC dev documentation

Hi Ricardo,
I have tried the updated API and tried to modify the previous pymc3 code accordingly, but I had no luck. Could you please help me modify my previous pymc3 code to the pymc5 version?
Thank you for your time.

If will be easier for me or others to help if you share what you tried.

Sure.

I am trying to create a model with observed y_data that includes a lot of zeros and some positive values. Therefore, I am considering using a zero-inflated exponential likelihood distribution.

Here are all the details of the code:

r, SM1, SM2, and dt are np.array data.

 def logp_zero_inflated_exponential(value: TensorVariable, psi: TensorVariable, mu: TensorVariable) -> TensorVariable:
      if value == 0:
        return np.log(psi)
      else:
        return np.log1p(-psi) - mu * value
with pm.Model() as model:

z  = pm.HalfNormal('Z', sigma=500)
a  = pm.HalfNormal('α', sigma=1000)
b  = pm.Normal('β', mu=0, sigma=1000)

psi = pm.Beta('psi', 1, 1)

y_obs = r
t_est = a*(SM1**b + SM2**b)*dt/2
mu = z*np.abs(dsm) - t_est
Y_obs = pm.CustomDist('Y_obs', psi, mu, dist=logp_zero_inflated_exponential, observed=y_obs)

obj_optimizer = pm.adam(learning_rate=0.01)
with model:
  fit_advi = pm.fit(method = pm.ADVI(), n = 200000, callbacks=[pm.callbacks.CheckParametersConvergence(diff='relative')], obj_optimizer=obj_optimizer, progressbar=False)
  idata    = fit_advi.sample(2000)

When I ran this code, I got the following error:

ValueError: Incompatible Elemwise input shapes [(451,), (0,)]

Previously, when I used PyMC3, I was able to successfully run similar code with different logp_zero_inflated_exponential functions and pm.DensityDist.

def logp_zero_inflated_exponential(value, psi, mu):
  return tt.switch(tt.eq(value, 0), tt.log(psi), tt.log1p(-psi) - mu * value)

Y_obs = pm.DensityDist('Y_obs', logp_zero_inflated_exponential, 
                                               observed={'value': y_obs, 'psi': psi, 'mu': mu})

If you have any alternative approaches for dealing with a large number of zeros in the observed data while still considering the positive values of y_obs, please feel free to suggest them.

Thank you for your help.