Hello PyMC Team,
I have encountered a problem when I try to fit a model using pymc.DensityDist
in my attempt to implement a zero-inflated log normal distribution for a hierarchical model. I have a real-valued outcome that has a large number of 0s and, in building this model, I have found success with this approach. The model is rather complex and has evolved from using MCMC to now using ADVI. Everything was working relatively well until I tried to introduce minibatch training into the model. Here is a simplified version of the model with some simulated data that executes without producing a warning.
import pytensor.tensor as pt
import pymc as pm
from scipy import stats
from scipy.special import expit as logistic
import numpy as np
RANDOM_NUM = 1234
POP_SIZE = 10
NUM_OBS = 500
def likelihood_lognorm(value, pi, mu, sigma):
return pt.switch(
pt.gt(value, 0),
pt.log1p(-pi) + pm.logp(pm.Lognormal.dist(mu=mu, sigma=sigma), value),
pt.log(pi)
)
def ziflnorm_random(pi, mu, sigma, rng, size=None):
bern = stats.binom.rvs(1, pi, size,)
lnorm = rng.lognormal(mu, sigma, size)
return (1 - bern) * lnorm
rng = np.random.default_rng()
# simulated data
effects = stats.norm.rvs(loc=0, scale=1, size=POP_SIZE)
which_member = np.random.choice(POP_SIZE, size=NUM_OBS)
zero_obs = stats.bernoulli.rvs(p = logistic(effects[which_member]) )
score = (1 - zero_obs) * rng.lognormal(mean=0, sigma=1, size=NUM_OBS)
# model without minibatch training
with pm.Model() as m_woMB:
# priors
z_a = pm.Normal("alpha_zo", shape=POP_SIZE )
alpha_bar_zo = pm.Normal("alpha_bar_zo")
sigma_zo = pm.Exponential("sigma_zo", 10)
alpha_zo = alpha_bar_zo + z_a * sigma_zo
alpha_bar_nz = pm.Normal("alpha_bar_nz")
sigma_nz = pm.Exponential("sigma_nz", 10)
alpha_nz = alpha_bar_nz + z_a * sigma_nz
sigma = pm.Exponential("sigma", 0.5)
# zero model
pi = pm.math.invlogit(alpha_zo[which_member])
# non-zero model
mu = pm.math.exp(alpha_nz[which_member] )
pm.DensityDist("obs",
pi, mu, sigma,
rng=rng, logp=likelihood_lognorm,
random=ziflnorm_random,
observed=score,
total_size=score.shape
)
pm.fit()
Here is the model with the same simulated data provided using minibatch that produces the inconsistent graph warning below:
mdata, sdata = pm.Minibatch(
which_member,
score,
batch_size=100
)
# model with minibatch training
with pm.Model() as m_wMB:
# priors
z_a = pm.Normal("alpha_zo", shape=POP_SIZE)
alpha_bar_zo = pm.Normal("alpha_bar_zo")
sigma_zo = pm.Exponential("sigma_zo", 10)
alpha_zo = alpha_bar_zo + z_a * sigma_zo
alpha_bar_nz = pm.Normal("alpha_bar_nz")
sigma_nz = pm.Exponential("sigma_nz", 10)
alpha_nz = alpha_bar_nz + z_a * sigma_nz
sigma = pm.Exponential("sigma", 0.5)
# zero model
pi = pm.math.invlogit(alpha_zo[mdata])
# non-zero model
mu = pm.math.exp(alpha_nz[mdata] )
pm.DensityDist("obs",
pi, mu, sigma,
rng=rng, logp=likelihood_lognorm,
random=ziflnorm_random,
observed=sdata,
total_size=score.shape
)
pm.fit()
UserWarning: RNG Variable RNG(<Generator(PCG64) at 0x165D90F20>) has multiple clients. This is likely an inconsistent random graph.
I read that there was a belief that a previous encounter with this issue was the result of using CustomDist here which led me to believe that my use of pymc.DensityDist
could be part of the issue. In this previous instance, it doesn’t appear that pymc.Minibatch
was involved in the model’s construction. I am also aware of a more recent issue posted on the discussion board which uses pymc.Minibatch
with pymc.Censored
utilized for the outcome (not pymc.DensityDist
).
In case it is helpful, here are the relevant libraries I am using from conda list:
pymc 5.16.1 pypi_0 pypi
pymc-experimental 0.1.1 pypi_0 pypi
pytensor 2.23.0 pypi_0 pypi
python 3.10.6 hc14f532_0_cpython conda-forge
I am concerned that although this model can be built despite the warning, that this is an indication that I shouldn’t trust predictions that I get from this model. Making predictions is the purpose of my model, so I’d like to try to understand what might be leading to this warning. Any help that you can provide on this matter is greatly appreciated.