I have a hierarchical model in which the observation values range from 0 to 100 and the likelihood distribution is Gamma distributed. To handle the occurrences of 0, I am using a Zero-inflated mixture model where the probability of a zero occurring is Bernoulli and when the data is non-zero, it is Gamma Distributed. The class of the distribution is presented below.
I am having trouble sampling from the posterior for the hierarchical model mentioned above. There are three levels involved, including the observation level and to diagnose the issue I have set weakly informative Half Normal priors on the Gamma parameters for the two levels higher than the likelihood distribution.
I am unable to initialize NUTS if I specify the initialization to be ADVI. Somewhere between 1 and 2000 samples, I receive a FloatingPointError specifying that NaN occurred in optimization. I know this is happening because the inference fit is receiving a value of infinity in the step prior, but I am unable to figure out why this is occurring.
If I use the default initialization (jitter + adapt_diag), I am able to sample from the posterior if I specify the Gamma parameters to be alpha and beta. The resulting trace from a tune of 500 samples and draw of 1500 samples is shown below:
The problem occurs when I switch the Gamma parameters to be parameterized in terms of mu and sd. The sampler throws an error saying Bad initial energy: Model might be misspecified halfway through tuning. I don’t know how to diagnose this error, but I don’t believe the model is misspecified.
I need help figuring out why ADVI won’t work as the initialization method and also help figuring out why the mu/sd parameterization can’t be sampled. I would like to work with the mu/sd parameterization since it is easier to place prior distributions/beliefs around these values then the alpha/beta parameters.
Test data is below. In this particular test data, there aren’t any occurrences of 0 so I expect the pi values (probability of 0 in Bernoulli trials) to be essentially 0.
anonymized_test_data.csv (76.2 KB)
Here is the minimal example code needed to reproduce the issue.
data = pd.read_csv('anonymized_test_data.csv')
hierarchy = {
'level_1': {
'association': {'parent': 'network_group', 'child': 'network'}
},
'level_2': {
'association': {'parent': 'network', 'child': 'network_program_id'}
},
'level_3': {
'association': {'parent': 'network_program_id', 'child': 'observation'}
}
}
for level in hierarchy:
association = hierarchy[level]['association']
parent = association['parent']
child = association['child']
subset = data[[parent, child]].drop_duplicates().reset_index(drop=True)
subset = subset.rename(columns={parent: 'parent', child: 'child'})
hierarchy[level]['data'] = subset
observed = pd.merge(data[['observation', 'y']], hierarchy['level_3']['data'], left_on='observation', right_on='child')
class ZeroInflatedGamma(pm.Continuous):
"""
Zero-inflated Gamma log-likelihood.
Parameters
----------
pi : float
Expected proportion of zero-count occurences (0 < pi < 1)
alpha : float
Shape parameter (alpha > 0).
beta : float
Rate parameter (beta > 0).
mu : float
Alternative shape parameter (mu > 0).
sd : float
Alternative scale parameter (sd > 0).
"""
def __init__(self, alpha=None, beta=None, mu=None, sd=None, pi=None, *args, **kwargs):
super(ZeroInflatedGamma, self).__init__(*args, **kwargs)
alpha, beta = self.get_alpha_beta(alpha=alpha, beta=beta, mu=mu, sd=sd)
self.alpha = alpha
self.beta = beta
self.pi = pi = tt.as_tensor_variable(pi)
self.gamma = pm.Gamma.dist(alpha, beta)
def get_alpha_beta(self, alpha=None, beta=None, mu=None, sd=None):
if (alpha is not None) and (beta is not None):
pass
elif (mu is not None) and (sd is not None):
alpha = mu ** 2 / sd ** 2
beta = mu / sd ** 2
else:
raise ValueError('Incompatible parameterization. Either use '
'alpha and beta, or mu and sd to specify '
'distribution.')
return alpha, beta
def random(self, point=None, size=None, repeat=None):
alpha, beta, pi = distribution.draw_values(
[self.alpha, self.beta, self.pi], point=point
)
gamma = distribution.generate_samples(
stats.gamma.rvs, alpha, scale=1. / beta, dist_shape=self.shape, size=size
)
sampled = gamma * (np.random.random(np.squeeze(gamma.shape)) < (1 - pi))
return distribution.reshape_sampled(sampled, size, self.shape)
def logp(self, value):
return tt.switch(
tt.gt(value, 0),
tt.log1p(-self.pi) + self.gamma.logp(value),
tt.log(self.pi)
)
with pm.Model() as model:
# Network-level parameters
level_1_param_0_0 = pm.HalfNormal(
'level_1_param_0_0',
sd=100,
#sd=level_0_param_0_0[hierarchy['level_1']['data'].parent.values],
shape=hierarchy['level_1']['data'].child.nunique(),
testval=1
)
level_1_param_1_0 = pm.HalfNormal(
'level_1_param_1_0',
sd=100,
#sd=level_0_param_1_0[hierarchy['level_1']['data'].parent.values],
shape=hierarchy['level_1']['data'].child.nunique(),
testval=1
)
# Priors for determining probability that a Bernoulli trial will result in success or not
level_1_param_2_0 = pm.Normal(
'level_1_param_2_0',
mu=0,
sd=25,
shape=hierarchy['level_1']['data'].child.nunique(),
testval=1
)
level_1_param_2_1 = pm.HalfNormal(
'level_1_param_2_1',
5,
shape=hierarchy['level_1']['data'].child.nunique(),
testval=1
)
# Priors for program_id
alpha = pm.HalfNormal(
'alpha',
sd=level_1_param_0_0[hierarchy['level_2']['data'].parent.values],
shape=hierarchy['level_2']['data'].child.nunique()
)
beta = pm.HalfNormal(
'beta',
sd=level_1_param_1_0[hierarchy['level_2']['data'].parent.values],
shape=hierarchy['level_2']['data'].child.nunique()
)
mu_pi_offset = pm.Normal(
'mu_pi_offset',
mu=0,
sd=1,
shape=hierarchy['level_2']['data'].child.nunique()
)
mu_pi = pm.Deterministic(
'mu_pi',
(level_1_param_2_0[hierarchy['level_2']['data'].parent.values] +
mu_pi_offset * level_1_param_2_1[hierarchy['level_2']['data'].parent.values])
)
pi = pm.Deterministic('pi', pm.math.sigmoid(mu_pi))
# Likelihood function
y = ZeroInflatedGamma(
'y',
alpha=alpha[hierarchy['level_3']['data'].parent.values],
beta=beta[hierarchy['level_3']['data'].parent.values],
pi=pi[hierarchy['level_3']['data'].parent.values],
observed=observed.y
)
with model:
trace = pm.sample(1500, tune=500, progressbar=True, chains=1, njobs=4, nuts_kwargs={'target_accept':0.85})