Bad initial energy after checking test point

SherlockLi · December 5, 2020, 4:20am

I am working on a hierarchical bayesian model, which can be seen as logistic regression. Specifically, there are 4 variables in my model, which are xa, xb, pa, and pb. The sigmoid function in my model is
P(A) = \dfrac{1}{1+\exp\left(-\epsilon \big(u(x_a)-u(x_b) + \Phi(\psi(p_a) - \psi(p_b))\big)\right)},
where
u(x) = \dfrac{1}{\alpha}(1-\exp(-\alpha x))
\psi(p) = p^\gamma
\Phi(\psi(pa) - \psi(pb)) = \dfrac{\rho}{\beta}\left(1-\exp\Big(-\beta\big(\frac{\psi(pa) - \psi(pb)}{\theta}\big)^\theta\Big)\right).

The intervals for each parameter are \epsilon \in [0, +\infty), \alpha \in (0, +\infty), \gamma \in (0,1], \rho \in (0, +\infty), \beta \in (0, +\infty), \theta \in (1, +\infty).

My code is

import pymc3 as pm
import theano.tensor as tt

# The u() function
def utility(alpha, x):
    value = (1 - tt.exp(-1 * alpha * x) ) / alpha
    return value

# The \psi() function
def spf(gamma, op):
    sp = op ** gamma
    return sp

# The \Phi() function
def atr_diff(rho, beta, theta, spf_diff):
    value = rho/beta * (1 - tt.exp(-1 * beta * (spf_diff / theta) ** theta) )
    return value

with pm.Model() as model1:
    mu_gamma = pm.Normal('mu_gamma', mu=0, sigma=1)
    sigma_gamma = pm.Uniform('sigma_gamma', 0.01, 40)
    gamma_phi_s_offset = pm.Normal('gamma_phi_s_offset', mu=0, sigma=1, shape=n_subject)
    gamma_phi_s = pm.Deterministic('gamma_phi_s', mu_gamma + gamma_phi_s_offset * sigma_gamma)

    gamma_log_s = pm.Normal.dist(mu=0, sigma=1).logcdf(gamma_phi_s)
    gamma_s = pm.Deterministic('gamma_s', tt.exp(gamma_log_s))

    mu_alpha = pm.Uniform('mu_alpha', -5, 5)
    sigma_alpha = pm.Uniform('sigma_alpha', 0.01, 40)
    alpha_s = pm.Lognormal('alpha_s', mu=mu_alpha, sigma=sigma_alpha, shape=n_subject)

    mu_rho = pm.Normal('mu_rho', mu=0, sigma=10)
    sigma_rho = pm.Uniform('sigma_rho', 0.01, 100)
    rho_s = pm.Lognormal('rho_s', mu=mu_rho, sigma=sigma_rho, shape=n_subject)

    mu_beta = pm.Normal('mu_beta', mu=0, sigma=10)
    sigma_beta = pm.Uniform('sigma_beta', 0.01, 100)
    beta_s = pm.Lognormal('beta_s', mu=mu_beta, sigma=sigma_beta, shape=n_subject)

    mu_theta = pm.Normal('mu_theta', mu=0, sigma=10)
    sigma_theta = pm.Uniform('sigma_theta', 0.01, 1000)
    theta_s = pm.Lognormal('theta_s', mu=mu_theta, sigma=sigma_theta, shape=n_subject)

    mu_epsilon = pm.Normal('mu_epsilon', mu=0, sigma=10)
    sigma_epsilon = pm.Uniform('sigma_epsilon', 0.01, 100)
    epsilon_s = pm.Lognormal('epsilon_s', mu=mu_epsilon, sigma=sigma_epsilon, shape=n_subject)

    utility_diff = utility(alpha_s[subject_idx], xa) - utility(alpha_s[subject_idx], xb)
    sprob_diff = spf(gamma_s[subject_idx], pa) - spf(gamma_s[subject_idx], pb)
    diff = utility_diff + atr_diff(rho_s[subject_idx], beta_s[subject_idx], theta_s[subject_idx]+1, sprob_diff)

    p = pm.math.invlogit(epsilon_s[subject_idx] * diff)
    likelihood = pm.Bernoulli('likelihood', p=p, observed=choice)

The result of model1.check_test_point() is

mu_gamma -0.92
sigma_gamma_interval__ -1.39
gamma_phi_s_offset -2.76
mu_alpha_interval__ -1.39
sigma_alpha_interval__ -1.39
alpha_s_log__ -11.74
mu_rho -3.22
sigma_rho_interval__ -1.39
rho_s_log__ -14.49
mu_beta -3.22
sigma_beta_interval__ -1.39
beta_s_log__ -14.49
mu_theta -3.22
sigma_theta_interval__ -1.39
theta_s_log__ -21.40
mu_epsilon -3.22
sigma_epsilon_interval__ -1.39
epsilon_s_log__ -14.49
likelihood -208.95
Name: Log-probability of test_point, dtype: float64

There is no -inf or nan in this result, but I have got bad initial energy when I used NUTS. This is an essential part of my Master thesis, and please help me. THANK YOU!

ricardoV94 · December 5, 2020, 9:51pm

You can try to change the init method in pm.sample(), for instance into “adapt-diag”. The default initialization method adds a random jitter around the test point, that can lead to bad initial energy even when the test point itself is valid.

Keep in mind that this might be a symptom of a misspecified model, so ignoring it might not be the best strategy. In any case you can try to change the init method and see if it runs.

SherlockLi · December 6, 2020, 4:26am

Thank you for your reply. Your suggestion is really helpful. But after I changed the init method, a new problem was raised:

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
“”"
Traceback (most recent call last):
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py”, line 110, in run
self._start_loop()
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py”, line 160, in _start_loop
point, stats = self._compute_point()
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py”, line 191, in _compute_point
point, stats = self._step_method.step(self._point)
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py”, line 247, in step
apoint, stats = self.astep(array)
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/hmc/base_hmc.py”, line 130, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/hmc/quadpotential.py”, line 231, in raise_ok
raise ValueError(’\n’.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV epsilon_s_log__.ravel()[0] is zero.
The derivative of RV epsilon_s_log__.ravel()[1] is zero.
The derivative of RV epsilon_s_log__.ravel()[2] is zero.
The derivative of RV sigma_epsilon_interval__.ravel()[0] is zero.
The derivative of RV mu_epsilon.ravel()[0] is zero.
The derivative of RV theta_s_log__.ravel()[0] is zero.
The derivative of RV theta_s_log__.ravel()[1] is zero.
The derivative of RV theta_s_log__.ravel()[2] is zero.
The derivative of RV sigma_theta_interval__.ravel()[0] is zero.
The derivative of RV mu_theta.ravel()[0] is zero.
The derivative of RV beta_s_log__.ravel()[0] is zero.
The derivative of RV beta_s_log__.ravel()[1] is zero.
The derivative of RV beta_s_log__.ravel()[2] is zero.
The derivative of RV sigma_beta_interval__.ravel()[0] is zero.
The derivative of RV mu_beta.ravel()[0] is zero.
The derivative of RV rho_s_log__.ravel()[0] is zero.
The derivative of RV rho_s_log__.ravel()[1] is zero.
The derivative of RV rho_s_log__.ravel()[2] is zero.
The derivative of RV sigma_rho_interval__.ravel()[0] is zero.
The derivative of RV mu_rho.ravel()[0] is zero.
The derivative of RV alpha_s_log__.ravel()[0] is zero.
The derivative of RV alpha_s_log__.ravel()[1] is zero.
The derivative of RV alpha_s_log__.ravel()[2] is zero.
The derivative of RV sigma_alpha_interval__.ravel()[0] is zero.
The derivative of RV mu_alpha_interval__.ravel()[0] is zero.
The derivative of RV gamma_phi_s_offset.ravel()[0] is zero.
The derivative of RV gamma_phi_s_offset.ravel()[1] is zero.
The derivative of RV gamma_phi_s_offset.ravel()[2] is zero.
The derivative of RV sigma_gamma_interval__.ravel()[0] is zero.
The derivative of RV mu_gamma.ravel()[0] is zero.
“”"

As you mentioned, my model might be misspecified. However, I am a beginner in Bayesian model. Could you recommend any relevant blogs or literature which might be helpful?

ricardoV94 · December 6, 2020, 6:22am

I would suggest you try to build the model in small steps to understand where does it start to fail.

For some more general resources on Bayesian modeling you can check:

https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

SherlockLi · December 6, 2020, 6:49am

Thank you for your recommendation. I will seriously read the materials you suggested.

Topic		Replies	Views
SamplingError: Bad initial energy Questions	1	2810	September 10, 2020
Bad initial energy error Questions	2	855	May 11, 2021
Bad Initial Energy Questions	9	1622	November 17, 2021
"ValueError: Bad initial energy: inf. The model might be misspecified" Questions	2	860	August 1, 2018
Bad initial energy and no clue for debugging a non-finite log-probability Questions	3	1179	July 28, 2020

Bad initial energy after checking test point

Related topics