Bad initial energy after checking test point

I am working on a hierarchical bayesian model, which can be seen as logistic regression. Specifically, there are 4 variables in my model, which are xa, xb, pa, and pb. The sigmoid function in my model is
P(A) = \dfrac{1}{1+\exp\left(-\epsilon \big(u(x_a)-u(x_b) + \Phi(\psi(p_a) - \psi(p_b))\big)\right)},
where
u(x) = \dfrac{1}{\alpha}(1-\exp(-\alpha x))
\psi(p) = p^\gamma
\Phi(\psi(pa) - \psi(pb)) = \dfrac{\rho}{\beta}\left(1-\exp\Big(-\beta\big(\frac{\psi(pa) - \psi(pb)}{\theta}\big)^\theta\Big)\right).

The intervals for each parameter are \epsilon \in [0, +\infty), \alpha \in (0, +\infty), \gamma \in (0,1], \rho \in (0, +\infty), \beta \in (0, +\infty), \theta \in (1, +\infty).

My code is

import pymc3 as pm
import theano.tensor as tt

# The u() function
def utility(alpha, x):
    value = (1 - tt.exp(-1 * alpha * x) ) / alpha
    return value

# The \psi() function
def spf(gamma, op):
    sp = op ** gamma
    return sp

# The \Phi() function
def atr_diff(rho, beta, theta, spf_diff):
    value = rho/beta * (1 - tt.exp(-1 * beta * (spf_diff / theta) ** theta) )
    return value

with pm.Model() as model1:
    mu_gamma = pm.Normal('mu_gamma', mu=0, sigma=1)
    sigma_gamma = pm.Uniform('sigma_gamma', 0.01, 40)
    gamma_phi_s_offset = pm.Normal('gamma_phi_s_offset', mu=0, sigma=1, shape=n_subject)
    gamma_phi_s = pm.Deterministic('gamma_phi_s', mu_gamma + gamma_phi_s_offset * sigma_gamma)

    gamma_log_s = pm.Normal.dist(mu=0, sigma=1).logcdf(gamma_phi_s)
    gamma_s = pm.Deterministic('gamma_s', tt.exp(gamma_log_s))

    mu_alpha = pm.Uniform('mu_alpha', -5, 5)
    sigma_alpha = pm.Uniform('sigma_alpha', 0.01, 40)
    alpha_s = pm.Lognormal('alpha_s', mu=mu_alpha, sigma=sigma_alpha, shape=n_subject)

    mu_rho = pm.Normal('mu_rho', mu=0, sigma=10)
    sigma_rho = pm.Uniform('sigma_rho', 0.01, 100)
    rho_s = pm.Lognormal('rho_s', mu=mu_rho, sigma=sigma_rho, shape=n_subject)

    mu_beta = pm.Normal('mu_beta', mu=0, sigma=10)
    sigma_beta = pm.Uniform('sigma_beta', 0.01, 100)
    beta_s = pm.Lognormal('beta_s', mu=mu_beta, sigma=sigma_beta, shape=n_subject)

    mu_theta = pm.Normal('mu_theta', mu=0, sigma=10)
    sigma_theta = pm.Uniform('sigma_theta', 0.01, 1000)
    theta_s = pm.Lognormal('theta_s', mu=mu_theta, sigma=sigma_theta, shape=n_subject)

    mu_epsilon = pm.Normal('mu_epsilon', mu=0, sigma=10)
    sigma_epsilon = pm.Uniform('sigma_epsilon', 0.01, 100)
    epsilon_s = pm.Lognormal('epsilon_s', mu=mu_epsilon, sigma=sigma_epsilon, shape=n_subject)

    utility_diff = utility(alpha_s[subject_idx], xa) - utility(alpha_s[subject_idx], xb)
    sprob_diff = spf(gamma_s[subject_idx], pa) - spf(gamma_s[subject_idx], pb)
    diff = utility_diff + atr_diff(rho_s[subject_idx], beta_s[subject_idx], theta_s[subject_idx]+1, sprob_diff)

    p = pm.math.invlogit(epsilon_s[subject_idx] * diff)
    likelihood = pm.Bernoulli('likelihood', p=p, observed=choice)

The result of model1.check_test_point() is

mu_gamma -0.92
sigma_gamma_interval__ -1.39
gamma_phi_s_offset -2.76
mu_alpha_interval__ -1.39
sigma_alpha_interval__ -1.39
alpha_s_log__ -11.74
mu_rho -3.22
sigma_rho_interval__ -1.39
rho_s_log__ -14.49
mu_beta -3.22
sigma_beta_interval__ -1.39
beta_s_log__ -14.49
mu_theta -3.22
sigma_theta_interval__ -1.39
theta_s_log__ -21.40
mu_epsilon -3.22
sigma_epsilon_interval__ -1.39
epsilon_s_log__ -14.49
likelihood -208.95
Name: Log-probability of test_point, dtype: float64

There is no -inf or nan in this result, but I have got bad initial energy when I used NUTS. This is an essential part of my Master thesis, and please help me. THANK YOU!

You can try to change the init method in pm.sample(), for instance into “adapt-diag”. The default initialization method adds a random jitter around the test point, that can lead to bad initial energy even when the test point itself is valid.

Keep in mind that this might be a symptom of a misspecified model, so ignoring it might not be the best strategy. In any case you can try to change the init method and see if it runs.

Thank you for your reply. Your suggestion is really helpful. But after I changed the init method, a new problem was raised:

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
“”"
Traceback (most recent call last):
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py”, line 110, in run
self._start_loop()
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py”, line 160, in _start_loop
point, stats = self._compute_point()
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/parallel_sampling.py”, line 191, in _compute_point
point, stats = self._step_method.step(self._point)
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py”, line 247, in step
apoint, stats = self.astep(array)
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/hmc/base_hmc.py”, line 130, in astep
self.potential.raise_ok(self._logp_dlogp_func._ordering.vmap)
File “/Users/sherlockli/anaconda3/lib/python3.7/site-packages/pymc3/step_methods/hmc/quadpotential.py”, line 231, in raise_ok
raise ValueError(’\n’.join(errmsg))
ValueError: Mass matrix contains zeros on the diagonal.
The derivative of RV epsilon_s_log__.ravel()[0] is zero.
The derivative of RV epsilon_s_log__.ravel()[1] is zero.
The derivative of RV epsilon_s_log__.ravel()[2] is zero.
The derivative of RV sigma_epsilon_interval__.ravel()[0] is zero.
The derivative of RV mu_epsilon.ravel()[0] is zero.
The derivative of RV theta_s_log__.ravel()[0] is zero.
The derivative of RV theta_s_log__.ravel()[1] is zero.
The derivative of RV theta_s_log__.ravel()[2] is zero.
The derivative of RV sigma_theta_interval__.ravel()[0] is zero.
The derivative of RV mu_theta.ravel()[0] is zero.
The derivative of RV beta_s_log__.ravel()[0] is zero.
The derivative of RV beta_s_log__.ravel()[1] is zero.
The derivative of RV beta_s_log__.ravel()[2] is zero.
The derivative of RV sigma_beta_interval__.ravel()[0] is zero.
The derivative of RV mu_beta.ravel()[0] is zero.
The derivative of RV rho_s_log__.ravel()[0] is zero.
The derivative of RV rho_s_log__.ravel()[1] is zero.
The derivative of RV rho_s_log__.ravel()[2] is zero.
The derivative of RV sigma_rho_interval__.ravel()[0] is zero.
The derivative of RV mu_rho.ravel()[0] is zero.
The derivative of RV alpha_s_log__.ravel()[0] is zero.
The derivative of RV alpha_s_log__.ravel()[1] is zero.
The derivative of RV alpha_s_log__.ravel()[2] is zero.
The derivative of RV sigma_alpha_interval__.ravel()[0] is zero.
The derivative of RV mu_alpha_interval__.ravel()[0] is zero.
The derivative of RV gamma_phi_s_offset.ravel()[0] is zero.
The derivative of RV gamma_phi_s_offset.ravel()[1] is zero.
The derivative of RV gamma_phi_s_offset.ravel()[2] is zero.
The derivative of RV sigma_gamma_interval__.ravel()[0] is zero.
The derivative of RV mu_gamma.ravel()[0] is zero.
“”"

As you mentioned, my model might be misspecified. However, I am a beginner in Bayesian model. Could you recommend any relevant blogs or literature which might be helpful?

I would suggest you try to build the model in small steps to understand where does it start to fail.

For some more general resources on Bayesian modeling you can check:

https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html

Thank you for your recommendation. I will seriously read the materials you suggested.