Log probability negative infinity

Hello,

I am trying to run an experiment to find out the causal effect of a treatment on an outcome (e.g. tutoring on graduation rates) using simulated data. People with higher income are more likely to get tutored and are also more likely to graduate. If we have access to the income variable, we could adjust for it in the analysis. However, in many situations, we only have access to a proxy (such as Free/Reduced Lunch (FRL)). If household income is less than $35,000, then FRL is true, and false otherwise.


I tried to model this as follows:

with pm.Model() as model:
    gamma_alpha = 10.57
    gamma_beta = 0.00015102481121898598
    
    tutoring_to_grad_coeff = pm.Normal('tutoring_coeff', mu=0, sigma=0.5)
    socioecon_to_grad_coeff = pm.Normal('socioecon_coeff', mu=0, sigma=0.00001)
    socio_to_tutoring_coeff = pm.Normal('socio_to_tutoring_coeff', mu=0, sigma=0.00001)
    tutoring_alpha = pm.Normal('tutoring_alpha', mu=0, sigma=0.5)
    
    socioeconomic_status = pm.Gamma(
        'socioeconomic_status', 
        alpha=gamma_alpha, 
        beta=gamma_beta, 
        shape=df.shape[0], 
        observed=df['socioeconomic_status']
    )
    p_tutoring = logistic(socioeconomic_status * socio_to_tutoring_coeff + tutoring_alpha)

    tutoring = pm.Binomial('tutoring', n=1, p=p_tutoring, observed=df['tutoring'], shape=df.shape[0])

    p_free_reduced_lunch = logistic(socioeconomic_status - 35000)

    # commenting out this line below makes the model run but leads to the incorrect inference 
    # that tutoring decreases graduation rates
    pm.Binomial(
        'free_reduced_lunch', 
        n=1, 
        p=p_free_reduced_lunch, 
        observed=df['free_reduced_lunch'], 
        shape=df.shape[0]
    )

    p_graduation = logistic(tutoring * tutoring_to_grad_coeff + socioeconomic_status * socioecon_to_grad_coeff)
    graduation = pm.Binomial('graduation', n=1, p=p_graduation, observed=df['graduation'], shape=df.shape[0])
    
        
    traces = pm.sample(init='adapt_diag', tune=1000)

which leads to:

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [tutoring_alpha, socio_to_tutoring_coeff, socioecon_coeff, tutoring_coeff]
Sampling 4 chains, 0 divergences:   0%|          | 0/6000 [00:00<?, ?draws/s]/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/numpy/core/fromnumeric.py:3118: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
Sampling 4 chains, 0 divergences:   0%|          | 0/6000 [00:01<?, ?draws/s]
Bad initial energy, check any log probabilities that are inf or -inf, nan or very small:
free_reduced_lunch   -inf
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 160, in _start_loop
    point, stats = self._compute_point()
  File "/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/parallel_sampling.py", line 191, in _compute_point
    point, stats = self._step_method.step(self._point)
  File "/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py", line 247, in step
    apoint, stats = self.astep(array)
  File "/Users/eugaddan/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/step_methods/hmc/base_hmc.py", line 144, in astep
    raise SamplingError("Bad initial energy")
pymc3.exceptions.SamplingError: Bad initial energy
"""

The above exception was the direct cause of the following exception:

SamplingError                             Traceback (most recent call last)
SamplingError: Bad initial energy

The above exception was the direct cause of the following exception:

ParallelSamplingError                     Traceback (most recent call last)
<ipython-input-111-486f4a828c9e> in <module>
     35 
     36 
---> 37     traces = pm.sample(init='adapt_diag', tune=1000)

~/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
    467         _print_step_hierarchy(step)
    468         try:
--> 469             trace = _mp_sample(**sample_args)
    470         except pickle.PickleError:
    471             _log.warning("Could not pickle model, sampling singlethreaded.")

~/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/sampling.py in _mp_sample(draws, tune, step, chains, cores, chain, random_seed, start, progressbar, trace, model, **kwargs)
   1057         try:
   1058             with sampler:
-> 1059                 for draw in sampler:
   1060                     trace = traces[draw.chain - chain]
   1061                     if trace.supports_sampler_stats and draw.stats is not None:

~/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/parallel_sampling.py in __iter__(self)
    392 
    393         while self._active:
--> 394             draw = ProcessAdapter.recv_draw(self._active)
    395             proc, is_last, draw, tuning, stats, warns = draw
    396             if self._progress is not None:

~/miniconda3/envs/causal-inf-experiments/lib/python3.7/site-packages/pymc3/parallel_sampling.py in recv_draw(processes, timeout)
    295             else:
    296                 error = RuntimeError("Chain %s failed." % proc.chain)
--> 297             raise error from old_error
    298         elif msg[0] == "writing_done":
    299             proc._readable = True

ParallelSamplingError: Bad initial energy

One can reproduce the issue by looking at this Binder.

Any ideas on what I should try next?

Thank you!
Edderic

Try normalizing/rescaling the predictors - most likely some predictors are in a large scale and after logistic transformation the output is 0 or 1 (numerical issue) while the observation is 1 or 0.

2 Likes

I concur with @junpenglao. To help see what might be wrong with your model you can run prior predictive sampling line by line looking at the output values at each step:

take one sample from the _coeff variables + tutoring_alpha + socioeconomic_status and then inject them in the following distributions to get a sample from them too. This has helped me understand the problem in 100% cases in the past.

2 Likes

Thanks @junpenglao and @rlouf. It does seem like it was a numerical issue.

logistic(50000)

gives 1.0.

I got it to run without the negative infinity issue by modifying the logistic transformation on socioeconomic status to free/reduced lunch by dividing by 200,000:

p_free_reduced_lunch = logistic((socioeconomic_status - 35000) / 200000)

However, that doesn’t really capture the definition of free/reduced lunch. Is there a way to still keep the original definition that free reduced lunch is if socioeconomic status < 35000 then 1 else 0? I looked at the parameters for Binomial and it does state that p should be 0 < p < 1. Is there something I can do with p to enforce that constraint? Maybe I can write a modified_logistic function that if logistic returns 1, we instead return 0.99999 and if logistic returns 0, we instead return 0.00001. Does that make sense? Or do you think there is a simpler way?

Thanks a lot!
Edderic

Hi,

It is difficult for me to answer without knowing the meaning of the variables you are using. It does seem strange to me that you would have to subtract and divide the value inside the logistic to get a good result, but again I don’t know your problem.

Maybe if you explained we could help?