Why am I getting inf or nan likelihood?

asya · January 13, 2021, 1:25pm

Initial evaluation results:
p_stickbreaking__ -1.50
z -2999.21
mu -26.74
sigma -12760002.76
Y_obs -inf
Name: Log-probability of test_point, dtype: float64

how to understand where is the mistake?

ricardoV94 · January 13, 2021, 4:31pm

Hi @asya,

Can you share the model? It’s not possible to give you feedback with the error message alone.

asya · January 14, 2021, 9:57am

okay, so I’m just trying things
the model might be a complete bullshit, but I just would like to understand where is a mistake in my logic

with gmm:

# Prior over z
p = pm.Dirichlet('p', a=np.array([1.]*K))

# z is the component that the data point is being sampled from
z = pm.Categorical('z', p=p, shape=N)

# Gamma prior
mu = pm.Gamma('mu', 1, 5, shape=K, testval=[1000,1500,2000] )
sigma = pm.Gamma('sigma', 1, 4, shape=K, testval=[1000,1500,2000])

# Beta likelihood
Y_obs = pm.Beta('Y_obs', alpha=mu[z], beta=sigma[z], observed=y)

and sampling:

with gmm:

# sampling algorithms
step1 = pm.NUTS(vars=[p, mu, sigma])
step2 = pm.CategoricalGibbsMetropolis(vars=[z])

trace = pm.sample(draws=3000, step=[step1, step2])

ricardoV94 · January 14, 2021, 3:11pm

Are you sure your observed data is valid for a Beta likelihood? Are all observed values between 0 and 1?

Besides that, I am not sure how the z is supposed to work. What distinguishes z=0 from z=1 or z=2?

asya · January 15, 2021, 12:44pm

Yes, all values the array are in between 0 and 1. I checked with scipy that each vector in this array has mostly beta distribution before scaling from 0 to 1, and after it is either weibull, norm, genextreme or gamma distribution

z is sampling from categorical distribution with probability p, which is taken from dirichlet (with currently all alphas equals 1 - as I’m not sure what alphas to put here)

michaelosthege · January 15, 2021, 12:45pm

The use of a categorical random variable for indexing could be cause of trouble.

The problem is that the gradient if the likelihood changes depending on the value of z. This means that the computation graph depends on z and that’s a problem.

If you really need to define your model like that, you’ll have to dig into how switch/ifelse works in Theano. Or use a different distribution.
I never did something like this myself…

asya · January 15, 2021, 1:08pm

what could be a more usual choice for random variable distribution if not dirichlet to categorical?

michaelosthege · January 15, 2021, 3:08pm

I think mathematically your model (including the distributions) is fine.

But the line pm.Beta('Y_obs', alpha=mu[z], beta=sigma[z], observed=y) is mathematically the same as one of these big curly brackets expressions and AFAIK this is not something you can build with slicing. In fact even this simple example does not work:

z = tt.vector("int32")
x  = tt.matrix("float32")
y = x[z]

I think it’s possible with tt.switch, but I couldn’t get an example to work right away. Someone with more Theano skill might.
Also, maybe I’m just completely wrong on this and mu[z] is not actually the problem.

ricardoV94 · January 15, 2021, 3:21pm

I think the indexing is fine. This models runs (with a lot of divergences as I would expect):

K = 3
N = 10
y = np.random.rand(N)

with pm.Model() as m:
    # Prior over z
    p = pm.Dirichlet('p', a=np.array([1.]*K))

    # z is the component that the data point is being sampled from
    z = pm.Categorical('z', p=p, shape=N)

    # Gamma prior
    mu = pm.Gamma('mu', 1, 5, shape=K, testval=[1000,1500,2000] )
    sigma = pm.Gamma('sigma', 1, 4, shape=K, testval=[1000,1500,2000])

    # Beta likelihood
    Y_obs = pm.Beta('Y_obs', alpha=mu[z], beta=sigma[z], observed=y)

@asya Do you see what could be different in your version?

Why did you go with those testvals? They seem very opinionated.

The model still doesn’t make sense to me. Are you trying to find K latent clusters in your data? If that’s the case you will have to avoid issues with label switching, since you are using the same exact priors for the K mu and sigma.

asya · January 22, 2021, 2:42pm

I mostly just in general has difficulties to understand why some of my models don’t work, for example this one also throughs an error:

import pymc3 as pm
import numpy as np


syn = np.array([0, 113, 42, 78, 125, 234, 393, 874, 407, 439, 297, 322, 246, 434, 797, 365, 202, 554, 303, 564, 392, 243, 237, 123, 206, 254, 243, 309, 388, 551, 810, 554, 741, 716, 483, 428, 267, 198, 190, 196, 387, 248, 249, 348, 857, 260, 296, 451, 339, 461, 426, 481, 1038])


with pm.Model() as model:
    alpha = pm.Exponential('alpha', lam=5)
    beta = pm.Exponential('beta', lam=5)

    g = pm.Gamma('g', alpha=alpha, beta=beta, observed=syn)

    trace = pm.sample(return_inferencedata=True, cores=1)

it gives me this error:

ricardoV94 · January 22, 2021, 2:46pm

I think the observed zero might be the problem. The docs suggest the distribution does not include zero: Continuous — PyMC 5.10.0 documentation

ricardoV94 · January 22, 2021, 2:49pm

More generally, you can always try to reduce the number of variables / observations until things start to work, and rebuild from there. That will help you catch where the problems are coming from.

asya · January 22, 2021, 2:59pm

okay thank you!

Topic		Replies	Views
"ValueError: Bad initial energy: inf. The model might be misspecified" Questions	2	840	August 1, 2018
Could you give me some advice about this modle? Questions	1	685	August 9, 2018
Why would likelihood be -inf in pm.sample() but not in pm.sample_prior_predictive()? v5 modeling	3	684	July 14, 2022
Beta-Binomial conjugate prior -- pm.Binomial buggy results...? Questions	9	838	November 3, 2021
"Unreasonable" model log likelihood in GMM Questions	3	450	November 20, 2018

Why am I getting inf or nan likelihood?

Related topics