Why am I getting inf or nan likelihood?

Initial evaluation results:
p_stickbreaking__ -1.50
z -2999.21
mu -26.74
sigma -12760002.76
Y_obs -inf
Name: Log-probability of test_point, dtype: float64

how to understand where is the mistake?

Hi @asya,

Can you share the model? It’s not possible to give you feedback with the error message alone.

okay, so I’m just trying things
the model might be a complete bullshit, but I just would like to understand where is a mistake in my logic :slight_smile:

with gmm:

# Prior over z
p = pm.Dirichlet('p', a=np.array([1.]*K))

# z is the component that the data point is being sampled from
z = pm.Categorical('z', p=p, shape=N)

# Gamma prior
mu = pm.Gamma('mu', 1, 5, shape=K, testval=[1000,1500,2000] )
sigma = pm.Gamma('sigma', 1, 4, shape=K, testval=[1000,1500,2000])

# Beta likelihood
Y_obs = pm.Beta('Y_obs', alpha=mu[z], beta=sigma[z], observed=y)

and sampling:

with gmm:

# sampling algorithms
step1 = pm.NUTS(vars=[p, mu, sigma])
step2 = pm.CategoricalGibbsMetropolis(vars=[z])

trace = pm.sample(draws=3000, step=[step1, step2])

Are you sure your observed data is valid for a Beta likelihood? Are all observed values between 0 and 1?

Besides that, I am not sure how the z is supposed to work. What distinguishes z=0 from z=1 or z=2?

Yes, all values the array are in between 0 and 1. I checked with scipy that each vector in this array has mostly beta distribution before scaling from 0 to 1, and after it is either weibull, norm, genextreme or gamma distribution

z is sampling from categorical distribution with probability p, which is taken from dirichlet (with currently all alphas equals 1 - as I’m not sure what alphas to put here)

The use of a categorical random variable for indexing could be cause of trouble.

The problem is that the gradient if the likelihood changes depending on the value of z. This means that the computation graph depends on z and that’s a problem.

If you really need to define your model like that, you’ll have to dig into how switch/ifelse works in Theano. Or use a different distribution.
I never did something like this myself…

what could be a more usual choice for random variable distribution if not dirichlet to categorical?

I think mathematically your model (including the distributions) is fine.

But the line pm.Beta('Y_obs', alpha=mu[z], beta=sigma[z], observed=y) is mathematically the same as one of these big curly brackets expressions and AFAIK this is not something you can build with slicing. In fact even this simple example does not work:

z = tt.vector("int32")
x  = tt.matrix("float32")
y = x[z]

I think it’s possible with tt.switch, but I couldn’t get an example to work right away. Someone with more Theano skill might.
Also, maybe I’m just completely wrong on this and mu[z] is not actually the problem.

1 Like

I think the indexing is fine. This models runs (with a lot of divergences as I would expect):

K = 3
N = 10
y = np.random.rand(N)

with pm.Model() as m:
    # Prior over z
    p = pm.Dirichlet('p', a=np.array([1.]*K))

    # z is the component that the data point is being sampled from
    z = pm.Categorical('z', p=p, shape=N)

    # Gamma prior
    mu = pm.Gamma('mu', 1, 5, shape=K, testval=[1000,1500,2000] )
    sigma = pm.Gamma('sigma', 1, 4, shape=K, testval=[1000,1500,2000])

    # Beta likelihood
    Y_obs = pm.Beta('Y_obs', alpha=mu[z], beta=sigma[z], observed=y)

@asya Do you see what could be different in your version?

Why did you go with those testvals? They seem very opinionated.

The model still doesn’t make sense to me. Are you trying to find K latent clusters in your data? If that’s the case you will have to avoid issues with label switching, since you are using the same exact priors for the K mu and sigma.

I mostly just in general has difficulties to understand why some of my models don’t work, for example this one also throughs an error:

import pymc3 as pm
import numpy as np


syn = np.array([0, 113, 42, 78, 125, 234, 393, 874, 407, 439, 297, 322, 246, 434, 797, 365, 202, 554, 303, 564, 392, 243, 237, 123, 206, 254, 243, 309, 388, 551, 810, 554, 741, 716, 483, 428, 267, 198, 190, 196, 387, 248, 249, 348, 857, 260, 296, 451, 339, 461, 426, 481, 1038])


with pm.Model() as model:
    alpha = pm.Exponential('alpha', lam=5)
    beta = pm.Exponential('beta', lam=5)

    g = pm.Gamma('g', alpha=alpha, beta=beta, observed=syn)

    trace = pm.sample(return_inferencedata=True, cores=1)

it gives me this error:

I think the observed zero might be the problem. The docs suggest the distribution does not include zero: Continuous — PyMC 5.10.0 documentation

1 Like

More generally, you can always try to reduce the number of variables / observations until things start to work, and rebuild from there. That will help you catch where the problems are coming from.

1 Like

okay thank you!