Best way to define a custom discrete distribution to use as a latent variable?

Jerry_Luo · April 21, 2020, 12:19am

I want to use a mixture of multivariate Bernoulli distributions as an intermediate discrete latent variable in a hierarchical model. I noticed it was probably easier to do this as a custom distribution. Thankfully someone else has already written the function, and it works great when the discrete variable is the observed variable.

def bernoulli_loglh(theta, weights):
    def _logp(value):
        value = value > 0.5
        value_neg = 1 - value
        logtheta = tt.log(theta)
        neglogtheta = tt.log(1-theta)

        # N*K matrix of likelihood contributions from X_i with theta_k
        betas = tt.tensordot(value, logtheta, axes=[1, 1]) + tt.tensordot(value_neg, neglogtheta, axes=[1, 1])

        ## Form alphas, NxK matrix with the component weights included
        alphas = (betas + tt.log(weights)).T

        # Take LSE rowise to get N vector
        # and add alpha_cluster1 to get the total likelihood across all
        # components for each X_i
        lse_clust = pm.math.logsumexp(alphas - alphas[0, :], axis=0) + alphas[0,:]

        # Final overall sum
        return tt.sum(lse_clust)
    return _logp

The problem is that when I use it as a latent variable, I am getting continuous values in the sampling, even when I specify Metropolis Hastings sampling. This makes sense, because the domain is never specified.

One hacky workaround I tried was to do the value > 0.5 thing everywhere that the variable appeared, but I didn’t really get good results with this and I’d ultimately want something that works better.

ricardoV94 · March 8, 2021, 1:47pm

How are you using the bernoulli_loglh in your model? Via pm.DensityDist? One way to get only integer values is to create a class Distribution that inherits from pm.Discrete, but the only thing this will do is tell the Metropolis sampler to round it’s proposed values before evaluating the logp. You can achieve the same if you round your value in your _logp and then save the rounded accepted values in a pm.Deterministic Something like

with pm.Model() as m:
  my_dist = pm.DensitiyDist('my_dist', bernoulli_loglh...)
  my_values = pm.Deterministic('my_values', np.round(my_dist))

Same thing if you are using a pm.Potential…

Topic		Replies	Views
Continuous Bernoulli distribution Questions	3	1114	October 14, 2020
Making a discrete CustomDist v5 modeling	4	429	June 13, 2023
Discrete DensityDist v5 modeling	5	509	January 7, 2023
Custom Categorical Distribution - ensure bounded candidates v5 modeling	13	395	March 31, 2024
Can not correctly sample high dimensional RVs from customized likelihood Questions	5	556	October 24, 2022

Best way to define a custom discrete distribution to use as a latent variable?

Related topics