Fitting joint distributions using custom distribution as part of mixture

jpm · December 19, 2019, 8:09pm

I’m trying to understand how to fit a joint distribution in pymc3.

Here’s a toy model to illustrate my question. Suppose I have data y[2,N] drawn from a mixture model, with some mixing proportion, where the mixture component y_i = k determines two things:

The mean of the Normal distribution that y[0,i] is sampled from (either theta1 or theta2).
The rate of the Poisson distribution that y[1,i] is sampled from (either rate1 or rate2).

How do I implement such a model in pymc3?

My attempt is below (marginalizing out cluster assignments). I may be going down the wrong track since I don’t understand how to dealing with data where each observation has elements drawn from multiple distributions. The current code gives the error “too many indices for array” at line “normal_data = y[0,i]”.

def likelihood(theta, rate):
    def logp(y):
        normal_data = y[0,:]
        poisson_data = y[1,:]
        return (pm.Poisson.dist(rate).logp(poisson_data) + 
                pm.Normal.dist(theta, 1).logp(normal_data))
    return logp

basic_model = pm.Model()

with basic_model:
   theta1 = pm.Uniform('theta1', 0, 1)
   theta2 = pm.Uniform('theta2', 0, 1)
   rate1 = pm.Exponential('rate1', 1)  
   rate2 = pm.Exponential('rate2', 1)
   mix_proportion = pm.Uniform('lambda', 0, 1)
   custom = pm.DensityDist('custom', likelihood(theta1, theta2))
   y_obs = pm.Mixture('y_obs', mix_proportion, 
                               [custom(theta1, rate1), 
                                custom(theta2, rate2)], 
                                observed=data)

Thanks!

junpenglao · December 20, 2019, 1:14am

Since in this toy example the two dimension of y are independent, I suggest you to separate the 2 and write 2 mixtures, each with observed y[0, :] and y[1, :], and then try combining them.

jpm · December 20, 2019, 5:13pm

Thanks for the advice.

I may not have been clear, but the two dimensions of y are not independent - they’re coupled through the cluster assignment. That is, y[0, i] and y[1,i] are assumed to have the same cluster assignment, and are thus only conditionally independent given the cluster assignment: if y[0,i] is sampled from Normal(theta1) then y[1,i] is sampled from Poisson(rate1) not Poisson(rate2).

jpm · January 13, 2020, 7:39pm

Hoping for more advice on this question…

I’m realising in retrospect that asking it a couple of days before Christmas may have resulted in not many people seeing it.

Thanks!

Topic		Replies	Views
How to improve fit of Poisson Mixture Model in PyMC3? Questions	20	4987	May 24, 2018
Drawing values from custom distribution Questions	8	952	September 14, 2022
Mixture model estimation Questions	3	446	May 26, 2020
Mixture model with variable number of component distributions Questions	2	1548	November 29, 2018
Create Custom Joint Distribution Questions	9	2565	May 26, 2019

Fitting joint distributions using custom distribution as part of mixture

Related topics