How to include data into DensityDist?

rosgori · March 19, 2019, 1:46am

Suppose I have a generalized zero inflated poisson model:

\text{Pr}(k \mid \phi, \alpha, \lambda) = \begin{cases} \text{something}, & \text{if } k = 0 \\ \text{something}, & \text{if } k = 1, 2, \dots \end{cases}

I have to define the logarithm of the likelihood. Fortunately, I can use some theano functions:

def log_(phi, alp, lam, k):
    return #something using theano functions: theano.tensor.log or
    # theano.tensor.exp

Now, my data is the frequency of movements of something, which means for 0 I have a frequency of 182, for 1 I have 41, and so on. Let’s put that in a variable:

import numpy as np
counts = np.array([182,  41,  12,  2,  2,  0,  0,  1])

The documentation says that I should use DensityDist for this, something like this:

import pymc3 as pm

with pm.Model() as model_p:
    # priors for phi, alpha and lamda
    # phi =
    # alpha =
    # lamda =

   gen_poisson = pm.DensityDist('gen_poisson', logp=logp_, 
       observed=dict(phi=phi, alp=alpha, lam=lamda))

As you may see, the function logp_ needs four arguments, only three have been passed. What about k? I don’t know. How should I include my data, counts, to DensityDist? I don’t think this makes sense:

gen_poisson = pm.DensityDist('gen_poisson', logp=logp_, 
       observed=dict(phi=phi, alp=alpha, lam=lamda, k=counts))

Remember that each datum of counts is the frequency of something. It has to be similar to this:

with pm.Model() as z_poisson:
    # priors for psi and theta
    # psi = 
    # theta = 

    # counts should be normalized before this line
    y = pm.ZeroInflatedPoisson('y', psi, theta, observed=counts)

After reading these questions: Model definition, variable size and custom likelihood and theano fucntions and densitydist, I still don’t know how to include my data to DensityDist. What should I do?

lucianopaz · March 19, 2019, 10:34am

This should be correct.

You only have to make sure that you use the correct tt.switch to handle k==0 differently, and you should be good to go.

Alternatively, you can define your own Distribution instead of using DensityDist. To do so, you should write down something like this:

class MyDistribution(pm.Discrete):
    def __init__(self, phi, alp, lam):
        self.phi = tt.as_tensor_variable(phi)
        self.alp = tt.as_tensor_variable(alp)
        self.lam = tt.as_tensor_variable(lam)

    def random(self, point=None, size=None):
        raise NotImplementedError('You have to implement this yourself')

    def logp(self, value):
        phi = self.phi
        alp = self.alp
        lam = self.lam
        # Use the same implementation as logp_
        # value will hold the observed counts

And this will work just like any other distribution in pymc3, where you pass on observed or you don’t.

rosgori · March 19, 2019, 10:08pm

Isn’t that weird?

I mean, k takes values of 0 or 1 or 2 or 3 and the output is the probability of a given number of events happens. For example, if I plot counts as probability, the result is:

With the model I showed previously, for example, for k=2, the result might be 0.03 or 0.01. With some k, the model should show something like this:

\text{Pr}(k=0 \mid (\text{something for }\phi, \theta, \lambda)) = 0.75
\text{Pr}(k=1 \mid (\text{something for }\phi, \theta, \lambda)) = 0.20
\text{Pr}(k=2 \mid (\text{something for }\phi, \theta, \lambda)) = 0.03

With this code:

gen_poisson = pm.DensityDist('gen_poisson', logp=logp_,
    observed=dict(phi=phi, alp=alpha, lam=lamda, k=counts))

I am saying that k takes values of 182 or 42, which is not right. I have to say that I am confused with this.

chartl · March 19, 2019, 10:40pm

If you call counts.shape[0] “n”, You’re saying that you observe n independent draws from your distribution, the first two of which were 182 and 41. These observations constrain the potential values of \phi, \alpha and \lambda to those which make your observations reasonably likely.

Topic		Replies	Views
Running into Theano issues when using DensityDist Questions theano	6	1156	February 10, 2021
Custom likelihood function (again) with DensityDist and Theano function Questions	4	2570	December 27, 2021
DensityDist from scipy.stats distribution Questions	5	2254	May 20, 2018
Problem with pymc3.DensityDist() #2846 Questions from_github	3	2270	February 8, 2018
Discrete DensityDist v5 modeling	5	475	January 7, 2023

How to include data into DensityDist?

Related topics