Mixture model estimation

Let’s assume that I have a variable Z that is a sum of a standard normal variable X + Bernoulli variable Y.

\begin{equation} Z = X + Y, \text{where } X\sim N(0,1) \text{ and } Y\sim B(p) \end{equation}

I have a observations of Z and I want to estimate p.

Is it possible to define a Pymc estimation problem without having to define the distributional properties of Z, other than that it is a sum X+Y?

1 Like

OK, I managed to get the density of Z. It is
{f_z}(x) = p{f_N}(x,1,1) + (1 - p){f_N}(x,0,1)
How do I write correctly a custom logp function that can be evaluated against my data?
I tried below but getting an error.

from scipy.stats import norm, bernoulli
import numpy as np
import pymc3 as pm

data = np.random.normal(size = 500) + bernoulli.rvs(0.2, size = 500)

with pm.Model() as model:
    p = pm.Uniform('p', 0.0001, 0.4)
    
    def logp(value):
        return np.log(p*norm.pdf(value, loc = 1) + (1-p)*norm.pdf(value, loc = 0)).sum()
    
    pst = pm.DensityDist('pst', logp, observed=data)
    trace = pm.sample(50000)

OK, I solved the issue myself. It fails because pymc doesn’t like the scipy’s norm density function. I’m not sure if this is expected. Writing a custom normal density function solved the issue.

In this case you can also use pm.NormalMixture directly