Let’s assume that I have a variable Z that is a sum of a standard normal variable X + Bernoulli variable Y.
\begin{equation}
Z = X + Y, \text{where } X\sim N(0,1) \text{ and } Y\sim B(p)
\end{equation}
I have a observations of Z and I want to estimate p.
Is it possible to define a Pymc estimation problem without having to define the distributional properties of Z, other than that it is a sum X+Y?
1 Like
OK, I managed to get the density of Z. It is
{f_z}(x) = p{f_N}(x,1,1) + (1 - p){f_N}(x,0,1)
How do I write correctly a custom logp function that can be evaluated against my data?
I tried below but getting an error.
from scipy.stats import norm, bernoulli
import numpy as np
import pymc3 as pm
data = np.random.normal(size = 500) + bernoulli.rvs(0.2, size = 500)
with pm.Model() as model:
p = pm.Uniform('p', 0.0001, 0.4)
def logp(value):
return np.log(p*norm.pdf(value, loc = 1) + (1-p)*norm.pdf(value, loc = 0)).sum()
pst = pm.DensityDist('pst', logp, observed=data)
trace = pm.sample(50000)
OK, I solved the issue myself. It fails because pymc doesn’t like the scipy’s norm density function. I’m not sure if this is expected. Writing a custom normal density function solved the issue.
In this case you can also use pm.NormalMixture
directly