To start with, I can tell you what is wrong with your code. In scipy when you do
r = np.linspace(0,5,1000)
y = scipy.stats.lognorm.pdf(r, 1,0) * (1 - 0.5*(scipy.stats.norm.pdf(r,2, 0.2)))
This combines the probability distributions of the two random variables evaluated at the same observed value (the resulting distribution is no longer a probability distribution function though). What you did in pymc i.e
def selection_d(mu:TensorVariable,
phi: TensorVariable,
x: TensorVariable,
tau: TensorVariable,
epsilon: TensorVariable,
size: TensorVariable) -> TensorVariable:
return pm.Lognormal.dist(mu,phi) * (1 - x*pm.Normal.dist(tau,epsilon))
is not equivalent to that. What this random variable would do I think is sample normal and lognormal independently and plug them into the operation you defined. To understand this with a simpler example, you can try sampling
pm.Normal.dist(0,1) - pm.Normal.dist(0,1)
would not give you a (non-probability) distribution which is 0 everywhere but instead a normal distribution with variance 2 where as following would be 0 everywhere
scipy.stats.norm.pdf(r,1, 0)) - scipy.stats.norm.pdf(r,1, 0))
Now, I have actually never worked with custom distributions before but looking at the function reference page for custom dist:
https://www.pymc.io/projects/docs/en/v5.10.3/api/distributions/generated/pymc.CustomDist.html
suggests that what you want can be achieved by defining the logp of your distribution and using sample, since that is the one which gets evaluated at the observed data point:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import pymc as pm
import scipy.stats
import arviz as az
from pytensor.tensor import TensorVariable
s = 1
mu = 0
x = 0.5
tau=2
eps=0.2
r = np.linspace(0,5,1000)
y = scipy.stats.lognorm.pdf(r, s,mu) * (1 - x*(scipy.stats.norm.pdf(r,tau,eps)))
z = scipy.stats.lognorm.pdf(r,s,mu)
plt.plot(r, y,'r-', lw=5, alpha=0.6, label='selection pdf')
plt.plot(r,z,'g',lw=5,alpha=0.6,linestyle='--',label='Lognorm pdf')
plt.fill_between(r,y,z,where=(z>y),label='Missing Portion')
plt.legend()
def normal_pdf(x, mu, s):
return 1/(s*2*pm.math.sqrt(np.pi))*pm.math.exp(-0.5*((x-mu)/s)**2)
def logp(value: TensorVariable, s:TensorVariable, mu: TensorVariable,
tau: TensorVariable, eps: TensorVariable, x:TensorVariable) -> TensorVariable:
return pm.LogNormal.logp(value, mu, s) + pm.math.log( 1 - x*(normal_pdf(value,tau,eps)))
with pm.Model() as sel_model:
phi = 1
mu = 0
x = 0.5
tau=2
eps=0.2
sel_dist = pm.CustomDist('sel_dist',
s,
mu,
tau,
eps,
x,
logp=logp,
)
draws = pm.sample(draws=10000)
q = az.plot_posterior(draws)
q.set_xlim([0,5])
gives
Note that I have used pm.LogNormal.logp since there is some boiler plate involved in dealing with x=0 and I have defined pdf of the normal because I need its pdf first and not logpdf (I log it after summing it with 1). If you compare it with the formula you use which is:
pm.logp(a,value)*(1-x*pm.logp(b,value))
you will notice that you have not applied log quite correctly, after all you have something like
p = p1*(1-xp2) so when logging it you get logp1 + log(1-xp2), so I suspect that is also another source of error in your code. But as I said, never worked with custom dists before so don’t really know things like when you need to define the dist and when not, when you need to define logp and when not and how that effects sample vs draw. You can I guess play around and see. Also perhaps the following is useful:
https://www.pymc.io/projects/docs/en/stable/contributing/implementing_distribution.html