I’m attempting to fit a custom made distribution to a set of data. I read related questions that were answered using the as_op
decorator, however I was unable to implement this approach. Instead I’m attempting to follow the prescribed approach outlined here: https://docs.pymc.io/Probability_Distributions.html#custom-distributions
which suggests defining a Python function that returns the log-likelihood for the distribution and providing this function as an input to the DensityDist()
method (see the “exponential survival function” in the link above, which I was able to run just fine).
However, my particular custom distribution (which is not analytically solvable and therefore must be normalized numerically) is proportional to the product of the cdf of the exponentially modified gaussian distribution and the survival function of the normal distribution. My model code is as follows (note: the observed data samples
is a numpy array of int32 values):
with pm.Model() as mod:
mu0 = pm.Uniform('mu0', lower=-250, upper=250)
mu1 = pm.Uniform('mu1', lower=750, upper=1250)
def model_logp(x):
sig0 = 100
tau0 = 100
sig1 = 100
# Unnormalized distribution value
unscaled = sp.stats.exponnorm.cdf(x, tau0/sig0, mu0, sig0) * sp.stats.norm.sf(x, mu1, sig1)
# Compute normalization factor by integrating over entire distribution
xmin = int(min(mu0 - 10*sig0, mu1 - 10*sig1))
xmax = int(max(mu0 + 10*np.sqrt(sig0**2 + tau0**2), mu1 + 10*sig1))
xfull = np.array(range(xmin, xmax))
norm_factor = sum(sp.stats.exponnorm.cdf(xfull, tau0/sig0, mu0, sig0) * sp.stats.norm.sf(xfull, mu1, sig1))
# Scale distribution to true pdf
pdf = unscaled/norm_factor
# Compute log-likelihood
out = np.log(pdf)
return out
model_eval = pm.DensityDist('model_eval', model_logp, observed={'x': samples})
step = pm.Metropolis()
trace = pm.sample(1000, step, cores=1)
For the sake of simplifying the question, I’ve hard coded several of the parameters (sig0
, tau0
, sig1
), but in principle, I will be fitting those parameters as well. In the code above, you can see I’ve used the scipy.stats
implementation of the EMG cdf and Normal sf. The code returns the following error:
TypeError Traceback (most recent call last)
<ipython-input-4-134823b3a983> in <module>
50 return out
51
---> 52 model_eval = pm.DensityDist('model_eval', model_logp, observed={'x': samps})
53
54 step = pm.Metropolis()
c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\pymc3\distributions\distribution.py in __new__(cls, name, *args, **kwargs)
45 total_size = kwargs.pop('total_size', None)
46 dist = cls.dist(*args, **kwargs)
---> 47 return model.Var(name, dist, data, total_size)
48 else:
49 raise TypeError("Name needs to be a string but got: {}".format(name))
c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\pymc3\model.py in Var(self, name, dist, data, total_size)
939 with self:
940 var = MultiObservedRV(name=name, data=data, distribution=dist,
--> 941 total_size=total_size, model=self)
942 self.observed_RVs.append(var)
943 if var.missing_values:
c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\pymc3\model.py in __init__(self, name, data, distribution, total_size, model)
1541 self.missing_values = [datum.missing_values for datum in self.data.values()
1542 if datum.missing_values is not None]
-> 1543 self.logp_elemwiset = distribution.logp(**self.data)
1544 # The logp might need scaling in minibatches.
1545 # This is done in `Factor`.
<ipython-input-4-134823b3a983> in model_logp(x)
33
34 # Unnormalized distribution value
---> 35 unscaled = sp.stats.exponnorm.cdf(x, tau0/sig0, mu0, sig0) * sp.stats.norm.sf(x, mu1, sig1)
36
37 # Compute normalization factor by integrating over entire distribution
c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\scipy\stats\_distn_infrastructure.py in cdf(self, x, *args, **kwds)
1824 x = np.asarray((x - loc)/scale, dtype=dtyp)
1825 cond0 = self._argcheck(*args) & (scale > 0)
-> 1826 cond1 = self._open_support_mask(x, *args) & (scale > 0)
1827 cond2 = (x >= np.asarray(_b)) & cond0
1828 cond = cond0 & cond1
c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\scipy\stats\_distn_infrastructure.py in _open_support_mask(self, x, *args)
901 def _open_support_mask(self, x, *args):
902 a, b = self._get_support(*args)
--> 903 return (a < x) & (x < b)
904
905 def _rvs(self, *args):
c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\theano\tensor\var.py in __bool__(self)
89 else:
90 raise TypeError(
---> 91 "Variables do not support boolean operations."
92 )
93
TypeError: Variables do not support boolean operations.
Broadly speaking, I’m wondering if the error is the result of me not writing the model_logp()
function using “pure Python”, given the break at line where I call the scipy.stats
methods (or it’s very likely I’m making some obvious, simple mistake). What am I doing wrong here and what would be the appropriate way of implementing this custom distribution? Thanks for the help!