TypeError: Variables do not support boolean operations

I’m attempting to fit a custom made distribution to a set of data. I read related questions that were answered using the as_op decorator, however I was unable to implement this approach. Instead I’m attempting to follow the prescribed approach outlined here: https://docs.pymc.io/Probability_Distributions.html#custom-distributions
which suggests defining a Python function that returns the log-likelihood for the distribution and providing this function as an input to the DensityDist() method (see the “exponential survival function” in the link above, which I was able to run just fine).

However, my particular custom distribution (which is not analytically solvable and therefore must be normalized numerically) is proportional to the product of the cdf of the exponentially modified gaussian distribution and the survival function of the normal distribution. My model code is as follows (note: the observed data samples is a numpy array of int32 values):

with pm.Model() as mod:
    mu0 = pm.Uniform('mu0', lower=-250, upper=250)
    mu1 = pm.Uniform('mu1', lower=750, upper=1250)
    def model_logp(x):
        sig0 = 100
        tau0 = 100
        sig1 = 100
        # Unnormalized distribution value
        unscaled = sp.stats.exponnorm.cdf(x, tau0/sig0, mu0, sig0) * sp.stats.norm.sf(x, mu1, sig1)
        # Compute normalization factor by integrating over entire distribution
        xmin = int(min(mu0 - 10*sig0, mu1 - 10*sig1))
        xmax = int(max(mu0 + 10*np.sqrt(sig0**2 + tau0**2), mu1 + 10*sig1))
        xfull = np.array(range(xmin, xmax))
        norm_factor = sum(sp.stats.exponnorm.cdf(xfull, tau0/sig0, mu0, sig0) * sp.stats.norm.sf(xfull, mu1, sig1))
        # Scale distribution to true pdf
        pdf = unscaled/norm_factor
        # Compute log-likelihood
        out = np.log(pdf)
        return out
    model_eval = pm.DensityDist('model_eval', model_logp, observed={'x': samples})
    step = pm.Metropolis()
    trace = pm.sample(1000, step, cores=1)

For the sake of simplifying the question, I’ve hard coded several of the parameters (sig0, tau0, sig1), but in principle, I will be fitting those parameters as well. In the code above, you can see I’ve used the scipy.stats implementation of the EMG cdf and Normal sf. The code returns the following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-134823b3a983> in <module>
     50         return out
---> 52     model_eval = pm.DensityDist('model_eval', model_logp, observed={'x': samps})
     54     step = pm.Metropolis()

c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\pymc3\distributions\distribution.py in __new__(cls, name, *args, **kwargs)
     45             total_size = kwargs.pop('total_size', None)
     46             dist = cls.dist(*args, **kwargs)
---> 47             return model.Var(name, dist, data, total_size)
     48         else:
     49             raise TypeError("Name needs to be a string but got: {}".format(name))

c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\pymc3\model.py in Var(self, name, dist, data, total_size)
    939             with self:
    940                 var = MultiObservedRV(name=name, data=data, distribution=dist,
--> 941                                       total_size=total_size, model=self)
    942             self.observed_RVs.append(var)
    943             if var.missing_values:

c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\pymc3\model.py in __init__(self, name, data, distribution, total_size, model)
   1541         self.missing_values = [datum.missing_values for datum in self.data.values()
   1542                                if datum.missing_values is not None]
-> 1543         self.logp_elemwiset = distribution.logp(**self.data)
   1544         # The logp might need scaling in minibatches.
   1545         # This is done in `Factor`.

<ipython-input-4-134823b3a983> in model_logp(x)
     34         # Unnormalized distribution value
---> 35         unscaled = sp.stats.exponnorm.cdf(x, tau0/sig0, mu0, sig0) * sp.stats.norm.sf(x, mu1, sig1)
     37         # Compute normalization factor by integrating over entire distribution

c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\scipy\stats\_distn_infrastructure.py in cdf(self, x, *args, **kwds)
   1824         x = np.asarray((x - loc)/scale, dtype=dtyp)
   1825         cond0 = self._argcheck(*args) & (scale > 0)
-> 1826         cond1 = self._open_support_mask(x, *args) & (scale > 0)
   1827         cond2 = (x >= np.asarray(_b)) & cond0
   1828         cond = cond0 & cond1

c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\scipy\stats\_distn_infrastructure.py in _open_support_mask(self, x, *args)
    901     def _open_support_mask(self, x, *args):
    902         a, b = self._get_support(*args)
--> 903         return (a < x) & (x < b)
    905     def _rvs(self, *args):

c:\users\jstanley\miniconda3\envs\dd\lib\site-packages\theano\tensor\var.py in __bool__(self)
     89         else:
     90             raise TypeError(
---> 91                 "Variables do not support boolean operations."
     92             )

TypeError: Variables do not support boolean operations.

Broadly speaking, I’m wondering if the error is the result of me not writing the model_logp() function using “pure Python”, given the break at line where I call the scipy.stats methods (or it’s very likely I’m making some obvious, simple mistake). What am I doing wrong here and what would be the appropriate way of implementing this custom distribution? Thanks for the help!

1 Like